更新:最初的分析是错误的,并在某些类别的测试用例上失败,正如Eric Zhang指出的那样。
我相信这可以用topological sort的形式解决。您的初始单词列表定义了部分单词或有关某些字母集的有向图。你希望找到一个能够使这个字母图形线性化的替换。让我们用你的不平凡的一个例子:
P A R K O V I S T E
P A R A D O N T O Z A
P A D A K
A B B A
A B E C E D A
A B S I N T
让x <* y
表明substitution(x) < substitution(y)
一些字母(或字)x
和y
。我们希望word1 <* word2 <* word3 <* word4 <* word5 <* word6
整体,但在文字方面,我们只需要看看在每对相邻的话,找到在同一列第一对不同的字符:
K <* A (from PAR[K]OVISTE <* PAR[A]DONTOZA)
R <* D (from PA[R]ADONTOZA <* PA[D]AK)
P <* A (from [P]ADAK <* [A]BBA)
B <* E (from AB[B]A <* AB[E]CEDA)
E <* S (from AB[E]CEDA <* AB[S]INT)
如果我们没有发现不匹配的字母,然后有三种情况:
- 字1和字2相同
- 字1是字的前缀2
- 字2字是的前缀1
在情况1和2中,单词已经按字典顺序排列,所以我们不需要执行任何替换(尽管我们可能),并且它们不会添加我们需要遵守的额外约束。在情况3中,根本没有可以解决这个问题的替代方案(想想["DOGGO", "DOG"]
),所以没有可能的解决方案,我们可以尽早退出。否则,我们建立对应于我们获得的部分排序信息的有向图并执行拓扑排序。如果排序过程指示不可能进行线性化,那么没有解决方法来排序单词列表。否则,你会得到如下结果:
P <* K <* R <* B <* E <* A <* D <* S
根据实现拓扑排序的方式,可能会得到不同的线性排序。现在您只需要为每个字母分配一个尊重此排序的替换,并且它本身按字母顺序排序。一个简单的办法是配对线性排序与自身字母顺序排序,并用其作为替代:
P <* K <* R <* B <* E <* A <* D <* S
| | | | | | | |
A < B < D < E < K < P < R < S
但是,如果你愿意,你可以实现一个不同的替换规则。
这里是用Python证明了概念:
import collections
import itertools
# a pair of outgoing and incoming edges
Edges = collections.namedtuple('Edges', 'outgoing incoming')
# a mapping from nodes to edges
Graph = lambda: collections.defaultdict(lambda: Edges(set(), set()))
def substitution_sort(words):
graph = build_graph(words)
if graph is None:
return None
ordering = toposort(graph)
if ordering is None:
return None
# create a substitition that respects `ordering`
substitutions = dict(zip(ordering, sorted(ordering)))
# apply substititions
return [
''.join(substitutions.get(char, char) for char in word)
for word in words
]
def build_graph(words):
graph = Graph()
# loop over every pair of adjacent words and find the first
# pair of corresponding characters where they differ
for word1, word2 in zip(words, words[1:]):
for char1, char2 in zip(word1, word2):
if char1 != char2:
break
else: # no differing characters found...
if len(word1) > len(word2):
# ...but word2 is a prefix of word1 and comes after;
# therefore, no solution is possible
return None
else:
# ...so no new information to add to the graph
continue
# add edge from char1 -> char2 to the graph
graph[char1].outgoing.add(char2)
graph[char2].incoming.add(char1)
return graph
def toposort(graph):
"Kahn's algorithm; returns None if graph contains a cycle"
result = []
working_set = {node for node, edges in graph.items() if not edges.incoming}
while working_set:
node = working_set.pop()
result.append(node)
outgoing = graph[node].outgoing
while outgoing:
neighbour = outgoing.pop()
neighbour_incoming = graph[neighbour].incoming
neighbour_incoming.remove(node)
if not neighbour_incoming:
working_set.add(neighbour)
if any(edges.incoming or edges.outgoing for edges in graph.values()):
return None
else:
return result
def print_all(items):
for item in items:
print(item)
print()
def test():
test_cases = [
('PINEAPPLE BANANA ARTICHOKE TOMATO', True),
('ABC ABB ABD', True),
('AB AA AB', False),
('PARKOVISTE PARADONTOZA PADAK ABBA ABECEDA ABSINT', True),
('AA AB CA', True),
('DOG DOGGO DOG DIG BAT BAD', False),
('DOG DOG DOGGO DIG BIG BAD', True),
]
for words, is_sortable in test_cases:
words = words.split()
print_all(words)
subbed = substitution_sort(words)
if subbed is not None:
assert subbed == sorted(subbed), subbed
print_all(subbed)
else:
print('<no solution>')
print()
print('expected solution?', 'yes' if is_sortable else 'no')
print()
if __name__ == '__main__':
test()
现在,它并不理想 - 例如,它仍然执行替代即使字的原始列表已经sorted- - 但它似乎工作。我不能正式证明它虽然工作,所以如果你找到一个反例,请让我知道!
非常感谢反例和更正!我本人无法弄清楚。是否严格需要将每个单词与其他每个单词进行比较,或者将每个单词与下一个相邻单词作品进行比较?有没有一个这样会失败的例子? –
是的,没错!这就是'O(NL)'时间算法。不过,实施稍微复杂一些。 –
难道你不能只用一个循环替换嵌套的i,j循环(对于我在范围内(len(wordlist)-1)''并且使用'w1,w2 = wordlist [i],wordlist [i + 1]' ,还是我误解? –