你可以看看把你的名字列表变成一个正则表达式。举个例子名称这个小名单:
names = ['AARON',
'ABDUL',
'ABE',
'ABEL',
'ABRAHAM',
'ABRAM',
'ADALBERTO',
'ADAM',
'ADAN',
'ADOLFO',
'ADOLPH',
'ADRIAN',
]
这可能与以下正则表达式来表示:
\b(?:AARON|ABDUL|ABE|ABEL|ABRAHAM|ABRAM|ADALBERTO|ADAM|ADAN|ADOLFO|ADOLPH|ADRIAN)\b
但是这不会是非常有效的。这是建立像树的正则表达式将更好地工作:
\b(?:A(?:B(?:E(?:|L)|RA(?:M|HAM)|DUL)|D(?:A(?:M|N|LBERTO)|OL(?:FO|PH)|RIAN)|ARON))\b
然后,您可以自动化生产这个正则表达式的 - 首先从名称列表创建dict
- 树结构可能和然后将该树翻译成正则表达式。对于上面的例子,这中间的树应该是这样的:
{
'A': {
'A': {
'R': {
'O': {
'N': {
'': {}
}
}
}
},
'B': {
'D': {
'U': {
'L': {
'': {}
}
}
},
'E': {
'': {},
'L': {
'': {}
}
},
... etc
......这能选择性地简化为这样:
{
'A': {
'ARON': {
'': {}
}
'B': {
'DUL': {
'': {}
},
'E': {
'': {},
'L': {
'': {}
}
},
'RA': {
'HAM': {
'': {}
},
'M': {
'': {}
}
}
},
... etc
这是建议的代码来做到这一点:
import re
def addToTree(tree, name):
if len(name) == 0:
return
if name[0] in tree.keys():
addToTree(tree[name[0]], name[1:])
else:
for letter in name:
tree[letter] = {}
tree = tree[letter]
tree[''] = {}
# Optional improvement of the tree: it combines several consecutive letters into
# one key if there are no alternatives possible
def simplifyTree(tree):
repeat = True
while repeat:
repeat = False
for key, subtree in list(tree.items()):
if key != '' and len(subtree) == 1 and '' not in subtree.keys():
for letter, subsubtree in subtree.items():
tree[key + letter] = subsubtree
del tree[key]
repeat = True
for key, subtree in tree.items():
if key != '':
simplifyTree(subtree)
def treeToRegExp(tree):
regexp = [re.escape(key) + treeToRegExp(subtree) for key, subtree in tree.items()]
regexp = '|'.join(regexp)
return '' if regexp == '' else '(?:' + regexp + ')'
def listToRegExp(names):
tree = {}
for name in names:
addToTree(tree, name[:])
simplifyTree(tree)
return re.compile(r'\b' + treeToRegExp(tree) + r'\b', re.I)
# Demo
names = ['AARON',
'ABDUL',
'ABE',
'ABEL',
'ABRAHAM',
'ABRAM',
'ADALBERTO',
'ADAM',
'ADAN',
'ADOLFO',
'ADOLPH',
'ADRIAN',
]
fields = [
'This is Aaron speaking',
'Is Abex a name?',
'Where did Abraham get the mustard from?'
]
regexp = listToRegExp(names)
# get the search result for each field, and link it with the index of the field
results = [[i, regexp.search(field)] for i, field in enumerate(fields)]
# remove non-matches from the results
results = [[i, match.group(0)] for [i, match] in results if match]
# print results
print(results)
看到它在repl.it
可能的重复[Python - 最快的方法来检查一个字符串是否包含列表中的任何项目中的特定字符](https://stackoverflow.com/questions/14411633/python-fastest-way-to-check -if-a-string-contains-specific-characters-in-any) – Shubham