2013-07-02 39 views
2
import regex,re 


sequence = 'aaaaaaaaaaaabbbbbbbbbbbbcccccccccccc' #being searched 
query = 'aaabbbbbbbbbbbbccc' #100% coverage 
query_1 = 'aaaabbbbbbbbcbbbcccc' #95% coverage 
query_2 = 'aaabbbbcbbbbbcbccc' #90% coverage 

threshold = .95 
error = len(query_1) - (len(query_1)*threshold) #for query_1 errors must be <= 1 

print regex.search(query_1 + '{e<={}}'.format(error),sequence).group(0) 

我试图添加额外的参数到正则表达式搜索,所以它只适用于查询的顺序被查询的一定比例的查询。如何将可变误差添加到正则表达式模糊搜索。蟒蛇

例如,如果我想这是至少95%的覆盖率,将工作为query_1,但它不会为query_2

+2

的模糊匹配功能[正则表达式模块](https://pypi.python.org/pypi/regex)可能是你正在寻找的。 –

回答

1

工作中使用的regex模块:的

import regex 
sequence = 'aaaaaaaaaaaabbbbbbbbbbbbcccccccccccc' #being searched 
query = 'aaabbbbbbbbbbbbccc' #100% coverage 
query_1 = 'aaaabbbbbbbbcbbbcccc' #95% coverage 
query_2 = 'aaabbbbcbbbbbcbccc' #90% coverage 
threshold = 0.97 
queries = (query, query_1, query_2) 
for q in queries: 
    error = int(len(q) - (len(q)*threshold)) 
    m = regex.search(r'(%s){e<=%d}'%(q,error), sequence) 
    print 'match' if m else 'nomatch' 
+0

添加(%s)(%d)%(变量1,变量2)时称为什么?我想看看这些文档,因为我在@perreal –

+0

之前已经看到它被称为旧式字符串格式:http://docs.python.org/2/library/stdtypes.html#string-formatting – perreal