如何使用python查找文本中的字符偏移

我的目标是在两个对齐的文本文档中标识匹配的字符串，然后在每个文档中查找匹配字符串的起始字符的位置。如何使用python查找文本中的字符偏移

doc1=['the boy is sleeping', 'in the class', 'not at home'] 
doc2=['the girl is reading', 'in the class', 'a serious student']

我尝试：

# find matching string(s) that exist in both document list: 
matchstring=[x for x in doc1 if x in doc2] 
Output=matchstring='in the class'

现在的问题是找到在DOC1和DOC2匹配的字符串的字符偏移量（不包括标点符号，空格包括在内）。

理想的结果：

Position of starting character for matching string in doc1=20 
Position of starting character for matching string in doc2=20

在文本对齐任何想法？谢谢。

来源

2014-03-02 Tiger1

为什么我找到19而不是21？ – zhangxaochen

嗨@zhangxaochen，你在'睡眠'中停止了数字'g'而不是停止在''在'班'中的字符'i'。 – Tiger1

'男孩正在睡觉'的长度是19，'i'是位于第19位的第20位字符，如果从0开始索引。 – zhangxaochen

喜的人，试试这个：

doc1=['the boy is sleeping', 'in the class', 'not at home'] 
doc2=['the girl is reading', 'in the class', 'a serious student'] 

temp=''.join(list(set(doc1) & set(doc2))) 
resultDoc1 = ''.join(doc1).find(temp) 
resultDoc2 = ''.join(doc2).find(temp) 

print "Position of starting character for matching string in doc1=%d" % (resultDoc1 + 1) 
print "Position of starting character for matching string in doc2=%d" % (resultDoc2 + 1)

它的工作完全是你的期望！

来源

2014-03-02 19:40:05

Al Mamun，感谢您的解决方案。正如你所说，它完美运作。 – Tiger1

接受答案并投票，男人:) –

@Al Mamum，我仍然希望我会得到一个双码线的答案。 – Tiger1

如何使用python查找文本中的字符偏移

回答

相关问题