我有2个CSV文件。我希望列表A中的每个元素都与列表B中的每个元素进行匹配。列表A充当训练集,列表B有错误,使用编辑距离获得匹配后会得到修正。csv文件中的两列读取为单列。 python 2.7
问题是B中有两列,第一列有唯一编号,第二列有固定的字符串。
即时得到的输出为:
628227teitARMTEteke : iQIARMTEMAC
628226iQIARMTEMAC 9 : iQIARMTEMAC
628229iQIAConfigCH : iQIAConfigCH
627701iQIAConfigCH : iQIAConfigCH
,但我想我的输出是:
628227 : teitARMTEteke : iQIARMTEMAC
628226 : iQIARMTEMAC 9 : iQIARMTEMAC
628229 : iQIAConfigCH : iQIAConfigCH
627701 : iQIAConfigCH : iQIAConfigCH
CODE
import csv
from nltk.metrics import distance
with open("all_correct_promo.csv","rb") as file1:
reader1 = csv.reader(file1)
correctPromoList = [''.join(i) for i in reader1]
# print correctPromoList
with open("all_extracted_promo3.csv","rb") as file2:
reader2 = csv.reader(file2)
extractedPromoList = [''.join(i) for i in reader2]
#print extractedPromoList
incorrectPromo = {}
count = 0
for extracted in extractedPromoList:
#print 'Computing %dth promo code...' % count
incorrectPromo[extracted] = find_min_edit(extracted,correctPromoList) # get comma separated str of real promo codes nearest to extracted
count+=1
#print incorrectPromo
for key, value in incorrectPromo.iteritems():
print key ,':', value
眼下唯一的数字越来越阅读与弦将影响字符串得到纠正的方式。我想用它的字符串,但不影响该字符串得到与来自all_correct_promo.csv
all_extracted_promo3.csv628229 iQIABundUPGR
628229 iQIAPortUPGR
628229 iQIAConfigCH
628229 iQIARMTEMAC 9
样品
样本名单A.字符串匹配的方式来显示的数字
iQ BundleUPGR
IQ MANAGED
IQ04 BRP
IQ1MOBILSUP
IQ2MOBILSUP
iQBundIeUPGR
iQBundle 1
iQBundle 2
什么是列表A? –
all_correct_promo.csv - 将列表A – safwan
我有点困惑。带有数字的字符串,它来自'all_correct_promo.csv',你想对没有数字的字符串进行距离计算? –