我在这里再次希望找到解决我的编码噩梦。我有一本词典term_dict
,其中包含术语列表作为键和术语类别作为值。还有一个带有ID和Notes列的数据框data
。任务是在每data.ID
记录中使用term_dict
在data.Notes
中查找匹配项。从熊猫数据帧中提取字符串
term_dict{
Ibuprofen 800mg : Drug
Hip Replacement Surgery : Treatment
Tylenol AM : Drug
Mild Dislocation : Treatment
Advil : Drug
Fractured Tibia : Treatment
Quinone : Drug
Fever : Treatment
Penicillin 250mg : Drug
Histerectomy : Treatment
Surgical removal of bunion : Treatment
Therapy : Treatment
Bunion : Treatment
Hospita X : Location
mg : Dosage
stop : Exclusion
}
data:
ID Notes
604 Take 2 tablets of advil & 3 caps of pen
250mg twice daily
602 Stop pen but cont. with advil
as needed for the fracture
210 2 tabs of Tyl 3x daily for 5 days
607 nan
700 surgery scheduled for 01/01/2017
515 nan
019 Call my office if bunion pain persist
after 3 days
604 f/up appt. @Hospital X
到目前为止,这是我的代码:
lists = []
for s in data['Notes']:
cleanNotes = " " + " ".join(re.split(r'[^a-z 0-9]|[w/]',s.lower())) + " "
for k, v in term_dict.items():
k = " %s "%k
if k in cleanNotes and v != exclusion:
if k in cleanNotes and v == 'drug':
lists.append(k)
data['Drug'] = ':'.join(str(lists))
elif k in cleanNotes and v == 'location':
lists.append(k)
data['Location'] = ' '.join(str(lists))
elif k in cleanNotes and v == 'treatment':
lists.append(k)
data['Treatment'] = ':'.join(str(lists))
elif k in cleanNotes and v == 'dosage':
lists.append(k)
data['Dosage'] = ':'.join(str(lists))
else:
for s in data.Notes:
matches = list(datefinder.find_dates(s.lower()))
data['Date'] = ', '.join([str(dates) for dates in matches])
....我的输出没有什么期望,因为代码只是填充从他过去的记录与匹配数据帧的新列数据帧的:
data:
ID Notes Drug Dosage Location Treatment Date
604 Take 2 tablets of advil & 3 caps of pen advil Hospital X
250mg twice daily
602 Stop pen but cont. with advil advil Hospital X
as needed for the fracture
210 2 tabs of Tyl 3x daily for 5 days advil
607 nan advil
700 surgery scheduled for 01/01/2017 advil
515 nan advil
019 Call my office if bunion pain persist advil
after 3 days
604 f/up appt. @Hospital X. cont w/advil advil Hospital X
***但是预期输出:
data:
ID Notes Drug Dosage Location Treatment Date
604 Take 2 tablets of advil & 3 caps of pen advil:penicilin 0:250mg
250mg twice daily
602 Stop pen but cont. with advil advil fracture
as needed for the fracture
210 2 tabs of Tyl 3x daily for 5 days Tylenol
607 nan
700 surgery scheduled for 01/01/2017 surgery 01/01/2017
515 nan
019 Call my office if bunion pain persist bunion
after 3 days
604 f/up appt. @Hospital X. cont w/advil advil Hospital X
如果我能修复这个重复,我将不胜感激。谢谢!
什么是“单词”?你为什么使用它? –
@COLDSPEED - 这是Notes列中每个备注的干净版本。清洁 - 意思是不含任何/所有特殊字符 – CodeLearner