2017-03-25 90 views
0

这是我的文本文件的前20行,我有这样的50K行。python字典和功能不工作

prov_type|prov_type_desc 
0|FAMILY PRACTICE/CLINIC 
1|FAMILY PRACTICE 
2|ALLERGIST 
3|DERMATOLOGIST 
4|INTERNIST 
5|NEUROLOGIST 
6|NEUROSURGEON 
7|OB/GYN 
8|OPTHAMOLOGIST 
9|ORTHOPEDIST 
10|OTOLARYNGOLOGIST 
11|PATHOLOGIST 
12|PEDIATRICIAN 
13|PLASTIC SURGEON 
14|COLON AND RECTAL SURGERY 
15|PSYCHIATRIST 
16|RADIOLOGIST 
17|SURGEON 
18|THORACIC SURGEON 
19|UROLOGIST 
20|ANESTHESIOLOGIST 

我读这样的,

ovations = pd.read_csv("Ovations.txt",sep='|',dtype=object) 
ovations.rename(columns={'prov_type_desc':'specialty'},inplace=True) 

我写了一本字典,以匹配特产,这里是字典。

options = {'FAMILYPRACTICESELF-REFFERAL' : 'FAMILY PRACTICE', 
'FAMILYPRACTICESPECIALIST' : 'FAMILY PRACTICE', 
'FAMILYPRACTICE/CLINIC' : 'FAMILY PRACTICE', 
'GENERALPRACTICE' : 'FAMILY PRACTICE', 
'ALLERGY' : 'ALLERGIST', 
'ALLERGYANDIMMUNOLOGY' : 'ALLERGIST', 
'ALLERGY&IMMUNOLOGY' : 'ALLERGIST', 
'ALLERGY/IMMUNOLOGY' : 'ALLERGIST', 
'CARDIOLOGY' : 'CARDIOLOGIST', 
'CARDIOLOGYGROUP' : 'CARDIOLOGIST', 
'CARDIOVASCULARDISEASE' : 'CARDIOLOGIST', 
'COLON&RECTALSURGERY' : 'COLON AND RECTAL SURGERY', 
'COLON/RECTALSURGERY' : 'COLON AND RECTAL SURGERY', 
'COLORECTALSURGERY' : 'COLON AND RECTAL SURGERY', 
'DERMATOLOGYGROUP' : 'DERMATOLOGIST', 
'DERMATOLOGY' : 'DERMATOLOGIST', 
'ENDOCRINOLOGY,DIABETES,ANDMETABOLISM' : 'ENDOCRINOLOGIST', 
'ENDOCRINOLOGY' : 'ENDOCRINOLOGIST', 
'ENDODONDIST' : 'ENDODONTICS', 
'GASTROENTEROLOGY' : 'GASTROENTEROLOGIST', 
'GASTROENTEROLOGYGROUP' : 'GASTROENTEROLOGIST', 
'GENETICCOUNSELOR' : 'GENETIC TESTING/COUNSELING CENTER', 
'GENETICS,CLINICAL(MD)' : 'GENETIC TESTING/COUNSELING CENTER', 
'GENETICS,CLINICALMOLECULAR' : 'GENETIC TESTING/COUNSELING CENTER', 
'HEMATOLOGYONCOLOGY' : 'HEMATOLOGY/ONCOLOGY', 
'HEMATOLOGIST' : 'HEMATOLOGY/ONCOLOGY', 
'HEMATOLOGY' : 'HEMATOLOGY/ONCOLOGY', 
'HEMATOLOGYGROUP' : 'HEMATOLOGY/ONCOLOGY', 
'HEMATOLOGY-ONCOLOGY' : 'HEMATOLOGY/ONCOLOGY', 
'HEMATOLOGY-ONCOLOGYGROUP' : 'HEMATOLOGY/ONCOLOGY', 
'HOSPICE&PALLATIVEMED' : 'HOSPICE', 
'HOSPITALOP/LAB/XRAY' : 'HOSPITAL', 
'HOSPITALIST' : 'HOSPITAL', 
'INFECTIOUSDISEASEMEDICINE' : 'INFECTIOUS DISEASE', 
'INTERNALMED' : 'INTERNAL MEDICINE', 
'INTERNALMEDICINESPECIALIST' : 'INTERNAL MEDICINE', 
'INTERNIST' : 'INTERNAL MEDICINE', 
'INFECTIOUSDISEASESEPCIALIST' : 'INFECTIOUS DISEASE', 
'NEPHROLOGY' : 'NEPHROLOGIST', 
'NEUROLOGY' : 'NEUROLOGIST', 
'OBSTETRICS' : 'OBSTETRICS AND GYNECOLOGY', 
'OBSTETRICS&GYNECOLOGY' : 'OBSTETRICS AND GYNECOLOGY', 
'OBSTETRICS/GYNECOLOGY' : 'OBSTETRICS AND GYNECOLOGY', 
'OB/GYNGROUP' : 'OBSTETRICS AND GYNECOLOGY', 
'OBSTETRICSGYNECOLOGY' : 'OBSTETRICS AND GYNECOLOGY', 
'OBGYNECOLOGISTSPECIALTY' : 'OBSTETRICS AND GYNECOLOGY', 
'OB/GYN' : 'OBSTETRICS AND GYNECOLOGY', 
'OB/GYNSELFREFCAP' : 'OBSTETRICS AND GYNECOLOGY', 
'GYNECOLOGY' : 'OBSTETRICS AND GYNECOLOGY', 
'ONCOLOGY' : 'ONCOLOGIST', 
'GYNECOLOGICONCOLOGY' : 'ONCOLOGIST', 
'GYNECOLOGICALONCOLOGY' : 'ONCOLOGIST', 
'GYNECOLOGICAL/ONCOLOGY' : 'ONCOLOGIST', 
'OPHTHALMOLOGY' : 'OPTHAMOLOGIST', 
'OTOLARYNGOLOGY' : 'OTOLARYNGOLOGIST', 
'OTOLARYNGOLOGY(ENT)' : 'OTOLARYNGOLOGIST', 
'PATHOLOGY' : 'PATHOLOGIST', 
'PATHOLOGYSERVICES' : 'PATHOLOGIST', 
'PATHOLOGY,ANATOMIC' : 'PATHOLOGIST', 
'CYTOPATHOLOGY' : 'PATHOLOGIST', 
'PATHOLOGY,ANATOMICAL&CLINICAL' : 'PATHOLOGIST', 
'PATHOLOGY,BLOOD BANKING/TRANSFUSIONMED' : 'PATHOLOGIST', 
'PATHOLOGY,CLINICAL' : 'PATHOLOGIST', 
'PATHOLOGY,CYTOPATHOLOGY' : 'PATHOLOGIST', 
'PATHOLOGY,DERMATOPATHOLOGY' : 'PATHOLOGIST', 
'PATHOLOGY,HEMATOLOGY' : 'PATHOLOGIST', 
'PATHOLOGY,IMMUNOPATHOLOGY' : 'PATHOLOGIST', 
'PATHOLOGY,NEUROPATHOLOGY' : 'PATHOLOGIST', 
'DERMATOLOGY-DERMATOPATHOLOGY' : 'PATHOLOGIST', 
'DERMATOPATHOLOGY' : 'PATHOLOGIST', 
'PEDIATRICMEDICINE' : 'PEDIATRICIAN', 
'PEDIATRSELFREFCAP' : 'PEDIATRICIAN', 
'PEDIATRICSPECIALTYIALIST' : 'PEDIATRICIAN', 
'PEDIATRICS' : 'PEDIATRICIAN', 
'PEDIATRICSSPECIALTYIALIST' : 'PEDIATRICIAN', 
'PLASTICANDRECONSTRUCTIVESURGERY' : 'PLASTIC SURGEON', 
'PLASTICSURGERY' : 'PLASTIC SURGEON', 
'PLASTICSURGERYWITHINTHEHEAD&NECK' : 'PLASTIC SURGEON', 
'PSYCHIATRY' : 'PSYCHIATRIST'} 

我为了得到该键的值写了这样的功能,

def key_in_dic(p): 
    return next((options[x] for x in p if x in options), 'Other') 
ovations['specialty_adj'] = key_in_dic(list(ovations['specialty'])) 

它无法按预期工作,有什么能在这个问题?

下面是我如何,我应该返回其他非匹配键,它是ALLERGIST,但事实并非如此。

enter image description here 谢谢。

+1

也许补充,它是如何工作的,并突出它应该如何不匹配? – Dilettant

+1

已更新,请检查 – subro

+0

为什么不使用'options.get(x,default ='Other')'为不存在的专业指定默认值? – Barmar

回答

1

正如Barmar已经指出的那样,您可以使用get字典的方法。我认为以下应该给你想要的东西:

ovations["specialty_adj"] = ovations["specialty"].apply(lambda x: options.get(x, "Other")) 
+0

谢谢,它按预期工作。 – subro

+0

还有一个帮助,我想返回'专业',如果它不匹配,你能建议我该怎么做? – subro

+0

乐意帮忙。请注意,您需要在这里为字符串比较匹配拼写,因为'options'包含'FAMILYPRACTICE/CLINIC',但在数据框中它被写入'FAMILY PRACTICE/CLINIC'并且有空格。根据你想达到的目标,你可以尝试在lambda表达式中使用'options.get(x.replace(“,”“),”Other“',但你应该接受这个答案作为正确的答案, – ValD

1

使用dict.get()方法指定找不到密钥时的默认值。

def key_in_dict(p): 
    return (options.get(x, default='Other') for x in p) 
+1

它给我'<生成器对象key_in_dic。 subro