在你原来的代码行
dic[country]= dic[country]+1
应引起KeyError
,因为关键是还没有出现在字典中,当一个国家被满足第一次。相反,你应该检查重点是存在的,如果不是,初始化值设为1。
在另一方面,它不会,因为检查
if country in country_codes['English short name lower case']:
收益率对于所有的值False
:一Series
对象的__contains__
与indices instead of values一起使用。你应该例如检查
if country in country_codes['English short name lower case'].values:
如果你的list of values is short。
对于一般计数任务,Python提供collections.Counter,它的行为有点像defaultdict(int)
,但带来了额外的好处。它删除键等的人工检查的需要
正如你已经有DataFrame
对象,你可以使用的工具pandas规定:
In [12]: country_codes = pd.read_csv('wikipedia-iso-country-codes.csv')
In [13]: text = pd.DataFrame({'SomeText': """Finland , Finland , Finland
...: The country where I want to be
...: Pony trekking or camping or just watch T.V.
...: Finland , Finland , Finland
...: It's the country for me
...:
...: You're so near to Russia
...: so far away from Japan
...: Quite a long way from Cairo
...: lots of miles from Vietnam
...:
...: Finland , Finland , Finland
...: The country where I want to be
...: Eating breakfast or dinner
...: or snack lunch in the hall
...: Finland , Finland , Finland
...: Finland has it all
...:
...: Read more: Monty Python - Finland Lyrics | MetroLyrics
...: """.split()})
In [14]: text[text['SomeText'].isin(
...: country_codes['English short name lower case']
...:)]['SomeText'].value_counts().to_dict()
...:
Out[14]: {'Finland': 14, 'Japan': 1}
此发现的text
行,其中SomeText列的值是英文简称英文简称country_codes
列,计算唯一值SomeText,并转换为字典。
In [49]: where_sometext_isin_country_codes = text['SomeText'].isin(
...: country_codes['English short name lower case'])
In [50]: filtered_text = text[where_sometext_isin_country_codes]
In [51]: value_counts = filtered_text['SomeText'].value_counts()
In [52]: value_counts.to_dict()
Out[52]: {'Finland': 14, 'Japan': 1}
相同与Counter
:
In [23]: from collections import Counter
In [24]: dic = Counter()
...: ccs = set(country_codes['English short name lower case'])
...: for country in text['SomeText']:
...: if country in ccs:
...: dic[country] += 1
...:
In [25]: dic
Out[25]: Counter({'Finland': 14, 'Japan': 1})
或简单地:用描述中间变量的相同
In [30]: ccs = set(country_codes['English short name lower case'])
In [31]: Counter(country for country in text['SomeText'] if country in ccs)
Out[31]: Counter({'Finland': 14, 'Japan': 1})
是'country_codes'的'dictionary'? –
你现在的代码有一个缩进错误 - 你应该先看看。 –
不,缩进只是我在这里剪切和粘贴的结果 – JayDoe