我想用一个unicode正则表达式来过滤DataFrame的列。我需要代码与python2和python3兼容。如何将DataFrame.filter与包含unicode的regex一起使用?
df.filter(regex=u'证券代码')
的代码抛出错误python2
File "D:\Applications\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2469, in filter
axis=axis_name)
File "D:\Applications\Anaconda2\lib\site-packages\pandas\core\generic.py", line 1838, in select
np.asarray([bool(crit(label)) for label in axis_values])]
File "D:\Applications\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2468, in <lambda>
return self.select(lambda x: matcher.search(str(x)) is not None,
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
所以,我写一个单元测试:
class StrTest(unittest.TestCase):
def test_str(self):
str(u'证券代码')
它报告同样的错误。
有关此错误的任何想法?如何使用unicode正则表达式过滤DataFrame?
这个问题与你的问题有关:https://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20 – Craig
这个打开的bug报告熊猫看起来像描述了你的问题:https://github.com/pandas-dev/pandas/issues/13101 – Craig
似乎我可以使用sys.setdefaultencoding(“utf-8”)来解决问题。但它说为了避免这 - http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script – user1633272