1
我正在构建测试用例,我想比较2个数据框。 尽管数据帧具有相同的列和值,但assert_frame_equal报告不相等。 列顺序不同,我尝试重新排列列没有任何成功。大熊猫assert_frame_equal错误
在使用下面的函数我的测试用例林:
testing.assert_frame_equal(expected, tested, check_dtype=False)
第一数据帧被声明这样的:
df2 = pandas.DataFrame({
'artista': [u'Beyoncé', 'Radiolab', 'Xmas', 'Beyonce'],
'mid_sugerido': ['/g/11bz0dg4b_', '/g/11bt_6j9dk', '/g/11c2nz8jc2', '/g/11bt_6jXXX'],
'texto': ['Lemonade', 'Radiolab', 'Merry Christmas Lil Mama', 'Beyonce'],
'busqueda': [u'Beyoncé', 'Radiolab', 'Xmas', 'Beyonce'],
'texto_sugerido': ['Lemonade', 'Radiolab', 'Merry Christmas Lil Mama', 'Beyonce'],
'artista_sugerido': [u'Beyoncé', 'Radiolab', None, 'Beyonce'],
'media_sugerido': ['album', 'album', 'track', 'album'],
})
熊猫数据帧PD2:
artista artista_sugerido busqueda media_sugerido mid_sugerido \
0 Beyoncé Beyoncé Beyoncé album /g/11bz0dg4b_
1 Radiolab Radiolab Radiolab album /g/11bt_6j9dk
2 Xmas None Xmas track /g/11c2nz8jc2
3 Beyonce Beyonce Beyonce album /g/11bt_6jXXX
texto texto_sugerido
0 Lemonade Lemonade
1 Radiolab Radiolab
2 Merry Christmas Lil Mama Merry Christmas Lil Mama
3 Beyonce Beyonce
第二数据帧是从函数(结果)返回的数据帧。
artista busqueda mid_sugerido texto \
0 Beyoncé Beyoncé /g/11bz0dg4b_ Lemonade
1 Radiolab Radiolab /g/11bt_6j9dk Radiolab
2 Xmas Xmas /g/11c2nz8jc2 Merry Christmas Lil Mama
3 Beyonce Beyonce /g/11bt_6jXXX Beyonce
texto_sugerido artista_sugerido media_sugerido
0 Lemonade Beyoncé album
1 Radiolab Radiolab album
2 Merry Christmas Lil Mama None track
3 Beyonce Beyonce album
我得到以下错误当我运行:assert_frame_equal(df2, result)
Traceback (most recent call last):
File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 158, in <module>
assert_frame_equal(df6, _Normalize(df5, test_dict))
File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 16, in assert_frame_equal
testing.assert_frame_equal(expected, tested, check_dtype=False)
File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 1142, in assert_frame_equal
obj='{0}.columns'.format(obj))
File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 761, in assert_index_equal
obj=obj, lobj=left, robj=right)
File "pandas/src/testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3887)
File "pandas/src/testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2769)
File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 915, in raise_assert_detail
raise AssertionError(msg)
AssertionError: DataFrame.columns are different
DataFrame.columns values are different (85.71429 %)
[left]: Index([u'artista', u'artista_sugerido', u'busqueda', u'media_sugerido',
u'mid_sugerido', u'texto', u'texto_sugerido'],
dtype='object')
[right]: Index([u'artista', u'busqueda', u'mid_sugerido', u'texto', u'texto_sugerido',
u'artista_sugerido', u'media_sugerido'],
dtype='object')
列是相同的,但不同的顺序,如果使用df.sort_index(轴= 1)进行重新排序我得到的列:
Traceback (most recent call last):
File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 154, in <module>
assert_frame_equal(df6.sort_index(axis=1), _Normalize(df5, test_dict).sort_index(axis=1))
File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 16, in assert_frame_equal
testing.assert_frame_equal(expected, tested, check_dtype=False, check_like=False)
File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 1166, in assert_frame_equal
obj='DataFrame.iloc[:, {0}]'.format(i))
File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 1049, in assert_series_equal
check_less_precise, obj='{0}'.format(obj))
File "pandas/src/testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3887)
File "pandas/src/testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2769)
File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 914, in raise_assert_detail
[right]: {3}""".format(obj, message, left, right)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)
为什么这两种方法之间有什么区别?他们不应该产生相同的结果吗? – pansen
我这么认为,我还是不明白它为什么会起作用,会进一步调试。 – spicyramen