7
我有一个csv文件,我使用pandas API读入数据框。 我打算设置我自己的标题,而不是默认的第一行。 (我也摆脱了一些行。)我如何最好地实现这一目标?使用pandas.read_csv设置标题
我尝试以下,但这并没有达到预期效果:
header_row=['col1','col2','col3','col4', 'col1', 'col2'] # note the header has duplicate column values
df = pandas.read_csv(csv_file, skiprows=[0,1,2,3,4,5], names=header_row)
这给了下面的错误 -
File "third_party/py/pandas/io/parsers.py", line 187, in read_csv
File "third_party/py/pandas/io/parsers.py", line 160, in _read
File "third_party/py/pandas/io/parsers.py", line 628, in get_chunk
File "third_party/py/pandas/core/frame.py", line 302, in __init__
File "third_party/py/pandas/core/frame.py", line 388, in _init_dict
File "third_party/py/pandas/core/internals.py", line 1008, in form_blocks
File "third_party/py/pandas/core/internals.py", line 1036, in _simple_blockify
File "third_party/py/pandas/core/internals.py", line 1068, in _stack_dict
IndexError: index out of bounds
我然后通过
df.columns = header_row
试图设置列但是这可能是因为列值重复而出错。
File "engines.pyx", line 101, in pandas._engines.DictIndexEngine.get_loc
(third_party/py/pandas/src/engines.c:2498)
File "engines.pyx", line 107, in pandas._engines.DictIndexEngine.get_loc
(third_party/py/pandas/src/engines.c:2447)
Exception: ('Index values are not unique', 'occurred at index entity')
我正在使用熊猫0.7.3版本。 从文档 -
名称:阵列状 列名的列表
我相信我是缺少在这里简单的东西。感谢您的帮助。
感谢您的参考。我重新访问并摆脱了重复列值的要求。 – Manju