使用pandas.read_csv设置标题

我有一个csv文件，我使用pandas API读入数据框。我打算设置我自己的标题，而不是默认的第一行。（我也摆脱了一些行。）我如何最好地实现这一目标？使用pandas.read_csv设置标题

我尝试以下，但这并没有达到预期效果：

header_row=['col1','col2','col3','col4', 'col1', 'col2'] # note the header has duplicate column values 
df = pandas.read_csv(csv_file, skiprows=[0,1,2,3,4,5], names=header_row)

这给了下面的错误 -

File "third_party/py/pandas/io/parsers.py", line 187, in read_csv 
File "third_party/py/pandas/io/parsers.py", line 160, in _read 
File "third_party/py/pandas/io/parsers.py", line 628, in get_chunk 
File "third_party/py/pandas/core/frame.py", line 302, in __init__ 
File "third_party/py/pandas/core/frame.py", line 388, in _init_dict 
File "third_party/py/pandas/core/internals.py", line 1008, in form_blocks 
File "third_party/py/pandas/core/internals.py", line 1036, in _simple_blockify 
File "third_party/py/pandas/core/internals.py", line 1068, in _stack_dict 
IndexError: index out of bounds

我然后通过

df.columns = header_row

试图设置列但是这可能是因为列值重复而出错。

File "engines.pyx", line 101, in pandas._engines.DictIndexEngine.get_loc  
(third_party/py/pandas/src/engines.c:2498) 
File "engines.pyx", line 107, in pandas._engines.DictIndexEngine.get_loc 
(third_party/py/pandas/src/engines.c:2447) 
Exception: ('Index values are not unique', 'occurred at index entity')

我正在使用熊猫0.7.3版本。从文档 -

名称：阵列状列名的列表

我相信我是缺少在这里简单的东西。感谢您的帮助。

来源

2012-08-22 Manju

Pandas 0.7.3不支持索引重复。您至少需要0.8.0，在0.8.0和0.8.1之间。索引中重复的几个问题是固定的，所以0.8.1（最近的稳定版本）可能是最好的。但是，即使0.8.1也不能解决您的问题，因为此版本的issue包含重复的列名称（不能显示具有重复列名的数据帧）。

来源

2012-08-22 07:27:59

感谢您的参考。我重新访问并摆脱了重复列值的要求。 – Manju

使用pandas.read_csv设置标题

回答

相关问题