,需要在每个字符串列表分为:
import pandas as pd
df = pd.DataFrame([sub.split(",") for sub in l])
print(df)
输出:
0 1 2 3 4 5 6
0 AN 2__AS000 26 20150826113000 -283.000 20150826120000 -283.000
1 AN 2__A000 26 20150826113000 0.000 20150826120000 0.000
2 AN 2__AE000 26 20150826113000 -269.000 20150826120000 -269.000
3 AN 2__AE000 26 20150826113000 -255.000 20150826120000 -255.000
4 AN 2__AE00 26 20150826113000 -254.000 20150826120000 -254.000
如果你知道有多少行的CSV跳过你可以使用skiprows=lines_of_metadata
read_csv做这一切:
import pandas as pd
df = pd.read_csv("in.csv",skiprows=3,header=None)
print(df)
或者,如果元数据的每一行都以特定字符开头,则可以使用co mment:
df = pd.read_csv("in.csv",header=None,comment="#")
如果需要指定一个以上的字符,你可以结合itertools.takewhile
将下降开始的行xxx
:
import pandas as pd
from itertools import dropwhile
import csv
with open("in.csv") as f:
f = dropwhile(lambda x: x.startswith("#!!"), f)
r = csv.reader(f)
df = pd.DataFrame().from_records(r)
使用输入数据增加,从第!!一些行:
#!! various
#!! metadata
#!! lines
AN,2__AS000,26,20150826113000,-283.000,20150826120000,-283.000
AN,2__A000,26,20150826113000,0.000,20150826120000,0.000
AN,2__AE000,26,20150826113000,-269.000,20150826120000,-269.000
AN,2__AE000,26,20150826113000,-255.000,20150826120000,-255.000
AN,2__AE00,26,20150826113000,-254.000,20150826120000,-254.000
输出:
0 1 2 3 4 5 6
0 AN 2__AS000 26 20150826113000 -283.000 20150826120000 -283.000
1 AN 2__A000 26 20150826113000 0.000 20150826120000 0.000
2 AN 2__AE000 26 20150826113000 -269.000 20150826120000 -269.000
3 AN 2__AE000 26 20150826113000 -255.000 20150826120000 -255.000
4 AN 2__AE00 26 20150826113000 -254.000 20150826120000 -254.000
我无法重现您的错误:'l = [[''AA','2__000',26,20150826113000,-283.000,20150826120000,-283.000],['BB','2__DI9',26,20150826113000, 0.00020150826120000,0.000],[ 'CC','2__GH6',26,20150826113000,-269.000,20150826120000,-269.000]] pd.DataFrame(l)'可以正常工作 – EdChum
您可以发布'print mylist)' – EdChum
由于是2k行,所以我限制了上面的结果。但是,当我打印(DF)我得到的所有数据[1922行×1列] – user636322