2017-05-24 121 views
1

我试图用包含科学记数法中的数字的列读取.csv。 不管我做什么,它结束了阅读他们作为字符串:在熊猫中读取为字符串的科学记数法

def readData(path, cols): 
    types = [str, str, str, str, np.float32] 
    t_dict = {key: value for (key, value) in zip(c, types)} 

    df = pd.read_csv(path, header=0, sep=';', encoding='latin1', usecols=cols, dtype=t_dict, chunksize=5000) 

    return df 

c = [3, 6, 7, 9, 16] 
df2017_chunks = readData('Data/2017.csv', c) 

def preProcess(df, f):  
    df.columns = f 
    df['id_client'] = df['id_client'].apply(lambda x: str(int(float(x)))) 

    return df 

f = ['issue_date', 'channel', 'product', 'issue', 'id_client'] 

df = pd.DataFrame(columns=f) 
for chunk in df2017_chunks: 
    aux = preProcess(chunk, f) 
    df = pd.concat([df, aux]) 

我怎样才能正确读取这些数据?

+2

您可以张贴小样本哪些大熊猫正试图读取CSV的? – cardamom

回答

0

您的预处理函数在应用其他函数后应用字符串转换。这是预期的行为?

你能尝试:

df = pd.read_csv(path, header=0, sep=';', encoding='latin1', usecols=cols, chunksize=5000) 
df["id_client"] = pd.to_numeric(df["id_client"]) 
0

样品数据框:

df = pd.DataFrame({'issue_date': [1920,1921,1922,1923,1924,1925,1926], 
    'name': ['jon doe1','jon doe2','jon doe3','jon doe4','jon doe5','jon doe6','jon doe7'], 
    'id_cleint': ['18.61', '17.60', '18.27', '16.18', '16.81', '16.37', '67.07']}) 

你可以用如下命令来检查数据帧的类型

print df.dtypes 

输出:

id_client  object 
issue_date  int64 
name   object 
dtype: object 

转换df['id_client'] D型细胞从使用下面的命令objectfloat64

df['id_client'] = pd.to_numeric(df['id_client'], errors='coerce') 

errors='coerce'将导致NaN当物品不能被转换。在下面的输出使用命令
print df.dtypes结果:

id_client  float64 
issue_date  int64 
name   object 
dtype: object 
相关问题