如果我读只是一块CSV的我得到的数据结构以下的毗连改变类别类型到对象/ float64
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 100000 entries, (2015-11-01 00:00:00, 4980770) to (2016-06-01 00:00:00, 8850573)
Data columns (total 5 columns):
CHANNEL 100000 non-null category
MCC 92660 non-null category
DOMESTIC_FLAG 100000 non-null category
AMOUNT 100000 non-null float32
CNT 100000 non-null uint8
dtypes: category(3), float32(1), uint8(1)
memory usage: 1.9+ MB
如果我在阅读整个CSV和CONCAT块按照上述我得到如下结构:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 30345312 entries, (2015-11-01 00:00:00, 4980770) to (2015-08-01 00:00:00, 88838)
Data columns (total 5 columns):
CHANNEL object
MCC float64
DOMESTIC_FLAG category
AMOUNT float32
CNT uint8
dtypes: category(1), float32(1), float64(1), object(1), uint8(1)
memory usage: 784.6+ MB
为什么分类变量改为object/float64?我怎样才能避免这种类型的变化? ESP。在float64
这是级联代码:
df = pd.concat([process(chunk) for chunk in reader])
处理功能只是做一些清洁和类型分配
你可以发布你用来加载和连接它的代码吗? –
分类也有'NaN'问题,有时 –
现在加入到文本 – snovik