您可以使用:
import pandas as pd
import io
temp=u"""#,Job_ID,Date/Time,value1,value2,
0,ID1,05/01 24:00:00,5,6
1,ID2,05/02 24:00:00,6,15
2,ID3,05/03 24:00:00,20,21"""
dateparse = lambda x: pd.datetime.strptime(x.replace('24:','00:'), '%m/%d %H:%M:%S')
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
skipinitialspace=True,
date_parser=dateparse,
parse_dates=['Date/Time'],
index_col=['Date/Time'],
usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
header=0)
print (df)
Job_ID value1 value2
Date/Time
1900-05-01 ID1 5 6
1900-05-02 ID2 6 15
1900-05-03 ID3 20 21
另一种解决方案采用双replace
- year
也可以添加:
dateparse = lambda x: x.replace('24:','00:').replace(' ','/1900 ')
df = pd.read_csv(io.StringIO(temp),
skipinitialspace=True,
date_parser=dateparse,
parse_dates=['Date/Time'],
index_col=['Date/Time'],
usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
header=0)
print (df)
Job_ID value1 value2
Date/Time
1900-05-01 ID1 5 6
1900-05-02 ID2 6 15
1900-05-03 ID3 20 21
dateparse = lambda x: x.replace('24:','00:').replace(' ','/2016 ')
df = pd.read_csv(io.StringIO(temp),
skipinitialspace=True,
date_parser=dateparse,
parse_dates=['Date/Time'],
index_col=['Date/Time'],
usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
header=0)
print (df)
Job_ID value1 value2
Date/Time
2016-05-01 ID1 5 6
2016-05-02 ID2 6 15
2016-05-03 ID3 20 21
你总是喷滴! – Andreuccio
我面临导入类似数据集的任务,每小时值而不是每天。因此,我不需要用'00:'代替'24:',而是需要将所有的小时数移回1个单位,即:'24:' - >'23:',...,'01:' - >' 00:'。代码如何改变呢? – Andreuccio
我想同样的方法,只减去一小时像'df.index = df.index - pd.Timedelta(1,unit ='h')' – jezrael