2016-11-29 34 views
1

使用恒定的值我有一个表中的大熊猫DF在大熊猫添加日期列DF str中

product_id_x product_id_y count 
0 2727846   7872456  1 
1 29234    2932348  2 
2 29346    9137500  1 
3 29453    91365738  1 
4 2933666   91323494  1 

我想添加一个新列“日期”这是我在海峡定义。

dateSelect = "'2016-11-06'" 

,所以我增加了一个新的常数列

df['dates'] = dateSelect 

,但我得到的结果

product_id_x product_id_y count dates 
0 2727846   7872456   1 '2016-11-06' 
1 29234   2932348   2 '2016-11-06' 
2 29346   9137500   1 '2016-11-06' 
3 29453   91365738  1 '2016-11-06' 
4 2933666   91323494  1 '2016-11-06' 

在日期中的值在引号来了。和

type(df['dates']) = str 

但我想它的日期格式,因为我要进一步将此表存储在我的mysql数据库。我想要的类型是日期。

​​

回答

2

我觉得你可以先用replace'的空白处,然后to_datetime

dateSelect = pd.to_datetime("'2016-11-06'".replace("'","")) 
print (dateSelect) 
2016-11-06 00:00:00 

print (type(dateSelect)) 
<class 'pandas.tslib.Timestamp'> 

df['dates'] = pd.to_datetime("'2016-11-06'".replace("'","")) 

print (df) 
    product_id_x product_id_y count  dates 
0  2727846  7872456  1 2016-11-06 
1   29234  2932348  2 2016-11-06 
2   29346  9137500  1 2016-11-06 
3   29453  91365738  1 2016-11-06 
4  2933666  91323494  1 2016-11-06 

print (df.dtypes) 
product_id_x    int64 
product_id_y    int64 
count     int64 
dates   datetime64[ns] 
dtype: object 
+0

是的,先生,你的答案前几秒钟我也试了一下,它不.replace工作的罚款(” '“,”“) – Shubham

+0

是的,如果将''''和''''加在一起,首先需要删除内部引号,然后它就完美了。或者只使用像“2016-11-06”或“2016-11-06”这样的一个,那么不需要“替换”。 – jezrael

0

在它不把双引号,避免将其定义为字符串。

dateSelect = '2016-11-06' 
df['dates'] = dateSelect 
1

啊! @jezrael最先到达那里......

print timeit.timeit(""" 
import pandas as pd 
import datetime as dt 
import timeit 
df = pd.read_csv('date_time_pandas.csv') 
dateSelect_str = "2016-11-06" 

# using standard datetime 
dateSelect = dt.datetime.strptime(dateSelect_str,"%Y-%m-%d") 
df['dates'] = dateSelect 
#print(df['dates']) 
""",number=100) 


# Alternate method using pandas datetime 
print timeit.timeit(""" 
import pandas as pd 
import datetime as dt 
import timeit 
df = pd.read_csv('date_time_pandas.csv') 
dateSelect_str = "2016-11-06" 

dateSelect = pd.to_datetime(dateSelect_str, format='%Y-%m-%d', errors='ignore') 
df['dates'] = dateSelect 
#print df['dates'] 
""",number=100) 

给输出 -

0.228258825751 
0.167258402887 

上的平均。

结论在这种情况下使用pd_datetime更有效

1

最直接的路线

df['dates'] = pd.Timestamp('2016-11-06') 
df 

    product_id_x product_id_y count  dates 
0  2727846  7872456  1 2016-11-06 
1   29234  2932348  2 2016-11-06 
2   29346  9137500  1 2016-11-06 
3   29453  91365738  1 2016-11-06 
4  2933666  91323494  1 2016-11-06