2017-09-18 46 views
0

我有一个数据集/ pandas df〜50列 - 列是字符,数字和日期的组合。其中5列是日期,标记为Meeting1-Meeting5,我试图计算会议日期之间的日期。ValueError:值的长度与索引|的长度不匹配计算日期之间的差异

我DF通常是这样的:

ID_number Meeting1 Meeting2 Meeting3 Meeting4 Meeting5 Comments … 
123456789 2014-09-17 2015-04-22 2015-05-30 NaN   NaN   text text … 
987654321 2015-09-22 NaN   2016-02-20 NaN   NaN   text text … 
456789123 2016-10-22 2017-05-29 NaN   NaN   NaN   text text … 

在SQL我将通常使用SELECT DATEDIFF(dd,Meeting1,Meeting2) AS diff_mt1_mt2 在Python我使用

from datetime import datetime 
from datetime import date 

df['diff_mt1_mt2'] = (df['Meeting2']-df['Meeting1']) 

尝试,但得到一个ValueError:值的长度不匹配长度的索引(完整的错误以下)

是否有更容易/更好的方式来做到这一点在Python?

完整的错误:

ValueError        Traceback (most recent call last) 
<ipython-input-9-055085bc04d7> in <module>() 
     3 from datetime import date 
     4 
----> 5 df['diff_mt1_mt2'] = (df['Meeting2']-df['Meeting1']), 

C:\Users\lmgagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value) 
    2427   else: 
    2428    # set column 
-> 2429    self._set_item(key, value) 
    2430 
    2431  def _setitem_slice(self, key, value): 

C:\Users\lmgagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value) 
    2493 
    2494   self._ensure_valid_index(value) 
-> 2495   value = self._sanitize_column(key, value) 
    2496   NDFrame._set_item(self, key, value) 
    2497 

C:\Users\lmgagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast) 
    2664 
    2665    # turn me into an ndarray 
-> 2666    value = _sanitize_index(value, self.index, copy=False) 
    2667    if not isinstance(value, (np.ndarray, Index)): 
    2668     if isinstance(value, list) and len(value) > 0: 

C:\Users\lmgagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy) 
    2877 
    2878  if len(data) != len(index): 
-> 2879   raise ValueError('Length of values does not match length of ' 'index') 
    2880 
    2881  if isinstance(data, PeriodIndex): 

ValueError: Length of values does not match length of index 

我使用:

Python版本3.6.1和熊猫版本0.20.1

+0

你可以添加数据样本吗?因为它应该工作。 – jezrael

+0

@jezrael增加了一些数据 – LMGagne

回答

0

我想你需要先转换Meetingdatetime S按to_datetime将参数errors='coerce'转换为非日期时间为NaT(日期时间缺失值):

#filter columns 
cols = df.columns[df.columns.str.startswith('Meeting')] 
df[cols] = df[cols].apply(lambda x: pd.to_datetime(x, errors='coerce')) 

df['diff_mt1_mt2'] = (df['Meeting2']-df['Meeting1']) 
+0

增加了这个代码,但它仍然产生相同的错误 – LMGagne

+0

没有数据很难回答。但也许使用老版本的熊猫。 – jezrael

0
import pandas as pd 
import numpy as np 

d1 = pd.to_datetime(['2014-09-17','2015-04-22','2015-05-30',np.NaN,np.NaN]) 
d2= pd.to_datetime(['2015-09-22',np.NaN,'2016-02-20',np.NaN,np.NaN]) 
d3= pd.to_datetime(['2016-10-22','2017-05-29',np.NaN,np.NaN,np.NaN]) 
data=[d1,d2,d3] 
index_serie = np.array((123456789,987654321,456789123)) 

df = pd.DataFrame(data=data,index=index_serie,columns=['Meeting 1','Meeting 2','Meeting 3','Meeting 4','Meeting 5']) 
df.index.name = 'ID_number' 
df['diff_mt1_mt2'] = (df['Meeting 2']-df['Meeting 1']) 

它适用于我最新版本的Python和Pandas。

相关问题