2015-01-04 78 views
1

值在熊猫,如果我有一个看起来像这样的交易数据,在数据帧(transdf):大熊猫发现间隔

OrderId, ShippmentSegmentsDays 
1  , 1 
2  , 3 
3  , 4 
4  , 10 

而且我还有一个DF(segmentdf)指定的时间间隔:

ShippmentSegmentDaysStart , ShippmentSegmentDaysEnd , ShippmentSegment 
-9999999     , 0      , 'On-Time' 
0       , 1      , '1 day late' 
1       , 2      , '2 days late' 
2       , 3      , '3 days late' 
3       , 9999999     , '>3 days late' 

我需要添加一个基于“ShippmentSegmentsDays”和“ShippmentSegment”的列。所以基本上从“transdf”每一行,我需要检查“ShippmentSegmentsDays”值,在其间隔可以从“segmentdf”

其结果是“transdf”应该是这样的发现:

OrderId, ShippmentSegmentsDays, ShippmentSegment 
1  , 1     , '1 day late' 
2  , 0     , 'On-Time' 
3  , 4     , '>3 days late' 
4  , 10     , '>3 days late' 

任何人都可以给我一个建议如何处理这种情况?

谢谢! 斯特凡

+1

这看起来类似于我以前回答的问题。看看它是否有帮助http://stackoverflow.com/questions/27464394/find-points-in-cells-through-pandas-dataframes-of-coordinates/27466566#27466566 –

回答

2

您可以使用pandas.apply(args)到功能应用到每个行的transdf数据帧,如果你知道,在segmentdf设置的规则是静态的,不会改变。也许下面的代码片段可以帮助你。我没有测试过,所以要小心,但我认为它应该让你开始朝正确的方向发展。

# create a series of just the data from the 'ShippmentSegmentDays' column 
seg_days_df = trends['ShippmentSegmentDays'] 

# Create a new column, 'ShippmentSegment', in 'transdf' data frame by calling 
# our utility function on the series created above. 
transdf['ShippmentSegment'] = seg_days_df.apply(calc_ship_segment, axis=1) 

# Utility function to define the rules set in the 'segmentdf' data frame 
def calc_ship_segment(num): 
    if not num: 
     return 'On Time' 
    elif num == 1: 
     return '1 Day Late' 
    elif num == 2: 
     return '2 Days Late' 
    elif num == 3: 
     return '3 Days Late' 
    else: 
     return '>3 Days Late' 
+0

'!num'不是有效的Python语法,我不认为。 – DSM

+0

好抓。我会编辑它 –