熊猫比较基于条件

-1

dataframes并返回集中的行我有两个dataframes：熊猫比较基于条件

[in] print(testing_df.head(n=5)) 
print(product_combos1.head(n=5)) 

[out] 
        product_id length 
transaction_id       
001      (P01,)  1 
002     (P01, P02)  2 
003    (P01, P02, P09)  3 
004     (P01, P03)  2 
005    (P01, P03, P05)  3 

      product_id count length 
0   (P06, P09) 36340  2 
1 (P01, P05, P06, P09) 10085  4 
2   (P01, P06) 36337  2 
3   (P01, P09) 49897  2 
4   (P02, P09) 11573  2

我想与是len(testing_df + 1)和包含在其中的testing_df串的最高频率返回product_combos行。所以例如，transaction_id 001我想返回product_combos[3]（只有P09虽然）。

对于第一部分（制作完全基于长度的比较）我想：

# Return the product combos values that are of the appropriate length and the strings match 
for i in testing_df['length']: 
    for k in product_combos1['length']: 
     if (i)+1 == (k): 
      matches = list(k)

然而，这将返回错误：

TypeError: 'numpy.int64' object is not iterable

来源

2017-08-05 zsad512

不能创建从一个列表像这样不可迭代。尝试用matches = [k]替换matches = list(k)。另外这些括号是多余的 - 您可以用if i + 1 == k:替换if (i)+1 == (k):。

来源

2017-08-05 16:46:59 vahndi

只需使用.append（）方法。我还建议将'匹配'设置为顶部的空白列表，以便在重新运行单元格时不会出现重复。

# Setup 

testing_df = pd.DataFrame(columns = ['product_id','length']) 
testing_df.product_id = [('P01',),('P01', 'P02')] 
testing_df.length = [1,2] 
product_combos1 = pd.DataFrame(columns = ['product_id','count','length']) 
product_combos1.length = [3,1] 
product_combos1.product_id = [('P01',),('P01', 'P02')] 
product_combos1.count = [100,5000] 

# Matching 

matches = [] 
for i in testing_df['length']: 
    for k in product_combos1['length']: 
     if i+1 == k: 
      matches.append(k)

让我知道这是否有效，或者如果还有其他东西！祝你好运！

来源

2017-08-05 16:49:48 CalendarJ

谢谢，但不幸的是，这并没有奏效 - 但是我能够用另一种方法解决问题。 – zsad512

我很抱歉听到！在我给出的示例设置的笔记本上它运行良好。很高兴听到你能解决这个问题！当你有机会时，请记得将它作为答案发布，以便其他人来到这篇文章可以参考。 – CalendarJ

熊猫比较基于条件

回答

相关问题