2017-04-16 157 views
0

的比方说,我有这样一个数据帧:熊猫 - 检查字符串列包含一对字符串

df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple', 
            'monkey eats banana', 'badger eats banana'], 
        'food':['apple', 'apple', 'banana', 'banana'], 
        'creature':['squirrel', 'badger', 'monkey', 'elephant']}) 

    consumption creature food 
0 squirrel eats apple squirrel apple 
1 monkey eats apple badger apple 
2 monkey eats banana monkey banana 
3 badger eats banana elephant banana 

我想找到其中“生物” &“食物”组合出现在列'消费'一栏,即如果苹果和松鼠一起出现,则为真,但如果苹果与大象一起出现则为假。同样,如果猴子&香蕉一起出现,那么True,但猴子苹果会是假的。

我尝试的方法是这样的:

creature_list = list(df['creature']) 
creature_list = '|'.join(map(str, creature_list)) 

food_list = list(df['food']) 
food_list = '|'.join(map(str, food_list)) 

np.where((df['consumption'].str.contains('('+creature_list+')', case = False)) 
      & (df['consumption'].str.contains('('+food_list+')', case = False)), 1, 0) 

但是,这因为我得到在所有情况下真不起作用。

如何检查字符串对?

回答

4

这里是一个可能的方式:

def match_consumption(r): 
    if (r['creature'] in r['consumption']) and (r['food'] in r['consumption']): 
     return True 
    else: 
     return False 

df['match'] = df.apply(match_consumption, axis=1) 
df 

      consumption creature food match 
0 squirrel eats apple squirrel apple True 
1 monkey eats apple badger apple False 
2 monkey eats banana monkey banana True 
3 badger eats banana elephant banana False 
+0

嘿@foglerit感谢,问题 - 如果的r [“消费”]'是另一个数据帧说'X [“消费”] '和我修改函数添加参数中的x,这仍然工作? – vagabond

+0

只是试着用两个数据框,并得到这个错误:'TypeError:'系列'对象是可变的,因此他们不能被哈希',但你的答案是正确的基于我问的问题。在我的问题中,我没有完全解决这个问题。 – vagabond

+0

@vagabond,如果'consumption'在另一个数据框中,则需要先合并这两个DF,然后才能应用此方法。 – foglerit

0

我确定有更好的方法来做到这一点。但这是一种方式。

import pandas as pd 
import re 

df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple', 'monkey eats banana', 'badger eats banana'], 'food':['apple', 'apple', 'banana', 'banana'], 'creature':['squirrel', 'badger', 'monkey', 'elephant']}) 

test = [] 
for i in range(len(df.consumption)): 
    test.append(bool(re.search(df.creature[i],df.consumption[i])) & bool((re.search(df.food[i], df.consumption[i])))) 
df['test'] = test 
1

正在检查字符串平等太简单了?您可以测试如果字符串<creature> eats <food>consumption列等于相应的值:

(df.consumption == df.creature + " eats " + df.food) 
相关问题