2017-08-03 174 views
1

我创建了一个数据帧:如何从熊猫数据框中的列表中删除值?

[in] testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id']) 

# Split the product_id's for the testing data 
testing_df.set_index(['transaction_id'],inplace=True) 
testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(',')) 

[out]     product_id 
transaction_id     
001      [P01] 
002     [P01, P02] 
003    [P01, P02, P09] 
004     [P01, P03] 
005    [P01, P03, P05] 
006    [P01, P03, P07] 
007    [P01, P03, P08] 
008     [P01, P04] 
009    [P01, P04, P05] 
010    [P01, P04, P08] 

如何我现在可以删除的结果“P04”和“P08”?

我想:

# Remove P04 and P08 from consideration 
testing_df['product_id'] = testing_df['product_id'].map(lambda x: x.strip('P04')) 

testing_df['product_id'].replace(regex=True,inplace=True,to_replace=r'P04,',value=r'') 

然而,无论是选择似乎工作。

的数据类型为:

[in] print(testing_df.dtypes) 
[out] product_id object 
dtype: object 

[in] print(testing_df['product_id'].dtypes) 
[out] object 
+0

请帮助,如果'product_id'是列表或字符串的列我明白了。 –

+0

'product_id'是一个字符串列表的列,即 - '[“P01”,“P02”,“P03”]' – zsad512

+0

您应该打印出类型的列清楚。 –

回答

1

存储您的所有元素。

remove_results = ['P04','P08'] 
for k in range(len(testing_df['product_id'])): 
    for r in remove_results: 
     if r in testing_df['product_id'][k]: 
      testing_df['product_id][k].remove(r) 
+0

太棒了!谢谢! – zsad512

2

我会做以前分裂:

数据:

In [269]: df 
Out[269]: 
       product_id 
transaction_id 
1      P01 
2     P01,P02 
3    P01,P02,P09 
4     P01,P03 
5    P01,P03,P05 
6    P01,P03,P07 
7    P01,P03,P08 
8     P01,P04 
9    P01,P04,P05 
10    P01,P04,P08 

解决方案:

In [271]: df['product_id'] = df['product_id'].str.replace(r'\,*?(?:P04|P08)\,*?', '') \ 
              .str.split(',') 

In [272]: df 
Out[272]: 
        product_id 
transaction_id 
1       [P01] 
2     [P01, P02] 
3    [P01, P02, P09] 
4     [P01, P03] 
5    [P01, P03, P05] 
6    [P01, P03, P07] 
7     [P01, P03] 
8       [P01] 
9     [P01, P05] 
10      [P01] 

或者您可以更改:

testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(',')) 

有:

testing_df['product_id'] = testing_df['product_id'].apply(lambda row: list(set(row.split(','))- set(['P04','P08']))) 

演示:在列表中移除

In [280]: df.product_id.apply(lambda row: list(set(row.split(','))- set(['P04','P08']))) 
Out[280]: 
transaction_id 
1    [P01] 
2   [P01, P02] 
3  [P09, P01, P02] 
4   [P01, P03] 
5  [P01, P03, P05] 
6  [P07, P01, P03] 
7   [P01, P03] 
8    [P01] 
9   [P01, P05] 
10    [P01] 
Name: product_id, dtype: object 
+0

等一下。 'product_id'是一个列表。使用'astype(STR)'先'申请(ast.literal_eval)' –

+0

@cᴏʟᴅsᴘᴇᴇᴅ后,它变成了后一个名单:'testing_df [ '的product_id']申请(拉姆达行:row.split( '') )' – MaxU

+0

@cᴏʟᴅsᴘᴇᴇᴅ是正确 – zsad512