如何从熊猫数据框中的列表中删除值？

我创建了一个数据帧：如何从熊猫数据框中的列表中删除值？

[in] testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id']) 

# Split the product_id's for the testing data 
testing_df.set_index(['transaction_id'],inplace=True) 
testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(',')) 

[out]     product_id 
transaction_id     
001      [P01] 
002     [P01, P02] 
003    [P01, P02, P09] 
004     [P01, P03] 
005    [P01, P03, P05] 
006    [P01, P03, P07] 
007    [P01, P03, P08] 
008     [P01, P04] 
009    [P01, P04, P05] 
010    [P01, P04, P08]

如何我现在可以删除的结果“P04”和“P08”？

我想：

# Remove P04 and P08 from consideration 
testing_df['product_id'] = testing_df['product_id'].map(lambda x: x.strip('P04')) 

testing_df['product_id'].replace(regex=True,inplace=True,to_replace=r'P04,',value=r'')

然而，无论是选择似乎工作。

的数据类型为：

[in] print(testing_df.dtypes) 
[out] product_id object 
dtype: object 

[in] print(testing_df['product_id'].dtypes) 
[out] object

来源

2017-08-03 zsad512

请帮助，如果'product_id'是列表或字符串的列我明白了。 –

'product_id'是一个字符串列表的列，即 - '[“P01”，“P02”，“P03”]' – zsad512

您应该打印出类型的列清楚。 –

存储您的所有元素。

remove_results = ['P04','P08'] 
for k in range(len(testing_df['product_id'])): 
    for r in remove_results: 
     if r in testing_df['product_id'][k]: 
      testing_df['product_id][k].remove(r)

来源

2017-08-03 15:59:27

太棒了！谢谢！ – zsad512

我会做以前分裂：

数据：

In [269]: df 
Out[269]: 
       product_id 
transaction_id 
1      P01 
2     P01,P02 
3    P01,P02,P09 
4     P01,P03 
5    P01,P03,P05 
6    P01,P03,P07 
7    P01,P03,P08 
8     P01,P04 
9    P01,P04,P05 
10    P01,P04,P08

解决方案：

In [271]: df['product_id'] = df['product_id'].str.replace(r'\,*?(?:P04|P08)\,*?', '') \ 
              .str.split(',') 

In [272]: df 
Out[272]: 
        product_id 
transaction_id 
1       [P01] 
2     [P01, P02] 
3    [P01, P02, P09] 
4     [P01, P03] 
5    [P01, P03, P05] 
6    [P01, P03, P07] 
7     [P01, P03] 
8       [P01] 
9     [P01, P05] 
10      [P01]

或者您可以更改：

testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))

有：

testing_df['product_id'] = testing_df['product_id'].apply(lambda row: list(set(row.split(','))- set(['P04','P08'])))

演示：在列表中移除

In [280]: df.product_id.apply(lambda row: list(set(row.split(','))- set(['P04','P08']))) 
Out[280]: 
transaction_id 
1    [P01] 
2   [P01, P02] 
3  [P09, P01, P02] 
4   [P01, P03] 
5  [P01, P03, P05] 
6  [P07, P01, P03] 
7   [P01, P03] 
8    [P01] 
9   [P01, P05] 
10    [P01] 
Name: product_id, dtype: object

来源

2017-08-03 15:51:55 MaxU

等一下。 'product_id'是一个列表。使用'astype（STR）'先'申请（ast.literal_eval）' –

@cᴏʟᴅsᴘᴇᴇᴅ后，它变成了后一个名单：'testing_df [ '的product_id']申请（拉姆达行：row.split（ ''））' – MaxU

@cᴏʟᴅsᴘᴇᴇᴅ是正确 – zsad512

如何从熊猫数据框中的列表中删除值？

回答

相关问题