2017-03-10 34 views
1

我有3列和1000多个行的数据帧,如何在Python中使用for循环子集和列出DataFrame?

df 
    day   product   order 
2010-01-01 150ml Mask   9 
2010-01-02 230ml Lotion  27 
2010-01-03 600ml Shampoo  33 

而且我想集中每一个产品如下,

df_mask     df_lotion   df_shampoo 
    day  order  day  order  day   order 
2010-01-01  9  2010-01-02 27  2010-01-03 33 
2010-01-09  8  2010-01-05 30  2010-01-04 25 
2010-01-11  13  2010-01-06 29  2010-01-06 46 

这是我要做的事,

# Create a product list 
productName = df['product'].tolist() 

# Subsetting 
def subtable(df,productName): 
    return (df[(df['product'] == productName)]) 

# Subsetting 
df_mask = subtable(df, '150ml Mask') 
df_lotion = subtable(df, '230ml Lotion') 
df_shampoo = subtable(df, '230ml Shampoo') 

有什么办法可以让我所有的子集一次用于循环,因为数据框有许多不同的产品。

回答

2

您可以使用groupby为此,这不正是你所需要的:

# show example data 
print(df) 

    day   product    order 
0 2010-01-01 "150ml Mask"   9 
1 2010-01-02 "230ml Lotion"  27 
2 2010-01-03 "600ml Shampoo"  33 
3 2010-01-04 "250ml Mask"   12 
4 2010-01-05 "330ml Lotion"  24 
5 2010-01-06 "400ml Shampoo"  13 

# split product column and keep only product name 
df["product"] = df["product"].str.split(expand=True)[1] 

# groupby product 
products = df.groupby("product") 

# print product and corresponding product df 
for product, product_df in products: 
    print(product) 
    print(product_df) 

Lotion 
      day product order 
1 2010-01-02 Lotion  27 
4 2010-01-05 Lotion  24 

Mask 
      day product order 
0 2010-01-01 Mask  9 
3 2010-01-04 Mask  12 

Shampoo 
      day product order 
2 2010-01-03 Shampoo  33 
5 2010-01-06 Shampoo  13 

为了单独访问每个子组,您可以使用get_group对应您的subtable功能:

mask_df = products.get_group("Mask") 
print(mask_df) 

    day   product  order 
0 2010-01-01 Mask  9 
3 2010-01-04 Mask  12 

最后,拿到一本词典内的所有子数据帧,你也可以遍历products和产品跌落列本身:

df_dict = {product: product_df.drop("product", axis=1) 
      for product, product_df in products} 
print(df_dict["Mask"]) 

    day   order 
0 2010-01-01 9 
3 2010-01-04 12 
+0

谢谢你的回答。我尝试过'df [“product”] = df [“product”]。str.split(expand = True)[1]',但某些产品名称没有组织,因为某些产品名称看起来像'0.7OZ Mask UK 6' 。有没有其他方法可以解决这个问题? – Peggy

+0

@peggy产品标签可能有哪些变化?提取产品名称完全取决于您的输入数据。但是,对于您的评论中给出的示例,'df [“product”]。str.split(expand = True)[1]'应该成功从'0.7OZ Mask UK 6'中提取* Mask *。或者你需要*面膜*包括* UK 6 *? – pansen

+0

是的。我需要_Mask UK 6_。但我决定为每个产品分配一个特定的编号,以使分类更容易。除此之外,代码运行得非常好。非常感谢你! – Peggy

0

看看是否有帮助:

dfs = {} 
for grp in df.groupby('product'): 
    dfs[grp[0].split(' ')[1]] = grp[1] # split gives you the product name as key 

for key in dfs.keys(): 
    print dfs[key] 
0

我认为你可以使用dict用于存储所有DataFrames,其中创建dict comprehensiongroupbysplit

producs = df['product'].str.split().str[-1] 
print (producs) 
0  Mask 
1  Lotion 
2 Shampoo 
Name: product, dtype: object 

dfs = {i:df.reset_index(drop=True) for i, df in df.groupby(producs)} 
print (dfs) 
{'Shampoo':   day  product order 
0 2010-01-03 600ml Shampoo  33, 'Mask':   day  product order 
0 2010-01-01 150ml Mask  9, 'Lotion':   day  product order 
0 2010-01-02 230ml Lotion  27} 

print (dfs['Shampoo']) 
      day  product order 
0 2010-01-03 600ml Shampoo  33 

如果您需要删除列product使用集[['day','order']]drop

dfs = {i:df.reset_index(drop=True)[['day','order']] for i, df in df.groupby(producs)} 
#dfs = {i:df.reset_index(drop=True).drop('product', axis=1) for i, df in df.groupby(producs)} 
print (dfs) 
{'Shampoo':   day order 
0 2010-01-03  33, 'Mask':   day order 
0 2010-01-01  9, 'Lotion':   day order 
0 2010-01-02  27} 

print (dfs['Shampoo']) 
      day order 
0 2010-01-03  33