2017-05-26 73 views
0

我有一个像下面的一个Python熊猫数据帧:Python的大熊猫多列组合到单个列

movie  unknown action adventure animation fantasy horror romance sci-fi 

Toy Story 0  1  1   0  1  0  0  1    
Golden Eye 0  1  0   0  0  0  1  0  
Four Rooms 1  0  0   0  0  0  0  0  
Get Shorty 0  0  0   1  1  0  1  0 
Copy Cat  0  0  1   0  0  1  0  0 

我想这部电影流派结合成一个烧毛列。输出会是这样:

movie  genre 

Toy Story action, adventure, fantasy, sci-fy 
Golden Eye action, romance 
Four Rooms unknown 
Get Shorty animation, fantasy, romance 
Copy Cat adventure, horror 

回答

2

你可以这样来做:

In [171]: df['genre'] = df.iloc[:, 1:].apply(lambda x: df.iloc[:, 1:].columns[x.astype(bool)].tolist(), axis=1) 

In [172]: df 
Out[172]: 
     movie unknown action adventure animation fantasy horror romance sci-fi         genre 
0 Toy Story  0  1   1   0  1  0  0  1 [action, adventure, fantasy, sci-fi] 
1 Golden Eye  0  1   0   0  0  0  1  0      [action, romance] 
2 Four Rooms  1  0   0   0  0  0  0  0        [unknown] 
3 Get Shorty  0  0   0   1  1  0  1  0   [animation, fantasy, romance] 
4 Copy Cat  0  0   1   0  0  1  0  0     [adventure, horror] 

PS,但我不明白它如何能够帮助你,我没有看到任何好处相比“一个热点编码矩阵

+1

'df ['genre'] = df.apply(lambda x:df.columns [x.astype(bool)]。tolist()[1:],axis = 1)+1并同意它不会提供任何额外的好处 – bernie

+1

@bernie,谢谢:) – MaxU