2017-05-09 34 views
2

我有一个熊猫数据帧是这样的:Pandas DF有一个列表。如何重复此列表的每个值的行?

title author    year type 
0 t1  a1     1980 article 
1 t2  ['a2', 'a3', 'a4'] 1983 article 
2 t3  a5     1982 article 
3 t4  a6     1977 article 
4 t5  ['a7','a8']   2011 book 

这是一个简短的例子,原来是更加大。

我需要这样一个数据帧:

title author year type 
0 t1  a1  1980 article 
1 t2  a2  1983 article 
2 t2  a3  1983 article 
3 t2  a4  1983 article 
4 t3  a5  1982 article 
5 t4  a6  1977 article 
6 t5  a7  2011 book 
7 t5  a8  2011 book 

注意,名单有不同数量的元素

+0

的可能的复制http://stackoverflow.com/questions/27263805/pandas- when-cell-contents-are-lists-create-a-row-for-each-element-in-the-list – bigbounty

回答

1
#Expand the list of authors to separate rows and build a authors df 
df_author = df.author.apply(pd.Series).stack().rename('author').reset_index() 

#join the authors df to the original df 
pd.merge(df_author,df,left_on='level_0',right_index=True, suffixes=(['','_old']))[df.columns] 

Out[184]: 
    title author year  type 
0 t1  a1 1980 article 
1 t2  a2 1983 article 
2 t2  a3 1983 article 
3 t2  a4 1983 article 
4 t3  a5 1982 article 
5 t4  a6 1977 article 
6 t5  a7 2011 article 
+0

不能正常工作。结果与第一个DF(带有列表)相同 – IvanMarkus

+0

我认为在创建数据框时,作者列中的列表元素不会像列表一样被解释。数据框是用df = pd.read_csv('./ file.csv',names = ['title','author','year','type'],header = 0,sep =';',low_memory =假)来自csv。因为你的解决方案不起作用。我能做什么? – IvanMarkus

相关问题