应用功能，数据帧列

我有一个熊猫数据帧：应用功能，数据帧列

name sample 
1 a  Category 1: qwe, asd (line break) Category 2: sdf, erg 
2 b  Category 2: sdf, erg(line break) Category 5: zxc, eru 
... 
30 p  Category 1: asd, Category PE: 2134, EFDgh, Pdr tke, err

我想结束：

name qwe asd sdf erg zxc eru 2134 EFDgh Pdr tke err 
1 a  1  1  1  1 0  0 0  0  0  0 
2 b  0  0  1  1 1  1 0  0  0  0 
... 
30 p  0 1  0  0 0  0 0  1  1  0

我创建了以下功能：

def cleanattributes(istring): 

    istring=str(istring) 
    istring=istring.rstrip().split('\\n') 

    counter=0 
    for line in istring: 
     istring[counter]=istring[counter].rpartition(': ')[-1] 
     counter+=1 
    istring=str(istring) 
    istring = istring.replace("'", "") 
    istring = istring.replace("\"", "") 
    return(str(istring))

这个函数创建返回没有类别标题的类别信息的预期结果（想法是使用getdummies来获取合作伙伴） lumns）

teststring="Category 1: qwe, asd\\nCategory 2: sdf, erg" 
cleanattributes(teststring) 
OUTPUT: '[qwe, asd, sdf, erg]'

我不知道如何最好地应用此功能，每一个记录，使数据帧是这样的：

name sample 
1 a  qwe, asd, sdf, erg 
2 b  sdf, erg, zxc, eru 
... 
30 p  asd, 2134, EFDgh, Pdr tke, err

或者，如果这是甚至逼近这个的最好方法。

按照要求：

df['sample'].iat[0] 
OUTPUt= 'Category 1: qwe, asd\nCategory 2: sdf, erg'

来源

2016-04-05 M Arroyo

什么是'DF [ '样品']的EXACT输出IAT [0]'。？ – Alexander

输出结果为'Category 1：qwe，asd \ nCategory 2：sdf，erg'（编辑：删除了一个额外的\ n我为测试目的而意外添加的） –

df = pd.DataFrame(
    {'name': ['a', 'b'], 
    'sample': ['Category 1: asd, Category PE: 2134, EFDgh, Pdr tke, err', 
       'Category 2: sdf, erg\nCategory 5: zxc, eru\nCategory 1: asd, Category PE: 2134, EFDgh, Pdr tke, err']} 

df2 = pd.concat([df.name, 
       df['sample'] 
       .str.replace("(Category .*:)+", '') # Remove "Category [*]:" 
       .str.replace(r'\n', '') # Remove "\n" 
       .str.split(', ', expand=True)], 
       axis=1) 

df3 = pd.melt(df2, id_vars='name')[['name', 'value']] 

>>> pd.concat([df3['name'], pd.get_dummies(df3['value'])], axis=1) 
    name 2134 EFDgh Pdr tke ergzxc err eru2134 sdf 
0  a  1  0  0  0 0  0 0 
1  b  0  0  0  0 0  0 1 
2  a  0  1  0  0 0  0 0 
3  b  0  0  0  1 0  0 0 
4  a  0  0  1  0 0  0 0 
5  b  0  0  0  0 0  1 0 
6  a  0  0  0  0 1  0 0 
7  b  0  1  0  0 0  0 0 
8  a  0  0  0  0 0  0 0 
9  b  0  0  1  0 0  0 0 
10 a  0  0  0  0 0  0 0 
11 b  0  0  0  0 1  0 0

来源

2016-04-05 21:31:47 Alexander

应用功能，数据帧列

回答

相关问题