我想解析基于某些标签和值在另一列的字段中的熊猫数据框中的文本数据,并将它们存储在自己的列中。例如,如果我创造了这个数据帧,DF:新的大熊猫列与正则表达式解析
df = pd.DataFrame([[1,2],['A: this is a value B: this is the b val C: and here is c.','A: and heres another a. C: and another c']])
df = df.T
df.columns = ['col1','col2']
df['tags'] = df['col2'].apply(lambda x: re.findall('(?:\s|)(\w*)(?::)',x))
all_tags = []
for val in df['tags']:
all_tags = all_tags + val
all_tags = list(set(all_tags))
for val in all_tags:
df[val] = ''
df:
col1 col2 tags A C B
0 1 A: this is a value B: this is the b val C: and... [A, B, C]
1 2 A: and heres another a. C: and another c [A, C]
我怎么会填充每个新的“标签”列从COL2他们的价值观,所以我得到这个DF:
col1 col2 tags \
0 1 A: this is a value B: this is the b val C: and... [A, B, C]
1 2 A: and heres another a. C: and another c [A, C]
A C B
0 this is a value and here is c. this is the b val
1 and heres another a. and another c