0
我的文本文件看起来像这样提取多个图案,并将其保存到熊猫数据帧[巨蟒]
Description: Text 1 follows <br/> blah blah blah Cause: Cause Text 1
follows here <br/>Description: Text 2 follows <br/> blah blah
blah Cause: Cause Text 2 follows here<br/>Description: Text 3 follows <br/>
blah blah blah Description: Text 4 follows <br/> blah blah
blah Cause: Cause Text 4 follows<br/>
我想拥有的所有说明,并导致了NLP结构化格式的熊猫数据帧
Description Cause
Text 1 follows Cause Text 1 follows here
Text 2 follows Cause Text 2 follows here
Text 3 follows
Text 4 follows Cause Text 4 follows here
我迄今所做的:
re.findall(r'Description:(.*?)<br/>',textfile)
re.findall(r'Cause:(.*?)<br/>',textfile)
但是,这并不让我垫当我尝试创建更大的数据框时,说明和原因!
感谢您的任何输入或指导做同样的事情。对python很新颖!
尝试['R'说明(S):(?:P(:(?(?:(?!
))\ S *。*)
(:(:(?!说明:)?。 )*?原因:\ s *(?P
)。)*))?''](https://regex101.com/r/bRIOev/1) –