2016-12-15 123 views
0

我有这样的字符串:拆分Python字符串由单引号

text = ['Adult' 'Adverse Drug Reaction Reporting Systems/*classification' '*Drug-Related Side Effects and Adverse Reactions' 'Hospital Bed Capacity 300 to 499' 'Hospitals County' 'Humans' 'Indiana' 'Pharmacy Service Hospital/*statistics & numerical data'] 

我需要这个串,其中每个类别(由单quotaions标记分隔被存储在一个阵列)分开。例如:

text = Adult, Adverse Drug Reaction Reporting Systems... 

我已经尝试过拆分功能,但不确定如何去做。

回答

1

你可以做这样的事情与正则表达式假设你没有,你还没有上市的限制:

>>> s = "'Adult' 'Adverse Drug Reaction Reporting Systems/*classification' '*Drug-Related Side Effects and Adverse Reactions' 'Hospital Bed Capacity 300 to 499' 'Hospitals County' 'Humans' 'Indiana' 'Pharmacy Service Hospital/*statistics & numerical data'" 
>>> import re 
>>> regex = re.compile(r"'[^']*'") 
>>> regex.findall(s) 
["'Adult'", "'Adverse Drug Reaction Reporting Systems/*classification'", "'*Drug-Related Side Effects and Adverse Reactions'", "'Hospital Bed Capacity 300 to 499'", "'Hospitals County'", "'Humans'", "'Indiana'", "'Pharmacy Service Hospital/*statistics & numerical data'"] 

我的正则表达式是留在琴弦' - 您可以轻松地将其删除与str.strip("'")

>>> [x.strip("'") for x in regex.findall(s)] 
['Adult', 'Adverse Drug Reaction Reporting Systems/*classification', '*Drug-Related Side Effects and Adverse Reactions', 'Hospital Bed Capacity 300 to 499', 'Hospitals County', 'Humans', 'Indiana', 'Pharmacy Service Hospital/*statistics & numerical data'] 

注意,这只是工作,因为我假设你没有在字符串中的任何转义引号...例如你从来没有:

'foo\'bar'其中在许多编程情况下表达字符串的完全有效的方式。如果你有这种情况,你需要使用更健壮的解析器 - 例如, pyparsing

>>> import pyparsing as pp 
>>> [x[0][0].strip("'") for x in pp.sglQuotedString.scanString(s)] 
['Adult', 'Adverse Drug Reaction Reporting Systems/*classification', '*Drug-Related Side Effects and Adverse Reactions', 'Hospital Bed Capacity 300 to 499', 'Hospitals County', 'Humans', 'Indiana', 'Pharmacy Service Hospital/*statistics & numerical data'] 
>>> s2 = r"'foo\'bar' 'baz'" 
>>> [x[0][0].strip("'") for x in pp.sglQuotedString.scanString(s2)] 
["foo\\'bar", 'baz']