正则表达式和python去除串

的一定的格式我有以下形式的字符串列表：正则表达式和python去除串

d = ['0.04M sodium propionate', ' 0.02M sodium cacodylate', ' 0.04M bis-tris propane', ' pH 8.0 ']

我想删除x.xxM但继续跟随pH数量。我尝试以下：

import re 
for i in range(len(d)): 
    d[i] = d[i].translate(None,'[1-9]+\.*[0-9]*M')

其产生以下：

>>> d 
['4 sodium propionate', ' 2 sodium cacodylate', ' 4 bistris propane', ' pH 8 ']

除去从pH的.0为好。我认为translate()不考虑订单，对吧？另外，我不明白为什么4,2等仍然在任何一个元素。我怎样才能删除严格以[1-9]+\.*[0-9]*M（意思是应该有一个数字，后面跟着一个.和零个或多个数字，以及一个M）的形式？

编辑：我知道使用正则表达式不适用于translate()。它匹配0，.和M并将其删除。我想我可以试试re.search()，找到确切的一段字符串，然后做sub()。

来源

2015-05-05 sodiumnitrate

您是否尝试过使用正则表达式模块（'import re'）？ – Kevin

你读过'translate'的文档吗？因为它完全不适合工作 –

我以为我已经在使用它了。我会将其添加到问题中。 – sodiumnitrate

这里是正则表达式过滤掉x.xxM：

[\d|.]+M

这意味着一个字符串与数字（\ d）或（|）点（。）出现超过0次（+）以M（M）结尾。

这里是代码：

result = [re.sub(r'[\d|.]+M',r'',i) for i in d] 
# re.sub(A,B,Str) replaces all A with B in Str.

产生这样的结果：

[' sodium propionate', ' sodium cacodylate', ' bis-tris propane', ' pH 8.0 ']

来源

2015-05-05 21:49:34 Hua2308

您是否知道这个正则表达式匹配'|'？ – Jerry

@Jerry我们是否需要做'[\ d \ |。。] + M'？ – sodiumnitrate

@sodiumnitrate这可能会有问题，即使它在运行代码时实际上可能不会导致问题。 '[\ d |。] +'会匹配一个数字，'|'或'.'。如果你得到格式不正确的输入，比如'12 | .32M'，正则表达式就会匹配它。甚至由于构造正则表达式的方式而导致“..... M”。 – Jerry

为什么要使用re.search然后re.sub？你只需要re.sub。你也想做两件完全不同的事情，所以把它们分成两部分是有意义的。

In [8]: d = ['0.04M sodium propionate', ' 0.02M sodium cacodylate', ' 0.04M bis-tris propane', ' pH 8.0 '] 

In [9]: d1 = [ re.sub(r"\d\.\d\dM", "",x) for x in d ] 
In [10]: d1 
Out[10]: [' sodium propionate', ' sodium cacodylate', ' bis-tris propane', ' pH 8.0 '] 

In [11]: d2 = [ re.sub(r"pH (\d+)\.\d+",r"pH \1", x) for x in d1 ] 

In [12]: d2 
Out[12]: [' sodium propionate', ' sodium cacodylate', ' bis-tris propane', ' pH 8 ']

请注意，我用\d，这是任何数字的简写。

来源

2015-05-05 21:33:11 cge

Cnosider re.sub：

应用re.sub（图案，REPL，串，计数= 0，标志= 0）

返回通过替换最左边的非重叠出现获得的字符串由替换repl的字符串模式。

你的情况：

>>> re.sub(r'\d\.\d(\d).', r'\1', '0.04M sodium propionate') 
'4 sodium propionate'

来源

2015-05-05 21:34:42

我觉得你的正则表达式几乎是正确的，只是你应该使用re.sub代替：

import re 
for i in range(len(d)): 
    d[i] = re.sub(r'[0-9]+\.[0-9]*M *', '', d[i])

ideone demo

这样d变：

['sodium propionate', ' sodium cacodylate', ' bis-tris propane', ' pH 8.0 ']

我做最小修改您正则表达式，但这里是每一个部分是指：

[0-9]+ # Match at least 1 number (a number between 0 to 9 inclusive) 
\.  # Match a literal dot 
[0-9]* # Match 0 or more numbers (0 through 9 inclusive) 
M *  # Match the character 'M' and any spaces following it

来源

2015-05-05 21:41:24 Jerry

怎么样快速和肮脏的

[re.sub(r'\b[.\d]+M\b', '', a).strip() for a in d]

这给

['sodium propionate', 'sodium cacodylate', 'bis-tris propane', 'pH 8.0']

其中[.\d]+匹配任何连续的数字和点序列，M为磨牙。两个\b确保它是一个字，并strip()砍掉多余的空格！

来源

2015-05-05 22:00:57 user2963623

正则表达式和python去除串

回答

相关问题