过滤掉只包含数字和/或标点符号的字符串 - python

我只需要过滤掉只包含数字和/或一组标点符号的字符串。过滤掉只包含数字和/或标点符号的字符串 - python

我试着检查每个字符，然后总结布尔条件来检查它是否等于len(str)。有没有一种更Python的方式来做到这一点：

>>> import string 
>>> x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"] 
>>> [i for i in x if [True if j.isdigit() else False for j in i] ] 
['12,523', '3.46', 'this is not', 'foo bar 42'] 
>>> [i for i in x if sum([True if j.isdigit() or j in string.punctuation else False for j in i]) == len(i)] 
['12,523', '3.46']

来源

2014-02-11 alvas

你确定你没有**真的意思是“我需要找到可能代表数字的字符串，但是”漂浮“等不起作用，因为我也想让逗号作为数千个分隔符”？ –

是的，我会稍后需要，但两层过滤也将捕获合法文件中的数字索引（例如'x = [“chapter”，“1.2.3.5”]'） – alvas

使用all与发电机的表情，你不需要来算，比较长：

>>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i)] 
['12,523', '3.46']

BTW，上面和OP的代码将包括串只包含标点符号。

>>> x = [',,,', '...', '123', 'not number'] 
>>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i)] 
[',,,', '...', '123']

来处理，增加更多的条件：

>>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i) and any(j.isdigit() for j in i)] 
['123']

您可以通过存储在一组string.punctuation的结果使其得快一点。

>>> puncs = set(string.punctuation) 
>>> [i for i in x if all(j.isdigit() or j in puncs for j in i) and any(j.isdigit() for j in i)] 
['123']

来源

2014-02-11 08:31:06 falsetru

您可以通过存储'set'中'string.punctuation'的结果。 –

@FrerichRaabe，感谢您的评论。我添加了你的评论。 – falsetru

您可以使用预编译的正则表达式来检查这一点。

import re, string 
pattern = re.compile("[\d{}]+$".format(re.escape(string.punctuation))) 
x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"] 
print [item for item in x if pattern.match(item)]

输出

['12,523', '3.46']

一点点时间比较，我的机器上@ falsetru的解决方案和矿山

import re, string 
punct = string.punctuation 
pattern = re.compile("[\d{}]+$".format(re.escape(string.punctuation))) 
x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"] 

from timeit import timeit 
print timeit("[item for item in x if pattern.match(item)]", "from __main__ import pattern, x") 
print timeit("[i for i in x if all(j.isdigit() or j in punct for j in i)]", "from __main__ import x, punct")

输出之间

2.03506183624 
4.28856396675

因此，预编译RegEx方法的速度是all和any方法的两倍。

来源

2014-02-11 08:39:57 thefourtheye

过滤掉只包含数字和/或标点符号的字符串 - python

回答

相关问题