2016-12-28 43 views
1

下面是一句“建筑是100米高20米宽的”我想提取约高度为100的号码,所以我用如何从一个句子,通过蟒蛇提取数

question = input " " 
height = re.findall(r'(\d+) m tall', question) 

但是,有时句子不是“100米高”,而是“100米高”。在这种情况下,我的程序不能再提取我想要的号码了。有没有办法改善我的课程,让它工作,而不管句子包含“高”还是“高”。

回答

4

您可以通过|检查“高或高”条件:

(\d+) m (tall|high) 

演示:

>>> re.findall(r'(\d+) m (tall|high)', 'a building is 100 m tall and 20 m wide') 
[('100', 'tall')] 
>>> re.findall(r'(\d+) m (tall|high)', 'a building is 100 m high and 20 m wide') 
[('100', 'high')] 

如果你想要不被捕获的话,使用非捕获组

(\d+) m (?:tall|high) 
1
>>> import re 
>>> re.findall(r'(\d+) m (?:tall|high)', "a building is 100 m tall and 20 m wide") 
['100'] 
>>> re.findall(r'(\d+) m (?:tall|high)', "a building is 100 m high and 20 m wide") 
['100'] 
0

根据您的要求,正则表达式应该匹配任何术语“高”或“高”。

  i.e., ?:tall|high 
     where, ?: means 'matches any of' 
       and,  | means 'or' 

因此,解决方案可以像:

>>> re.findall(r'(\d+) m (?:tall|high)', question) 


['100']