我想过滤街道名称，并得到我想要的部分。这些名字有几种格式。这里有一些例子，我想从他们那里得到什么。正则表达式python不会工作，因为我想它

Car Cycle 5 B Ap 1233  < what I have 
Car Cycle 5 B    < what I want 

Potato street 13 1 AB  < what I have 
Potato street 13   < what I want 

Chrome Safari 41 Ap 765  < what I have 
Chrome Safari 41   < what I want 

Highstreet 53 Ap 2632/BH < what I have 
Highstreet 53    < what I want 

Something street 91/Daniel < what I have 
Something street 91   < what I want

通常我想要的是后面的街道号码街道名称（1-4名），如果有一个，然后在街上信（1号），如果有一个。我无法让它正常工作。

这里是我的代码（我知道，它吮吸）：

import re 

def address_regex(address): 
    regex1 = re.compile("(\w+){1,4}(\d{1,4}){1}(\w{1})") 
    regex2 = re.compile("(\w+){1,4}(\d{1,4}){1}") 
    regex3 = re.compile("(\w+){1,4}(\d){1,4}") 
    regex4 = re.compile("(\w+){1,4}(\w+)") 

    s1 = regex1.search(text) 
    s2 = regex2.search(text) 
    s3 = regex3.search(text) 
    s4 = regex4.search(text) 

    regex_address = "" 

    if s1 != None: 
     regex_address = s1.group() 
    elif s2 != None: 
     regex_address = s2.group() 
    elif s3 != None: 
     regex_address = s3.group() 
    elif s4 != None: 
     regex_address = s4.group()  
    else: 
     regex_address = address 

    return regex_address

我使用Python 3.4

来源

2015-08-18 ZeZe

只需使用像科多兽正则表达式的工具。 – bgusach

你不想要最后一个例子中的数字91？ – Falko

最后一个例子的逻辑是什么？ “街道91/Daniel'为什么不带91？ – PYPL

我要去这里走出去的肢体，并在最后一个例子假设你实际上想要抓住91号，因为它没有意义。

下面是其捕获所有你的例子（和你的最后，但包括91）的解决方案：

^([\p{L} ]+ \d{1,4}(?: ?[A-Za-z])?\b)

^开始匹配的字符串的开头
[\p{L} ]+ Character类空间或Unicode字符属于 “信” 类别，1 - 无限次
\d{1,4}号，1-4倍
(?: ?[A-Za-z])?非捕获组可选空间和单个字母，0-1次

捕获组1是整个地址。我不太了解你的分组背后的逻辑，但随意分组，但你愿意。

See demo

来源

2015-08-18 11:42:31 ohaal

谢谢，这种方法效果更好，但有时在街道名称中没有数字（我忘记提及这一点），然后正则表达式不会得到任何东西。一个例子是“马铃薯街”，它什么都没有。那我该怎么做？ – ZeZe

您可选择的东西越多，相互依赖，您添加的复杂程度越高。你可以尝试一下这个改进版本，除了说它更强大之外，我不会再解释得更远了：'^（\ p {L} [\ p {L} - ] * \ p {L}（?: \ d {1,4}（?:？[A-Za-z]）？）？\ b）'[See demo]（http://rubular.com/r/MMpglOziNu） - 如果你想更好地理解它，请尝试使用在线正则表达式资源，如[RegexStorm]（http://regexstorm.net/reference）。 – ohaal

是的，作品谢谢你！我也会考虑RegexStorm。 – ZeZe

本工程为您提供

^([a-z]+\s+)*(\d*(?=\s))?(\s+[a-z])*\b

集多了5个样品模式和不区分大小写。如果你的正则表达式支持它，那就是（？im）。

来源

2015-08-18 11:30:26 buckley

也许你喜欢一个更可读的Python版本（无正则表达式）：

import string 

names = [ 
    "Car Cycle 5 B Ap 1233", 
    "Potato street 13 1 AB", 
    "Chrome Safari 41 Ap 765", 
    "Highstreet 53 Ap 2632/BH", 
    "Something street 91/Daniel", 
    ] 

for name in names: 
    result = [] 
    words = name.split() 
    while any(words) and all(c in string.ascii_letters for c in words[0]): 
     result += [words[0]] 
     words = words[1:] 
    if any(words) and all(c in string.digits for c in words[0]): 
     result += [words[0]] 
     words = words[1:] 
    if any(words) and words[0] in string.ascii_uppercase: 
     result += [words[0]] 
     words = words[1:] 
    print " ".join(result)

输出：

Car Cycle 5 B 
Potato street 13 
Chrome Safari 41 
Highstreet 53 
Something street

来源

2015-08-18 11:32:45 Falko

正则表达式python不会工作，因为我想它

回答

See demo

相关问题