2013-08-20 257 views
2

正则表达式并不是我最强的套装,我在这种情况下遇到了一些麻烦。从括号,括号和连字符的字符串中获取子字符串

我有以下字符串:

locale (district - town) [parish] 

我需要提取以下信息: 1 - 区域设置 2 - 区 3 - 镇

而且我有这些解决方案:

1 - 区域

preg_match("/([^(]*)\s/", $input_line, $output_array); 

2 - 区

preg_match("/.*\(([^-]*)\s/", $input_line, $output_array); 

3 - 镇

preg_match("/.*\-\s([^)]*)/", $input_line, $output_array); 

而这些似乎很好地工作。 但是,字符串可以呈现像任何这些:

localeA(localeB) (district - town) [parish] 
locale (district - townA(townB)) [parish] 
locale (district - townA-townB) [parish] 

区域设置还可以包括其自身的括号内。 城镇可以包括括号和/或自己的连字符。

这使得很难提取正确的信息。在3个场景上面,我将不得不提取:

localeA(localeB)+小区+镇

区域+小区+ townA(townB)

区域+小区+ townA-townB

我发现很难处理所有这些情况。你能帮我吗?

在此先感谢

+3

[?GOT速度(http://regex101.com/r/xS9fZ1) – HamZa

+0

@Hamza:为什么评论为什么不回答? – anubhava

+0

@anubhava我忙于其他事情,这是一个快速小提琴。如果我发布这个答案,我至少应该提供一些解释。 – HamZa

回答

0

如果语言环境,区,镇没有空间在其中:

preg_match("/^\s*(\S+)\s*\((\S+)\s*-\s*(\S+)\)/", $input_line, $output_array); 

解释:

The regular expression: 

(?-imsx:^\s*(\S+)\s*\((\S+)\s*-\s*(\S+)\)) 

matches as follows: 

NODE      EXPLANATION 
---------------------------------------------------------------------- 
(?-imsx:     group, but do not capture (case-sensitive) 
         (with^and $ matching normally) (with . not 
         matching \n) (matching whitespace and # 
         normally): 
---------------------------------------------------------------------- 
^      the beginning of the string 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
    \S+      non-whitespace (all but \n, \r, \t, \f, 
          and " ") (1 or more times (matching the 
          most amount possible)) 
---------------------------------------------------------------------- 
)      end of \1 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    \(      '(' 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
    \S+      non-whitespace (all but \n, \r, \t, \f, 
          and " ") (1 or more times (matching the 
          most amount possible)) 
---------------------------------------------------------------------- 
)      end of \2 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    -      '-' 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \3: 
---------------------------------------------------------------------- 
    \S+      non-whitespace (all but \n, \r, \t, \f, 
          and " ") (1 or more times (matching the 
          most amount possible)) 
---------------------------------------------------------------------- 
)      end of \3 
---------------------------------------------------------------------- 
    \)      ')' 
---------------------------------------------------------------------- 
)      end of grouping 
---------------------------------------------------------------------- 
0

不知道究竟你的规则和边缘案件是,但这适用于提供的例子

preg_match('#^(.+?) \((.+?) - (.+?)\) \[(.+)\]$#',$str,$matches); 

给出了这些结果(当在$str每个示例串运行):

Array 
(
    [0] => locale (district - town) [parish] 
    [1] => locale 
    [2] => district 
    [3] => town 
    [4] => parish 
) 

Array 
(
    [0] => localeA(localeB) (district - town) [parish] 
    [1] => localeA(localeB) 
    [2] => district 
    [3] => town 
    [4] => parish 
) 

Array 
(
    [0] => locale (district - townA(townB)) [parish] 
    [1] => locale 
    [2] => district 
    [3] => townA(townB) 
    [4] => parish 
) 

Array 
(
    [0] => locale (district - townA-townB) [parish] 
    [1] => locale 
    [2] => district 
    [3] => townA-townB 
    [4] => parish 
)