用分隔符分隔带引号的字符串

我想用分隔符空格分隔字符串。但它应该智能地处理引用的字符串。例如。对于像用分隔符分隔带引号的字符串

"John Smith" Ted Barry

它应该返回三个字符串约翰史密斯，特德和巴里。

来源

2012-05-22 fastcodejava

您可能需要首先拆分包含引号的字符串，然后再用空格分隔字符串的其余部分。在这里必须有一些关于如何做第一步的问题。第二步是微不足道的。 – jahroy

你有什么尝试？ –

体面的CSV解析器库会适合你。大多数会允许选择分隔符，并会尊重和避免拆分引用的文本。 –

瞎搞它之后，你可以使用正则表达式这一点。运行的上的 “匹配所有” 等效：

((?<=("))[\w ]*(?=("(\s|$))))|((?<!")\w+(?!"))

的Java示例：

import java.util.regex.Pattern; 
import java.util.regex.Matcher; 

public class Test 
{ 
    public static void main(String[] args) 
    { 
     String someString = "\"Multiple quote test\" not in quotes \"inside quote\" \"A work in progress\""; 
     Pattern p = Pattern.compile("((?<=(\"))[\\w ]*(?=(\"(\\s|$))))|((?<!\")\\w+(?!\"))"); 
     Matcher m = p.matcher(someString); 

     while(m.find()) { 
      System.out.println("'" + m.group() + "'"); 
     } 
    } 
}

输出：

'Multiple quote test' 
'not' 
'in' 
'quotes' 
'inside quote' 
'A work in progress'

与上面使用的实施例中的正则表达式击穿在这里可以查看：

http://regex101.com/r/wM6yT9

与所有的说，正则表达式不应该是转到解决方案的一切 - 我只是觉得好玩。这个例子有很多边缘情况，比如处理unicode字符，符号等。你最好使用一个经过验证的真正的库来完成这种任务。在使用这个之前请看看其他答案。

来源

2012-05-22 03:12:23

我不确定输入是否包含Unicode，但是您的代码将无法使用来处理它。 – nhahtdh

这是一个很好的例子。 +1，为什么不把一个if来检查m.group（）是否返回一个空格，这样你就不必输出空格了。 –

辉煌的... +1 –

试试这个丑陋的代码。

String str = "hello my dear \"John Smith\" where is Ted Barry"; 
    List<String> list = Arrays.asList(str.split("\\s")); 
    List<String> resultList = new ArrayList<String>(); 
    StringBuilder builder = new StringBuilder(); 
    for(String s : list){ 
     if(s.startsWith("\"")) { 
      builder.append(s.substring(1)).append(" "); 
     } else { 
      resultList.add((s.endsWith("\"") 
        ? builder.append(s.substring(0, s.length() - 1)) 
        : builder.append(s)).toString()); 
      builder.delete(0, builder.length()); 
     } 
    } 
    System.out.println(resultList);

来源

2012-05-22 03:35:13

比我的代码好多了。 +1 –

过多的空白将导致程序生成空字符串。 – nhahtdh

@nhahtdh：O'yeah。实际上，我只是提供了一个提示。不是100％的工作解决方案。 Trevor Senior，把它钉牢了。不过，这也有相同的空白问题。但这不是一个真正的问题，可以很容易地解决。 –

commons-lang有一个StrTokenizer类来为你做这件事，并且还有java-csv库。

例与StrTokenizer：

String params = "\"John Smith\" Ted Barry" 
// Initialize tokenizer with input string, delimiter character, quote character 
StrTokenizer tokenizer = new StrTokenizer(params, ' ', '"'); 
for (String token : tokenizer.getTokenArray()) { 
    System.out.println(token); 
}

输出：

John Smith 
Ted 
Barry

来源

2012-05-22 03:35:18 Matt

@BasilioGerman我添加了一个例子，所以你可以考虑删除你的评论。 –

好，我做你想要做什么小snipet和做更多的事情。因为你没有指定更多的条件，我没有经历麻烦。我知道这是一种肮脏的方式，你可能会获得更好的结果。但对于编程这里的乐趣的例子：

String example = "hello\"John Smith\" Ted Barry lol\"Basi German\"hello"; 
    int wordQuoteStartIndex=0; 
    int wordQuoteEndIndex=0; 

    int wordSpaceStartIndex = 0; 
    int wordSpaceEndIndex = 0; 

    boolean foundQuote = false; 
    for(int index=0;index<example.length();index++) { 
     if(example.charAt(index)=='\"') { 
      if(foundQuote==true) { 
       wordQuoteEndIndex=index+1; 
       //Print the quoted word 
       System.out.println(example.substring(wordQuoteStartIndex, wordQuoteEndIndex));//here you can remove quotes by changing to (wordQuoteStartIndex+1, wordQuoteEndIndex-1) 
       foundQuote=false; 
       if(index+1<example.length()) { 
        wordSpaceStartIndex = index+1; 
       } 
      }else { 
       wordSpaceEndIndex=index; 
       if(wordSpaceStartIndex!=wordSpaceEndIndex) { 
        //print the word in spaces 
        System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex)); 
       } 
       wordQuoteStartIndex=index; 
       foundQuote = true; 
      } 
     } 

     if(foundQuote==false) { 
      if(example.charAt(index)==' ') { 
       wordSpaceEndIndex = index; 
       if(wordSpaceStartIndex!=wordSpaceEndIndex) { 
        //print the word in spaces 
        System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex)); 
       } 
       wordSpaceStartIndex = index+1; 
      } 

      if(index==example.length()-1) { 
       if(example.charAt(index)!='\"') { 
        //print the word in spaces 
        System.out.println(example.substring(wordSpaceStartIndex, example.length())); 
       } 
      } 
     } 
    }

这也检查了未经过或报价前，用空格分隔的单词，如“约翰·史密斯”之前加上“你好”之后“巴西德国人”。

当字符串被修改为"John Smith" Ted Barry输出是三个串， 1）“约翰·史密斯” 2）泰德 3）巴里

在该示例中的字符串是你好“约翰·史密斯”泰德百里洛尔“巴斯德” 你好，并打印 1）喂 2） “约翰·史密斯” 3）泰德 4）百里 5）洛尔 6） “巴斯德” 7）喂

希望它能帮助

来源

2012-05-22 03:35:29

这是所有这些中最好的代码。它可以处理Unicode输入，并且当空间过多时不会生成空字符串。它会将所有内容保留在报价中（好吧，这可以是正数或负数）。我认为代码可以修改一下删除引号。进一步扩展可以是：添加对逃脱报价的支持。 – nhahtdh

当然，报价可以删除。只有我做到了保持报价。 ive添加了关于删除引号的注释。 –

这是我自己的版本，清理从http://pastebin.com/aZngu65y（发表评论）。它可以照顾Unicode。它会清理所有过多的空间（即使在报价中） - 根据需要，这可能是好的或坏的。不支持逃脱报价。

private static String[] parse(String param) { 
    String[] output; 

    param = param.replaceAll("\"", " \" ").trim(); 
    String[] fragments = param.split("\\s+"); 

    int curr = 0; 
    boolean matched = fragments[curr].matches("[^\"]*"); 
    if (matched) curr++; 

    for (int i = 1; i < fragments.length; i++) { 
    if (!matched) 
     fragments[curr] = fragments[curr] + " " + fragments[i]; 

    if (!fragments[curr].matches("(\"[^\"]*\"|[^\"]*)")) 
     matched = false; 
    else { 
     matched = true; 

     if (fragments[curr].matches("\"[^\"]*\"")) 
     fragments[curr] = fragments[curr].substring(1, fragments[curr].length() - 1).trim(); 

     if (fragments[curr].length() != 0) 
     curr++; 

     if (i + 1 < fragments.length) 
     fragments[curr] = fragments[i + 1]; 
    } 
    } 

    if (matched) { 
    return Arrays.copyOf(fragments, curr); 
    } 

    return null; // Parameter failure (double-quotes do not match up properly). 
}

用于比较样品输入：

"sdfskjf" sdfjkhsd "hfrif ehref" "fksdfj sdkfj fkdsjf" sdf sfssd 


asjdhj sdf ffhj "fdsf fsdjh" 
日本語　中文 "Tiếng Việt" "English" 
    dsfsd  
    sdf  " s dfs fsd f " sd f fs df fdssf "日本語　中文" 
"" ""  "" 
" sdfsfds " "f fsdf

（第二行是空的，第三行是空格，最后一行的格式不正确）。请根据您自己的预期输出进行判断，因为它可能会有所不同，但基线是，第一个案例应该返回[sdfskjf，sdfjkhsd，hfrif ehref，fksdfj sdkfj fkdsjf，sdf，sfssd]。

来源

2012-05-22 04:23:00 nhahtdh

用分隔符分隔带引号的字符串

回答

相关问题