2011-10-11 38 views

回答

2

您可以使用String.split和正则表达式模式作为参数。 像这样:

"Hello_World I am Learning,Ruby".split /[ _,.!?]/ 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 
1
ruby-1.9.2-p290 :022 > str = "Hello_World I am Learning,Ruby" 
ruby-1.9.2-p290 :023 > str.split(/\s|,|_/) 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 
0

虽然上面的例子中工作,我想将字符串分割的话拆就不会被认为是任何一种文字的一部分字符的时候它可能会更好。要做到这一点,我这样做:

str = "Hello_World I am Learning,Ruby" 
str.split(/[^a-zA-Z]/).reject(&:empty?).compact 

本声明如下:

  1. 拆分由不在字母字符的字符串
  2. 然后拒绝任何为空字符串
  3. ,并移除阵列

然后将处理的话大部分组合的所有空值。上面的例子要求你列出你想匹配的所有字符。指定不认为是单词的一部分的字符要容易得多。

1

String#Scan似乎是一个合适的方法完成这个任务

irb(main):018:0> "Hello_World I am Learning,Ruby".scan(/[a-z]+/i) 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 

,或者您可以使用内置的匹配\w

irb(main):020:0> "Hello_World I am Learning,Ruby".scan(/\w+/) 
=> ["Hello_World", "I", "am", "Learning", "Ruby"] 
4

你可以使用\ W任何非单词字符:

"Hello_World I am Learning,Ruby".split /[\W_]/ 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 

"Hello_World I am Learning, Ruby".split /[\W_]+/ 
=> ["Hello", "World", "I", "am", "Learning", "Ruby"] 
0

只是为了好玩,1.9的Unicode识别版本(或1.8与Oniguruma):

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|\p{Connector_Punctuation}/) 
=> ["This", "µstring", "has", "words", "and", "thing's"] 

或许:

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|_/) 
=> ["This", "µstring", "has", "words", "and", "thing's"] 

真正的问题是确定哪些字符序列构成在这种情况下一个 “字”。您可能想要查看Oniguruma docs以了解支持的字符属性,Wikipedia has some notes on the properties