元字符“。”在R的漩涡会议

目标是过滤和打印出的位置数据都以元音开始和结束。

下面是代码：

start_end_vowel<- "^[AEIOU]{1}.+[aeiou]{1}$" #Q1 
    vowel_state_lgl <-grepl(start_end_vowel,state.name) #Q2 
    state.name[vowel_state_lgl] #Q3 

[1] "Alabama" "Alaska" "Arizona" "Idaho" "Indiana" "Iowa"  "Ohio"  "Oklahoma"

我的问题是，什么是使用在Q1 .？

我知道.适用于任何字符，并且在上述情况下，我们想要用元音开始位置，但为什么+[aeiou]{1}$不需要.？事实上，R报告错误，如果使用+[aeiou]{1}.$

那么在这种情况下使用.的适当方式是什么？

来源

2016-11-25 leveygao

'“。*”'是组件。它说匹配任何系列的字符。然后'“。* [aeiou] $”'表示匹配以元音结尾的任何一系列字符。 [本网站]（http://www.regular-expressions.info/）是一个很棒的参考。 – lmo

'.'等同于正则表达式中的通配符，它将匹配任何东西。像'* +'这样的多个操作符都表示匹配“任何数字”和“多于一个”。我保留这张作弊书，以帮助自己：https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/ – Nate

这里是你的正则表达式如何读取故障：

^[AEIOU]{1}.+[aeiou]{1}$ 

^       Start of a line 
[AEIOU]     Match any single character inside the []. A single upper-case vowel 
     {1}    Match the preceding token exactly once 
      .    Match any character (depending on your RegEx engine and options this will behave with slight variation.) 
      +    Match the preceding token as many times as possible. "Any character any number of times" 
      [aeiou]{1} Match a single lower-case vowel. 
         $ Match the end of a line.

要回答你的问题，[aeiou]{1}$不需要.，因为它会读这样：

[aeiou]{1}.$ 

[aeiou]{1}  Match any lower-case vowel one time 
      .  Match any other character one time 
      $ Match end of line

这意味着你的正则表达式只会匹配以大写元音开头的行。

我在提供RegEx建议之前总是这么说：我不是RegEx忍者;可能有更好的方法来做到这一点。随着中说，如果需要，匹配以不区分大小写的元音开头的行，而不是使用：

^[aeiouAEIOU].+[aeiouAEIOU]$

注意，我删除冗余{1}预选赛。它应该默认匹配一个字符。

来源

2016-11-25 15:10:10 Brandon

谢谢！这是一个非常明确的答案！ – leveygao

元字符“。”在R的漩涡会议

回答

相关问题