2010-09-08 123 views
1

后发现封信,我想编写使用的Unix命令,正则表达式识别所有字符串不确认以下格式正则表达式下划线

First Leter is UpperCase  
Followed by any number of letters 
Underscore 
Followed by UpperCase Letter 
Followed by any number of letters 
Underscore 
and so on ............. 

下划线的数量是可变的

So valid ones are          Invalid ones are 
Alpha_Beta_Gamma          alph_Beta_Gamma 
Alpha_Beta_Gamma_Delta        Alpha_beta_Gamma 
Alppha_Beta           Alpha_beta 
Aliph_Theta_Pi_Chi_Ming        Alpha_theta_Pi_Chi_Ming 
+0

“A_B_C”有效还是无效? “ABC”呢? 'Abc'? 'AB_CD'?空的字符串? – 2010-09-08 13:51:24

+0

A_B_C有效ABC无效Abc也无效,AB_CD无效应该是Ab_Cd – lisa 2010-09-08 14:05:16

回答

4

grep有一个-v选项,它反转匹配(即返回不匹配的行)。 -E选项将grep置入extended-regexp模式(允许+和圆括号在模式中未转义)。

可以使用的模式(分解为清楚起见):

^    # beginning of string 
    [A-Z]  # a single uppercase letter 
    [a-z]*  # zero or more lowercase letters 
    (   # start a group 
    _   # an underscore 
    [A-Z]  # a single uppercase letter 
    [a-z]*  # zero or more lowercase letters 
)+   # close the group and it can appear one or more times 
$    # end of string 

因此,假设您有一个包含从你的问题你的8串的文件test.dat

grep -E -v "^[A-Z][a-z]*(_[A-Z][a-z]*)+$" test.dat 

返回:

alph_Beta_Gamma 
Alpha_beta_Gamma 
Alpha_beta 
Alpha_theta_Pi_Chi_Ming 
+0

Daniel非常感谢您 – lisa 2010-09-08 14:07:16

+0

感谢您的解释 – lisa 2010-09-08 14:12:55

+0

'[az]'后的'+'需要'*'匹配'A_B_C'。 – Omnifarious 2010-09-08 14:41:13

相关问题