2016-04-27 231 views
0

如何从字符串中删除电话号码(如果它们的格式不同)?从文本中删除电话号码

比如我有:

text=' 
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78 
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array) 
    Smart Functionality: Yes - xx TV Streaming Platform 
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78' 

也怎样从文本中删除这些格式

09414241441 095-41-41-441 (096)4141441 091-123-11-22 094 00 111 222 

如何删除这些电话号码?

(093) 123-34-56 (068) 123 45 67 (095) 123 456 78 

我试过gsub,但它删除了所有相似的数字。

+1

后你已经尝试了什么。你使用的是正则表达式吗? –

+1

你需要删除哪些电话号码格式? [有很多。](https://en.wikipedia.org/wiki/National_conventions_for_writing_telephone_numbers) –

+0

有没有一些特定的格式,它可以是不同的 – user

回答

3

您可以使用:

text.gsub(/\([0-9]*\)\s[0-9]*(-|\s)[0-9]*(-|\s)[0-9]*/, '') 

这个人会删除手机中的你在文本中指定的格式:

  • (XXX)XXX-XX-XX
  • (XXX) XXX XX XX

并且总是在您尝试编写正则表达式时尝试使用此Rubular

  • \([0-9]*\)需要捕获数的括号(...)内,但括号在正则表达式的特殊字符,从而增加\之前,[0-9]意味着需要一个号码,作为内部,从而增加*均值为0或它不仅1号更多数量应该是内部的,

  • \s需要加上一个空格,

  • (-|\s)需要破折号(-)(OR |)空间(\s

为其他格式,如:

  • XXXXXXXXXX
  • XXX-XX-XX-XXX
  • (XXX)XXXXXXX

旁上方的一个,与以下相关:

text.gsub(/\(*[0-9]+(\)|-)+\s*[0-9]+(-|\s)*[0-9]+(-|\s)*[0-9]+|[0-9]{10}/, '') 
+0

正则表达式非常有用,但有点复杂的了解 – user

+0

也如何从文本中删除这些格式 '09414241441 095-41-41-441(096)4141441' – user

+0

在帖子中写新备注现在,只需一分钟 –

1

根据您的格式,下面的正则表达式的作品

/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/ 

的Ruby代码

print text.gsub(/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/, "") 

Ideone Demo

0

如果你的文字是固定的格式,这些数字将永远是第一行在块中,然后简单地删除第一行:

text=' 
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78 
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array) 
    Smart Functionality: Yes - xx TV Streaming Platform 
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78' 

text.strip 
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78" 
text.strip.lines 
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"] 
text.strip.lines[1..-1].join 
# => " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78" 

或者:

lines = text.strip.lines 
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"] 
lines.shift 
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n" 
lines.join 
# => " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78" 

使用正则表达式和gsub可以工作,但它也更容易成为一个维护问题。

如果电话号码将永远是一条线,但不一定是第一,那么我仍然使用lines打破文本到一个数组,但我会用reject用正则表达式来数模式相匹配检查每一行,并拒绝一个与电话号码般的正则表达式匹配:在使用strip导致领先的“\ n”被保留

lines = text.lines 
lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] } 
# => ["\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"] 

lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }.join 
# => "\n Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78" 

注意。

使用lines将文本转换为数组有助于隔离任何损坏,以防其他情况触发模式匹配,从而导致文本无意中损坏。

这种方法出现故障时,电话号码分散在整个文本中。尽管如此,我仍然可能会使用这种方法将文本减少到单独的行,如果存在误报,也可以减少可能的损害。

0
phone_formats = [/(\d{3}) \d{3}-\d{4}/, 
       /\d{3}-\d{3}-\d{4}/, 
       /\d{3} \d{3} \d{4}/, 
       /\(\d{3}\) \d{3} \d{3} \d{2}/, 
       /\(\d{3}\) \d{3} \d{2} \d{2}/, 
       /\(\d{3}\) \d{3}-\d{2}-\d{2}/, 
       /\d{3}-\d{3}-\d{2}-\d{2}/, 
       /\d{3}-\d{3}-\d{2}-\d{2}/] 

r = Regexp.union(phone_formats) 
    #=> /(?-mix:(\d{3}) \d{3}-\d{4})| 
    # (?-mix:\d{3}-\d{3}-\d{4})| 
    # (?-mix:\d{3} \d{3} \d{4})| 
    # (?-mix:\(\d{3}\) \d{3} \d{3} \d{2})| 
    # (?-mix:\(\d{3}\) \d{3} \d{2} \d{2})| 
    # (?-mix:\(\d{3}\) \d{3}-\d{2}-\d{2})| 
    # (?-mix:\d{3}-\d{3}-\d{2}-\d{2})| 
    # (?-mix:\d{3}-\d{3}-\d{2}-\d{2})/ 

(我已经打破各|以提高可读性后Regexp.union的返回值。)

text =<<_ 
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78 
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array) 
Smart Functionality: Yes - xx TV Streaming Platform 
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, 
TV with stand (inches) : 28.98x18.68x7.78 
_ 

puts text.gsub(r,'') 

Refresh Rate: 60Hz (Native). Backlight: LED (Full Array) 
Smart Functionality: Yes - xx TV Streaming Platform 
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, 
TV with stand (inches) : 28.98x18.68x7.78