如果你的文字是固定的格式,这些数字将永远是第一行在块中,然后简单地删除第一行:
text='
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)
Smart Functionality: Yes - xx TV Streaming Platform
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78'
text.strip
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
text.strip.lines[1..-1].join
# => " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
或者:
lines = text.strip.lines
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
lines.shift
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n"
lines.join
# => " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
使用正则表达式和gsub
可以工作,但它也更容易成为一个维护问题。
如果电话号码将永远是一条线,但不一定是第一,那么我仍然使用lines
打破文本到一个数组,但我会用reject
用正则表达式来数模式相匹配检查每一行,并拒绝一个与电话号码般的正则表达式匹配:在使用strip
导致领先的“\ n”被保留不
lines = text.lines
lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }
# => ["\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"]
lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }.join
# => "\n Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"
注意。
使用lines
将文本转换为数组有助于隔离任何损坏,以防其他情况触发模式匹配,从而导致文本无意中损坏。
这种方法出现故障时,电话号码分散在整个文本中。尽管如此,我仍然可能会使用这种方法将文本减少到单独的行,如果存在误报,也可以减少可能的损害。
后你已经尝试了什么。你使用的是正则表达式吗? –
你需要删除哪些电话号码格式? [有很多。](https://en.wikipedia.org/wiki/National_conventions_for_writing_telephone_numbers) –
有没有一些特定的格式,它可以是不同的 – user