2011-08-23 23 views
4

在特定行中存在特定值的ruby中,从CSV文件中删除行的聪明方法是什么?删除文件中的行 - Ruby

下面是一个文件的例子:

350 lbs., Outrigger Footprint, 61" x 53", Weight, 767 lbs., 300-2080 
350 lbs., Outrigger Footprint, 61" x 53", Weight, 817 lbs., 300-2580 
350 lbs., Outrigger Footprint, 61" x 53", Weight, 817 lbs., 300-2580 
350 lbs., Outrigger Footprint, 69" x 61", Weight, 867 lbs., 300-3080 
350 lbs., Outrigger Footprint, 69" x 61", Weight, 867 lbs., 300-3080 

理想情况下,我想仅此创建一个新的文件:鉴于此,当

350 lbs., Outrigger Footprint, 61" x 53", Weight, 767 lbs., 300-2080 
350 lbs., Outrigger Footprint, 61" x 53", Weight, 817 lbs., 300-2580 
350 lbs., Outrigger Footprint, 69" x 61", Weight, 867 lbs., 300-3080 

300-2580 
300-3080 
300-2080 

所以我知道我可以用sort filename|uniq -d来做到这一点,但我想学习Ruby(有点痛苦)。

由于提前, 中号

回答

10

你可以用它来获取数组中的唯一行csv文件

File.readlines("file.csv").uniq 
=> ["350 lbs., Outrigger Footprint, 61\" x 53\", Weight, 767 lbs., 300-2080\n", "350 lbs., Outrigger Footprint, 61\" x 53\", Weight, 817 lbs., 300-2580\n", "350 lbs., Outrigger Footprint, 69\" x 61\", Weight, 867 lbs., 300-3080\n"] 

将其写入到一个新的文件,你可以以写模式打开一个文件,写入到文件这样的:

File.open("new_csv", "w+") { |file| file.puts File.readlines("csv").uniq } 

为了比较值,你可以上使用split功能 “” 访问这样的每一列:

rows = File.readlines("csv").map(&:chomp) # equivalent to File.readlines.map { |f| f.chomp } 
mapped_columns = rows.map { |r| r.split(",").map(&:strip) } 
=> [["350 lbs.", " Outrigger Footprint", " 61\" x 53\"", " Weight", " 767 lbs.", " 300-2080"], ["350 lbs.", " Outrigger Footprint", " 61\" x 53\"", " Weight", " 817 lbs.", " 300-2580"], .....] 
mapped_columns[0][5] 
=> "300-2080" 

如果您需要更多功能,最好安装FasterCSV gem

+2

你只需要FasterCSV如果你坚持1.8,1.9的CSV是FasterCSV(有一些改进)。 –

+0

@ mu..yes..u r right – rubyprince

+0

我在使用FasterCSV,但仍然可以使用.uniq吗? – MarkL

0

嗯,我不认为这个例子中会得到你正在寻找...答案,但是这会工作...

tmp.txt =>

350 lbs., Outrigger Footprint, 61" x 53", Weight, 767 lbs., 300-2080 
350 lbs., Outrigger Footprint, 61" x 53", Weight, 817 lbs., 300-2580 
350 lbs., Outrigger Footprint, 61" x 53", Weight, 817 lbs., 300-2580 
350 lbs., Outrigger Footprint, 69" x 61", Weight, 867 lbs., 300-3080 
350 lbs., Outrigger Footprint, 69" x 61", Weight, 867 lbs., 300-3080 

File.readlines('tmp.txt').uniq将返回此:

350 lbs., Outrigger Footprint, 61" x 53", Weight, 767 lbs., 300-2080 
350 lbs., Outrigger Footprint, 61" x 53", Weight, 817 lbs., 300-2580 
350 lbs., Outrigger Footprint, 69" x 61", Weight, 867 lbs., 300-3080 

所以,你也可以轻松地使用Array fxns进行排序。谷歌红宝石阵列,我相信你可以学习如何选择,如果你想要一个条目根据比较期望的字符串。

0

你也可以创建一个不允许重复记录作为条目的散列。 例如,下面的代码应该有所帮助:

require 'optparse' 
require 'csv' 
require 'pp' 

options = Hash.new 

OptionParser.new do |opts| 
    opts.banner = "Usage: remove_extras.rb [options] file1 ..." 

    options[:input_file] = '' 
    opts.on('-i', '--input_file FILENAME', 'File to have extra rows removed') do |file| 
     options[:input_file] = file 
    end 

end.parse! 
if File.exists?(options[:input_file]) 
    p "Parsing: #{options[:input_file]}" 
     UniqFile=Hash.new  
     File.open(options[:input_file]).each do |row| 
     UniqFile.store(row,row.hash)     
end 
puts "please enter the output filename: \n" 
aFile=File.open(gets.chomp, "a+") 
UniqFile.each do|key,value| 
aFile.syswrite("#{key}") 
end 

end