2014-01-27 89 views
0

我正在使用zipcode dataset和csvkit,但无处可去。如果我做csvcut -n zipcode.csv我看到列的清洁列表:为什么csvkit给我“列表索引超出范围”错误?

1: zip 
    2: city 
    3: state 
    4: latitude 
    5: longitude 
    6: timezone 
    7: dst 

但是,任何搜索我csvgrep做只是给我一个错误。这里有一个数据块:

"99919","Thorne Bay","AK","55.677232","-132.55624","-9","1" 
"99921","Craig","AK","55.456449","-133.02648","-9","1" 
"99922","Hydaburg","AK","55.209339","-132.82545","-9","1" 
"99923","Hyder","AK","55.941442","-130.0545","-9","1" 
"99925","Klawock","AK","55.555164","-133.07316","-9","1" 
"99926","Metlakatla","AK","55.123897","-131.56883","-9","1" 
"99927","Point Baker","AK","56.337957","-133.60689","-9","1" 
"99928","Ward Cove","AK","55.395359","-131.67537","-9","1" 
"99929","Wrangell","AK","56.409507","-132.33822","-9","1" 
"99950","Ketchikan","AK","55.875767","-131.46633","-9","1" 

the docs,我希望csvgrep -c 2 -m "Hyder" zipcode.csv会变成了一场比赛,而是我得到:

zip,city,state,latitude,longitude,timezone,dst 
list index out of range 

我能够用其他的CSV文件csvgrep罚款 - 为什么这个会让人窒息?

回答

1

你的问题是“zipcodes.csv”格式不正确;它包含空行。例如,线#17是空白:

"00607","Aguas Buenas","PR","18.256995","-66.104657","-4","0" 

"00609","Aibonito","PR","18.142002","-66.273278","-4","0" 

文档的作者可能已经这样做了指示邮政编码00608不存在,这可能是在某些情况下有益的,但阻止您使用csvkit实用程序。

您可以使用sed,如果你使用*基于Unix操作系统的它,您已经安装了自动删除,像这样的空行:

$ sed '/^$/d' zipcode.csv > zipcode2.csv 

这将存储结果为“zipcode2。 CSV”。现在,我们可以用我们的新的“固定”的邮政编码文件:

$ csvgrep -c 2 -m "Hyder" zipcode2.csv 
zip,city,state,latitude,longitude,timezone,dst 
99923,Hyder,AK,55.941442,-130.0545,-9,1 
1

为了防止大多数错误,如所描述的,我使用csvclean(也csvkit)找到,并在源CSV正确的损坏的数据。另请参阅this blog post以获得完整的操作方法

相关问题