R在Python中的read.table等效项

我正试图将一些处理工作从R移到Python。在R中，我使用read.table（）读取真正凌乱的CSV文件，并自动以正确的格式分割记录。例如。R在Python中的read.table等效项

391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p> 

<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p> 
","windows-7 printer hp"

被正确地分成4列。 1条记录可以分成许多行，并且在所有地方都有逗号。在R我只是这样做：

read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)

在Python中有什么可以做到这一点同样好吗？

谢谢！

来源

2013-10-23 mchangun

您可以使用csv模块。

from csv import reader 
csv_reader = reader(open("C:/text.txt","r"), quotechar="\"") 

for row in csv_reader: 
    print row 

['391788', 'HP Deskjet 3050 scanner always seems to break', "<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>\n\n<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>\n", 'windows-7 printer hp']

长度输出= 4

来源

2013-10-23 08:59:05

但这只是返回字符串。它不会像read.table那样推断每一列的类型。 –

的pandas模块还提供了许多R-样函数和数据结构，包括read_csv。这里的优点是数据将作为熊猫DataFrame读入，比标准的Python列表或字典更容易操作（尤其是如果您习惯于R）。这里是一个例子：

>>> from pandas import read_csv 
>>> ugly = read_csv("ugly.csv",header=None) 
>>> ugly 
     0            1 \ 
0 391788 HP Deskjet 3050 scanner always seems to break 

                2      3 
0 <p>I'm running a Windows 7 64 blah blah blah..... windows-7 printer hp

来源

2013-10-23 14:25:30 David

R在Python中的read.table等效项

回答

相关问题