2015-07-21 27 views
1

我正在尝试写入CSV,但是当我检查输出时,我发现一些“检查”字段留空,即使在我看到输出它打印正确。我相信这是一个zip()限制,因为我正在使用它打印列明智而不是连续10。我再次用spider打印的Xpath输出正确输出。我想知道它是一个zip或我的语法limitaion?或者另一个猜测是它可能是delimeter=','试图写入CSV,但一些字段被scrapy for python排除

Pipline.py

import csv 
import itertools 
from string import maketrans 
class CSVPipeline(object): 

    def __init__(self): 
     self.csvwriter = csv.writer(open('Output.csv', 'wb'),delimiter=',') 
     self.csvwriter.writerow(['names','date','location','starts','subjects','reviews']) 

    def process_item(self, item, ampa): 

     rows = zip(item['names'],item['date'],item['location'],item['stars'],item['subjects'],item['reviews']) 


     for row in rows: 
     self.csvwriter.writerow(row) 

     return item 

示例输出,一些评论得到排除

names,date,location,starts,subjects,reviews 
Aastha2015,20 July 2015," 
Bengaluru (Bangalore), India 
",5,Amazing Time in Ooty," 
Hi All, i visited Ooty on July 10th, choose to stay in Elk Hills hotel, i read reviews of almost all good hotels and decided to try Elk Hills. I must say the property is huge, very well maintained. Rooms are clean spacious & views are great. Food in the Cafe Blue was awesome. They forgot to give us the... 
" 
pushp2015,11 July 2015," 
Gurgaon, India 
",3,Nice Hotel ...under going maintainance," 
" 
REDDY84,25 June 2015," 
Chennai, India 
",4,Good old property," 
Its an old property with a very good view. We booked a suite at a very reasonable price but they charged for an extra bed 1500 + txs which i feel was not required because the bed was already their in the suite room.Other then that everything was good. Breakfast was nice . The room they had given was neat... 
" 
arun606,20 June 2015," 
Mumbai, India 
",5,Amazing Hospitality," 
" 
+0

输出和您的项目的一些示例将很高兴看到。 – GHajba

+0

添加样本输出 – Smashed

+0

,因为您可以看到一些评论被删除。 – Smashed

回答

1

我不知道,但我认为你所说的限制更是工作的zip方式。

退房izip_longest不会停在最短的名单。

例子:

>>> zip('abc', '12345') 
[('a', '1'), ('b', '2'), ('c', '3')] 
>>> list(itertools.izip_longest('abc', '12345', fillvalue=0)) 
[('a', '1'), ('b', '2'), ('c', '3'), (0, '4'), (0, '5')] 
+0

这不起作用,即使存在数据,它仍会保持默认填充值 – Smashed

0

想通了,正如@马丁·埃文斯说我检查的长度,发现有很多回车,将简单地把一个空格。我不知道为什么,但它确实如此。要解决它只是添加此代码。

while "\n" in yourlist['key']: yourlist['key'].remove("\n")