2016-04-24 57 views
0

我不断遇到干净的csv输出问题。csv输出中的空白 - Python

下面是本程序:

import csv 
import requests 
from lxml import html 

page = requests.get('http://www.mediamarkt.be/mcs/productlist/_108-tot-127-cm-43-tot-50-,98952,501090.html?langId=-17') 
tree = html.fromstring(page.content) 

outfile = open("./tv_test1.csv", "wb") 
writer = csv.writer(outfile) 

rows = tree.xpath('//*[@id="category"]/ul[2]/li') 
writer.writerow(["Product Name", "Price"]) 

for row in rows: 
    price = row.xpath('div/aside[2]/div[1]/div[1]/div/text()') 
    product_ref = row.xpath('div/div/h2/a/text()') 
    writer.writerow([product_ref,price]) 

outfile.close() 

电流输出:

['\r\n\t\t\t\t\tTV SAMSUNG UE48JU6640UXXN 48" LCD FULL LED Smart Ultra HD Curved\r\n\t\t\t\t'],"['999,-']" 

需要的输出:

TV SAMSUNG UE48JU6640UXXN 48" LCD FULL LED Smart Ultra HD Curve,999,- 

回答

0

发现:

import csv 
import requests 
from lxml import html 

page = 
requests.get('http://www.mediamarkt.be/mcs/productlist/_108-tot-127-cm-43-tot-50-,98952,501090.html?langId=-17') 
tree = html.fromstring(page.content) 

outfile = open("./tv_test1.csv", "wb") writer = csv.writer(outfile) 

rows = tree.xpath('//*[@id="category"]/ul[2]/li') 
writer.writerow(["Product Name", "Price"]) 

for row in rows: 
    price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())') 
    product_ref = row.xpath('normalize-space(div/div/h2/a/text())') 
    writer.writerow([product_ref,price]) 

outfile.close() 
0

你可以简单地删除\n\r\t写入CSV文件中的数据之前:

import csv 
import requests 
from lxml import html 

page = requests.get('http://www.mediamarkt.be/mcs/productlist/_108-tot-127-cm-43-tot-50-,98952,501090.html?langId=-17') 
tree = html.fromstring(page.content) 

outfile = open("./tv_test1.csv", "wb") 
writer = csv.writer(outfile) 

rows = tree.xpath('//*[@id="category"]/ul[2]/li') 
writer.writerow(["Product Name", "Price"]) 

for row in rows: 
    price = row.xpath('div/aside[2]/div[1]/div[1]/div/text()') 
    for i in range(len(price)): 
     price[i]= price[i].replace("\n","") 
     price[i]= price[i].replace("\t","") 
     price[i]= price[i].replace("\r","") 

    product_ref = row.xpath('div/div/h2/a/text()') 
    for i in range(len(product_ref)): 
     product_ref[i]= product_ref[i].replace("\n","") 
     product_ref[i]= product_ref[i].replace("\t","") 
     product_ref[i]= product_ref[i].replace("\r","") 
    if len(product_ref) and len(price): 
     writer.writerow([product_ref,price]) 

outfile.close() 

,你将有:

enter image description here

请注意,我还检查的price长度和product_ref,然后将它们存储在文件中。

+0

'strip()'也可以工作,因为这些字符在末尾 –

+0

@ cricket_007是的,但是使用这种方法,内部字符也可以被删除。 – EbraHim