python scrapy代码打印出我正在阅读的文件

我已经使用scrapy编写了一些python代码以从网站中提取一些地址。python scrapy代码打印出我正在阅读的文件

代码的第一部分是通过从单独的文件googlecoords.txt中读取纬度和经度坐标，然后形成start_urls的一部分，将start_urls放在一起。（我以前准备的googlecoords.txt文件将英国邮政编码转换为谷歌地图的谷歌坐标）。

因此，例如，在start_url列表中的第一项是“https://www.howdens.com/process/searchLocationsNear.php?lat=53.674434&lon=-1.4908923&distance=1000&units=MILES”，其中“土地增值税= 53.674434 & LON = -1.4908923”都来自于googlecoors.txt文件。

但是，当我运行代码时，它的工作原理非常完美，只是它首先打印出googlecoords.txt文件 - 我不需要。

如何停止此打印？（虽然我可以住在一起。）

import scrapy 
import sys 

from scrapy.http import FormRequest, Request 
from Howdens.items import HowdensItem 

class howdensSpider(scrapy.Spider): 
    name = "howdens" 
    allowed_domains = ["www.howdens.com"] 

    # read the file that has a list of google coordinates that are converted from postcodes 
    with open("googlecoords.txt") as f: 
     googlecoords = [x.strip('\n') for x in f.readlines()] 

    # from the goole coordinates build the start URLs 
    start_urls = [] 
    for a in range(len(googlecoords)): 
     start_urls.append("https://www.howdens.com/process/searchLocationsNear.php?{}&distance=1000&units=MILES".format(googlecoords[a])) 

    # cycle through 6 of the first relevant items returned in the text 
    def parse(self, response): 
     for sel in response.xpath('/html/body'): 
      for i in range(0,6): 
       try: 
        item = HowdensItem() 
        item['name'] =sel.xpath('.//text()').re(r'(?<="name":")(.*?)(?=","street")')[i] 
        item['street'] =sel.xpath('.//text()').re(r'(?<="street":")(.*?)(?=","town")')[i] 
        item['town'] = sel.xpath('.//text()').re(r'(?<="town":")(.*?)(?=","pc")')[i] 
        item['pc'] = sel.xpath('.//text()').re(r'(?<="pc":")(.*?)(?=","state")')[i] 
        yield item 
       except IndexError: 
        pass

来源

2016-12-27 nevster

的数据是JSON ...使用json解析器与它一起工作... –

像是有人在评论中指出的，你应该在start_requests()法json模块加载它：

import scrapy 
import json 

class MySpider(scrapy.Spider): 
    start_urls = ['https://www.howdens.com/process/searchLocationsNear.php?lat=53.674434&lon=-1.4908923&distance=1000&units=MILES'] 

    def parse(self, response): 
     data = json.loads(response.body_as_unicode()) 
     items = data['response']['depots'] 
     for item in items: 
      url_template = "https://www.howdens.com/process/searchLocationsNear.php?{}&distance=1000&units=MILES" 
      url = url_template.format(item['lat']) # format in your location here 
      yield scrapy.Request(url, self.parse_item) 

    def parse_item(self, response): 
     print(response.url)

来源

2016-12-28 11:06:24 Granitosaurus

python scrapy代码打印出我正在阅读的文件

回答

相关问题