我使用scrapy制作了一个蜘蛛,我试图将下载链接保存到(python)列表中,以便稍后使用downloadlist[1]
调用列表条目。解析url链接到列表中
但scrapy将url保存为项目而不是列表。有没有办法将每个网址附加到列表中?
from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider
from scrapy.http import Request
import scrapy
from scrapy.linkextractors import LinkExtractor
DOMAIN = 'some-domain.com'
URL = 'http://' +str(DOMAIN)
linklist = []
class subtitles(scrapy.Spider):
name = DOMAIN
allowed_domains = [DOMAIN]
start_urls = [
URL
]
# First parse returns all the links of the website and feeds them to parse2
def parse(self, response):
hxs = HtmlXPathSelector(response)
for url in hxs.select('//a/@href').extract():
if not (url.startswith('http://') or url.startswith('https://')):
url= URL + url
yield Request(url, callback=self.parse2)
# Second parse selects only the links that contains download
def parse2(self, response):
le = LinkExtractor(allow=("download"))
for link in le.extract_links(response):
yield Request(url=link.url, callback=self.parse2)
print link.url
# prints list of urls, 'downloadlist' should be a list but isn't.
downloadlist = subtitles()
print downloadlist
如果'downloadlist'不是列表,它是什么? –
根据scrapy文档,我认为它可能是一个请求对象。当我尝试打印'downloadlist [3]'时,我得到:TypeError,'字幕'对象不支持索引。 – LuukS
为什么不通过'downloadlist'作为参数,然后在需要的地方追加元素。 –