2016-03-09 117 views
1

的URL如何python.please帮助me.this使用scrapy来从网站图片的网址是我的代码scrapy提取图像

from scrapy.spiders import CrawlSpider, Rule 
#from scrapy.linkextractors.lxmlhtml import LxmlLinkExtractor 
from scrapy.contrib.linkextractors import LinkExtractor 
from scrapy.item import Item, Field 

class MyItem(Item): 
    url= Field() 


class someSpider(CrawlSpider): 
    name = 'crawltest' 
    allowed_domains = ['bambeeq.com'] 
    start_urls = ['http://www.bambeeq.com/'] 
    rules = (Rule(LinkExtractor(allow=()), callback='parse_obj', follow=True),) 

    def parse_obj(self,response): 
     item = MyItem() 
     item['url'] = [] 
     for link in LinkExtractor(allow=(),deny = self.allowed_domains).extract_links(response): 
      item['url'].append(link.url) 
      #item['image'].append(link.img) 
     return item 
+0

问题寻求帮助调试(**“为什么不是这个代码的工作?” **)必须包括所期望的行为,*一个特定的问题或错误*和*必要最短的代码*到在问题本身**中重现它**。没有**明确问题陈述**的问题对其他读者没有用处。请参阅:[如何创建最小,完整和可验证示例](http://stackoverflow.com/help/mcve)。 – MattDMo

回答

2

要解压缩的链接(“一”元素),而不是图像('img'元素)。试试这个:

# iterate over the list of images 
for image in response.xpath('//img/@src').extract(): 
    # make each one into a full URL and add to item[] 
    item['url'].append(response.urljoin(image)) 

yield item