scrapy可以单独使用scrapy刮掉iframe的内容吗？

iframe code scrapy可以单独使用scrapy刮掉iframe的内容吗？

我试过复制并粘贴网站的元素（xpath），但没有返回任何结果。

scrapy可以抓取iframe中的数据吗？如果是的话，如果没有，还应该做些什么其他的事情？谢谢！

rules = (Rule (SgmlLinkExtractor(deny = path_deny_base, restrict_xpaths=('*')) 
    , callback="parse", follow= True), 
    ) 


    def parse(self, response): 
     yield(Request(url, callback = self.parse_iframe)) 

    def parse_iframe(self, response): 
     #your code to scrape the content from iframe 
     #def parse_items(self, response): 
     hxs = HtmlXPathSelector(response) 
     titles = hxs.select('//div[2]/h1') 
      #//div[2]/h1 
     linker = hxs.select('//div[2]/div[10]/a[1]') 
      #//div[2]/div[10]/a[1] 
     loc_Con = hxs.select('//div[2]/div[1]/div[2]/span/span/span[1]') #//div[2]/div[1]/div[2]/span/span/span[1] 
     loc_Reg = hxs.select('//div[2]/div[1]/div[2]/span/span/span[2]') #/div[2]/div[1]/div[2]/span/span/span[2] 
     loc_Loc = hxs.select('//div[2]/div[1]/div[2]/span/span/span[3]') #/div[2]/div[1]/div[2]/span/span/span[3] 
     items = [] 
     for titles in titles: 
      item = CraigslistSampleItem() 
      #item ["job_id"] = id.select('text()').extract()[0].strip() 
      item ["title"] = map(unicode.strip, titles.select('text()').extract()) #ok 
      item ["link"] = linker.select('@href').extract() #ok 
      item ["info"] = (response.url) 
      temp1 = loc_Con.select('text()').extract() 
      temp2 = loc_Reg.select('text()').extract() 
      temp3 = loc_Loc.select('text()').extract() 
      temp1 = temp1[0] if temp1 else "" 
      temp2 = temp2[0] if temp2 else "" 
      temp3 = temp3[0] if temp3 else "" 
      item["code"] = "{0}-{1}-{2}".format(temp1, temp2, temp3) 
      items.append(item) 
     return(items)

来源

2014-06-19 chano

Scrapy无法从iframe中抓取内容。相反，你也求iframe网址，如：

def parse(self, response): 
    yield(Request(url, callback = self.parse_iframe)) 

def parse_iframe(self, response): 
    #your code to scrape the content from iframe

在哪里，网址应该是iframe网址，例如https://career-meridia....../jobs)

编辑：

用红色下划线的部分更换网址。 Put the underlined part 编辑2： 请确保您传递了iframe url所需的每个参数。否则，你什么也得不到。如果它是post方法，你必须通过所有的post参数。

来源

2014-06-19 08:36:38

如果我想获得环境服务助手，请问xpath正常吗？ – chano

它肯定会做，如果你得到这个孩子iframe的响应 –

在这里很难阅读你的代码，你可以用这段代码编辑你的问题。谢谢 –

这就是我这样做的方式。首先获取iframe网址，然后再次调用解析。

urls = response.css('iframe::attr(src)').extract() 
for url in urls : 
     yield scrapy.Request(url....)

来源

2017-07-21 11:08:34 chairam

scrapy可以单独使用scrapy刮掉iframe的内容吗？

回答

相关问题