2015-01-06 25 views
1

这里是我的scrapy代码:如何从不同的URL获取的XPath,通过start_requests方法返回

import scrapy 
from scrapy.spider import BaseSpider 
from scrapy.selector import Selector 
import MySQLdb 


class AmazonSpider(BaseSpider): 
    name = "amazon" 
    allowed_domains = ["amazon.com"] 
    start_urls = [] 

    def parse(self, response): 
     print self.start_urls 

    def start_requests(self): 
     conn = MySQLdb.connect(user='root',passwd='root',db='mydb',host='localhost') 
     cursor = conn.cursor() 
     cursor.execute(
      'SELECT url FROM products;' 
      ) 
     rows = cursor.fetchall() 
     for row in rows: 
      yield self.make_requests_from_url(row[0]) 
     conn.close() 

?我怎样才能通过start_requests函数返回的URL的XPath的?

注意:网址是不同的域,不一样。

回答

1

yield使start_requests发挥功能。使用for循环来获取从它返回的每个结果。

像这样:

... 
my_spider = AmazonSpider() 
for my_url in my_spider.start_requests(): 
    print 'we get URL: %s' % str(my_url) 
... 
+0

感谢名单mate.it工作! – user2728494

相关问题