进入下一页的麻烦

当我运行我的函数从某个站点获取某些链接时，它从第一个页面获取链接，但不是进入下一个页面来执行相同的操作，而是显示跟随错误。进入下一页的麻烦

履带：

import requests 
from lxml import html 

def Startpoint(mpage): 
    page=4 
    while page<=mpage: 
     address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html" 
     tail="https://www.katalystbusiness.co.nz/business-profiles/" 
     page = requests.get(address) 
     tree = html.fromstring(page.text) 
     titles = tree.xpath('//p/a/@href') 
     for title in titles: 
      if "bindex" not in title: 
       if "cdn-cgi" not in title: 
        print(tail + title) 


    page+=1 

Startpoint(5)

错误消息：

Traceback (most recent call last): 
    File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 19, in <module> 
    Startpoint(5) 
    File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 6, in Startpoint 
    while page<=mpage: 
TypeError: unorderable types: Response() <= int()

来源

2017-04-21 SIM

你分配的requests.get(address)的结果page。然后，Python无法将requests.Response对象与int进行比较。只需拨打page即可，如response。您的最后一行也有缩进错误。

import requests 
from lxml import html 

def Startpoint(mpage): 
    page=4 
    while page<=mpage: 
     address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html" 
     tail="https://www.katalystbusiness.co.nz/business-profiles/" 
     response = requests.get(address) 
     tree = html.fromstring(response.text) 
     titles = tree.xpath('//p/a/@href') 
     for title in titles: 
      if "bindex" not in title: 
       if "cdn-cgi" not in title: 
        print(tail + title) 


     page+=1 

Startpoint(5)

来源

2017-04-21 17:27:23 bernie

谢谢伯爵先生您的尖锐回应。它像魔术一样工作。当这个网站允许我这样做的时候会接受你的回答。再次感谢。 – SIM

非常欢迎！快乐的编码给你。 – bernie

我发现了一个传奇性的错误，但是当我发现这个错误时，我的脑袋正在旋转。这就是为什么编码不应该在飞行中实施的原因。再次感谢，先生，伯尼。 – SIM

你覆盖就行了page变量：page = requests.get(address)

所以，当它得到回while page<=mpage:在第二次迭代，它试图比较page（现在是一个响应对象）到mpage（一个整数）。

此外，page+=1应该在while循环内。

来源

2017-04-21 17:27:39 Stacktrace

进入下一页的麻烦

回答

相关问题