Python：机械化无限地随机停止程序

我正在编写一些使用机械化来访问网站的代码，但是通常情况下，当我运行Python代码时，它会无限期停止在我使用mechanize.ParseResponse的行。这不是给我一个错误，而是我必须通过CTRL+C打断它。另外，我相信我正在使用该方法的正确参数。但是，我很困惑，为什么我的程序会突然停止运行。任何想法？Python：机械化无限地随机停止程序

作为额外的背景，我在Mac上运行。

任何帮助将不胜感激！

编辑：以下是我的代码

注：我呼吁python bikes.py并在下面一行偶尔死机：

form = mechanize.ParseResponse(response, backwards_compat=False)

有时，它也将停止在：

text = response.read()

# bikes.py 
import re 
import webbrowser 
import mechanize 
import urllib 

brands = ["cannondale", "felt", "fuji", "giant", "specialized", "trek"] 
keywords = ["52", "53", "54", "shimano", "sora", "tiagra", "105", "ultegra", \ 
"road", "allez", "defy"] 
avoid = ["bmx", "mountain", "kids", "fixie", "jacket", "clothing", "fixed gear", \ 
"hybrid", "mtb"] 

def openLink(text): 
    text = text.lower() 
    open = False 
    for word in avoid: 
     if word in text: 
      return False 
    for word in keywords: 
     if word in text: 
      open = True 

    return open 

def scourPage(text, fileRead, fileWrite): 
    links = re.findall(r'class="row".+?href="(.+?)"', text) 

    for link in links: 
     if "http:" in link: 
      url = link 
     else: 
      url = homePage + link 

     page = urllib.urlopen(url) 
     pageText = page.read() 
     title = re.search(r'"postingtitle">.{0,10}<span.+?>[\s\'"]+(.+?)[\s\'"]{0,10}</h2>', \ 
     pageText, re.DOTALL) 
     body = re.search(r'"postingbody">(.+?)</section>', pageText, re.DOTALL) 
     openBody = False 
     openTitle = False 

     if body != None: 
      body = body.group(1) 
      openBody = openLink(body) 

     if title != None: 
      title = title.group(1) 
      openTitle = openLink(title) 

     if (openTitle and openBody) and (url not in fileRead) and (title not in fileRead): 
      fileWrite.write(title + "\n" + url + "\n") 

     fileWrite.close() 

homePage = "http://sfbay.craigslist.org" 
request = mechanize.Request(homePage) 
response = mechanize.urlopen(request) 
forms = mechanize.ParseResponse(response, backwards_compat=False) 
form = forms[0] 

request = form.click() 
response = mechanize.urlopen(request) 
emptySearch = response.geturl() 
request = mechanize.Request(emptySearch) 
response = mechanize.urlopen(request) 
forms = mechanize.ParseResponse(response, backwards_compat=False) 
form = forms[0] 

form["catAbb"] = ["bik"] 
form["maxAsk"] = "500" 
form.find_control("hasPic").items[0].selected = True 

for brand in brands: 
    form["query"] = brand 

    request = form.click() 
    response = mechanize.urlopen(request) 
    text = response.read() 

    fileR = open('bikes.txt', 'r').read() 
    fileA = open('bikes.txt', 'a') 

    scourPage(text, fileR, fileA) 

    fileA.close() 

    next = re.findall(r'class="nplink next".{0,50}<a href=\'(.+?)\'>', text, re.DOTALL) 

    while len(next) != 0: 
     text = urllib.urlopen(next[0]).read() 

     fileR = open('bikes.txt', 'r').read() 
     fileA = open('bikes.txt', 'a') 

     scourPage(text, fileR, fileA) 

     fileA.close() 

     next = re.findall(r'class="nplink next".{0,50}<a href=\'(.+?)\'>', text, re.DOTALL)

此代码梳理通过Craigslist广告试图淘汰那些我不想要的。在这种情况下，我试图找到一辆公路自行车，并避免任何山地自行车和其他物品。

UPDATE：

相当等待很长一段时间后，我终于键盘再次中断运行，并停在form = mechanize.ParseResponse(response, backwards_compat=False)线。我试着跑一遍，并得到这个错误：

Traceback (most recent call last): 
    File "bikes.py", line 97, in <module> 
    forms = mechanize.ParseResponse(response, backwards_compat=False) 
    File "build/bdist.macosx-10.8-intel/egg/mechanize/_form.py", line 945, in ParseResponse 
    File "build/bdist.macosx-10.8-intel/egg/mechanize/_form.py", line 981, in _ParseFileEx 
    File "build/bdist.macosx-10.8-intel/egg/mechanize/_form.py", line 758, in feed 
    File "build/bdist.macosx-10.8-intel/egg/mechanize/_sgmllib_copy.py", line 110, in feed 
    File "build/bdist.macosx-10.8-intel/egg/mechanize/_sgmllib_copy.py", line 192, in goahead 
    File "build/bdist.macosx-10.8-intel/egg/mechanize/_form.py", line 654, in handle_charref 
    File "build/bdist.macosx-10.8-intel/egg/mechanize/_form.py", line 149, in unescape_charref 
ValueError: unichr() arg not in range(0x10000) (narrow Python build)

来源

2013-07-27 Zhouster

你可以给我们的代码？ – svineet

已添加。希望能帮助到你。：X – Zhouster

你while回路可以去无限的，这说明了其行为。你有没有检查它不是？

当您的代码CTRL-C代码不一定意味着代码已损坏时，您会收到运行时错误。

来源

2013-07-31 20:56:11 djas

我已经在我的for循环中放置了打印语句，并且它不停在“text = response.read（）”行。我非常确定while循环的运行正常，因为如果它是无限的，它将会打印大量的语句，但事实并非如此。我认为这与urllib有关，但这只是一个猜测。 – Zhouster

Python：机械化无限地随机停止程序

回答

相关问题