在Python 3.建议避免堆栈溢出需要

免责说明：我绝对没有对计算机科学的想法，不具备有关的任何事情发生在幕后的内部运作一无所知。在互联网上教授自己使用everythign进行编码。在Python 3.建议避免堆栈溢出需要

Python版本：

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit(Intel)] on win32

在一个正常的解析器，其主要观点是获取图像的图像的全尺寸的网址，将其保存到一个文件供以后下载，然后进入下一个工作图像一致，由于相关网站的糟糕网络架构，这几乎是强制性的。当我完成这个程序时，我在第976次执行期间遇到了一个错误。

RuntimeError: maximum recursion depth exceeded in comparison

经研究，我发现问题是由于“堆栈溢出”。但是，目前我不知道如何解决，而不会造成任何显着的性能下降。（虽然，这是不是一个真正的问题，因为我只是做了研究。）

这让我想起我的问题，我怎么能解决这个问题，我从哪里可以学习到更多这样的事情，像什么是堆栈溢出开始？

（该项目工程优良，堆栈溢出停止它虽然）时，有太多的嵌套函数调用

import requests 
from bs4 import BeautifulSoup 

def somesite_parsing(url): 

    connection = requests.get(url) 
    html = connection.text 
    soup = BeautifulSoup(html, "html.parser") 

    # The exception is necessary due to the web architecture. 
    # Images that don't have different versions by size have an img tag. 
    # Returns "http://www.somesite.net/tag_tag_tag.full.jpg" 
    try: 
     semi_link = soup.select("html > body > #wrapper > #body > #content > #large > a") 
     full_link = semi_link[0].get("href") 
     print(full_link) 

    except IndexError: 
     semi_link = soup.select("html > body > #wrapper > #body > #content > #large > img") 
     full_link = semi_link[0].get("src") 
     print(full_link) 

    # File was created during testing so I switched to appending. 
    # Saves link into folder. 
    fx = open("list_file.txt", "a") 
    fx.write(full_link + "\n") 
    fx.close() 

    # Fetches the next url. 
    # Returns "/id_number" 
    next_link = soup.select("html > body > #wrapper > #body > #menu > .smallthumbs > li > a") 
    next_link = next_link[0].get("href") 
    next_link = "http://www.somesite.net" + next_link 
    print(next_link) 

    print() 
    somesite_parsing(next_link) 


somesite_parsing("http://www.somesite.net/1905220")

来源

2015-08-20 N. Cross

我假设了最近一个电话，'zerochan_parsing'，实际上应该是'somesite_parsing'？ –

Woops，没有注意到XD –

每次函数被再次调用时，总是调用'somesite_parsing'。您需要确定一种停止调用'somesite_parsing'的方法。所以也许尝试检查你是否仍然得到一个id_number。如果你没有得到一个id_number，那么在你再次调用'somesite_parsing'之前'函数'返回' – Dan

发生堆栈溢出。这主要发生在一个函数继续无休止地调用自己的时候。

在你的情况，你叫somesite_parsing里面本身。这最终导致堆栈溢出。

有几种方法来避免这种情况。我会建议有一个循环解析。

变化somesite_parsing返回下一个环节，而不是调用本身，你可以这样做：

next_link = "http://www.somesite.net/1905220" 
while next_link: 
    next_link = somesite_parsing(next_link)

这将允许你从somesite_parsing返回falsy值，停止循环。

来源

2015-08-20 17:13:03

修复了我把所有东西都转换为while循环并使url等于next_link的问题。你能否详细说明为什么这会起作用？似乎没有什么不同。 –

我可以给你的技术最好的答案是我已经给的一个：当嵌套函数调用过多时发生堆栈溢出。每次函数调用任何其他函数（包括它本身）之后，结束添加另一层嵌套。 ---更技术性的答案是，运行时只有一定数量的堆栈帧（谷歌它）的空间，并且帧被添加到堆栈时调用一个函数，当它们结束时弹出。如果你一直添加帧而不弹出任何你会得到和溢出（太多的帧）。 –

你最初做的事情在用tail调用优化（google it）的语言中会很好：一些语言比其他语言更好地处理递归。 –

'while'循环确实是你需要的。

这里是我会怎么做，但我还没有运行代码。

import requests 
import json 

start_url = "http//your_start_url" 

def save_data(data): 
    """Or however you want to save your data. 
    I like .jsonl, see http://jsonlines.org/""" 
    data = json.dumps(data) 
    fx = open("data_file.jsonl", "a") #see 
    fx.write(data + "\n") 
    fx.close() 

def get_url(url): 
    "This returns something similar to an 'option type'." 
    r = requests.get(url) 
    return {"success": r.ok, 
      "next_url": parse_your_response_for_next_url(r.text), 
      "page":  r.text, 
      "url":  url} 

################################## 


response = get_url(start_url) 

while respose["success"]: 
    save_data(response) 
    response = get_url(response["next_url"])

（我使用伪“选项类型”和jsonl文件。但是，这只是一个风格决定。见https://en.wikipedia.org/wiki/Option_type和http://jsonlines.org/）

另外，如果你正在做足够的请求到达最大递归深度，用@functools.lru_cache或一些磁盘备份替代方法存储响应可能会很好。

来源

2015-08-20 18:28:01 Logan

在Python 3.建议避免堆栈溢出需要

回答

相关问题