缓存访问被拒绝。请求模块中需要身份验证

我正在尝试制作基本的网络爬虫。我的互联网是通过代理连接。所以我使用了给定的解决方案here。但仍然在运行代码时出现错误。我的代码是：缓存访问被拒绝。请求模块中需要身份验证

#!/usr/bin/python3.4 
import requests 
from bs4 import BeautifulSoup 

import urllib.request as req 
proxies = { 
    "http": r"http://usr:[email protected]:3128", 
    "https": r"http://usr:[email protected]:3128", 
} 

url = input("Ask user for something") 

def santabanta(max_pages,url): 
    page = 1 
    while (page <= max_pages):  
     source_code = requests.get(url,proxies=proxies) 
     plain_text = source_code.text 
     print (plain_text) 
     soup = BeautifulSoup(plain_text,"lxml") 
     for link in soup.findAll('a'): 
      href = link.get('href') 
      print(href) 
     page = page + 1 
santabanta(1,url)

但是，尽管在Ubuntu 14.04在终端中运行我收到以下错误：http://www.santabanta.com/wallpapers/gauhar-khan/：

是试图获取URL遇到以下错误？

缓存访问被拒绝。

对不起，你目前被允许请求http://www.santabanta.com/wallpapers/gauhar-khan/？从这个缓存直到你已经认证你自己。

发表我的网址是：http://www.santabanta.com/wallpapers/gauhar-khan/

请帮我

来源

2016-02-13 Kevin Pandya

打开URL。
点击F12（铬用户）
现在转到下面的菜单中的“网络”。
点击f5重新加载页面，以便chrome记录从服务器接收的所有数据。
打开任何“接收的文件”，并深入到“请求头”
通过所有的头request.GET中（）

[这里是一个图像，以帮助你] [1 ] [1]：http://i.stack.imgur.com/zUEBE.png

使头部如下：

头= { '接受'： '*/*'， '接受编码'： 'gzip的，放气，SDCH'， 'Accept-Language'：'en-US，en; q = 0.8'， 'Cache-Control'：'max-age = 0'， 'Connection'：'keep-alive'， 'Proxy-Authorization'：'Basic ZWRjZ3Vlc3Q6ZWRjZ3Vlc3Q ='， 'If-Modified-Since'：'Fri ，2015年11月13日17:47:23 GMT'， 'User-Agent'：'Mozilla/5.0（X11; Linux x86_64）AppleWebKit/537.36（KHTML，如Gecko）Chrome/48.0.2564.116 Safari/537.36' }

来源

2016-02-20 05:42:49

还有另一种解决此问题的方法。
你可以做的是让你的Python脚本，以使用环境变量

打开终端（CTRL + ALT + T）

export http_proxy="http://usr:[email protected]:port"
export https_proxy="https://usr:[email protected]:port"

定义的代理和删除代码代码
以下是更改后的代码：

#!/usr/bin/python3.4 
import requests 
from bs4 import BeautifulSoup 

import urllib.request as req 
url = input("Ask user for something") 

def santabanta(max_pages,url): 
    page = 1 
    while (page <= max_pages):  
     source_code = requests.get(url) 
     plain_text = source_code.text 
     print (plain_text) 
     soup = BeautifulSoup(plain_text,"lxml") 
     for link in soup.findAll('a'): 
      href = link.get('href') 
      print(href) 
     page = page + 1 
santabanta(1,url)

来源

2016-08-22 12:54:02 alphaguy

缓存访问被拒绝。请求模块中需要身份验证

回答

相关问题