2011-08-09 49 views
1

我是python的新手,并且试图在插座上运气。所以我写了一个简单的HTTP客户端,但让我吃惊的是无法访问的Firefox可以访问网页,但它们使用相同的标题为什么python脚本无法通过代理下载网页

import socket 
clientsocket= socket.socket(socket.AF_INET, socket.SOCK_STREAM) 
clientsocket.connect(("213.229.83.205",80))#connect to proxy at given address 
print "connected to 213.229.83.205" 
sdata= """GET http://google.co.ug/ HTTP/1.1 
Host: google.co.ug 
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20100101 Firefox/6.0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en-us,en;q=0.5 
Accept-Encoding: gzip, deflate 
Proxy-Connection: keep-alive 
Cookie: cookie <-- Real cookie deleted 

""" 
print "sending request" 
clientsocket.send(sdata); 
rdata=clientsocket.recv(10240) 
if not rdata: print "no data found" 
else: 
    print "receiving data !" 
    myfile=open("c:/users/markdenis/desktop/google.html","w") 
    myfile.write(str(rdata)) 
    myfile.close() 
    print "data written to file on desktop" 
clientsocket.close() 
raw_input()#system(pause) 

当我运行它,它表明:

connected to 213.229.83.205 
sending request 
no data found 
+0

有是在地址上面跑 –

+0

你确定你的线之间和头后休息是'一个glype代理\ r \ N'?它是一些服务器所需要的(大部分是我的经验)。 – Skurmedel

+0

我可以知道你的代码的目标,没有使用urllib2的任何特殊原因吗? – Kracekumar

回答

5

HTTP协议要求在每个标头的末尾有\r\n,在HTTP标头的末尾有一个空白行。您对sdata缓冲区中的行结尾没有明确说明,因此缓冲区仅以\n行结束符结束。

测试在Windows,Linux和OS X,可以肯定的:

>>> x = """a 
b 
c""" 
>>> x 
'a\\nb\\nc\\n' 

,你需要:

>>> x = "a\r\nb\r\nc\r\n" 
>>> x 
'a\\r\\nb\\r\\nc\\r\\n' 

添加\r\n S和给它一个镜头。直接在缓冲区做这将让你一组额外的\n,所以拆起来:

sdata = "GET http://google.co.ug/ HTTP/1.1\r\n" 
sdata += "Host: google.co.ug\r\n" 
sdata += "User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20100101 Firefox/6.0\r\n" 
sdata += "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n" 
sdata += "Accept-Language: en-us,en;q=0.5\r\n" 
sdata += "Accept-Encoding: gzip, deflate\r\n" 
sdata += "Proxy-Connection: keep-alive\r\n" 
sdata += "\r\n"