2014-04-25 113 views
1

我有一个关于编写能为任何网站提供标题的小工具的问题。我是python的新手,但想知道是否除编码外还有其他任何东西,我在开发该工具时必须在我的代码中进行说明?下面显示了我的代码的草稿。来自Python编码器的任何指针?Python脚本问题

#!/usr/bin/python 
import sys, urllib 


if len(sys.argv) == 2: 
    website = sys.argv[1] 
website = urllib.urlopen(sys.argv[1]) 
if(website.code != 200): 
    print "Something went wrong here" 
    print website.code 
    exit(0) 

print 'Printing the headers' 
print '-----------------------------------------' 
for header, value in website.headers.items() : 
    print header + ' : ' + value 
+0

这与安全性有什么关系?另外,为什么不使用卷曲? – phoops

回答

1

似乎是一个相当直接的脚本(尽管这个问题似乎更适合于stackoverflow)。夫妇的评论,首先curl -I是一个有用的命令行工具来比较。其次,即使你没有获得200的身份,仍然常常会显示有用的内容或标题。例如,

$ curl -I http://security.stackexchange.com/asdf 
HTTP/1.1 404 Not Found 
Cache-Control: private 
Content-Length: 24068 
Content-Type: text/html; charset=utf-8 
X-Frame-Options: SAMEORIGIN 
Set-Cookie: prov=678b5b9c-0130-4398-9834-673475961dc6; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly 
Date: Fri, 25 Apr 2014 07:24:00 GMT 

另请注意urllib自动跟随重定向。例如,用卷曲你会看到:

$ curl -I http://www.security.stackexchange.com 
HTTP/1.1 301 Moved Permanently 
Content-Length: 157 
Content-Type: text/html; charset=UTF-8 
Location: http://security.stackexchange.com/ 
Date: Fri, 25 Apr 2014 07:26:52 GMT 

而你的工具只会给。

$ python user3567119.py http://www.security.stackexchange.com 
Printing the headers 
----------------------------------------- 
content-length : 68639 
set-cookie : prov=9bf4f3d4-e3ae-4161-8e34-9aaa83f0aa4b; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly 
expires : Fri, 25 Apr 2014 07:29:32 GMT 
vary : * 
last-modified : Fri, 25 Apr 2014 07:28:32 GMT 
connection : close 
cache-control : public, no-cache="Set-Cookie", max-age=60 
date : Fri, 25 Apr 2014 07:28:31 GMT 
x-frame-options : SAMEORIGIN 
content-type : text/html; charset=utf-8 

第三,如果你继续在Python HTTP请求玩弄,我强烈建议使用requests。有了请求,你将能够看到301如果你这样做:

In [1]: import requests 

In [2]: r=requests.get('http://www.security.stackexchange.com') 

In [3]: r 
Out[3]: <Response [200]> 

In [4]: r.history 
Out[4]: (<Response [301]>,) 

这也是值得尝试一些HTTP请求在只是普通的旧的telnet。例如,telnet security.stackexchange.com 80然后快速键入:

GET/HTTP/1.1 
Host: security.stackexchange.com 

后跟一个空行。然后你会在网上看到实际的HTTP响应(而不是在urllib处理完HTTP响应后重新创建它):

HTTP/1.1 200 OK 
Cache-Control: public, no-cache="Set-Cookie", max-age=60 
Content-Type: text/html; charset=utf-8 
Expires: Fri, 25 Apr 2014 07:38:37 GMT 
Last-Modified: Fri, 25 Apr 2014 07:37:37 GMT 
Vary: * 
X-Frame-Options: SAMEORIGIN 
Set-Cookie: prov=a75de1f2-678b-4a9d-bbfd-39e933e60237; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly 
Date: Fri, 25 Apr 2014 07:37:36 GMT 
Content-Length: 68849 

<!DOCTYPE html>