HTTP横幅抓取Python

我有兴趣做一个HTTP横幅抓取器，但是当我连接到端口80上的服务器，我发送了一些东西（例如“HEAD/HTTP/1.1”）recv不会返回任何东西给我像当我这样做，让我们说netcat ..HTTP横幅抓取Python

我将如何去呢？

谢谢！

2010-06-19 Kaep

尝试使用urllib2 module。

>>> data = urllib2.urlopen('http://www.example.com').read() 
>>> print data 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 
<HTML> 
<HEAD> 
    <META http-equiv="Content-Type" content="text/html; charset=utf-8"> 
    <TITLE>Example Web Page</TITLE> 
</HEAD> 
<body> 
<p>You have reached this web page by typing &quot;example.com&quot;, 
&quot;example.net&quot;, 
    or &quot;example.org&quot; into your web browser.</p> 
<p>These domain names are reserved for use in documentation and are not available 
    for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC 
    2606</a>, Section 3.</p> 
</BODY> 
</HTML> 

>>>

问一个例子，你可能会错过更好的观点。要查看content-type标题：

>>> stream = urllib2.urlopen('http://www.example.com') 
>>> stream.headers['content-type'] 
'text/html; charset=UTF-8' 
>>> data = stream.read() 
>>> print data[:100] 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 
<HTML> 
<HEAD> 
    <META http-equiv= 
>>>

来源

2010-06-19 16:37:30 gimel

我将如何去寻找答复？如果我想让我的脚本能够识别，让我们说内容类型，并打印出来？ – Kaep 2010-06-19 16:54:21

查看内容类型示例（添加）。其实，你需要看看BeautifulSoup - http://stackoverflow.com/questions/tagged/beautifulsoup – gimel 2010-06-19 17:18:32

您是否发送了“\ r \ n \ r \ n”来表示请求结束？如果你不是，服务器仍然在等待其余的请求。

来源

2010-06-19 16:32:04

我会尽快尝试。等一下。 – Kaep 2010-06-19 16:41:00

谢谢，它的工作！ – Kaep 2010-06-19 16:42:47

我将如何去寻找答复？如果我想让我的脚本能够识别，让我们说内容类型，并打印出来？ – Kaep 2010-06-19 16:44:54

HTTP横幅抓取Python

回答

相关问题