2012-10-03 23 views
1

我已经成功地使用python和urllib2模块检索常规网页中的html代码。如何从url中使用冒号“:”来获取htmlcode?

但是,当我尝试使用它与冒号的网页它不起作用。 此编码:

f = urllib2.urlopen("http://http://gulasidorna.eniro.se/hitta:svenska+kyrkan/") 
htmlcode = f.read() 
print htmlcode 

以下代码会生成此错误消息。

File "/Users/jonathan/Documents/Dropbox/Python/eniro.py", line 137, in <module> 
    f = urllib2.urlopen("http://http://gulasidorna.eniro.se/hitta:svenska+kyrkan/") 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 394, in open 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 412, in _open 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1199, in http_open 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1140, in do_open 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 693, in _init_ 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 718, in _set_hostport 
httplib.InvalidURL: nonnumeric port: '' 
+5

也许是错误的地址。再看看,看看你是否可以发现它:'http:// http://gulasidorna.eniro.se/hitta:svenska + kyrkan /' –

+3

不应该的地址是'http://gulasidorna.eniro.se/hitta:svenska + kyrkan /' – shahkalpesh

+0

我觉得很羞耻。有时你会犯最愚蠢的错误。感谢您指出! – Jonathan

回答

3

这应该工作,你有一个额外HTTP://在URL的开始:

f = urllib2.urlopen("http://gulasidorna.eniro.se/hitta:svenska+kyrkan/") 
htmlcode = f.read() 
print htmlcode