无法打开的Unicode URL与蟒蛇

使用python 2.5.2和Linux的debian我试图得到一个包含西班牙字符（ 'I'）一个西班牙网址的内容：无法打开的Unicode URL与蟒蛇

import urllib 
url = u'http://mydomain.es/índice.html' 
content = urllib.urlopen(url).read()

我“M收到此错误：将网址传递之前

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)

我已经尝试使用到的urllib这一点：

url = urllib.quote(url)

这：

url = url.encode('UTF-8')

，但它不工作

你能告诉我什么，我做错了什么？

来源

2009-12-16 odeceixe

每适用的标准，RFC 1378，网址只能包含ASCII字符。很好的解释here，我引用：

"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

正如我给出解释的URL，这可能意味着你必须更换“小写我用重音符号”与'％ED”。

来源

2009-12-16 18:42:02

我相信这已经被改变，并且在现在最少的域可以包含任意的Unicode字符。 – Cerin

@Cerin排序。 [IRI可以包含任意的Unicode字符]（https://www.w3.org/International/articles/idn-and-iri），但是当你将它们转换为常规的URI时，它们使用'Punycode'被规范化为ASCII（for域组件）和百分比编码（用于路径组件）。 –

将URL编码为utf-8应该已经工作了。我想知道你的源文件是否被正确编码，以及解释者是否知道它。如果你的Python源文件保存为UTF-8，例如，那么你应该有

# coding=UTF-8

作为第一或第二线。

import urllib 
url = u'http://mydomain.es/índice.html' 
content = urllib.urlopen(url.encode('utf-8')).read()

适合我。

编辑：另外，请注意，交互式Python会话中的Unicode文本（无论是通过IDLE还是控制台）充满了编码相关的困难。在这些情况下，你应该使用Unicode文字（例如\ u00ED）。

来源

2009-12-16 18:40:45

这个工作对我来说：

#!/usr/bin/env python 
# define source file encoding, see: http://www.python.org/dev/peps/pep-0263/ 
# -*- coding: utf-8 -*- 

import urllib 
url = u'http://example.com/índice.html' 
content = urllib.urlopen(url.encode("UTF-8")).read()

来源

2009-12-16 18:41:14 miku

它适合我。确保你使用的是相当新的Python版本，并且你的文件编码是正确的。这里是我的代码：

# -*- coding: utf-8 -*- 
import urllib 
url = u'http://mydomain.es/índice.html' 
url = url.encode('utf-8') 
content = urllib.urlopen(url).read()

（mydomain.es不存在，因此DNS查找失败，但目前还没有统一的问题这一点。）

来源

2009-12-16 18:43:20

使用Python 3我得到'AttributeError：'字节'对象没有属性'超时'使用此代码时。有没有python 3解决方案？ – byxor

@BrandonIbbotson你应该尝试：'urllib.parse.quote（url）'而不是'url。encode（'utf-8'）' 你可以在这里阅读更多关于它的信息：https://docs.python.org/dev/library/urllib.parse.html#urllib.parse.quote – Snooze

谢谢@Snooze！ – byxor

无法打开的Unicode URL与蟒蛇

回答

相关问题