1
中国网站与中国符号simbols。 我如何报废中文simbolse?编码错误刮
from urllib.request import urlopen
from urllib.parse import urljoin
from lxml.html import fromstring
URL = 'http://list.suning.com/0-258003-0.html'
ITEM_PATH = '.clearfix .product .border-out .border-in .wrap .res-info .sell-point'
def parse_items():
f = urlopen(URL)
list_html = f.read().decode('utf-8')
list_doc = fromstring(list_html)
for elem in list_doc.cssselect(ITEM_PATH):
a = elem.cssselect('a')[0]
href = a.get('href')
title = a.text
em = elem.cssselect('em')[0]
title2 = em.text
print(href, title, title2)
def main():
parse_items()
if __name__ == '__main__':
main()
错误是这样的。 错误看起来像这样 错误看起来像这样 错误看起来像这样 错误看起来像这样
http://product.suning.com/0000000000/146422477.html Traceback (most recent call last):
File "parser.py", line 27, in <module>
main()
File "parser.py", line 24, in main
parse_items()
File "parser.py", line 20, in parse_items
print(href, title, title2)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
请提供你给了我们在这个问题 – DomTomCat
我有一些问题,UTF-8编码的完整的错误堆栈。添加 –
也许这个答案[http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20](http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec -cant-encode-character-u-xa0-in-position-20)可以帮助你。 – gzc