Python 3 UnicodeDecodeError：'charmap'编解码器无法解码字节0x9d

我想制作搜索引擎，并在某些网页中关注教程。我想测试解析HTMLPython 3 UnicodeDecodeError：'charmap'编解码器无法解码字节0x9d

from bs4 import BeautifulSoup 

def parse_html(filename): 
    """Extract the Author, Title and Text from a HTML file 
    which was produced by pdftotext with the option -htmlmeta.""" 
    with open(filename) as infile: 
     html = BeautifulSoup(infile, "html.parser", from_encoding='utf-8') 
     d = {'text': html.pre.text} 
     if html.title is not None: 
      d['title'] = html.title.text 
     for meta in html.findAll('meta'): 
      try: 
       if meta['name'] in ('Author', 'Title'): 
        d[meta['name'].lower()] = meta['content'] 
      except KeyError: 
       continue 
     return d 

parse_html("C:\\pdf\\pydf\\data\\muellner2011.html")

，并得到错误

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 867: character maps to <undefined>enter code here

我看到了网络上的一些解决方案使用的编码（）。但我不知道如何在代码中插入encode（）函数。谁能帮我？

来源

2015-06-10 Fakhriyanto

什么是异常的**完整**回溯？ –

在Python 3中，为您打开文件（解码为Unicode）文本;你不需要告诉BeautifulSoup从哪个解码器解码。

如果解码数据失败，那是因为您没有告诉open()调用读取文件时使用哪个编解码器;添加正确的编解码器与encoding说法：

with open(filename, encoding='utf8') as infile: 
    html = BeautifulSoup(infile, "html.parser")

否则文件将与您的系统默认的编解码器，这取决于操作系统被打开。

来源

2015-06-10 08:36:33

Python 3 UnicodeDecodeError：'charmap'编解码器无法解码字节0x9d

回答

相关问题