我想使用beautifulsoup来解析html,但是每当我用内联脚本标记打开页面时,beautifulsoup都会对内容进行编码,但最终不会解码它。如何使beautifulsoup编码和解码脚本标记的内容
这是我使用的代码:
from bs4 import BeautifulSoup
if __name__ == '__main__':
htmlData = '<html> <head> <script type="text/javascript"> console.log("< < not able to write these & also these >> "); </script> </head> <body> <div> start of div </div> </body> </html>'
soup = BeautifulSoup(htmlData)
#... using BeautifulSoup ...
print(soup.prettify())
我想这样的输出:
<html>
<head>
<script type="text/javascript">
console.log("< < not able to write these & also these >> ");
</script>
</head>
<body>
<div>
start of div
</div>
</body>
</html>
但我得到这样的输出:
<html>
<head>
<script type="text/javascript">
console.log("< < not able to write these & also these >> ");
</script>
</head>
<body>
<div>
start of div
</div>
</body>
</html>
有一个[提交的bug(https://bugs.launchpad.net/beautifulsoup/+bug/950459)为这在美丽的汤3.看起来像美丽的汤4错误依然存在。你可能想[文件](https://bugs.launchpad.net/beautifulsoup/)一个错误报告。 –