Python的BS4打印写入错误

我试图写一个代码来使用Python3网站抓住一些数据，你可以从代码中看到：Python的BS4打印写入错误

from bs4 import BeautifulSoup 
import urllib.request 
import sys 
headers={} 
headers['User-Agent']="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36" 
req=urllib.request.Request('http://www.cjcyw.com/a/chuanbodangan/2015/0930/47853.html',headers=headers) 
resp=urllib.request.urlopen(req) 
xml=BeautifulSoup(resp,'html.parser') 
x=xml.findAll('dd') 
for item in x: 
    item=item.text.encode('utf-8') 
    print(sys.stdout.buffer.write(item))

的结果是这样的：

result1

当我把这些数据写入到一个txt文件：

我使用STR调试，真正的问题是蹦出：

buggggggg

来源

2015-10-21 dongjian xiao

在4.txt文件中显示数字，但不是我想要的结果。 –

你为什么使用'sys.stdout.buffer.write'？尝试'f.write（item）'。 –

我不认为这里需要'.encode（）'。 –

您可以在这里使用.strings。 strings

from bs4 import BeautifulSoup 
import urllib.request 
import sys 
headers={} 
headers['User-Agent']="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"   req=urllib.request.Request('http://www.cjcyw.com/a/chuanbodangan/2015/0930/47853.html',headers=headers) 
resp=urllib.request.urlopen(req) 
xml=BeautifulSoup(resp,'html.parser') 
x=xml.findAll('dd') 

file = open("4.txt", 'a') 
for item in x: 
    s = "" 
    for string in item.strings: 
     s += string 
    s += "\n" 
    file.write(s) 
file.close()

所有代码都被粘贴。

来源

2015-10-21 08:43:07 uoryon

不工作，但thx，我想也许先运行代码，然后可能更有帮助 –

我已经运行了代码，我会在这里粘贴我的整个代码。运行它并获得正确的文本文件 – uoryon

@dongjianxiao我在我的Mac上运行这段代码。 – uoryon

首先，正如我所说的，在这里不要使用sys.stdout.buffer.write，只需使用f.write(str(item))来代替。

然后，因为Microsoft Windows中文版的默认文件编码是GBK。文本的编码看起来像是UTF-8。因此，你需要打开该文件在UTF-8编码像这样：

open('4.txt', 'a', encoding="utf-8")

并尝试运行代码。

来源

2015-10-21 09:22:41

Python的BS4打印写入错误

回答

相关问题