BeautifulSoup，保存文本文件中的刮痕结果

我试图用BeautifulSoup从表中抓取数据并将其保存到文件中。我写了这样的：BeautifulSoup，保存文本文件中的刮痕结果

import urllib2 
from bs4 import BeautifulSoup 

url = "http://dofollow.netsons.org/table1.htm" 

page = urllib2.urlopen(url).read() 
soup = BeautifulSoup(page) 

for tr in soup.find_all('tr')[2:]: 
    tds = tr.find_all('td') 
    print "%s, %s, %s" % (tds[0].text, tds[1].text, tds[2].text)

哪个有效。

然后我试图将结果写入文件，但它不起作用。 :(

logfile = open("log.txt", 'a')    
logfile.write("%s,%s,%s\n" % (tds[0].text, tds[1].text, tds[2].text)) 
logfile.close()

如何保存我的成绩在测试文件？

来源

2013-09-23 kingcope

它不起作用？你期望看到什么？那里是'log.txt'，但是空的？你收到错误信息了吗？如果是这样，请发布完整的回溯。 –

是的，文件是空的！ – kingcope

我认为你有一个'UnicodeEncodeError'错误，为什么你没有在你的queston中包含这个错误？ –

BeautifulSoup给你Unicode数据，你需要它写入文件前进行编码。

它会更容易如果使用io库，它可以打开与透明编码文件对象：

import io 

with io.open('log.txt', 'a', encoding='utf8') as logfile: 
    for tr in soup.find_all('tr')[2:]: 
     tds = tr.find_all('td') 
     logfile.write(u"%s, %s, %s\n" % (tds[0].text, tds[1].text, tds[2].text))

的with语句采用封闭的护理文件对象为你。

我使用UTF8作为编解码器，但您可以选择任何可以处理您正在抓取的页面中使用的所有代码点。

来源

2013-09-23 20:38:39

BeautifulSoup，保存文本文件中的刮痕结果

回答

相关问题