2014-06-14 43 views
0

我需要删除围绕对我很重要的数据的[u'前缀和']后缀。这将被放入一个数据库,从我看到它需要这些额外的字符。我怎样才能删除它们?我试过。替换变量,但它返回一个错误。我需要从BeautifulSoup的字符串输出中删除多余的字符

import urllib 
import mechanize 
from bs4 import BeautifulSoup 
import requests 
import re 
import MySQLdb 
import time 

db = MySQLdb.connect(
    host=" ", 
    user=" ", 
    passwd=" ", 
    db=" ") 

inc = 0 

# while inc != 3289: 
c = db.cursor() 
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,)) 
result = c.fetchall() 
result = str(result) 

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 
br = mechanize.Browser() 
br.set_handle_robots(False) 
br.addHeaders = [('User-agent',user_agent)] 

term = result.replace('((','').replace(',)','').replace("'",'') 
url = "http://www.marketwatch.com/investing/stock/"+term 
soup = BeautifulSoup(requests.get(url).text) 
search = soup.find('p', attrs = {'class':'data bgLast'}) 
cur = search.findAll(text = True) 
search2 = soup.find('span', attrs = {'class':'bgChange'}) 
diff = search2.findAll(text = True) 
print term 
print cur 
print diff 

c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term)) 
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term)) 
db.commit() 

不用谢谢你@jonrsharpe,我找到了答案。在原始代码中,.findAll正在检索结果集。我所要做的就是将它改为一个允许strip函数传递给它的str。修改后的代码如下。 :

import urllib 
import mechanize 
from bs4 import BeautifulSoup 
import requests 
import re 
import MySQLdb 
import time 

db = MySQLdb.connect(
    host=" ", 
    user=" ", 
    passwd=" ", 
    db=" ") 

inc = 0 

# while inc != 3289: 
c = db.cursor() 
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,)) 
result = c.fetchall() 
result = str(result) 

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 
br = mechanize.Browser() 
br.set_handle_robots(False) 
br.addHeaders = [('User-agent',user_agent)] 

term = result.replace('((','').replace(',)','').replace("'",'') 
url = "http://www.marketwatch.com/investing/stock/"+term 
soup = BeautifulSoup(requests.get(url).text) 
search = soup.find('p', attrs = {'class':'data bgLast'}) 
cur = str(search.findAll(text = True)) 
search2 = soup.find('span', attrs = {'class':'bgChange'}) 
diff = str(search2.findAll(text = True)) 
cur = cur.strip("'[]u") 
diff = diff.strip("'[]u") 
print term 
print cur 
print diff 

c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term)) 
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term)) 
db.commit() 
+2

你明白你看到的是一个包含Unicode字符串的单元素列表,对吧? – jonrsharpe

+0

是的,但是我怎样才能让变量只包含没有u和括号的文本? – user3643141

+0

或者至少只显示文字... – user3643141

回答

0
result = str(result) 
... 
cur = str(search.findAll(text = True)) 

停止这样做!除字符串外还有数据类型!

result是列表的列表; search.findAll给你一个文本节点的列表。例如,您可以通过说result[0][0]来获得第一行的symbol值;您只需说出search.getText()即可获取元素的文本。

将像列表这样的结构化对象序列化为一个扁平的字符串,然后试图从中挑选出来并不是一个明智的方法。

相关问题