2013-01-11 72 views
1

海岸(NS/WS)= EUR35.99/US $ 46.09解析HTML,pyparser或beautifulsoup

货币对象= EUR42.00/US $ 53.79

<div id="t142_1" class="text" >Data Center</div> 
<div id="t143_1" class="text" >Coast (NS/WS)</div> 
<div id="t144_1" class="text" >EUR35.99/US$46.09</div> 
<div id="t145_1" class="text" >Money Object</div> 
<div id="t146_1" class="text" >EUR42.00/US$53.79</div> 
<div id="t147_1" class="text" >Date</div> 
<div id="t148_1" class="text" >7-Nov-2013/7-Nov-2013</div> 
<div id="t149_1" class="text" >Opinions</div> 

如何从这个码值获取“Money Object”和“Coast(NS/WS)”使用pyparser还是beautifulsoup?

我需要的变量(例如):

coast = 'EUR35.99/US$46.09' 

money_obj = 'EUR42.00/US$53.79' 

编辑:

a = soup.find_all(text='Money Object') 
for i in a: 
    print i.find_next('div').text 

但返回:

Change 

EUR42.00/US$53.79 

我只需要一个值(EUR42.00/US $ 53.79 )

回答

1

其中text就是你们的榜样HTML:

from bs4 import BeautifulSoup as bs 

soup = bs(text) 
print soup.find(text='Money Object').find_next('div').text 
# EUR42.00/US$53.79 

其内容 - 找到Money Object的东西作为其文本内容,然后采取下一步div s的文字...

+0

如果我有几次我们怎么办编辑“钱对象”一词?我在“下一行”中有一个不好的值 – user1966421

+0

@ user1966421然后你使用'find_all'和循环结果...不知道你为什么要得到这个消息 - 它意味着你没有“Money对象”作为根据您的数据样本 –

+0

谢谢!我更新问题 – user1966421

0

使用pyparsing

from pyparsing import * 

data = """\ 
<div id="t142_1" class="text" >Data Center</div> 
<div id="t143_1" class="text" >Coast (NS/WS)</div> 
<div id="t144_1" class="text" >EUR35.99/US$46.09</div> 
<div id="t145_1" class="text" >Money Object</div> 
<div id="t146_1" class="text" >EUR42.00/US$53.79</div> 
<div id="t147_1" class="text" >Date</div> 
<div id="t148_1" class="text" >7-Nov-2013/7-Nov-2013</div> 
<div id="t149_1" class="text" >Opinions</div> 
""" 

divS,divE = makeHTMLTags("div") 

div = divS + SkipTo(divE).setResultsName("body") + divE 
divS.setParseAction(withAttribute(id="t144_1")) 

for tokens,start,end in div.scanString(data): 
    print "cost = " + tokens.body 

divS.setParseAction(withAttribute(id="t146_1")) 
for tokens,start,end in div.scanString(data): 
    print "money_obj = " + tokens.body 

输出:

>>> 
cost = EUR35.99/US$46.09 
money_obj = EUR42.00/US$53.79