2012-12-06 32 views
1
<p> 
    Glassware veteran 
    <strong>Corning </strong> 
    (
    <span class="ticker"> 
     NYSE: 
     <a class="qsAdd qs-source-isssitthv0000001" href="http://caps.fool.com/Ticker/GLW.aspx?source=isssitthv0000001" data-id="203758">GLW</a> 
    </span> 
    <a class="addToWatchListIcon qsAdd qs-source-iwlsitbut0000010" href="http://my.fool.com/watchlist/add?ticker=&source=iwlsitbut0000010" title="Add to My Watchlist"> </a> 
    ) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback? 
</p> 

我想得到“玻璃器皿老手”和“最近陷入了困境,现在是放弃股票的时候了,还是康宁会有香蕉和卷土重来?如何使用lxml从html解析文本?

使用代码

tnode = root.xpath("/p") 
content = tnode.text 

我只能得到 “玻璃器皿老将”,为什么呢?

回答

0

像这样的东西可能会得到你想要的东西:

>>> tnode = root.xpath('/p') 
>>> content = tnode.xpath('text()') 
>>> print ''.join(content) 

Glassware veteran 

(


) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback? 
>>> 

如果你想文本节点的所有,只需使用//text()代替text()

>>> print ' '.join([x.strip() for x in ele.xpath('//text()')]) 
Glassware veteran Corning (NYSE: GLW ) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback? 
+0

非常感谢你。但是现在我遇到了一个新问题,我希望得到“玻璃器皿老兵康宁(纽约证券交易所代码:GLW)最近陷入了困境,现在是放弃股票的时候了,还是康宁会有香蕉和卷土重来?使用代码:tnode = root.xpath('/ p |/p/strong |/p/a |/p/span')content = tnode.xpath('text()')print''.join(content)结果是:“Glassware老将()最近陷入了困境,是放弃股票的时候了,还是康宁会有香蕉和卷土重来呢?”康宁纽约证券交易所股票代码: GLW“你有什么想法吗?谢谢。 – yinyao

+0

我已经更新了我的答案。 – larsks