从网页选择表格数据

我试图用YQL - http://www.empireonline.com/search/default.asp?search=Dragonheart（作为例子）从帝国杂志网站（电影评论（热门匹配）表）得到结果，我使用萤火虫获取xpath，但它似乎并不想返回结果。这是我正在使用的;从网页选择表格数据

select * from html where url='http://www.empireonline.com/search/default.asp?search=cars' and xpath='/html/body/table[3]/tbody/tr[5]/td[2]/table[2]/tbody/tr/td/table[2]/tbody/tr/td/table[2]'

现在，它似乎能够使用;

select * from html where url='http://www.empireonline.com/search/default.asp?search=cars' and xpath='//table'

但是，这一大堆的数据，我不需要认输的。

来源

2011-04-26 Garbit

这是一个常见问题**：浏览器添加强制性HTML元素到DOM **（如'head'和'tbody'），那些'tbody'不目前在源头上。 – 2011-04-26 19:26:36

我得到的最好的是以下 - SELECT * FROM HTML WHERE url =“http://www.empireonline.com/search/default.asp?search=cars”and xpath =“// table [3] // table [2] // table [2] // table [2]“ – Garbit 2011-04-26 19:28:00

那么你的问题解决了吗？听起来像你的评论中的XPath完全符合你的需求，对吧？ – LarsH 2011-04-26 19:49:18

您只需在制定适当的XPath查询时注意。下面通过首先定位“电影评论（热门匹配）”段落，然后导航到电影列表来获得该HTML表格中列出的每个评论的链接和名称。

SELECT href, strong 
FROM html 
WHERE url = 'http://www.empireonline.com/search/default.asp?search=Thor' 
AND xpath = ' 
    //p[.="Film Reviews (Popular Matches)"] 
    /ancestor::table[1] 
    /following-sibling::table[1] 
    //td[2]/a 
'

^{（Try this query in the YQL console）}

来源

2011-05-03 18:33:42 salathe

这是现货，谢谢萨拉斯！ – Garbit 2011-05-03 18:59:37

从网页选择表格数据

回答

相关问题