访问结构字段（XML封装）

我得到使用HTMLTreeParser这个结构，我需要有包含在页面访问结构字段（XML封装）

doc <- htmlTreeParse(url, useInternalNodes = FALSE) 
doc 
$file 
[1] "http://www.google.com/trends/fetchComponent?q=asdf,qwerty&cid=TIMESERIES_GRAPH_0&export=3" 

$version 
[1] "" 

$children 
$children$html 
<html> 
<body> 
<p>// Data table response google.visualization.Query.setResponse([INSERT LOT OF JSON HERE])</p> 
</body> 
</html> 
attr(,"class") 
[1] "XMLDocumentContent"

我在寻找的“P”挡上有什么文字。我今天没有找到任何可以帮助我的东西。
那么，我怎样才能获得这些数据？

来源

2014-03-06 aaaaaaa

你读过'？htmlTreeParse'的帮助文件中的文本？ –

是的，有几次，但我的问题并不在htmlTreeParse函数中，它更多的是如何操作它返回的数据。 – aaaaaaa

对不起更早更清楚。底部有一个金矿。我很抱歉，我不能给你任何具体的'xpath'指针，但我认为这些例子是一个好的开始。 –

如果要在文档上运行XPath，则需要设置useInternalNodes = TRUE（请参阅此参数的文档）。下面的代码应该让你开始使用XPath。

[注：当我运行代码我得到一个错误页面，没有这个文件，你得到的。]

library(XML) 
url <- "http://www.google.com/trends/fetchComponent?q=asdf,qwerty&cid=TIMESERIES_GRAPH_0&export=3" 
doc <- htmlTreeParse(url, useInternalNodes = T) 
# XPath examples 
p  <- doc["//p"]  # nodelist of all the <p> elements (there aren't any...) 
div  <- doc["//div"]  # nodelist of all the <div> elememts 
scripts <- doc["//script"] # nodelist of all the <script> elements 
b.script <- doc["//body/script"] # nodelist of all <script> elements within the <body> 

# title of the page 
xmlValue(doc["//head/title"][[1]]) 
# [1] "Google Trends - An error has been detected"

基本上，你可以使用XPath字符串，如果它是一个索引文件。所以你的情况，

xmlValue(doc["//p"][[1]])

应该返回包含在（第一）<p>元素doc

来源

2014-03-07 06:18:04 jlhoward

感谢您的帮助，它工作正常。对于错误页面，可能是您连续尝试多次访问该URL并且Google会阻止您（但如果您手动将其粘贴到任何浏览器中，您将获得该数据）。 – aaaaaaa

不客气。由于您是新手，请参阅：[当某人回答我的问题时该怎么办？]（http://stackoverflow.com/help/someone-answers）。 – jlhoward

访问结构字段（XML封装）

回答

相关问题