R：rvest：刮动态电子商务页面

我在R中使用rvest做一些刮擦。我知道一些HTML和CSS。R：rvest：刮动态电子商务页面

我希望得到一个URI的每一个产品的价格：因为你走在页面上（如你做一些滚动）

http://www.linio.com.co/tecnologia/celulares-telefonia-gps/

新项目加载。

什么我迄今所做的：

Linio_Celulares <- html("http://www.linio.com.co/celulares-telefonia-gps/") 

Linio_Celulares %>% 
    html_nodes(".product-itm-price-new") %>% 
    html_text()

而且我得到了我所需要的，只是对于25个第一要素（这些负载默认）。

[1] "$ 1.999.900" "$ 1.999.900" "$ 1.999.900" "$ 2.299.900" "$ 2.279.900" 
[6] "$ 2.279.900" "$ 1.159.900" "$ 1.749.900" "$ 1.879.900" "$ 189.900" 
[11] "$ 2.299.900" "$ 2.499.900" "$ 2.499.900" "$ 2.799.000" "$ 529.900" 
[16] "$ 2.699.900" "$ 2.149.900" "$ 189.900" "$ 2.549.900" "$ 1.395.900" 
[21] "$ 249.900" "$ 41.900" "$ 319.900" "$ 149.900"

问题：如何获得这个动态部分的所有元素？

我想，我可以滚动页面，直到所有元素加载，然后使用HTML（URL）。但是，这似乎很多工作（我打算在不同的部分做这个）。应该有一个程序化的工作。

欢迎任何提示！

来源

2015-04-25 Omar Gonzales

您需要使用XPath（以R或R之外） - 看看了'XML'包。 –

Rvest无法完成？我已经看到Rvest导入XML。我读过关于XML的一些东西。但我是我的例子中的URL，我没有看到来自XML的元标记。你能帮我吗？ –

在这里，我想也许这会帮助你在'rvest'中做到这一点：http://stackoverflow.com/questions/27812259/following-next-link-with-relative-paths-using-rvest –

正如@nrussell建议的那样，您可以使用RSelenium在获取源代码之前以编程方式向下滚动页面。

例如，你可以这样做：

library(RSelenium) 
library(rvest) 
#start RSelenium 
checkForServer() 
startServer() 
remDr <- remoteDriver() 
remDr$open() 

#navigate to your page 
remDr$navigate("http://www.linio.com.co/tecnologia/celulares-telefonia-gps/") 

#scroll down 5 times, waiting for the page to load at each time 
for(i in 1:5){  
remDr$executeScript(paste("scroll(0,",i*10000,");")) 
Sys.sleep(3)  
} 

#get the page html 
page_source<-remDr$getPageSource() 

#parse it 
html(page_source[[1]]) %>% html_nodes(".product-itm-price-new") %>% 
    html_text()

来源

2015-04-30 10:24:09 NicE

非常好。它像一个魅力。谢谢！ –

我一直在学习一些Javascript，但我没有得到你使用的for循环。你能指点我一个文件吗？ –

这是一个R'for'循环，而不是一个javascript，一些信息[here]（http://paleocave.sciencesortof.com/2013/03/writing-a-for-loop-in-r/） – NicE

R：rvest：刮动态电子商务页面

回答

相关问题