2016-06-27 37 views
0

我想刮一个表像this [R刮表(点击搜索,然后你会得到合作伙伴的一个表)。我想要刮掉伙伴的名字。问题是我不知道什么样的桌子,也不知道怎么刮。我正在使用RSelenium包。如果可以使用rvest来完成,那么它会很有帮助。使用RSelenium

那么这是一张什么样的桌子,是否可以用RSeleniumrvest刮掉它,如果是这样,怎么办?

ul="http://partnerlocator.symantec.com" 
remDr$navigate(ul) 
webElem<-remDr$findElement(using = "class", value = "button") 
webElem$clickElement() 
Sys.sleep(10) 
webElem<-remDr$findElement(using = "class", value = "results") 
unlist(webElem$getElementText()) 

但我得到这样一个非常复杂的文本输出 -

CDW\nCDW\n200 North Milwaukee Avenue\nVernon Hills ,Illinois ,60061\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nCore Security - Platinum\nThreat Protection - Platinum\nCyber Security Services - Platinum\nInformation Protection - Platinum\nDLT Solutions\nDLT Solutions\n2411 Dulles Corner Park Suite 800\nHerndon ,Virginia ,20171\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nInformation Protection - Platinum\nThreat Protection - Platinum\nCore Security - Platinum\nCyber Security Services - Platinum\nInsight Direct USA\nInsight Direct USA\n3480 Lotus Drive\nPlano ,Texas ,75075\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nCyber Security Services - Platinum\nCore Security - Platinum\nThreat Prot......... 
+0

这里提到类似的问题:http://stackoverflow.com/questions/29953394/how-to-find-a-subset-of-cells-in-an-html-table-using-r-or-jquery – Mohammad

回答

0

这看起来像一个非常基本的HTML表合并为一条线路可扩展为这样:

library(RSelenium) 

checkForServer() 
ul="http://partnerlocator.symantec.com" 
startServer() 
remDr <- remoteDriver() 
remDr$open() 
remDr$navigate(ul) 
webElem<-remDr$findElement(using = "class", value = "button") 
webElem$clickElement() 
Sys.sleep(10) 
webElem<-remDr$findElement(using = "class", value = "results") 
results <- webElem$getElementText() 
results_chr <- unlist(strsplit(results[[1]], "\n")) 

head(results_chr) 
[1] "CDW"       "CDW"       "200 North Milwaukee Avenue" 
[4] "Vernon Hills ,Illinois ,60061" "United States"     "Distance: 0 mi" 

您可以使用rvest从该结果页面的HTML表格返回一个更清晰的结果,但我无法这样做。