2014-09-10 30 views
2

我希望得到一些帮助,试图从数据框中的一行字符串中获取以下网站的每一节圣经章节。RSelenium和findElements与检查元素使用

我很努力地找到正确的元素/不知道如何将findElements()与浏览器中的inspect元素结合使用。通常对于其他位也如何做到这一点的指示,例如,交叉引用/脚注将是巨大的......(注意交叉引用可以通过调整“页面选项”通过点击COG不久的页面

下面的顶部看到的是我已经尝试的代码。

chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV" 
library(RSelenium) 
RSelenium:::startServer() 
remDr <- remoteDriver() 
remDr$open() 
remDr$navigate(chapter.url) 
webElem <- remDr$findElements('id','passage-text') 
+0

你忘了加上'remDr $的open()'。 – 2014-09-10 09:51:30

+0

啊对不起...现在就添加 – 2014-09-10 10:05:46

回答

4

通常我会针对相关的HTML与Firefox萤火虫或类似的东西检查的页面,我们看到:。

enter image description here

相关的HTML片段是<div class="version-ESV result-text-style-normal text-html "> 因此,我们可以用version-ESV类找到元素:

chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV" 
library(RSelenium) 
RSelenium:::startServer() 
remDr <- remoteDriver() 
remDr$open() 
remDr$navigate(chapter.url) 
webElem <- remDr$findElement('class', 'version-ESV') 
webElem$highlightElement() # check visually we have the right element 

highlightElement方法为我们提供了视觉确认,我们有HTML所需的块。最后,我们可以使用getElementAttribute方法得到这段HTML代码:

appData <- webElem$getElementAttribute("outerHTML")[[1]] 

这个HTML然后可以解析使用XML包的诗句。

UPDATE:

包含在spanid与开始的各种经文“EN-ESV-”我们可以针对这个使用'//span[contains(@id,"en-ESV-")]一个XPATH。但是,在这些代码块中,我们只希望子节点是文本节点。一旦我们发现这些文本节点,我们希望它们粘贴用空格分隔条件一起:

appXPATH <- '//span[contains(@id,"en-ESV-")]' 
appFunc <- function(x){ 
    appChildren <- xmlChildren(x) 
    out <- appChildren[names(appChildren) == "text"] 
    paste(sapply(out, xmlValue), collapse = ' ') 
} 
doc <- htmlParse(appData, encoding = 'UTF8') # specify encoding 
results <- xpathSApply(doc, appXPATH, appFunc) 

结果如下:

> head(results) 
[1] "Then Joseph fell on his father's face and wept over him and kissed him."                                     
[2] "And Joseph commanded his servants the physicians to embalm his father. So the physicians embalmed Israel."                             
[3] "Forty days were required for it, for that is how many are required for embalming. And the Egyptians wept for him seventy days."                        
[4] "And when the days of weeping for him were past, Joseph spoke to the household of Pharaoh, saying, “If now I have found favor in your eyes, please speak in the ears of Pharaoh, saying,"         
[5] "‘My father made me swear, saying, “I am about to die: in my tomb that I hewed out for myself in the land of Canaan, there shall you bury me.” Now therefore, let me please go up and bury my father. Then I will return.’”" 
[6] "And Pharaoh answered, “Go up, and bury your father, as he made you swear.”"                      
+0

谢谢!这是有用的...我不是一个XML的巨大专家......如何将一行从'appData'对象中提取出来? – 2014-09-10 10:00:47

+0

我已经给出了一个使用适当的XPATH从生成的HTML代码块中提取经文的简单示例。第一节经文没有标明课程,可能是最简单的单独处理。 – jdharrison 2014-09-10 10:15:29

+0

请解释你如何得到'['// sup [@class =“versenum”]/following-sibling :: text()']'? – 2014-09-10 10:24:37