RSelenium和findElements与检查元素使用

我希望得到一些帮助，试图从数据框中的一行字符串中获取以下网站的每一节圣经章节。RSelenium和findElements与检查元素使用

我很努力地找到正确的元素/不知道如何将findElements（）与浏览器中的inspect元素结合使用。通常对于其他位也如何做到这一点的指示，例如，交叉引用/脚注将是巨大的......（注意交叉引用可以通过调整“页面选项”通过点击COG不久的页面

下面的顶部看到的是我已经尝试的代码。

chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV" 
library(RSelenium) 
RSelenium:::startServer() 
remDr <- remoteDriver() 
remDr$open() 
remDr$navigate(chapter.url) 
webElem <- remDr$findElements('id','passage-text')

来源

2014-09-10 h.l.m

你忘了加上'remDr $的open（）'。 – 2014-09-10 09:51:30

啊对不起...现在就添加 – 2014-09-10 10:05:46

通常我会针对相关的HTML与Firefox萤火虫或类似的东西检查的页面，我们看到：。

enter image description here

相关的HTML片段是<div class="version-ESV result-text-style-normal text-html "> 因此，我们可以用version-ESV类找到元素：

chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV" 
library(RSelenium) 
RSelenium:::startServer() 
remDr <- remoteDriver() 
remDr$open() 
remDr$navigate(chapter.url) 
webElem <- remDr$findElement('class', 'version-ESV') 
webElem$highlightElement() # check visually we have the right element

的highlightElement方法为我们提供了视觉确认，我们有HTML所需的块。最后，我们可以使用getElementAttribute方法得到这段HTML代码：

appData <- webElem$getElementAttribute("outerHTML")[[1]]

这个HTML然后可以解析使用XML包的诗句。

UPDATE：

包含在span与id与开始的各种经文“EN-ESV-”我们可以针对这个使用'//span[contains(@id,"en-ESV-")]一个XPATH。但是，在这些代码块中，我们只希望子节点是文本节点。一旦我们发现这些文本节点，我们希望它们粘贴用空格分隔条件一起：

appXPATH <- '//span[contains(@id,"en-ESV-")]' 
appFunc <- function(x){ 
    appChildren <- xmlChildren(x) 
    out <- appChildren[names(appChildren) == "text"] 
    paste(sapply(out, xmlValue), collapse = ' ') 
} 
doc <- htmlParse(appData, encoding = 'UTF8') # specify encoding 
results <- xpathSApply(doc, appXPATH, appFunc)

结果如下：

> head(results) 
[1] "Then Joseph fell on his father's face and wept over him and kissed him."                                     
[2] "And Joseph commanded his servants the physicians to embalm his father. So the physicians embalmed Israel."                             
[3] "Forty days were required for it, for that is how many are required for embalming. And the Egyptians wept for him seventy days."                        
[4] "And when the days of weeping for him were past, Joseph spoke to the household of Pharaoh, saying, “If now I have found favor in your eyes, please speak in the ears of Pharaoh, saying,"         
[5] "‘My father made me swear, saying, “I am about to die: in my tomb that I hewed out for myself in the land of Canaan, there shall you bury me.” Now therefore, let me please go up and bury my father. Then I will return.’”" 
[6] "And Pharaoh answered, “Go up, and bury your father, as he made you swear.”"

来源

2014-09-10 09:49:13 jdharrison

谢谢！这是有用的...我不是一个XML的巨大专家......如何将一行从'appData'对象中提取出来？ – 2014-09-10 10:00:47

我已经给出了一个使用适当的XPATH从生成的HTML代码块中提取经文的简单示例。第一节经文没有标明课程，可能是最简单的单独处理。 – jdharrison 2014-09-10 10:15:29

请解释你如何得到'['// sup [@class =“versenum”]/following-sibling :: text（）']'？ – 2014-09-10 10:24:37

RSelenium和findElements与检查元素使用

回答

相关问题