2017-09-03 19 views
-1

我试图从多个页面(233)的Securities Class Action Filings网站上刮取表格。我的代码如下:rvest包的错误

install.packages("rvest") 
install.packages("magrittr") 
install.packages("xml2") 

library(xml2) 
library(rvest) 
library(magrittr) 
library(data.table) 


i <- 1:233 
urls <- paste0("http://securities.stanford.edu/filings.html?page=", i) 

get_table <- function(url) { 
    url %>% 
    read_html() %>% 
    html_nodes(xpath = '//*[@id="records"]/table') %>% 
    html_table() 
} 

results <- sapply(urls, get_table) 

的代码产生以下错误:

Error in xpath_element() :
could not find function "xpath_element"

任何想法?

我试着重新启动R,重新启动计算机并更新所有软件包。

回答

0

重新安装的R - 不是通过蟒蛇这个时候 - 现在的代码工作。对不起浪费你们的时间。

0

我认为这段代码会让你接近你所需要的。

suppressPackageStartupMessages(library(tidyverse)) 
suppressPackageStartupMessages(library(rvest)) 


# iterate over the first 10 pages 
iter_page <- 1:10 
pb <- progress_estimated(length(iter_page)) 

# define function to scrape the table data from a page 
get_table <- function(i) { 
    base_url <- "http://securities.stanford.edu/filings.html?page=" 
    url <- paste0(base_url, i) 
    url %>% 
    read_html() %>% 
    html_nodes(xpath = '//*[@id="records"]/table') %>% 
    html_table() %>% 
    .[[1]] %>% 
    as_tibble() 
} 

# scrape first 10 pages 
map_df(iter_page, ~ { 
    pb$tick()$print() 
    df <- get_table(.x) 
    Sys.sleep(sample(10, 1) * 0.1) 
    df 
}) 
#> # A tibble: 200 x 5 
#>              `Filing Name` 
#>                <chr> 
#> 1         Dr. Reddy's Laboratories Ltd. 
#> 2            PetMed Express, Inc. 
#> 3             Top Ships Inc. 
#> 4              Sevcon, Inc. 
#> 5              XCerra Corp. 
#> 6            Zillow Group, Inc. 
#> 7             ShoreTel, Inc. 
#> 8 Teva Pharmaceutical Industries Ltd. : American Depository Shares 
#> 9             Depomed, Inc. 
#> 10          Blue Apron Holdings, Inc. 
#> # ... with 190 more rows, and 4 more variables: `Filing Date` <chr>, 
#> # `District Court` <chr>, Exchange <chr>, Ticker <chr> 
+0

谢谢。我运行了代码,但仍然收到相同的错误消息。运行后 'map_df(。)'R开始刮擦('| ====== |剩余10%〜1米),但不久后结束并显示错误消息“Error in xpath_element(): 找不到函数”xpath_element“ –