循环函数中的循环错误（用于从Twitter中提取数据）

我创建了一个循环函数，它使用具有特定时间间隔的搜索API（可以说每隔5分钟）提取推文。此函数执行它假设的操作：连接到Twitter，提取包含特定关键字的推文，并将它们保存在csv文件中。但是偶尔（每天2-3次）循环停止，因为这两个错误之一：循环函数中的循环错误（用于从Twitter中提取数据）

错误htmlTreeParse（URL，useInternal = TRUE）：错误创建解析器http://search.twitter.com/search.atom?q= 6.95322e -310tst & RPP = 100 &页= 10
错误UseMethod（ “xmlNamespaceDefinitions”）：施加到类 “NULL” 的对象关于 'xmlNamespaceDefinitions' 不适用方法

我希望你能帮助我处理这些错误，回答我的一些问题：

是什么原因导致出现这些错误？
如何调整我的代码以避免这些错误？
我该如何强制循环在遇到错误时继续运行（例如使用Try函数）？

我的功能（基于网上找到几个脚本）如下：

library(XML) # htmlTreeParse 

    twitter.search <- "Keyword" 

    QUERY <- URLencode(twitter.search) 

    # Set time loop (in seconds) 
    d_time = 300 
    number_of_times = 3000 

    for(i in 1:number_of_times){ 

    tweets <- NULL 
    tweet.count <- 0 
    page <- 1 
    read.more <- TRUE 

    while (read.more) 
    { 
    # construct Twitter search URL 
    URL <- paste('http://search.twitter.com/search.atom?q=',QUERY,'&rpp=100&page=', page, sep='') 
    # fetch remote URL and parse 
    XML <- htmlTreeParse(URL, useInternal=TRUE, error = function(...){}) 

    # Extract list of "entry" nodes 
    entry  <- getNodeSet(XML, "//entry") 

    read.more <- (length(entry) > 0) 
    if (read.more) 
    { 
    for (i in 1:length(entry)) 
    { 
    subdoc  <- xmlDoc(entry[[i]]) # put entry in separate object to manipulate 

    published <- unlist(xpathApply(subdoc, "//published", xmlValue)) 

    published <- gsub("Z"," ", gsub("T"," ",published)) 

    # Convert from GMT to central time 
    time.gmt <- as.POSIXct(published,"GMT") 
    local.time <- format(time.gmt, tz="Europe/Amsterdam") 

    title <- unlist(xpathApply(subdoc, "//title", xmlValue)) 

    author <- unlist(xpathApply(subdoc, "//author/name", xmlValue)) 

    tweet <- paste(local.time, " @", author, ": ", title, sep="") 

    entry.frame <- data.frame(tweet, author, local.time, stringsAsFactors=FALSE) 
    tweet.count <- tweet.count + 1 
    rownames(entry.frame) <- tweet.count 
    tweets <- rbind(tweets, entry.frame) 
    } 
    page <- page + 1 
    read.more <- (page <= 15) # Seems to be 15 page limit 
    } 
    } 

    names(tweets) 

    # top 15 tweeters 
    #sort(table(tweets$author),decreasing=TRUE)[1:15] 

    write.table(tweets, file=paste("Twitts - ", format(Sys.time(), "%a %b %d %H_%M_%S %Y"), ".csv"), sep = ";") 

    Sys.sleep(d_time) 

    } # end if

来源

2012-05-31 Gert

'tryCatch'和'try'是什么你在追求。 –

关于如何实施tryCatch或尝试的任何建议？ – Gert

TryCatch +有用链接的示例： http://stackoverflow.com/a/2622930/473899 现在运行您的代码，但目前为止还没有收到任何错误，所以我们将看看我是否有任何有用的添加。 – Esteis

我的猜测是，您的问题对应的twitter（或者连接到网络）是向下或慢或什么，所以得到一个不好的结果。你有没有试过设置

options(error = recover)

然后下一次你得到一个错误，一个很好的浏览器环境会起来让你有一个捅。

来源

2012-05-31 13:49:49 Fhnuzoag

这是我的解决方案，使用try来解决Twitter API的类似问题。

我在Twitter用户的长长名单中询问Twitter API的每个人的追随者人数。当用户保护他们的帐户时，我会收到一个错误，并且在我放入try函数之前，循环会中断。使用try允许循环继续工作，跳到列表中的下一个人。

这里的设置

# load library 
library(twitteR) 
# 
# Search Twitter for your term 
s <- searchTwitter('#rstats', n=1500) 
# convert search results to a data frame 
df <- do.call("rbind", lapply(s, as.data.frame)) 
# extract the usernames 
users <- unique(df$screenName) 
users <- sapply(users, as.character) 
# make a data frame for the loop to work with 
users.df <- data.frame(users = users, 
         followers = "", stringsAsFactors = FALSE)

这里与try循环处理错误而填充用户$来自Twitter的API获得追随者计数的追随者

for (i in 1:nrow(users.df)) 
    { 
    # tell the loop to skip a user if their account is protected 
    # or some other error occurs 
    result <- try(getUser(users.df$users[i])$followersCount, silent = TRUE); 
    if(class(result) == "try-error") next; 
    # get the number of followers for each user 
    users.df$followers[i] <- getUser(users.df$users[i])$followersCount 
    # tell the loop to pause for 60 s between iterations to 
    # avoid exceeding the Twitter API request limit 
    print('Sleeping for 60 seconds...') 
    Sys.sleep(60); 
    } 
# 
# Now inspect users.df to see the follower data

来源

2012-05-31 17:48:25 Ben

循环函数中的循环错误（用于从Twitter中提取数据）

回答

相关问题