在R中提取字符串的一部分

-1

我需要从Twitter输出中提取一部分字符串。我做的提取物是与此代码：在R中提取字符串的一部分

some_tweets = searchTwitter('weather', n=4, lang='en') 
st <- twListToDF(some_tweets) 
st[,"statusSource"]

和输出是一样的东西：

[1] "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>" 
[2] "<a href=\"http://www.facebook.com/twitter\" rel=\"nofollow\">Facebook</a>"    
[3] "<a href=\"http://instagram.com\" rel=\"nofollow\">Instagram</a>"       
[4] "<a href=\"http://www.hootsuite.com\" rel=\"nofollow\">Hootsuite</a>"

我想提取就像是最后一节：

Twitter for iPhone 
Facebook 
Instagram 
Hootsuite

我想要做的是计算每种连接类型的条目数量。

关于如何提取字符串的任何想法我需要数它们？

来源

2017-03-06 Selrac

使用'GSUB（ “<[^>] +>”， “”，ST [ “statusSource”]）' –

此外，也许更接近：http://stackoverflow.com/q/26809847/1000343 –

我检查了几个解决方案，但我无法弄清楚。谢谢Wiktor，这对我很有用 – Selrac

下面是使用rvest包的一种方法。

x <- c("<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", 
     "<a href=\"http://www.facebook.com/twitter\" rel=\"nofollow\">Facebook</a>", 
     "<a href=\"http://instagram.com\" rel=\"nofollow\">Instagram</a>", 
     "<a href=\"http://www.hootsuite.com\" rel=\"nofollow\">Hootsuite</a>") 


library(rvest) 

unname(sapply(x, FUN = function(m) html_text(html_nodes(read_html(m), "a")))) 
[1] "Twitter for iPhone" "Facebook"   "Instagram"   "Hootsuite"

来源

2017-03-06 19:33:01

在R中提取字符串的一部分

回答

相关问题