将原始数据框恢复到R

我有一个数据框，每年向不同国家的不同国家出口公司。我的问题是我需要创建一个变量，说每年在每个国家有多少公司。我可以用“tapply”命令完美地完成这项工作，如将原始数据框恢复到R

incumbents <- tapply(id, destination-year, function(x) length(unique(x)))

它工作得很好。我的问题是，在位者的长度为length(destination-year)，我需要它的长度为length(id)-每个目的地每年都有很多公司 - 在随后的回归中使用它（当然，以一种匹配年份和目的地的方式）。一个“for”循环可以做到这一点，但是这是非常耗时的，因为数据库非常庞大。

有什么建议吗？

来源

2012-02-13 Francisco Roldán

对不起，没有示例数据...新手错误 – 2012-02-13 23:45:54

你不提供重复的例子，所以我不能对此进行测试，但你应该能够使用ave：

incumbents <- ave(id, destination-year, FUN=function(x) length(unique(x)))

来源

2012-02-13 21:11:05

工程很好。谢谢！！ – 2012-02-13 23:37:35

只需将tapply摘要与原始数据帧“合并”即merge即可。

由于您没有提供示例数据，我做了一些。相应地修改。

n   = 1000 
id   = sample(1:10, n, replace=T) 
year  = sample(2000:2011, n, replace=T) 
destination = sample(LETTERS[1:6], n, replace=T) 

`destination-year` = paste(destination, year, sep='-') 

dat = data.frame(id, year, destination, `destination-year`)

现在列出您的摘要。请注意我如何重新格式化为数据框，并使名称与原始数据匹配。

incumbents = tapply(id, `destination-year`, function(x) length(unique(x))) 
incumbents = data.frame(`destination-year`=names(incumbents), incumbents)

最后，合并早在与原始数据：

merge(dat, incumbents)

顺便说一句，而不是结合destination和year到第三个变量，像它看起来你已经做了， tapply可以直接处理两个变量作为列表：

incumbents = melt(tapply(id, list(destination=destination, year=year), function(x) length(unique(x))))

来源

2012-02-13 20:30:12

使用@ JohnColby的出色数据。例如，我在想东西沿着这一线路更多：

#I prefer not to deal with the pesky '-' in a variable name 
destinationYear = paste(destination, year, sep='-') 

dat = data.frame(id, year, destination, destinationYear) 

#require(plyr) 
dat <- ddply(dat,.(destinationYear),transform,newCol = length(unique(id))) 

#Or if more speed is required, use data.table 
require(data.table) 
datTable <- data.table(dat) 

datTable <- datTable[,transform(.SD,newCol = length(unique(id))),by = destinationYear]

来源

2012-02-13 21:06:00 joran

将原始数据框恢复到R

回答

相关问题