编辑:对不起,低质量的职位。我应该花更多的时间向你介绍这件事。该文章已被编辑,我已经为整个事情添加了一个工作语法示例。感谢迄今为止提供建议的所有人。使我的R功能更快
EDIT2:发现脚本在其他计算机上只有缓慢。可能是由一些本地问题或REPL引起的。
我做了这个功能。它生成带标签(标记为&天堂包)数据框中的值的频率表。它可以工作,但我打算在具有许多列的数据框架上使用它,并且我认为它运行得有点慢;用户可能会认为R运行超过100列时发生崩溃,所以我想加快速度。
该脚本的目的是产生输出,帮助我查找调查数据集中的处理错误。这有点费劲,因为我想知道答案的频率,并同时评估价值标签的形状。因此,此脚本为每个变量生成一个频率表,显示频率,未使用的标签以及没有值标签的值。在查看脚本的输出时,希望这会更清晰。
我将不胜感激,如果你能指出一些方法,使这个更高效:
# demonstration dataset
library(knitr)
library(data.table)
library(labelled)
df <- data.frame(q1 = rep(1:6, 3), q2 = rep(6:1, 3))
val_labels(df[, c("q1", "q2")]) <- c(YES = 1, MAYBE = 2, NO = 3, DK = 4, MISSING=5)
val_label(df$q2, 1) <- NULL
# Produce a frequency table over values and labels in a labelled-class dataframe object
# --------------------------------------------------------------------------------------------------
# Example: freqlab(ds[[1]]) or freqlab(ds[1:10]) or freqlab(ds)
# Wrong: freqlab(ds[1])
freqlab <- function(x){
# If the function is called on double brackets, eg. freqlab(ds[[11]])
if (!is.list(x)){
# Make a frequency distribution, put it in a data.table
xFreq <- data.table(table(x))
names(xFreq) <- c("Value", "Frequency")
class(xFreq[[1]]) <- "numeric"
setkey(xFreq, Value)
# Put the value labels in another data.table
if (!is.null(val_labels(x))){
xLab <- data.table(val_labels(x), names(val_labels(x)))
names(xLab) <- c("Value", "Label")
setkey(xLab, Value)
} else {
# If the variable does not have labels, create one to avoid errors
xLab <- data.table(xFreq[[1,1]], "** UNLABELLED **")
names(xLab) <- c("Value", "Label")
setkey(xLab, Value)
}
# Perform a FULL OUTER JOIN
outTable <- merge(xFreq, xLab, all = TRUE)
# Arrange values in ascending order of absolute value
outTable <- arrange(outTable, abs(outTable[[1]]))
# Edit the Label column for value cases with no label
outTable[[2]][is.na(outTable[[2]])] <- 0
outTable[[3]][is.na(outTable[[3]])] <- "** UNLABELLED **"
# If the output has more than 25 rows, cut it short
if (dim(outTable)[1] > 25){
outTable <- outTable[1:25]
}
# Output the table
print(kable(outTable, format = "rst", align = "l"))
# If the function is called on a list of variables, eg. freqlab(ds[10:11]),
# do the same steps as above, looping through all the input variables
} else {
for (y in 1:length(x)){
xFreq <- data.table(table(x[[y]]))
names(xFreq) <- c("Value", "Frequency")
class(xFreq[[1]]) <- "numeric"
setkey(xFreq, Value)
if (!is.null(val_labels(x[[y]]))){
xLab <- data.table(val_labels(x[[y]]), names(val_labels(x[[y]])))
names(xLab) <- c("Value", "Label")
setkey(xLab, Value)
} else {
xLab <- data.table(xFreq[[1,1]], "** UNLABELLED **")
names(xLab) <- c("Value", "Label")
setkey(xLab, Value)
}
outTable <- merge(xFreq, xLab, all = TRUE)
outTable <- arrange(outTable, abs(outTable[[1]]))
outTable[[2]][is.na(outTable[[2]])] <- 0
outTable[[3]][is.na(outTable[[3]])] <- "** UNLABELLED **"
if (dim(outTable)[1] > 25){
outTable <- outTable[1:25]
}
# Extra information printed when function is called on a list of variables
cat("Name:\t", names(x[y]),"\n")
print(kable(outTable, format = "rst", align = "l"))
cat(rep("-", 80), sep='', "\n\n")
}
}
}
输出的例子:
> freqlab(df)
Name: q1
===== ========= ================
Value Frequency Label
===== ========= ================
1 3 YES
2 3 MAYBE
3 3 NO
4 3 DK
5 3 MISSING
6 3 ** UNLABELLED **
===== ========= ================
--------------------------------------------------------------------------------
Name: q2
===== ========= ================
Value Frequency Label
===== ========= ================
1 3 ** UNLABELLED **
2 3 MAYBE
3 3 NO
4 3 DK
5 3 MISSING
6 3 ** UNLABELLED **
===== ========= ================
--------------------------------------------------------------------------------
您可以尝试对其进行分析(使用RStudio)以查看您必须改进的部分。 –
关于大'for'循环,是否可以使用'apply'函数呢? –
你的功能是做什么的?如果你想把一个频率表写入一个数据帧,为什么不只是'as.data.frame(table(x))'? –