2016-09-26 43 views
2

我想使用ifelse语句对数据表进行子集化,但是我没有得到我正在查找的结果。在旧数据表上基于ifelse创建新数据表

我最初的数据表看起来像这样:

head(Data_copy, n = 18) 

    Company  Date  DOW variable value Year Month End_of_Month 
1: ASXRI 1991-09-06 Friday  RI NA 1991 Sep   0 
2: ASXRI 1991-09-09 Monday  RI NA 1991 Sep   0 
3: ASXRI 1991-09-10 Tuesday  RI NA 1991 Sep   0 
4: ASXRI 1991-09-11 Wednesday  RI NA 1991 Sep   0 
5: ASXRI 1991-09-12 Thursday  RI NA 1991 Sep   0 
6: ASXRI 1991-09-13 Friday  RI NA 1991 Sep   0 
7: ASXRI 1991-09-16 Monday  RI NA 1991 Sep   0 
8: ASXRI 1991-09-17 Tuesday  RI NA 1991 Sep   0 
9: ASXRI 1991-09-18 Wednesday  RI NA 1991 Sep   0 
10: ASXRI 1991-09-19 Thursday  RI NA 1991 Sep   0 
11: ASXRI 1991-09-20 Friday  RI NA 1991 Sep   0 
12: ASXRI 1991-09-23 Monday  RI NA 1991 Sep   0 
13: ASXRI 1991-09-24 Tuesday  RI NA 1991 Sep   0 
14: ASXRI 1991-09-25 Wednesday  RI NA 1991 Sep   0 
15: ASXRI 1991-09-26 Thursday  RI NA 1991 Sep   0 
16: ASXRI 1991-09-27 Friday  RI NA 1991 Sep   0 
17: ASXRI 1991-09-30 Monday  RI NA 1991 Sep   1 
18: ASXRI 1991-10-01 Tuesday  RI NA 1991 Oct   0 

这是18行了25万。

我想要的是如下分裂基础上的ifelse功能此数据表:

Data1 <- ifelse("Weekly" == "Weekly", Data_copy[End_of_Month ==1,], Data_copy) 

*“周刊” ==“周刊”位将是在一个函数以后使用。

我想让Data1成为一个只包含End_of_Month == 1的行的新数据表。

当我运行上面的代码,我发现我得到名单的公司名称,就是这样。

我会告诉你的输出是什么样子:

Data1[[1]] 
    [1] "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" 

现在,如果我滚动进一步下跌,我得到:

[1387] "AANRI" "AANRI" "AANRI" "AANRI" "AANRI" "AANRI" "APARI" "APARI" "APARI" "APARI" "APARI" 
[1398] "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" 

这些条目中的每一个仅仅是公司的名字之一。

我得到我想要的结果,如果我做的:

Data2 <- Data_copy[End_of_Month == 1, ] 

Company  Date  DOW variable value Year Month End_of_Month 
1: ASXRI 1991-09-30 Monday  RI NA 1991 Sep   1 
2: ASXRI 1991-10-31 Thursday  RI NA 1991 Oct   1 
3: ASXRI 1991-11-29 Friday  RI NA 1991 Nov   1 
4: ASXRI 1991-12-31 Tuesday  RI NA 1991 Dec   1 
5: ASXRI 1992-01-31 Friday  RI NA 1992 Jan   1 
6: ASXRI 1992-02-28 Friday  RI NA 1992 Feb   1 

从本质上讲,我想重复数据2,但使用的ifelse语句。

这里的第一个100行:

dput(head(Data_copy, n = 100)) 
structure(list(Company = c("ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI"), Date = structure(c(7918, 
7921, 7922, 7923, 7924, 7925, 7928, 7929, 7930, 7931, 7932, 7935, 
7936, 7937, 7938, 7939, 7942, 7943, 7944, 7945, 7946, 7949, 7950, 
7951, 7952, 7953, 7956, 7957, 7958, 7959, 7960, 7963, 7964, 7965, 
7966, 7967, 7970, 7971, 7972, 7973, 7974, 7977, 7978, 7979, 7980, 
7981, 7984, 7985, 7986, 7987, 7988, 7991, 7992, 7993, 7994, 7995, 
7998, 7999, 8000, 8001, 8002, 8005, 8006, 8007, 8008, 8009, 8012, 
8013, 8014, 8015, 8016, 8019, 8020, 8021, 8022, 8023, 8026, 8027, 
8028, 8029, 8030, 8033, 8034, 8035, 8036, 8037, 8040, 8041, 8042, 
8043, 8044, 8047, 8048, 8049, 8050, 8051, 8054, 8055, 8056, 8057 
), class = "Date"), DOW = c("Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday" 
), variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("RI", 
"VO", "MV", "TD", "ND"), class = "factor"), value = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), Year = c("1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1992", "1992", "1992", 
"1992", "1992", "1992", "1992", "1992", "1992", "1992", "1992", 
"1992", "1992", "1992", "1992", "1992", "1992"), Month = c("Sep", 
"Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", 
"Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Oct", "Oct", 
"Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", 
"Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", 
"Oct", "Oct", "Oct", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", 
"Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", 
"Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Dec", "Dec", "Dec", 
"Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", 
"Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", 
"Dec", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", 
"Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan" 
), End_of_Month = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0)), .Names = c("Company", "Date", "DOW", "variable", "value", 
"Year", "Month", "End_of_Month"), class = c("data.table", "data.frame" 
), row.names = c(NA, -100L), .internal.selfref = <pointer: 0x00000000001f0788>) 
+0

我想复制您的初始data.table – jangorecki

+0

@jangorecki如果您想尝试添加了一些数据。 –

+0

走出你的方式来使用'ifelse'通常是一个坏主意。尽管函数的语法很好,但它有很多缺点和局限性,所以我会坚持使用'Data_copy [End_of_Month == 1]'的方法。也许我错过了一些东西,因为你没有说你为什么要在这里使用'ifelse'。 – Frank

回答

2

其他用户已经注意到ifelse是不适合你的目的。解释原因可能很有用。从?ifelseifelse(test, yes, no)返回相同长度的

向量和属性(包括尺寸 和“‘类’”),为“测试”和数据值从“是” “不”

的值或

换言之,如果您的test矢量是长度为1,ifelse(...)将返回长度1。例如矢量,

> ifelse(TRUE, 1:3, 7:9) 
[1] 1 
> ifelse(c(TRUE, FALSE), 1:3, 7:9) 
[1] 1 8 

在你的情况下,

ifelse("Weekly" == "Weekly", Data_copy[End_of_Month ==1,], Data_copy) 

将返回长度为1的向量。更确切地说,由于测试返回TRUE,ifelse将返回您的yes参数中的第一个元素;因为它是一个数据帧(一种列表),所以ifelse返回数据帧的第一个元素,它是第一列。这就是为什么你会得到公司名称的列表。如果你真的想使用ifelse建设,努力

ifelse("Weekly" == "Weekly", list(Data_copy[End_of_Month ==1,]), list(Data_copy)) 

尽管如其他人所说的,你可能会更好使用if {} else {}