2017-08-12 45 views
0

与子集data.frame一些有线输出在R.

这里是文件我用

https://d37djvu3ytnwxt.cloudfront.net/assets/courseware/v1/ccdc87b80d92a9c24de2f04daec5bb58/asset-v1:[email protected]+block/WHO.csv

读取后R中的数据有194个obs。有13个变量。

> str(WHO) 
'data.frame': 194 obs. of 13 variables: 
$ Country      : Factor w/ 194 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ... 
$ Region      : Factor w/ 6 levels "Africa","Americas",..: 3 4 1 4 1 2 2 4 6 4 ... 
$ Population     : int 29825 3162 38482 78 20821 89 41087 2969 23050 8464 ... 
$ Under15      : num 47.4 21.3 27.4 15.2 47.6 ... 
$ Over60      : num 3.82 14.93 7.17 22.86 3.84 ... 
$ FertilityRate    : num 5.4 1.75 2.83 NA 6.1 2.12 2.2 1.74 1.89 1.44 ... 
$ LifeExpectancy    : int 60 74 73 82 51 75 76 71 82 81 ... 
$ ChildMortality    : num 98.5 16.7 20 3.2 163.5 ... 
$ CellularSubscribers   : num 54.3 96.4 99 75.5 48.4 ... 
$ LiteracyRate     : num NA NA NA NA 70.1 99 97.8 99.6 NA NA ... 
$ GNI       : num 1140 8820 8310 NA 5230 ... 
$ PrimarySchoolEnrollmentMale : num NA NA 98.2 78.4 93.1 91.1 NA NA 96.9 NA ... 
$ PrimarySchoolEnrollmentFemale: num NA NA 96.4 79.4 78.2 84.5 NA NA 97.5 NA ... 

但随着功能子集子集的结果从DF [,]如实施例下面是不同的。

> Outliers <- WHO[WHO$GNI > 10000 & WHO$FertilityRate > 2.5,] 
> nrow(Outliers) 
    [1] 27 
Country    Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers 
NA     <NA>     <NA>   NA  NA  NA   NA    NA    NA     NA 
23    Botswana    Africa  2004 33.75 5.63   2.71    66   53.3    142.82 
NA.1    <NA>     <NA>   NA  NA  NA   NA    NA    NA     NA 
NA.2    <NA>     <NA>   NA  NA  NA   NA    NA    NA     NA 
(trimmed ...) 

有很多NA obs。

虽然使用子集功能,产量正确的结果。

> Outliers <- subset(WHO, GNI > 10000 & FertilityRate > 2.5) 
> nrow(Outliers) 
[1] 7 
> Outliers 
      Country    Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers 
23   Botswana    Africa  2004 33.75 5.63   2.71    66   53.3    142.82 
56 Equatorial Guinea    Africa  736 38.95 4.53   5.04    54   100.3    59.15 
63    Gabon    Africa  1633 38.49 7.38   4.18    62   62.0    117.32 
83    Israel    Europe  7644 27.53 15.15   2.92    82   4.2    121.66 
88   Kazakhstan    Europe  16271 25.46 10.04   2.52    67   18.7    155.74 
131   Panama    Americas  3802 28.65 10.13   2.52    77   18.5    188.60 
150  Saudi Arabia Eastern Mediterranean  28288 29.69 4.59   2.76    76   8.6    191.24 
(trimmed ...) 
+1

希望链接将帮助https://stackoverflow.com/questions/40446165/how-to-subset-data-in-r-without-losing-na-rows – Wen

+0

谢谢,这是明确的答案。 –

回答

0

如何确保您首先摆脱NA?

Outliers <- WHO[!is.na(WHO$GNI) & WHO$GNI > 10000 & 
!is.na(WHO$FertilityRate) & WHO$FertilityRate > 2.5,] 
+1

谢谢,然后使用** [**子集必须注意** NA **。 –