2016-05-15 40 views
0

我有一个数据集,其中10维作为特征,1维作为簇数(11维一起)。我如何绘制我的数据(PC1)的PCA与使用R的簇号?绘制PCA与一维的R

qplot(x = not_null_df$TSC_8125, y = pca, data = subset(not_null_df, select = c (not_null_df$AVG_ERTEBAT,not_null_df$AVG_ROSHD,not_null_df$AVG_HOGHOGH,not_null_df$AVG_MM,not_null_df$AVG_MK,not_null_df$AVG_TM,not_null_df$AVG_VEJHE,not_null_df$AVG_ANGIZEH,not_null_df$AVG_TAHOD)), main = "Loadings for PC1", xlab = "cluster number") 

其实我写这部分代码,我得到这个错误:

Don't know how to automatically pick scale for object of type princomp. Defaulting to continuous. 
Error: Aesthetics must be either length 1 or the same as the data (564): x, y 

summary(not_null_df) 
    ï..QN   NAMECODE  GENDER  VAZEYATTAAHOL  TAHSILAT   SEN   SABEGHE  
Min. : 1.00 Min. : 1.0 Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000 
1st Qu.: 28.00 1st Qu.:11.0 1st Qu.:1.000 1st Qu.:1.75 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 
Median : 60.00 Median :13.0 Median :1.000 Median :2.00 Median :3.000 Median :1.000 Median :1.000 
Mean : 68.63 Mean :11.7 Mean :1.152 Mean :1.75 Mean :2.578 Mean :1.394 Mean :1.121 
3rd Qu.:103.25 3rd Qu.:14.0 3rd Qu.:1.000 3rd Qu.:2.00 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:1.000 
Max. :190.00 Max. :16.0 Max. :2.000 Max. :2.00 Max. :3.000 Max. :3.000 Max. :3.000 
    AVG_ERTEBAT  AVG_ROSHD  AVG_HOGHOGH   AVG_MM   AVG_MK   AVG_TM   AVG_VEJHE  
Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 
1st Qu.: 5.333 1st Qu.: 4.125 1st Qu.: 1.750 1st Qu.: 5.000 1st Qu.: 3.125 1st Qu.: 5.981 1st Qu.: 4.556 
Median : 7.000 Median : 5.875 Median : 3.500 Median : 7.727 Median : 5.000 Median : 8.000 Median : 6.333 
Mean : 6.730 Mean : 5.787 Mean : 4.001 Mean : 6.903 Mean : 4.890 Mean : 7.390 Mean : 6.095 
3rd Qu.: 8.425 3rd Qu.: 7.656 3rd Qu.: 6.000 3rd Qu.: 9.182 3rd Qu.: 6.688 3rd Qu.: 9.204 3rd Qu.: 7.778 
Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000 
    AVG_ANGIZEH  AVG_TAHOD  AVG_SOALAT  TSC_8125   avg  
Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :1.000 Min. :0.000 
1st Qu.: 5.000 1st Qu.: 5.833 1st Qu.: 4.000 1st Qu.:1.000 1st Qu.:4.788 
Median : 7.000 Median : 7.667 Median : 7.000 Median :2.000 Median :6.301 
Mean : 6.549 Mean : 7.171 Mean : 6.025 Mean :2.046 Mean :6.154 
3rd Qu.: 8.750 3rd Qu.: 9.000 3rd Qu.: 8.000 3rd Qu.:3.000 3rd Qu.:7.599 
Max. :10.000 Max. :10.000 Max. :10.000 Max. :3.000 Max. :9.978 

,我可以通过这个代码得到PCA:

pca <- princomp(not_null_df, cor=TRUE, scores=TRUE) 

summary(pca) 
Importance of components: 
         Comp.1  Comp.2  Comp.3  Comp.4  Comp.5  Comp.6  Comp.7  Comp.8  Comp.9 
Standard deviation  2.887437 1.28937443 1.12619079 1.08816449 0.98432226 0.91257779 0.90980017 0.82303807 0.74435256 
Proportion of Variance 0.438805 0.08749929 0.06675293 0.06232116 0.05099423 0.04383149 0.04356507 0.03565219 0.02916109 
Cumulative Proportion 0.438805 0.52630426 0.59305720 0.65537835 0.70637258 0.75020406 0.79376914 0.82942133 0.85858242 
          Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17  Comp.18 
Standard deviation  0.70304085 0.67709130 0.62905993 0.59284646 0.50799135 0.48013732 0.4476952 0.39317004 0.378722707 
Proportion of Variance 0.02601402 0.02412909 0.02082718 0.01849826 0.01358185 0.01213325 0.0105490 0.00813593 0.007548994 
Cumulative Proportion 0.88459644 0.90872553 0.92955271 0.94805097 0.96163282 0.97376607 0.9843151 0.99245101 1.000000000 
          Comp.19 
Standard deviation  1.838143e-08 
Proportion of Variance 1.778301e-17 
Cumulative Proportion 1.000000e+00 

我的目标是绘制PCA(仅为Comp.1)与TSC_8125(即群集n棕褐色)

+0

我会检查你的'subset'语句是否返回你认为它是。 – user20650

+0

你认为它是子集问题吗? – aliakbarian

+0

如何访问PC1?实际上我怎样才能在qplot中使用PC1而不是pca? – aliakbarian

回答

1

函数princomp()返回一个包含7个元素的列表。这些是sdev,装载,中心,规模,n.obs,分数和电话。你可以在功能帮助页面找到这些描述(你可以通过键入?princomp来访问它们)。根据你的情节的目的,这里感兴趣的可能是分数。

scores: the scores of the supplied data on the principal components.

loadings: the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors).

访问列表元素的最简单方法是通过$运算符。因此,pca $分数或pca $ loading将分别访问这些分数。分数和加载都是类矩阵,每列对应于一个主成分(第一列是第一主成分,依此类推)。

因此,要访问第一主成分分数,可以使用

comp.1 <- pca$scores[,1] 

绘制该对簇号就可以使用

plot (comp.1 ~ not_null_df$TSC_8125) 

,或者使用qplot如果您希望通过

qplot(x = not_null_df$TSC_8125, y = comp.1, main = "Scores for PC1", xlab = "cluster number") 
绘制