2017-10-19 25 views
-1

我有一个这样的格式的数据,为了能够可视化数据我需要将格式更改为像第二个例子,你知道我可以如何改变这样的格式?如何转换我的数据R

The first row is Age range 
0–14 15–24 25–34 35–44 45–54 55–64 65 years and over 
1,873.4 1,088.4 1,296.4 1,157.2 1,207.5 1,177.5 1,498.7 
513.0 351.8 339.1 419.1 485.0 624.1 925.7 
1,049.9 666.4 594.2 682.9 645.7 650.2 727.1 
422.6 287.7 354.1 344.9 400.6 411.5 528.3 
2,069.1 1,234.7 1,429.0 1,310.3 1,323.1 1,229.6 1,514.9 
178.0 306.8 253.8 248.9 178.5 75.2 42.1  
2,247.2 1,541.5 1,682.9 1,559.2 1,501.5 1,304.8 1,557.0 

如何,我可以将我的数据是这样的:

Age  Count 
0-14 1,873.4 
15-24 1,088.4 
25-34 1,296.4 
35-44 1,157.2 
45-54 1,207.5 
55-64 1,177.5 
65+  1,498.7 
0-14 513.0 
15-24 351.8 
25-34 339.1 
35-44 419.1 
45-54 485.0 
55-64 624.1 
65+  925.7 
0-14 1,049.0 
15-24 666.4 
25-34 594.2 
35-44 682.9 
45-54 645.7 
55-64 650.2 
65+  727.1 
0-14 422.6 
15-24 287.7 
25-34 354.1 
35-44 344.9 
45-54 400.6 
55-64 411.5 
65+  528.3 
+4

请使用'dput'分享你的数据 - 现在它的结构尚不清楚。只需将'dput(your_data)'粘贴到问题中即可。 – Gregor

+1

这个字符是“1,873.4”吗? – PoGibas

回答

-1

如果您的表被命名为tab,我会做:

# Load library 
    library(reshape) 

# Use melt() function 
    melt.tab <- melt(tab) 

你要找的是第二和第三行melt.tab;那就是:

melt.tab[,-1] 
0

我会使用reshape2::melt但不同。 首先数据,格式为dput

Tamra <- 
structure(list(`0–14` = c("1,873.4", "513.0", "1,049.9", "422.6", 
"2,069.1", "178.0", "2,247.2"), `15–24` = c("1,088.4", "351.8", 
"666.4", "287.7", "1,234.7", "306.8", "1,541.5"), `25–34` = c("1,296.4", 
"339.1", "594.2", "354.1", "1,429.0", "253.8", "1,682.9"), `35–44` = c("1,157.2", 
"419.1", "682.9", "344.9", "1,310.3", "248.9", "1,559.2"), `45–54` = c("1,207.5", 
"485.0", "645.7", "400.6", "1,323.1", "178.5", "1,501.5"), `55–64` = c("1,177.5", 
"624.1", "650.2", "411.5", "1,229.6", "75.2", "1,304.8"), `65+` = c("1,498.7", 
"925.7", "727.1", "528.3", "1,514.9", "42.1", "1,557.0")), .Names = c("0–14", 
"15–24", "25–34", "35–44", "45–54", "55–64", "65+"), class = "data.frame", row.names = c(NA, 
-7L)) 

现在的代码。

molten <- reshape2::melt(Tamra, measure.vars = names(Tamra), 
         variable.name = "Age", value.name = "Count") 
#head(molten) 
# Age Count 
#1 0–14 1,873.4 
#2 0–14 513.0 
#3 0–14 1,049.9 
#4 0–14 422.6 
#5 0–14 2,069.1 
#6 0–14 178.0 
0

解决方案与dplyr + tidyr

library(dplyr) 
library(tidyr) 
df %>% 
    gather(AgeRange, count) %>% 
    mutate(count = as.numeric(gsub(",", "", count))) %>% 
    arrange(rep(1:nrow(df), ncol(df))) 

结果:

AgeRange count 
1  0–14 1873.4 
2  15–24 1088.4 
3  25–34 1296.4 
4  35–44 1157.2 
5  45–54 1207.5 
6  55–64 1177.5 
7  65+ 1498.7 
8  0–14 513.0 
9  15–24 351.8 
10 25–34 339.1 
11 35–44 419.1 
12 45–54 485.0 
13 55–64 624.1 
14  65+ 925.7 
15  0–14 1049.9 
16 15–24 666.4 
17 25–34 594.2 
18 35–44 682.9 
19 45–54 645.7 
20 55–64 650.2 
... 

注:

我还添加了mutate步到counts列转换为numeric

数据:

df = structure(list(`0–14` = c("1,873.4", "513.0", "1,049.9", "422.6", 
"2,069.1", "178.0", "2,247.2"), `15–24` = c("1,088.4", "351.8", 
"666.4", "287.7", "1,234.7", "306.8", "1,541.5"), `25–34` = c("1,296.4", 
"339.1", "594.2", "354.1", "1,429.0", "253.8", "1,682.9"), `35–44` = c("1,157.2", 
"419.1", "682.9", "344.9", "1,310.3", "248.9", "1,559.2"), `45–54` = c("1,207.5", 
"485.0", "645.7", "400.6", "1,323.1", "178.5", "1,501.5"), `55–64` = c("1,177.5", 
"624.1", "650.2", "411.5", "1,229.6", "75.2", "1,304.8"), `65+` = c("1,498.7", 
"925.7", "727.1", "528.3", "1,514.9", "42.1", "1,557.0")), .Names = c("0–14", 
"15–24", "25–34", "35–44", "45–54", "55–64", "65+"), class = "data.frame", row.names = c(NA, 
-7L))