2017-10-12 46 views
2

我是R新手,需要按照相似性对行进行分组和填充。 我有一个数据帧,看起来像这样:如何分组和填充行

Name   Job   Gender City 
California NA   NA  1 
Bob   plumber  M  0 
Carol  nurse   F  0 
Chicago  NA   NA  1 
Tom   Chef   M  0 
Ann   Swimmer  F  0 
Joy   Police  F  0 

我需要的数据框,看起来像这样:

Name   Job   Gender City 
Bob   plumber  M  California 
Carol  nurse   F  California 
Tom   Chef   M  Chicago 
Ann   Swimmer  F  Chicago 
Joy   Police  F  Chicago 

道歉,如果这是类似的另一个问题asked-一遍,我是很新的这个。谢谢!

回答

0

通过使用zoo,我分解步骤

library(zoo) 
dat1$City[dat1$City==1]=dat1$Name[dat1$City==1] 
dat1$City[dat1$City==0]=NA 
dat1$City=na.locf(dat1$City) 
dat1=dat1[!is.na(dat1$Gender),] 

dat1 
    Name  Job Gender  City 
2 Bob plumber  M California 
3 Carol nurse  F California 
5 Tom Chef  M Chicago 
6 Ann Swimmer  F Chicago 
7 Joy Police  F Chicago 
+0

这个工作正是我需要它!唯一的问题是,我留下的行看起来像这样: 加利福尼亚州不适用加利福尼亚州 芝加哥不适用不适用芝加哥 有没有办法让我摆脱这些呢? 非常感谢您的帮助! – maggietron

+0

@maggietron所以你需要保持它们或删除它们? – Wen

1

假设基开始,其中City等于1,并在Job(或Gender)柱使用NA值,我们可以执行以下操作。

na.omit(transform(df, City = Name[is.na(Job)][cumsum(City)])) 
# Name  Job Gender  City 
# 2 Bob plumber  M California 
# 3 Carol nurse  F California 
# 5 Tom Chef  M Chicago 
# 6 Ann Swimmer  F Chicago 
# 7 Joy Police  F Chicago 

数据:

df <- structure(list(Name = structure(c(3L, 2L, 4L, 5L, 7L, 1L, 6L), .Label = c("Ann", 
"Bob", "California", "Carol", "Chicago", "Joy", "Tom"), class = "factor"), 
    Job = structure(c(NA, 3L, 2L, NA, 1L, 5L, 4L), .Label = c("Chef", 
    "nurse", "plumber", "Police", "Swimmer"), class = "factor"), 
    Gender = structure(c(NA, 2L, 1L, NA, 2L, 1L, 1L), .Label = c("F", 
    "M"), class = "factor"), City = c(1L, 0L, 0L, 1L, 0L, 0L, 
    0L)), .Names = c("Name", "Job", "Gender", "City"), class = "data.frame", row.names = c(NA, 
-7L)) 
+0

感谢您的帮助 - 这个完全消除了“城市”变量,它不会将它们分组 – maggietron

+1

@maggietron - 这正是您所期望的结果。 'cumsum(城市)'创建组。 –

+1

从技术上讲,它根本不会消除“城市”。它在'df'里。它已被转变为具有'['California','Chicago']'的价值。它完美匹配您的测试案例。不要责怪@RichScriven的坏规格。 –

0

然而,你可能接近这个法子是使用filltidyr包。

我已经添加了一个新变量City_Name并保留了原来的City变量(因为OP的注释表明他们可能仍然希望在结果中看到该信息)。

library(dplyr) 
library(tidyr) 

df %>% 
    mutate(City_Name = if_else(City == 1, Name, NA_character_)) %>% 
    fill(City_Name) %>% 
    filter(City == 0) 
#> Name  Job Gender City City_Name 
#> 1 Bob plumber  M 0 California 
#> 2 Carol nurse  F 0 California 
#> 3 Tom Chef  M 0 Chicago 
#> 4 Ann Swimmer  F 0 Chicago 
#> 5 Joy Police  F 0 Chicago 

数据

df <- read.table(text = "Name   Job   Gender City 
California NA   NA  1 
Bob   plumber  M  0 
Carol  nurse   F  0 
Chicago  NA   NA  1 
Tom   Chef   M  0 
Ann   Swimmer  F  0 
Joy   Police  F  0", header = TRUE, stringsAsFactors = FALSE)