2014-02-28 76 views
1

我有这样如何删除R中dataframe列中字符串的所有NAs?

LocationList,Identity,Category 
"New York,New York,United States","42","S" 
"NA,California,United States","89","lyt" 
"Hartford,Connecticut,United States","879","polo" 
"San Diego,California,United States","45454","utyr" 
"Seattle,Washington,United States","uytr","69" 
"NA,NA,United States","87","tree" 

一个CSV文件,我想从 'LocationList' 列中删除所有 'NA'

期望的结果 -

LocationList,Identity,Category 
"New York,New York,United States","42","S" 
"California,United States","89","lyt" 
"Hartford,Connecticut,United States","879","polo" 
"San Diego,California,United States","45454","utyr" 
"Seattle,Washington,United States","uytr","69" 
"United States","87","tree" 

列数不固定和可能增加或减少。此外,我想写入不带引号的CSV文件,并且不会为'LocationList'列进行转义。

如何在R中实现以下内容? 新的R任何帮助表示赞赏。

+0

删除NA,因为您暗示使标题信息错误。你想让NA被黑色或空间所取代?如果你真的想删除NA,有办法做,但我想知道它的使用后处理。如果这是csv,并且所需的输出也是csv,那么不能简单地使用任何文本处理器来替换'NA,'''“”'(无),并且不加任何引号(“)” – Ananta

+1

@Ananta格式就像'LocationList'列'NA,NA,United States'一样,我不知道它是如何使标题信息错误的? – user3188390

+0

oops,my bad。然后'df $ LocationList < - gsub(“NA,”,“” '''''''''''''''''''''''''''''''''''''''''当'gsub(“NA,”,“”,my.data $ LocationList''''write.table'参数'quote = FALSE' – Ananta

回答

1

尝试:

my.data <- read.table(text='LocationList,Identity,Category 
         "New York,New York,United States","42","S" 
         "NA,California,United States","89","lyt" 
         "Hartford,Connecticut,United States","879","polo" 
         "San Diego,California,United States","45454","utyr" 
         "Seattle,Washington,United States","uytr","69" 
         "NA,NA,United States","87","tree"', header=T, sep=",") 
my.data$LocationList <- gsub("NA,", "", my.data$LocationList) 
my.data 
#       LocationList Identity Category 
# 1 New York,New York,United States  42  S 
# 2   California,United States  89  lyt 
# 3 Hartford,Connecticut,United States  879  polo 
# 4 San Diego,California,United States 45454  utyr 
# 5 Seattle,Washington,United States  uytr  69 
# 6      United States  87  tree 

如果去掉引号的,当你写一个传统的CSV文件,您将无法读取数据在以后。这是因为您在LocationList变量中的每个值中都有逗号,所以您可以在字段中间逗号并标记字段之间的中断。您可以尝试使用write.csv2(),它会用分号;指示新字段。你可以使用:

write.csv2(my.data, file="myFile.csv", quote=FALSE, row.names=FALSE) 

其产生下列文件:

LocationList;Identity;Category 
New York,New York,United States;42;S 
California,United States;89;lyt 
Hartford,Connecticut,United States;879;polo 
San Diego,California,United States;45454;utyr 
Seattle,Washington,United States;uytr;69 
United States;87;tree 

现在我注意到了5IdentityCategory值大概是搞砸了,你可能想在写入文件之前切换这些文件,

x    <- my.data[5, 2] 
my.data[5, 2] <- my.data[5, 3] 
my.data[5, 2] <- x 
rm(x) 
+0

' )'? – Ananta

+0

好点,@Ananta。我改变了这个,我想我默认是'lapply()' – gung

+0

谢谢你的回答,它给了我一个好主意,仍然学习R,如果有人回答你的问题。 – user3188390

2

在这种情况下,您只需将NA,替换为无。但是,这不是删除NA值的标准方法。

假设dat是您的数据,使用

dat$LocationList <- gsub("^(NA,)+", "", dat$LocationList) 
相关问题