2016-09-23 60 views
1

我是一名新手,想要过滤地图,并且遇到问题。 我正试图从.csv文件中删除头文件并尝试将某些记录归档。但由于某种原因我的过滤条件是 无法正常工作。带地图的火花过滤功能

val dataWithHeader = sc.textFile("/user/skv/airlines.csv") 
val headerAndRows = dataWithHeader.map(x => x.split(",").map(_.trim) 
val Header = headerAndRows.first  
val data = headerAndRows.filter(_(0) != Header(0)) 

val maps = data.map(x => Header.zip(x).toMap)  
//result looks like //res0:  
// Array[scala.collection.immutable.Map[String,String]] =  
// Array(Map(Code -> "19031", Description -> "Mackey International Inc.: MAC"), 
//  Map(Code -> "19032", Description -> "Munz Northern Airlines Inc.: XY"), 
//now when i am trying to filter the map with the below condition the filter is not working ? 

val result = maps.filter(x => x("Code") != "19031") 

airlines.csv看起来像

Code,Description 
"19031","Mackey International Inc.: MAC" 
"19032","Munz Northern Airlines Inc.: XY" 
"19033","Cochise Airlines Inc.: COC" 
"19034","Golden Gate Airlines Inc.: GSA" 
"19035","Aeromech Inc.: RZZ" 
"19036","Golden West Airlines Co.: GLW" 
"19037","Puerto Rico Intl Airlines: PRN" 
"19038","Air America Inc.: STZ" 
"19039","Swift Aire Lines Inc.: SWT" 

回答

3

你似乎有一对双引号的太多(因为你从CSV读双引号)。

尝试

val headerAndRows = dataWithHeader.map(x => x.split(",").map(_.trim.replace("\"", "")) 
+0

感谢拉斐尔...我用的替代取出DOUB le引用.. –

0

更换

val headerAndRows = dataWithHeader.map(x => x.split(",").map(_.trim) 

既然你有你的数据double quote。你可以让你工作有两种方式来完成:

  1. 通过更换双引号(如回答Raphael Roth

  2. 通过与这样的

    双引号比较你的价值除去数据双引号
val result = maps.filter(x => { 
     x("Code") != "\"19031\"" 
    }) 
+0

谢谢p2 ...它帮我解决了它.. –

+0

@satish_venu乐于帮忙,欢迎来到Stack Overflow。如果此答案或任何其他人解决了您的问题,请将其标记为已接受。 –