0
我有输入像以下示例火花写入结果[数组[不限]]到文件
3070811,1963,1096,,"US","CA",,1,
3022811,1963,1096,,"US","CA",,1,56
3033811,1963,1096,,"US","CA",,1,23
写入用0替换空字符后,我试图将结果写入文本文件,我越来越
scala> result.saveAsTextFile("data/result")
<console>:34: error: value saveAsTextFile is not a member of Array[Array[Any]]
result.saveAxtFile("data/result")
下面是解
scala> val file2 = sc.textFile("data/file.txt")
scala> val mapper = file2.map(x => x.split(",",-1))
scala> val result = mapper.map(x => x.map(x => if(x.isEmpty) 0 else x)).collect()
result: Array[Array[Any]] = Array(Array(3070811, 1963, 1096, 0, "US", "CA", 0, 1, 0), Array(3022811, 1963, 1096, 0, "US", "CA", 0, 1, 56), Array(3033811, 1963, 1096, 0, "US", "CA", 0, 1, 23))
scala> result.saveAsTextFile("data/result")
<console>:34: error: value saveAsTextFile is not a member of Array[Array[Any]]
result.saveAsTextFile("data/result")
我也曾尝试以下,并为失败以及
scala> val output = result.map(x => (x(0),x(1),x(2),x(3), x(4), x(5), x(7), x(8)))
output: Array[(Any, Any, Any, Any, Any, Any, Any, Any)] = Array((3070811,1963,1096,0,"US","CA",1,0), (3022811,1963,1096,0,"US","CA",1,56), (3033811,1963,1096,0,"US","CA",1,23))
scala> output.saveAsTextFile("data/output")
<console>:36: error: value saveAsTextFile is not a member of Array[(Any, Any, Any, Any, Any, Any, Any, Any)]
output.saveAsTextFile("data/output")
,然后添加以下和失败,以及
scala> output.mapValues(_.toList).saveAsTextFile("data/output")
<console>:36: error: value mapValues is not a member of Array[(Any, Any, Any, Any, Any, Any, Any, Any)]
output.mapValues(_.toList).saveAsTextFile("data/output")
我怎么能在控制台或在结果文件的结果或输出变量的内容查看。这里缺少一些基本的东西。
更新1
每香卡拉腊我已删除.collect然后保存执行。
scala> val result = mapper.map(x => x.map(x => if(x.isEmpty) 0 else x))
,这是导致该输出
[Ljava.lang.Object;@7a1167b6
[Ljava.lang.Object;@60d86d2f
[Ljava.lang.Object;@20e85a55
更新1.A
拿起更新的答案,这是给正确的数据
scala> val result = mapper.map(x => x.map(x => if(x.isEmpty) 0 else x).mkString(","))
result: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[29] at map at <console>:31
scala> result.saveAsTextFile("data/mkstring")
结果
3070811,1963,1096,0,"US","CA",0,1,0
3022811,1963,1096,0,"US","CA",0,1,56
3033811,1963,1096,0,"US","CA",0,1,23
更新2
scala> val output = result.map(x => (x(0),x(1),x(2),x(3), x(4), x(5), x(7), x(8)))
output: org.apache.spark.rdd.RDD[(Any, Any, Any, Any, Any, Any, Any, Any)] = MapPartitionsRDD[27] at map at <console>:33
scala> output.saveAsTextFile("data/newOutPut")
和我得到这个结果
(3070811,1963,1096,0,"US","CA",1,0)
(3022811,1963,1096,0,"US","CA",1,56)
(3033811,1963,1096,0,"US","CA",1,23)
请参阅问题中的更新1和更新2。 –
如果您有csv文件,您可以使用https://github.com/databricks/spark-csv spark csv来读取和写入文件,该文件更加简单高效。 –
谢谢,在问题中添加了第1.a节。 mkString像更新2一样工作 –