0
我有一个列,它的类型是数组< Struct>从json文件中推导出来的。 我想要将数组< Struct>转换为字符串,以便我可以将此数组列保留在配置单元中并将其作为单列导出到RDBMS。spark scala:将Struct列的Array转换为String列
temp.json
{"properties":{"items":[{"invoicid":{"value":"923659"},"job_id":
{"value":"296160"},"sku_id":
{"value":"312002"}}],"user_id":"6666","zip_code":"666"}}
处理:
scala> val temp = spark.read.json("s3://check/1/temp1.json")
temp: org.apache.spark.sql.DataFrame = [properties: struct<items:
array<struct<invoicid:struct<value:string>,job_id:struct<value:string>,sku_id:struct<value:string>>>, user_id: string ... 1 more field>]
scala> temp.printSchema
root
|-- properties: struct (nullable = true)
| |-- items: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- invoicid: struct (nullable = true)
| | | | |-- value: string (nullable = true)
| | | |-- job_id: struct (nullable = true)
| | | | |-- value: string (nullable = true)
| | | |-- sku_id: struct (nullable = true)
| | | | |-- value: string (nullable = true)
| |-- user_id: string (nullable = true)
| |-- zip_code: string (nullable = true)
scala> temp.select("properties").show
+--------------------+
| properties|
+--------------------+
|[WrappedArray([[9...|
+--------------------+
scala> temp.select("properties.items").show
+--------------------+
| items|
+--------------------+
|[[[923659],[29616...|
+--------------------+
scala> temp.createOrReplaceTempView("tempTable")
scala> spark.sql("select properties.items from tempTable").show
+--------------------+
| items|
+--------------------+
|[[[923659],[29616...|
+--------------------+
我怎样才能像结果:
+-----------------------------------------------------------------------------------------+
| items |
+-----------------------------------------------------------------------------------------+
[{"invoicid":{"value":"923659"},"job_id":{"value":"296160"},"sku_id":{"value":"312002"}}] |
+-----------------------------------------------------------------------------------------+
得到数组元素值而没有任何变化。
我正好找这个.Thanks你非常亟待解决的功能。 –