0

我有多个json文件。我必须使用apache spark来解析它。它嵌套了关键的init。我必须打印所有栏和嵌套键。如何从json文件中使用java中的apache spark创建嵌套列

这些文件也有嵌套键。 我想要获取所有列名称以及嵌套的列名称。我怎么能得到它。

我想这样的:在文件

String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-01.json,/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-02.json"; 

String[] jsonFiles = jsonFilePath.split(","); 

Dataset<Row> people = sparkSession.read().json(jsonFiles); 

JSON结构:

{ 
    "Name":"Vipin Suman", 
    "Email":"[email protected]", 
    "Designation":"Programmer", 
    "Age":22 , 
    "location": 
      { 
      "City":"Ahmedabad", 
      "State":"Gujarat" 
      } 
} 

我得到的结果:

people.show(50, false); 

Age | Designation | Email   | Name  | Location 
------------------------------------------------------------ 
22 |Programmer |[email protected] | Vipin Suman|[Ahmedabad,Gujarat] 

我要像数据:

Age | Designation | Email   | Name  | City  | State 
------------------------------------------------------------ 
22 |Programmer |[email protected] | Vipin Suman| Ahmedabad |Gujarat 

或类似: -

Age | Designation | Email   | Name  | Location 
--------------------------------------------------------------- 
22 |Programmer |[email protected] | Vipin Suman| Ahmedabad,Gujarat 

如果scema这个样子

root 
|-- Age: long (nullable = true) 
|-- Company: struct (nullable = true) 
| |-- Company Name: string (nullable = true) 
| |-- Domain: string (nullable = true) 
|-- Designation: string (nullable = true) 
|-- Email: string (nullable = true) 
|-- Name: string (nullable = true) 
|-- Test: array (nullable = true) 
| |-- element: string (containsNull = true) 
|-- location: struct (nullable = true) 
| |-- City: struct (nullable = true) 
| | |-- City Name: string (nullable = true) 
| | |-- Pin: long (nullable = true) 
| |-- State: string (nullable = true) 

和JSON结构

{ 
    "Name":"Vipin Suman", 
    "Email":"[email protected]", 
"Designation":"Trainee Programmer", 
"Age":22 , 
"location": 
    {"City": 
      { 
      "Pin":324009, 
      "City Name":"Ahmedabad" 
      }, 
    "State":"Gujarat" 
    }, 
"Company": 
      { 
      "Company Name":"Elegant", 
      "Domain":"Java" 
      }, 
"Test":["Test1","Test2"] 

} 

那又怎么能找到嵌套的关键。并表示在适当的formet表

+1

请准备好:输入数据样本,你做了什么,有什么问题? –

回答

1

要在以上预期的格式显示数据,可以使用下面的代码:

people.select("*", "location.*").drop("location").show 

它会给下面的输出:

+---+-----------+-----------------+----------+---------+-------+ 
|Age|Designation|   Email|  Name|  City| State| 
+---+-----------+-----------------+----------+---------+-------+ 
| 22| Programmer|[email protected]|VipinSuman|Ahmedabad|Gujarat| 
+---+-----------+-----------------+----------+---------+-------+ 
+0

非常感谢@himanshuIIITian的回复。 我可以再问你一个问题吗? 如果我不知道什么关键是嵌套的我怎么能找到它。 或者如果我有多个嵌套列,那么我怎么才能找到并解决这种情况。 –

+0

@Vpn_talent这是不可能的,因为如果我们不知道数据框的模式,那么我们不知道它是否嵌套。 – himanshuIIITian

+0

@Vpn_talent这个答案解决了你的问题吗? – himanshuIIITian

相关问题