2016-11-08 72 views
1

我有LinkeIn帐户的数据模式如下所示。我需要查询数组中的技能,其中数组可能包含JAVA或Java或Java或JAVA开发人员或Java开发人员。Spark Sql,无法查询

Dataset<Row> sqlDF = spark.sql("SELECT * FROM people" 
      + " WHERE ARRAY_CONTAINS(skills,'Java') " 
      + " OR ARRAY_CONTAINS(skills,'JAVA')" 
      + " OR ARRAY_CONTAINS(skills,'Java developer') " 
      + "AND ARRAY_CONTAINS(experience['description'],'Java developer')" ); 

回答

1
df.printschema() 

root 
|-- skills: array (nullable = true) 
| |-- element: string (containsNull = true) 


df.show() 

+--------------------+ 
|    skills| 
+--------------------+ 
|  [Java, java]| 
|[Java Developer, ...| 
|    [dev]| 
+--------------------+