0
我在星火这个数据结构:星火复杂的分组
val df = Seq(
("Package 1", Seq("address1", "address2", "address3")),
("Package 2", Seq("address3", "address4", "address5", "address6")),
("Package 3", Seq("address7", "address8")),
("Package 4", Seq("address9")),
("Package 5", Seq("address9", "address1")),
("Package 6", Seq("address10")),
("Package 7", Seq("address8"))).toDF("Package", "Destinations")
df.show(20, false)
我需要找到被视为一起在不同的软件包的所有地址。看起来我无法找到有效实现这一点的方法。我试着组,地图等理想的情况下,给定的df
的结果将是
+----+------------------------------------------------------------------------+
| Id | Addresses |
+----+------------------------------------------------------------------------+
| 1 | [address1, address2, address3, address4, address5, address6, address9] |
| 2 | [address7, address8] |
| 3 | [address10] |
+----+------------------------------------------------------------------------+
我知道我问的太多了,但请问您可否包含一个小例子? – twoface88
TreeReduce没有'sequential'或'combine','TreeAggregate'确实 – twoface88