2017-03-21 78 views
1

我想要实现在由几列的如下所示的数据集的程序:组数据集成基于该值不同的子数据集

+-----------+---------------+-------------------+-----------------------+ 
|Item_ID |Product_Name |Manufacturer_Name |Product_Description | 
+-----------+---------------+-------------------+-----------------------+ 
|12345  |Pen   |Cello    |Ball Pen Soft Nib... | 
|12346  |Pencil   |Nataraja   |Pencil HB Extra D... | 
|42345  |Ruler   |Nataraja   |Scale No.1103 15c... | 
|12677  |Sharpener  |Nataraja   |Pencil Shraperner... | 
|12987  |Pen   |Reynolds   |Dot Pen Extra Gr... | 
|44326  |Pen   |Reynolds   |Gel Pen German T... | 
|13456  |Pen   |Cello    |Dot Pen 0.5mm Nib... | 
|19876  |Eraser   |Cello    |Dust free Eraser ... | 
|43246  |Ink Pen  |Hero    |Ink Pen Smooth Ha... | 
+-----------+---------------+-------------------+-----------------------+ 

,我想基于所述Manufacturer_Name组数据集等所示低于

Manufacturer = Cello 
+-----------+---------------+-------------------+-----------------------+ 
|Item_ID |Product_Name |Manufacturer_Name |Product_Description | 
+-----------+---------------+-------------------+-----------------------+ 
|12345  |Pen   |Cello    |Ball Pen Soft Nib... | 
|13456  |Pen   |Cello    |Dot Pen 0.5mm Nib... | 
|19876  |Eraser   |Cello    |Dust free Eraser ... | 
+-----------+---------------+-------------------+-----------------------+ 

Manufacturer = Nataraja 
+-----------+---------------+-------------------+-----------------------+ 
|Item_ID |Product_Name |Manufacturer_Name |Product_Description | 
+-----------+---------------+-------------------+-----------------------+ 
|12346  |Pencil   |Nataraja   |Pencil HB Extra D... | 
|42345  |Ruler   |Nataraja   |Scale No.1103 15c... | 
|12677  |Sharpener  |Nataraja   |Pencil Shraperner... | 
+-----------+---------------+-------------------+-----------------------+ 

Manufacturer = Reynolds 
+-----------+---------------+-------------------+-----------------------+ 
|Item_ID |Product_Name |Manufacturer_Name |Product_Description | 
+-----------+---------------+-------------------+-----------------------+ 
|12987  |Pen   |Reynolds   |Dot Pen Extra Gr... | 
|44326  |Pen   |Reynolds   |Gel Pen German T... | 
+-----------+---------------+-------------------+-----------------------+ 

Manufacturer = Hero 
+-----------+---------------+-------------------+-----------------------+ 
|Item_ID |Product_Name |Manufacturer_Name |Product_Description | 
+-----------+---------------+-------------------+-----------------------+ 
|43246  |Ink Pen  |Hero    |Ink Pen Smooth Ha... | 
+-----------+---------------+-------------------+-----------------------+ 

我尝试使用下面的代码,它不会产生好的结果。帮我改进这个程序。以下是我使用的代码:

Dataset<Row> countsBy = src.select("Manufacturer_Name").distinct(); 
List<Row> lsts = countsBy.collectAsList(); 
for (Row lst : lsts) { 
    String man = lst.toString(); 
    System.out.println("Records of " + man + " only"); 
    Dataset<Row> mandataset = src.filter("Manufacturer_Name='" + man + "'"); 
    mandataset.show(); 
} 
+0

你能具体谈谈糟糕的后果新的想法?它是缓慢还是错误? –

+0

我希望数据集的子集能够在迭代部分之外使用。由于它是在本地声明的,并且在每次迭代时都被覆盖,所以我不能使用除最后一次迭代期间生成的子集以外的所有子集。 @AugustinBocken –

回答

0

也许你可以试着让地图数据集的,关键一个字符串(MANUFACTURER_NAME)和每一次迭代中,检查MANUFACTURER_NAME,那么你检查它是否已经在地图中(如果需要,您可以创建它),最后,将您的行添加到好的数据集中。

你有类似的东西:

Map<string,ArrayList<ShopItem>> dic = new HashMap<string,ArrayList<ShopItem>>(); 
for(/*...*/) 
{ 
    string Manufacturer_Name = //you get the name 
    if(/*the Manufacturer_Name is not in dic*/) 
    { 
    dic.put(Manufacturer_Name,new ArrayList<ShopItem>()); 
    } 
    dic.get(Manufacturer_Name).Add(/*what you want to add*/); 
} 

然后你需要第二个循环,但仅用于打印数据。

我希望它能解决您的问题!

编辑:按地图(对不起)remplaced Dictionnary并提供链接

How do you create a dictionary in Java?

编辑:改变的代码匹配

+0

只是为了在实施前澄清一些事情。一个字典可以被实例化吗?并且是java中可用的方法add和AddValue? –

+0

也许只有在图书馆......你说得对,我需要检查更多,或者你可以实现你自己的词典!这是一个Entry列表,每个条目都有一个Key和Object Value对象,应该足以供您使用......但我会看一看并编辑我的答案 –

+0

在这里,我更正了答案,因此它更加正确:D –