2016-05-14 42 views
0

之后说,k点均值聚类过程运行在一组点上,结果是5个聚类,是否可以根据该单独的大多数点写入数据库簇?如果基于Rapidminer聚类结果的说明

即。伪:

if majority of points within cluster have attribute category == 'state' 
add record in database with attribute description == 'state' 
else attribute decription == 'private' 

希望我的解释清楚!

+0

这将是可能的,但要清楚你的意思是以下吗?如果cluster1中有100个例子,其中51个有另一个属性,称为'category'设置为'state',然后将另一个属性称为'description'为'state',否则将'description'设置为'private' 。考虑每个群集的数量,重复其他群集。将最终结果保存到数据库中。 – awchisholm

+0

正是。因此,要保存在数据库中的最终结果(如果对于例如多数是“状态”)将是: [集群1的质心] [desc ='状态'] –

回答

0

一个相对复杂的过程,但这里有一个可以复制的实例。

<?xml version="1.0" encoding="UTF-8" standalone="no"?> 
<process version="7.0.000"> 
    <context> 
    <input/> 
    <output/> 
    <macros/> 
    </context> 
    <operator activated="true" class="process" compatibility="7.0.000" expanded="true" name="Process"> 
    <process expanded="true"> 
     <operator activated="true" class="retrieve" compatibility="7.0.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="34"> 
     <parameter key="repository_entry" value="//Samples/data/Iris"/> 
     </operator> 
     <operator activated="true" class="k_means" compatibility="7.0.000" expanded="true" height="82" name="Clustering" width="90" x="246" y="34"> 
     <parameter key="k" value="10"/> 
     </operator> 
     <operator activated="true" class="generate_attributes" compatibility="7.0.000" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="136"> 
     <list key="function_descriptions"> 
      <parameter key="category" value="if(rand()&gt;0.5, &quot;state&quot;, &quot;notstate&quot;)"/> 
      <parameter key="categoryNumeric" value="if(category==&quot;state&quot;, 1, 0)"/> 
     </list> 
     </operator> 
     <operator activated="true" class="aggregate" compatibility="7.0.000" expanded="true" height="82" name="Aggregate" width="90" x="246" y="238"> 
     <list key="aggregation_attributes"> 
      <parameter key="categoryNumeric" value="average"/> 
     </list> 
     <parameter key="group_by_attributes" value="cluster"/> 
     </operator> 
     <operator activated="true" class="generate_attributes" compatibility="7.0.000" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="380" y="340"> 
     <list key="function_descriptions"> 
      <parameter key="description" value="if ([average(categoryNumeric)]&gt;0.5, &quot;state&quot;,&quot;private&quot;)"/> 
     </list> 
     </operator> 
     <operator activated="true" class="join" compatibility="7.0.000" expanded="true" height="82" name="Join" width="90" x="514" y="238"> 
     <parameter key="join_type" value="left"/> 
     <parameter key="use_id_attribute_as_key" value="false"/> 
     <list key="key_attributes"> 
      <parameter key="cluster" value="cluster"/> 
     </list> 
     </operator> 
     <operator activated="true" class="jdbc_connectors:write_database" compatibility="7.0.000" expanded="true" height="68" name="Write Database" width="90" x="715" y="238"> 
     <parameter key="connection" value="LocalMYSQL"/> 
     <parameter key="schema_name" value="ascom"/> 
     <parameter key="table_name" value="joinresult"/> 
     </operator> 
     <connect from_op="Retrieve Iris" from_port="output" to_op="Clustering" to_port="example set"/> 
     <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/> 
     <connect from_op="Clustering" from_port="clustered set" to_op="Generate Attributes" to_port="example set input"/> 
     <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/> 
     <connect from_op="Aggregate" from_port="example set output" to_op="Generate Attributes (4)" to_port="example set input"/> 
     <connect from_op="Aggregate" from_port="original" to_op="Join" to_port="left"/> 
     <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Join" to_port="right"/> 
     <connect from_op="Join" from_port="join" to_op="Write Database" to_port="input"/> 
     <connect from_op="Write Database" from_port="through" to_port="result 2"/> 
     <portSpacing port="source_input 1" spacing="0"/> 
     <portSpacing port="sink_result 1" spacing="0"/> 
     <portSpacing port="sink_result 2" spacing="0"/> 
     <portSpacing port="sink_result 3" spacing="0"/> 
    </process> 
    </operator> 
</process> 

的要点是

  • 创建对应于category称为categoryNumeric它被设置为1,如果是categorystate否则为0的属性。
  • 按聚类进行聚合,取平均值categoryNumeric。如果聚合值大于0.5,则表示大多数群集示例的category等于state
  • 根据大多数确定,在聚合结果中创建一个新属性,称为description
  • 每个集群现在都有附加数据,并且可以使用集群标识符作为关键字将其连接到原始数据。
  • 写入到数据库中(我用的MySQL)

希望这有助于为一个开始。