如果基于Rapidminer聚类结果的说明

之后说，k点均值聚类过程运行在一组点上，结果是5个聚类，是否可以根据该单独的大多数点写入数据库簇？如果基于Rapidminer聚类结果的说明

即。伪：

if majority of points within cluster have attribute category == 'state' 
add record in database with attribute description == 'state' 
else attribute decription == 'private'

希望我的解释清楚！

来源

2016-05-14 X' Byte

这将是可能的，但要清楚你的意思是以下吗？如果cluster1中有100个例子，其中51个有另一个属性，称为'category'设置为'state'，然后将另一个属性称为'description'为'state'，否则将'description'设置为'private' 。考虑每个群集的数量，重复其他群集。将最终结果保存到数据库中。 – awchisholm

正是。因此，要保存在数据库中的最终结果（如果对于例如多数是“状态”）将是： [集群1的质心] [desc ='状态'] –

一个相对复杂的过程，但这里有一个可以复制的实例。

<?xml version="1.0" encoding="UTF-8" standalone="no"?> 
<process version="7.0.000"> 
    <context> 
    <input/> 
    <output/> 
    <macros/> 
    </context> 
    <operator activated="true" class="process" compatibility="7.0.000" expanded="true" name="Process"> 
    <process expanded="true"> 
     <operator activated="true" class="retrieve" compatibility="7.0.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="34"> 
     <parameter key="repository_entry" value="//Samples/data/Iris"/> 
     </operator> 
     <operator activated="true" class="k_means" compatibility="7.0.000" expanded="true" height="82" name="Clustering" width="90" x="246" y="34"> 
     <parameter key="k" value="10"/> 
     </operator> 
     <operator activated="true" class="generate_attributes" compatibility="7.0.000" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="136"> 
     <list key="function_descriptions"> 
      <parameter key="category" value="if(rand()&gt;0.5, &quot;state&quot;, &quot;notstate&quot;)"/> 
      <parameter key="categoryNumeric" value="if(category==&quot;state&quot;, 1, 0)"/> 
     </list> 
     </operator> 
     <operator activated="true" class="aggregate" compatibility="7.0.000" expanded="true" height="82" name="Aggregate" width="90" x="246" y="238"> 
     <list key="aggregation_attributes"> 
      <parameter key="categoryNumeric" value="average"/> 
     </list> 
     <parameter key="group_by_attributes" value="cluster"/> 
     </operator> 
     <operator activated="true" class="generate_attributes" compatibility="7.0.000" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="380" y="340"> 
     <list key="function_descriptions"> 
      <parameter key="description" value="if ([average(categoryNumeric)]&gt;0.5, &quot;state&quot;,&quot;private&quot;)"/> 
     </list> 
     </operator> 
     <operator activated="true" class="join" compatibility="7.0.000" expanded="true" height="82" name="Join" width="90" x="514" y="238"> 
     <parameter key="join_type" value="left"/> 
     <parameter key="use_id_attribute_as_key" value="false"/> 
     <list key="key_attributes"> 
      <parameter key="cluster" value="cluster"/> 
     </list> 
     </operator> 
     <operator activated="true" class="jdbc_connectors:write_database" compatibility="7.0.000" expanded="true" height="68" name="Write Database" width="90" x="715" y="238"> 
     <parameter key="connection" value="LocalMYSQL"/> 
     <parameter key="schema_name" value="ascom"/> 
     <parameter key="table_name" value="joinresult"/> 
     </operator> 
     <connect from_op="Retrieve Iris" from_port="output" to_op="Clustering" to_port="example set"/> 
     <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/> 
     <connect from_op="Clustering" from_port="clustered set" to_op="Generate Attributes" to_port="example set input"/> 
     <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/> 
     <connect from_op="Aggregate" from_port="example set output" to_op="Generate Attributes (4)" to_port="example set input"/> 
     <connect from_op="Aggregate" from_port="original" to_op="Join" to_port="left"/> 
     <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Join" to_port="right"/> 
     <connect from_op="Join" from_port="join" to_op="Write Database" to_port="input"/> 
     <connect from_op="Write Database" from_port="through" to_port="result 2"/> 
     <portSpacing port="source_input 1" spacing="0"/> 
     <portSpacing port="sink_result 1" spacing="0"/> 
     <portSpacing port="sink_result 2" spacing="0"/> 
     <portSpacing port="sink_result 3" spacing="0"/> 
    </process> 
    </operator> 
</process>

的要点是

创建对应于category称为categoryNumeric它被设置为1，如果是categorystate否则为0的属性。
按聚类进行聚合，取平均值categoryNumeric。如果聚合值大于0.5，则表示大多数群集示例的category等于state。
根据大多数确定，在聚合结果中创建一个新属性，称为description。

每个集群现在都有附加数据，并且可以使用集群标识符作为关键字将其连接到原始数据。

写入到数据库中（我用的MySQL）

希望这有助于为一个开始。

来源

2016-05-16 12:06:38 awchisholm

如果基于Rapidminer聚类结果的说明

回答

相关问题