2013-11-27 105 views
2

我们有以下在我们的MySQL两个表导入多值字段到Solr的从MySQL:使用Solr的数据导入处理程序

mysql> describe comment; 
+--------------+--------------+------+-----+---------+-------+ 
| Field  | Type   | Null | Key | Default | Extra | 
+--------------+--------------+------+-----+---------+-------+ 
| id   | int(11)  | YES |  | NULL |  | 
| blogpost_id | int(11)  | YES |  | NULL |  | 
| comment_text | varchar(256) | YES |  | NULL |  | 
+--------------+--------------+------+-----+---------+-------+ 

mysql> describe comment_tags; 
+------------+-------------+------+-----+---------+-------+ 
| Field  | Type  | Null | Key | Default | Extra | 
+------------+-------------+------+-----+---------+-------+ 
| comment_id | int(11)  | YES |  | NULL |  | 
| tag  | varchar(80) | YES |  | NULL |  | 
+------------+-------------+------+-----+---------+-------+ 

如果每个注释可以有多个标签。我们可以使用数据导入处理程序将整个注释导入Solr。不过,我不确定如何将每个注释的标签导入为每个注释文档定义schema.xml的多值字段。

请指教。由于

回答

4

尝试是这样的:

<dataConfig> 
    <!-- dataSource is just an example. Included just for completeness. --> 
    <dataSource batchSize="500" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/my-database" user="root" password="somethinglong1283"/> 
<document> 
    <entity name="comment" pk="id" query="SELECT * FROM comment"> 
     <field column="blogpost_id" name="blogpost_id"/> 
     <field column="comment_text" name="comment_text" /> 
     <entity name="comment_tags" pk="comment_id" query="SELECT * FROM comment_tags WHERE comment_id='${comment.id}'"> 
      <field column="tag" name="tag" /> 
     </entity> 
    </entity> 
</document> 

+0

向下滚动[Solr的维基中的完整示例]位(http://wiki.apache.org/solr/DataImportHandler #Full_Import_Example)还有一个子实体的例子。 – cheffe

+1

'$ {comment.id}'只是$ {comment.id},没有 - >' –

10

您还可以使用GROUP_CONCAT用分隔符(如 “”),然后尝试这样:

<dataConfig> 
<!-- dataSource is just an example. Included just for completeness. --> 
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db" user="root" password="root"/> 
    <document> 
    <entity name="comment" pk="id" query="SELECT *, group_concat(tags) as comment_tags FROM comment" transformer="RegexTransformer"> 
     <field column="blogpost_id" name="blogpost_id"/> 
     <field column="comment_text" name="comment_text" /> 
     <field column="tag" name="comment_tags" splitBy = "," />  
    </entity> 
    </document>  
</dataConfig> 

它会提高性能,并且会删除另一个查询的依赖项。

+0

查询应该是'SELECT comment。*,tags FROM(SELECT comment_id,GROUP_CONCAT(tag)AS tags FROM comment_tags GROUP BY标签)AS标签ON comment.id = ctag.comment_id' –

2

如果其他解决方案不起作用,那么试试这个。在schema.xml中

<field name="tag" type="string" indexed="true" stored="true" multiValued="true"/> 

如果你想使用自定义分隔在MySQL的然后下面一个使用

<dataConfig> 
<!-- dataSource is just an example. Included just for completeness. --> 
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db" user="root" password="root"/> 
    <document> 
    <entity name="comment" pk="id" query="SELECT *, group_concat(tags) as tag FROM comment" transformer="RegexTransformer"> 
     <field column="blogpost_id" name="blogpost_id"/> 
     <field column="comment_text" name="comment_text" /> 
     <field column="tag" splitBy="," sourceColName="tag"/>  
    </entity> 
    </document>  
</dataConfig> 

添加字段。

GROUP_CONCAT(tags SEPARATOR '~,~') AS tags 

如果你想在DISTINCT CONCAT标签然后

GROUP_CONCAT(DISTINCT tags SEPARATOR '~,~') AS tags 
相关问题