2014-02-26 35 views
1

我想推荐一个用户,一个当前用户可以添加为朋友的用户列表。基于Cassandra的Mahout用户朋友推荐

我正在使用Cassandra和mahout。 mahout集成包​​中已经有CassandraDataModel的实现。我想要使​​用这个类。

所以,我的建议-ER类看起来像如下

public class UserFriendsRecommender { 

@Inject 
private CassandraDataModel dataModel; 

public List<RecommendedItem> recommend(Long userId, int number) throws TasteException{ 
    UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel); 
    // Optional: 
    userSimilarity.setPreferenceInferrer(new AveragingPreferenceInferrer(dataModel)); 

    UserNeighborhood neighborhood = 
       new NearestNUserNeighborhood(3, userSimilarity, dataModel); 
    Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, userSimilarity); 
    Recommender cachingRecommender = new CachingRecommender(recommender); 
    List<RecommendedItem> recommendations = cachingRecommender.recommend(userId, number); 
    return recommendations; 
} 

}

CassandraDataModel有4列的familys

static final String USERS_CF = "users"; 
    static final String ITEMS_CF = "items"; 
    static final String USER_IDS_CF = "userIDs"; 
    static final String ITEM_IDS_CF = "itemIDs"; 

我有一个很难理解这个类尤其是列族的。有没有我可以找到的任何例子,或者如果有人能解释,将会是一个很好的例子。

的javadoc说,这

* <p> 
* First, it uses a column family called "users". This is keyed by the user ID 
* as an 8-byte long. It contains a column for every preference the user 
* expresses. The column name is item ID, again as an 8-byte long, and value is 
* a floating point value represnted as an IEEE 32-bit floating poitn value. 
* </p> 
* 
* <p> 
* It uses an analogous column family called "items" for the same data, but 
* keyed by item ID rather than user ID. In this column family, column names are 
* user IDs instead. 
* </p> 
* 
* <p> 
* It uses a column family called "userIDs" as well, with an identical schema. 
* It has one row under key 0. It contains a column for every user ID in the 
* model. It has no values. 
* </p> 
* 
* <p> 
* Finally it also uses an analogous column family "itemIDs" containing item 
* IDs. 
* </p> 

回答

2

以下所有有关所需的列族由CassandraDataMdoel说明应在卡桑德拉-CLI创建(推荐或其他名称)的密钥空间下进行。

1:表用户

userID是行密钥,每个条目标识号具有一个单独的列名,值是所述偏好:

CREATE COLUMN FAMILY users 
WITH comparator = LongType 
AND key_validation_class=LongType 
AND default_validation_class=FloatType; 

插入值:

set users[0][0]='1.0'; 
set users[1][0]='3.0'; 
set users[2][2]='1.0'; 

2 :表项目

itemID是行键,每个userID具有单独的列名称,并且值是所述偏好:

CREATE COLUMN FAMILY items 
WITH comparator = LongType 
AND key_validation_class=LongType 
AND default_validation_class=FloatType; 

插入值:

set items[0][0]='1.0'; 
set items[0][1]='3.0'; 
set items[2][2]='1.0'; 

3:表用户ID

该表只是具有一排,但许多列,即,每个用户ID具有一个单独的列:

CREATE COLUMN FAMILY userIDs 
WITH comparator = LongType 
AND key_validation_class=LongType; 

插入值:

set userIDs[0][0]=''; 
set userIDs[0][1]=''; 
set userIDs[0][2]=''; 

4:表itemIDs:

此表只是有一列,但很多列,即每个条目标识号具有一个单独的列:

CREATE COLUMN FAMILY itemIDs 
WITH comparator = LongType 
AND key_validation_class=LongType; 

插入值:

set itemIDs[0][0]=''; 
set itemIDs[0][1]=''; 
set itemIDs[0][2]=''; 
0

为了补充上面的答案,对于Cassandra 2.0,新的语法如下,因为cli已被弃用。

表用户:

CREATE TABLE用户(用户ID BIGINT,ITEMID BIGINT,值浮子,PRIMARY KEY(用户ID,项ID));

表项目:

CREATE TABLE项目(ITEMID BIGINT,用户ID BIGINT,值浮子,PRIMARY KEY(ITEMID,用户ID));

表用户ID:

CREATE TABLE用户ID(ID BIGINT,用户ID BIGINT PRIMARY KEY(ID,用户ID));

表itemIDs:

CREATE TABLE itemIDs(ID BIGINT,ITEMID BIGINT PRIMARY KEY(ID,项ID));