-1
我训练创建推荐系统。我从网站获取数据http://grouplens.org/datasets/movielens/指数5688超出范围为0轴的大小为3706
import numpy as np
import pandas as pd
header = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('ml-1m/ratings.dat', sep='::', names=header)
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]
print ('Number of users = ' + str(n_users) + ' | Number of movies = ' + str(n_items))
用户数= 6040 |电影的数量= 3706
from sklearn import cross_validation as cv
train_data, test_data = cv.train_test_split(df, test_size=0.25)
,我尝试建立两个用户 - 项目矩阵,一个用于训练,而另一个用于测试
train_data_matrix = np.zeros((n_users, n_items))
for line in train_data.itertuples():
train_data_matrix[line[1]-1, line[2]-1] = line[3]
test_data_matrix = np.zeros((n_users, n_items))
for line in test_data.itertuples():
test_data_matrix[line[1]-1, line[2]-1] = line[3]
,我得到(全回溯)
IndexError Traceback (most recent call last)
<ipython-input-39-180dea01cdf8> in <module>()
2 train_data_matrix = np.zeros((n_users, n_items))
3 for line in train_data.itertuples():
----> 4 train_data_matrix[line[1]-1, line[2]-1] = line[3]
5
6 test_data_matrix = np.zeros((n_users, n_items))
IndexError: index 5688 is out of bounds for axis 0 with size 3706
有什么不对?
P.S.
train_data.head()
user_id item_id rating timestamp
483019 2968 2268 5 971107926
943582 5689 3615 3 963719230
116153 752 1147 5 975458000
103250 686 1704 5 975601762
235333 1425 3752 4 1023560349
PSS
for line in train_data.itertuples():
print (line)
Pandas(Index=483019, user_id=2968, item_id=2268, rating=5, timestamp=971107926)
Pandas(Index=943582, user_id=5689, item_id=3615, rating=3, timestamp=963719230)
Pandas(Index=116153, user_id=752, item_id=1147, rating=5, timestamp=975458000)
Pandas(Index=103250, user_id=686, item_id=1704, rating=5, timestamp=975601762)
train_data_matrix - 唯一值用户与电影的id的矩阵。 5689 - 这是用户的ID train_data.head() – Edward
我回答了我的问题 – Edward
但矩阵的行由行数,而不是用户ID索引。 – hpaulj