2015-11-05 58 views
1

我试图用PCA降低数据集的维数。然后,根据某些标准(取决于从中获取数据点的文件名的编号),为每个数据点分配一个“类/类别”,并将所有数据点绘制为散点图,其中包含遗留的有关散点图的其他信息的问题

如同每个数据点的另一个列表我有一些附加信息存储,我希望每个数据点都可以选择,以便我可以读取终端中的信息。 在绘制我的散点图时 - 我假设因为我绘制了子集明智的 - 订单被搞乱了。 接收到的事件的标记不再适用于具有附加信息的阵列。

我试图在绘图时重新排列信息数组,但不知何故它仍然无法工作。这里是我的代码:

targets = [] 
trainNames = [] 

# React on to a click on a datapoint. 
def onPick(event): 
    indexes = event.ind 
    xy = event.artist.get_offsets() 
    for index in indexes: 
    print trainNames[index] 


# Load the additonal information for each datapoint. It's stored in the 
# same order as the datapoints in 'trainingfile.csv'. 
modelNamesFile = open("training-names.csv") 
for line in modelNamesFile: 

    # Save target for datapoint. It's the class of the object, seperated 
    # into "rectangular", "cylindrical", "irregular", dependend on the 
    # objects file number. 
    objnum = int(line.split(",")[-1].split("/")[-1].split(".")[0]) 
    if (objnum <= 16): 
    objnum = 0 
    elif (objnum >= 17 and objnum <= 34): 
    objnum = 1 
    else: 
    objnum = 2 
    targets.append(objnum) 

    # Save name description for datapoint. 
    sceneName = line.split(",")[0].split("/")[-1] 
    modelName = line.split(",")[-1].split("/")[-1].split(".")[0] 
    trainNames.append(sceneName + ", " + modelName) 


target_names = ["rectangular", "cylindrical", "irregular"] 


# Load the actual data. 
f = open("trainingfile.csv") 
tData = [] 
for line in f: 
    lsplit = line.split(",") 
    datapoint = [] 
    for feature in lsplit: 
    datapoint.append(float(feature)) 

    tData.append(datapoint) 
data = np.array(tData) 

# Transform it into 2D with PCA. 
y = np.array(targets) 
X = np.delete(data, data.shape[1] - 1, 1) # Strip class. 
pipeline = Pipeline([('scaling', StandardScaler()), ('pca', PCA(n_components=2))]) 
X_reduced = pipeline.fit_transform(data) 


# Create plot. 
trainNames = np.array(trainNames) 
tmpTrainNames = np.array([]) 
fig = plt.figure() 
for c, i, target_name in zip("rgb", [0, 1, 2], target_names): 
    plt.scatter(X_reduced[y == i, 0], X_reduced[y == i, 1], c=c, label=target_name, picker=True) 

    # Here i try to rearrange the order of the additonal information int he order the points 
    # were plotted. 
    tmpTrainNames = np.append(tmpTrainNames, trainNames[y == i]) 

trainNames = tmpTrainNames 

plt.legend() 
plt.xlabel('Feature 1') 
plt.ylabel('Feature 2') 
fig.canvas.mpl_connect('pick_event', onPick) 
plt.show() 

如果它太复杂,我可以尝试简化。就告诉我嘛。

回答

0

由于找不到索引问题的解决方案,我用不同的方法解决了这个问题。我没有分配类{0, 1, 2},然后与zip()进行映射,而是直接将颜色值分配为类,并将颜色参数作为整个类目标。通过这个,我可以一次绘制所有内容并保持数据点的原始顺序。

# y is here the class target with color values, e.g. ['r', 'g',..., 'r'] 
plt.scatter(X_reduced[:,0], X_reduced[:,1], c=y, picker=True) 
相关问题