2016-09-01 26 views
1

我的最终目标是创建一个带d3的Force-Directed graph,它显示在我的应用程序中使用某些功能的用户群。要做到这一点,我需要建立一套具有以下格式(从上面的链接所)“链接”:如何根据另一列的值获取两列组合的所有排列列表?

{"source": "Napoleon", "target": "Myriel", "value": 1} 

要到这一步,虽然,我开始与大熊猫数据框,看起来像这个。如何为每个USER_ID生成APP_NAME/FEAT_ID组合的排列列表?

 APP_NAME  FEAT_ID USER_ID CNT 
280  app1   feature1 user1 114 
2622 app2   feature2 user1 8 
1698 app2   feature3 user1 15 
184  app3   feature4 user1 157 
2879 app2   feature5 user1 7 
3579 app2   feature6 user1 5 
232  app2   feature7 user1 136 
295  app2   feature8 user1 111 
2620 app2   feature9 user1 8 
2047 app3   feature10 user2 11 
3395 app2   feature2 user2 5 
3044 app2   feature11 user2 6 
3400 app2   feature12 user2 5 

预期结果:

基于以上数据帧,我期望user1user2生成以下排列

user1: 
    app1-feature1 -> app2-feature2, app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app2-feature2 -> app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app2-feature3 -> app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app3-feature4 -> app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app2-feature5 -> app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app2-feature6 -> app2-feature7, app2-feature8, app2-feature9 
    app2-feature7 -> app2-feature8, app2-feature9 
    app2-feature8 -> app2-feature9 

user2: 
    app3-feature10 -> app2-feature2, app2-feature11, app2-feature12 
    app2-feature2 -> app2-feature11, app2-feature12 
    app2-feature11 -> app2-feature12 

从此,我期望能够生成D3的预期输入,看起来像user2

{"source": "app3-feature10", "target": "app2-feature2"} 
{"source": "app3-feature10", "target": "app2-feature11"} 
{"source": "app3-feature10", "target": "app2-feature12"} 
{"source": "app2-feature2", "target": "app2-feature11"} 
{"source": "app2-feature2", "target": "app2-feature12"} 
{"source": "app2-feature11", "target": "app2-feature12"} 

怎样才能在我的数据帧每个USER_IDAPP_NAME/FEAT_ID组合排列的列表?

回答

1

我想看看做一些元组出你的数据框,然后使用类似itertools.permutations东西创造所有的排列,然后从那里,手艺你的字典,因为你需要:

import itertools 

allUserPermutations = {} 

groupedByUser = df.groupby('USER_ID') 
for k, g in groupedByUser: 

    requisiteColumns = g[['APP_NAME', 'FEAT_ID']] 

    # tuples out of dataframe rows 
    userCombos = [tuple(x) for x in requisiteColumns.values] 

    # this is a generator obj 
    userPermutations = itertools.permutations(userCombos, 2) 

    # create a list of specified dicts for the current user 
    userPermutations = [{'source': s, 'target': tar for s, tar in userPermutations] 

    # store the current users specified dicts 
    allUserPermutations[k] = userPermutations 

如果排列唐不会返回所需的行为,您可以尝试一些其他组合发生器found here。希望这种策略有效(目前我没有支持熊猫的REPL来测试它)。祝你好运!

相关问题