python统计分析

给定15名球员 - 2名守门员，5名后卫，5名中场球员和3名前锋，以及每个球员都有一定的价值和得分的事实，我想计算出我拥有的最高得分球队。每个团队必须由1个GK组成，然后形成例如4：4：2，4：3：3等我开始与像这样的样本数据python统计分析

玩家角色分成本

我那么做了以下评价所有组合

阅读每一行成列表（针对每个角色），然后使用itertools嵌套运行得到所有组合

if line[1] == "G": G.append(line[0]) 
if line[1] == "D": D.append(line[0]) 
if line[1] == "M": M.append(line[0]) 
if line[1] == "S": S.append(line[0]) 

for gk in itertools.combinations(G,1): 
    for de in itertools.combinations(D,4): 
     for mi in itertools.combinations(M,4): 
      for st in itertools.combinations(S,2): 
       teams[str(count)]= " ".join(gk)+" "+" ".join(de)+" "+" ".join(mi)+" "+" ".join(st) 
       count +=1

已经得到了球队，我计算出它们的分值，和团队的成本。如果它低于阈值，我会打印它。
但是如果我现在让这20个守门员，150个后卫，150个中场球员和100个前锋，我理解的是失去记忆。
我可以做些什么来执行此分析？它是一个生成器而不是我需要的递归函数吗？

非常感谢

来源

2010-10-12 user317225

您可能可以通过递归来解决这个问题。下面显示了基本轮廓，但是忽略了一些细节，比如一个团队由一定数量的特定类型的球员组成。

players=[{'name':'A','score':5,'cost':10}, 
     {'name':'B','score':10,'cost':3}, 
     {'name':'C','score':6,'cost':8}] 

def player_cost(player): 
    return player['cost'] 
def player_score(player): 
    return player['score'] 
def total_score(players): 
    return sum(player['score'] for player in players) 

def finance_team_recurse(budget, available_players): 
    affordable_players=[] 
    for player in available_players: 
     if player_cost(player)<=budget: 
      # Since we've ordered available players, the first player appended 
      # will be the one with the highest score. 
      affordable_players.append(player) 
    result=[] 
    if affordable_players: 
     candidate_player=affordable_players[0] 
     other_players=affordable_players[1:] 
     # if you include candidate_player on your team 
     team_with_candidate=finance_team_recurse(budget-player_cost(candidate_player), 
               other_players) 
     team_with_candidate.append(candidate_player) 
     score_of_team_with_candidate=total_score(team_with_candidate) 
     if score_of_team_with_candidate>total_score(other_players): 
      result=team_with_candidate 
     else: 
      # if you exclude candidate_player from your team 
      team_without_candidate=finance_team_recurse(budget, other_players) 
      score_of_team_without_candidate=total_score(team_without_candidate) 
      if score_of_team_with_candidate>score_of_team_without_candidate: 
       result=team_with_candidate 
      else: 
       result=team_without_candidate 
    return result 

def finance_team(budget, available_players): 
    tmp=available_players[:] 
    # Sort so player with highest score is first. (Greedy algorithm?) 
    tmp.sort(key=player_score, reverse=True) 
    return finance_team_recurse(budget,tmp) 

print(finance_team(20,players)) 
# [{'score': 6, 'cost': 8, 'name': 'C'}, {'score': 10, 'cost': 3, 'name': 'B'}]

20 choose 1 = 20 combinations 
150 choose 4 = 20260275 combinations 
100 choose 2 = 4950 combinations

因此，有总共在teams字典20 * 20260275 * 20260275 * 4950 = 40637395564486875000L 项目。这需要很多内存。

for gk in itertools.combinations(G,1): 
    for de in itertools.combinations(D,4): 
     for mi in itertools.combinations(M,4): 
      for st in itertools.combinations(S,2):  
       #Don't collect the results into a dict. 
       #That's what's killing you (memory-wise). 
       #Just compute the cost and 
       #Just print the result here.

PS。 40637395564486875000L的订单是10**19。假设你的程序可以每秒处理10**6组合，这将需要大约1.3百万年的程序来完成...

来源

2010-10-12 18:56:41 unutbu

我一千年，电脑会更快！ – florin 2010-10-12 19:13:09

+1：正确使用combinatorics。这是可计算性的一个教科书示例** O **复杂性和不能做什么。辉煌。想要多花点时间。 – 2010-10-12 19:35:56

行不行好？ – user317225 2010-10-12 19:41:44

功能和发电机有很大的帮助：

def make_teams(G, D, M, S): 
    """ returns all possible teams """ 
    for gk in itertools.combinations(G,1): 
     for de in itertools.combinations(D,4): 
      for mi in itertools.combinations(M,4): 
       for st in itertools.combinations(S,2): 
        yield gk, de, mi, st 

def get_cost(team): 
    return sum(member.cost for member in team) 

def good_teams(min_score=0): 
    for team in make_teams(G, D, M, S): 
     if get_cost(team) > min_score: 
      yield team 

for team in good_teams(min_score=100): 
    print team

它仍然产生所有可能的组合，所以你现在可能会用完时间，而不是记忆。

你在做什么好像knapsack problem的变化 - 你可以做的比尝试所有可能的组合更好，但不是更好。

快速获得良好解决方案的一种方法是按照每的分数排序玩家。你应该首先得到最高得分的球队，但是不能保证你得到最好的解决方案。维基百科称这为“贪婪近似算法”。

def score_per_cost(player): return player.score/player.cost def sorted_combinations(seq, n): return itertools.combinations( sorted(seq, key=score_per_cost, reverse=True),n) def make_teams(G, D, M, S): """ returns all possible teams """ for gk in sorted_combinations(G,1): for de in sorted_combinations(D,4): for mi in sorted_combinations(M,4): for st in sorted_combinations(S,2): yield gk, de, mi, st def get_cost(team): return sum(member.cost for member in team) def top_teams(n): return itertools.islice(make_teams(G, D, M, S),n) for team in top_teams(100): print team

我会离开加入要求“每队<门槛费”给读者（提示：这是在make_teams一行：P）。

来源

2010-10-12 19:03:44

当我看着背包问题时，我几乎感到不适！ – user317225 2010-10-12 19:43:58

python统计分析

回答

相关问题