2014-11-21 266 views
1

我有一个Training_list,它的列表列表例如从列表创建一个子列表

[[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'], 
[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'], 
[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'], 
... 
] 

我想根据最后一个属性将这个列表分成两个子列表。 第一个列表应包含所有的< 50k记录的under_50k列表清单,例如,

[[1,2,3,4,5,6,7,8,9,10,11,12,13], [1,2,3,4,5,6,7,8,9,10,11,12,13], ...] 

第二个列表应包含所有> 50k记录的over_50k列表列表,例如,

[[1,2,3,4,5,6,7,8,9,10,11,12,13], [1,2,3,4,5,6,7,8,9,10,11,12,13], ...] 

一旦两个列表创建然后我试图加起来每个索引列表 例如

[1,2,3,4,5,6,7,8,9,10,11,12,13] + [1,2,3,4,5,6,7,8,9,10,11,12,13] 
= [2,4,6,8,10,12,14,16,18,20,22,24,26] 

似乎可以得到清单的细分工作。

def sums_list(): 

    sums_list = [] 
    try: 
     for index in range(15): 
      sums_list.append(under_50k_list[index]+over_50k_list[index]) 
    except: 
     pass 
     return(sums_list) 

def under_over_lists(): 

    under_50k_list = [0]*14 
    under_50k_count = 0 
    over_50k_list = [0]*14 
    over_50k_count = 0 
    try: 
     for row in training_list: 
      if row[-1].lstrip() == '<=50K': 
       under_50k_list = sums_list(under_50k_list, row[:-1]) 
       under_50k_count += 1 
      else: 
       if row[-1].lstrip() == '>50K': 
        over_50k_list = sums_list(over_50k_list, row[:-1]) 
        over_50k_count += 1 
    except: 
     pass 
     print(under_50k_list) 
     return under_over_lists 
+0

任何帮助将不胜感激。谢谢 – saggart 2014-11-21 14:33:56

+0

您应该提供额外的标签,例如这是什么编程语言。 – user1438038 2014-11-21 14:35:46

+0

对不起,我是新来的堆栈溢出,它的python – saggart 2014-11-21 14:41:49

回答

0

您可以使用numpy如果list的每个子列表是同样大小的

>>> import numpy as np 
>>> llist=[[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'], [1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k']] 
>>> under_50k_list=[i[:-1] for i in llist if i[-1]=='<50k'] 
>>> under_50k_list 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]] 
>>> over_50k_list=[i[:-1] for i in llist if i[-1]=='>50k'] 
>>> over_50k_list 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]] 
>>> sum(np.array(under_50k_list)) 
array([ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39]) 
>>> under_50k_sum=sum(np.array(under_50k_list)) 
>>> over_50k_sum=sum(np.array(over_50k_list)) 
>>> under_50k_sum 
array([ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39]) 
>>> over_50k_sum 
array([ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39]) 
0

您应该使用append方法,让您的细分正常工作。我觉得它对动态大小的列表更友好。

over_50k = [] 
under_50k = [] 
for row in training_list: 
    if row[-1] == "<50k": 
     under_50k.append(row[:-1]) 
    elif row[-1] == ">50k": 
     over_50k.append(row[:-1]) 

现在,让您的资金:

over_50k_sum = [i for i in over_50k[0]] # initialize with the first one 
for i in range(1,len(over_50k)):   # skips the first one 
    for j in range(len(over_50k[i])): 
     over_50k_sum[j] += over_50k[i][j] 

under_50k_sum = [i for i in under_50k[0]] # initialize with the first one 
for i in range(1,len(under_50k)):   # skips the first one 
    for j in range(len(under_50k[i])): 
     under_50k_sum[j] += under_50k[i][j] 
+0

[37,0.7173543689320389,None,None,9,0.3351132686084142,0.05165857605177993,0.2942961165048544,0.8373381877022654,0.6119741100323625,0,045,None,' <= 50K'],[46,0.7173543689320389,无,无,13,0.03673139158576052,0.113989838187702265,0.3013349514563107,0.8373381877022654,0.38802588996763754,0,0,25,无,'= 50K'],[44,0.7173543689320389,无,无,9,0.1610032362459547,0.12823624595469255,0.3013349514563107,0.8373381877022654,0.6119741100323625,0,0,40,None,'> 50K'], – saggart 2014-11-21 17:57:39

+0

嗨,乔,我试过你的代码; over_50k = [] under_50k = [] 为行中training_list: 打印( “U”) 如果行[-1] .lstrip()== '<= 50K': 打印( “是”) under_50k.append(row [: - 1]) print(“b”) else: if row [-1] .lstrip()=='> 50K': over_50k.append(row [: - 1] ) – saggart 2014-11-21 17:58:33

+0

它看起来不会创建over&under列表,我尝试在下面添加print语句 - training_list中的行,如果row [-1] .lstrip()=='<= 50K'也在下面:&if row [-1 ] .lstrip()=='> 50K':他们都打印出来,不知道是什么问题,我包含在我的实际培训列表 – saggart 2014-11-21 18:02:15

0

假设你只想要的结果,而不是中间的列表,它只是:

ll = [[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'], ... ] 

sumlist = lambda a,b:[x+y for x,y in zip(a,b)] 
def sum_if(lists, key): 
    return reduce(sumlist, (l[:-1] for l in lists if l[-1]==key)) 

under_50k_count = sum_if(ll, '<50k') 
over_50k_count = sum_if(ll, '>50k') 

这可能是值得导入ìzip from itertools和如果您的列表很长并且想要减少复制,那么使用该代替zip,但它肯定不是必需的。

0

由于单行的乐趣:

ll = [[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'], ...] 
under_50k_count = [sum(b) for b in zip(*[a[:-1] for a in ll if a[-1].startswith('<')])] 
over_50k_count = [sum(b) for b in zip(*[a[:-1] for a in ll if a[-1].startswith('>')])] 

这本身是不是一个有用的堆栈溢出的答案,落入类别“试试这个”没有解释,让我们打破它一点。

我们用列表解析分离出名单的2个不同的种类:

[a[:-1] for a in ll if a[-1].startswith('<')] 

我们再unpack这个名单并把它传递给压缩,这为我们提供了一个元组的列表:

[(1,1,1,...), (2,2,2,...), ...] 

然后,我们使用另一个列表理解来对这些元组进行求和。

列表解析明显很快,解包列表不是(尤其是大列表)。所以,虽然压缩这样的东西很有趣,但如果速度有任何问题,或者如果任何人继承你的代码甚至有远程可能性,我都不会推荐使用它。