将样品输入文件(实际输入文件包含大约50,000个条目):要根据条件形成群集?
615 146
615 180
615 53
615 42
615 52
615 52
615 51
615 45
615 49
616 34
616 44
616 42
616 41
616 42
617 42
617 43
617 42
685 33
685 33
685 33
686 33
686 33
687 47
687 68
737 449
737 41
737 1138
738 46
738 53
我必须在列中的每个值与相同的值等615615615比较必须被分组在一起群集必须包含像146180 COLUMN1值.. ...... 45,49则群集必须打破&形式的另一个群集为下一组相同的值616616616 ..........的等
我写的代码是:
from __future__ import division
from sys import exit
h = 0
historyjobs = []
targetjobs = []
def quickzh(zhlistsub,
targetjobs=targetjobs,num=0,denom=0):
li = [] ; ji = []
j = 0
for i in zhlistsub:
x1 = targetjobs[j][0]
x = targetjobs[i][0]
num += x
denom += 1
if x1 >= 0.9 * (num/denom):#to group all items with same value in column 0
li.append(targetjobs[i][1])
else:
break
return li
def filewr(listli):
global h
s = open("newout1","a")
if(len(listli) != 0):
h += 1
s.write("cluster: %d"%h)
s.write("\n")
s.write(str(listli))
s.write("\n\n")
else:
print "0"
def new(inputfile,
historyjobs=historyjobs,targetjobs=targetjobs):
zhlistsub = [];zhlist = []
k = 0
with open(inputfile,'r') as f:
for line in f:
job = map(int,line.split())
targetjobs.append(job)
while True:
if len(targetjobs) != 0:
zhlistsub = [i for i, element in enumerate(targetjobs)]
if zhlistsub:
listrun = quickzh(zhlistsub)
filewr(listrun)
historyjobs.append(targetjobs.pop(0))
k += 1
else:
break
new('newfinal1')
输出,我得到的是:
cluster: 1
[146, 180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]
cluster: 2
[180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]
cluster: 3
[53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]
..................so on
但是,我需要输出为:
cluster: 1
[146, 180, 53, 42, 52, 52, 51, 45, 49]
cluster: 2
[34, 44, 42, 41, 42]
cluster: 3
[42, 43, 42]
_____________________ so on
所以任何人都可以建议我应该做哪些改变来调节,以获得所需的结果。它是真的有用吗?
我有一个真正艰难的时间,了解你需要什么...但通常对于分组,'itertools.groupby'或者'collections.defaultdict'是要走的路... – mgilson