2012-04-16 97 views
1

我对Python有点熟悉。我有一个需要以特定方式阅读的信息文件。下面是一个例子...将文件内容读入数组

1 
6 
0.714285714286 
0 0 1.00000000000 
0 1 0.61356352337 
... 
-1 -1 0.00000000000 
0 0 5.13787636499 
0 1 0.97147643932 
... 
-1 -1 0.00000000000 
0 0 5.13787636499 
0 1 0.97147643932 
... 
-1 -1 0.00000000000 
0 0 0 0 5.13787636499 
0 0 0 1 0.97147643932 
.... 

所以每个文件都会有这样的结构(制表符分隔)。

  • 第一行必须作为变量以及第二行和第三行读入。
  • 接下来我们有四个由-1 -1 0.0000000000分开的代码块。每个代码块都是'n'行。前两个数字表示行中第三个数字要插入数组中的位置/位置。只列出了唯一的位置(因此,位置0 1将与1 0相同,但该信息不会显示)。
  • 注意:第四个代码块有一个4索引号。

我需要

  • 前3行中读入使用数字作为第一2(或4)列作为唯一变量
  • 每个数据块读入一个数组什么数组索引和第三列作为插入到数组中的值。
  • 只显示唯一的数组元素。我需要镜像位置填充适当的值(0 1值也应该出现在1 0)。
  • 最后一个块需要插入到一个4维数组中。
+4

你已经试过了什么? – yannis 2012-04-16 13:07:06

+0

我没有尝试过任何东西(缺乏Python的经验),因此我在SE上的帖子。 – LordStryker 2012-04-17 14:28:23

回答

3

我重写了代码。现在它几乎是你需要的。你只需要微调。

我决定离开旧的答案 - 也许这也会有帮助。 因为新功能够丰富,有时可能不清楚明白。

def the_function(filename): 
    """ 
    returns tuple of list of independent values and list of sparsed arrays as dicts 
    e.g. ([1,2,0.5], [{(0.0):1,(0,1):2},...]) 
    on fail prints the reason and returns None: 
    e.g. 'failed on text.txt: invalid literal for int() with base 10: '0.0', line: 5' 
    """ 

    # open file and read content 
    try: 
     with open(filename, "r") as f: 
      data_txt = [line.split() for line in f] 
    # no such file 
    except IOError, e: 
     print 'fail on open ' + str(e) 

    # try to get the first 3 variables 
    try: 
     vars =[int(data_txt[0][0]), int(data_txt[1][0]), float(data_txt[2][0])] 
    except ValueError,e: 
     print 'failed on '+filename+': '+str(e)+', somewhere on lines 1-3' 
     return 

    # now get arrays 
    arrays =[dict()] 
    for lineidx, item in enumerate(data_txt[3:]): 
     try: 
      # for 2d array data 
      if len(item) == 3: 
       i, j = map(int, item[:2]) 
       val = float(item[-1]) 
       # check for 'block separator' 
       if (i,j,val) == (-1,-1,0.0): 
        # make new array 
        arrays.append(dict()) 
       else: 
        # update last, existing 
        arrays[-1][(i,j)] = val 
      # almost the same for 4d array data 
      if len(item) == 5: 
       i, j, k, m = map(int, item[:4]) 
       val = float(item[-1]) 
       arrays[-1][(i,j,k,m)] = val 
     # if value is unparsable like '0.00' for int or 'text' 
     except ValueError,e: 
      print 'failed on '+filename+': '+str(e)+', line: '+str(lineidx+3) 
      return 
    return vars, arrays 
+0

不可思议。我正在调整这个代码。 – LordStryker 2012-04-20 16:43:16

+0

我可以将float映射到数组中的位置(i,j),但不能(j,i)。我尝试在'if/else语句中插入'array [-1] [(j,i)] = val',但是我的数组的大小根本不增加(21个元素,而不是所需的42)。有什么想法吗? – LordStryker 2012-04-23 15:36:51

+0

奇怪。这应该工作。你检查了错字吗? i = 0,j = 0的情况? – akaRem 2012-04-23 19:53:20

1

从文件中读取行迭代,你可以使用类似:

with open(filename, "r") as f: 
    var1 = int(f.next()) 
    var2 = int(f.next()) 
    var3 = float(f.next()) 
    for line in f: 
    do some stuff particular to the line we are on... 

只需创建环路以外的一些数据结构,并在上面的循环填充。为了字符串分割成元素,你可以使用:

>>> "spam ham".split() 
['spam', 'ham'] 

我也想你想看看在numpy库阵列数据结构,并尽可能SciPy库进行分析。

+3

更好地使用'与开放(文件名,“r”)作为f:'并把语句放在'与'块 – jamylak 2012-04-16 13:17:07

+0

编辑答案,我认为主要优点是'close'不需要被调用文件连接。 – 2012-04-16 13:20:01

+2

重复打开(文件名“r”)是否有编辑错误? – Levon 2012-04-16 13:24:27

2

正如我anderstand是什么?你问..

# read data from file into list 
parsed=[] 
with open(filename, "r") as f: 
    for line in f: 
     # # you can exclude separator here with such code (uncomment) (1) 
     # # be careful one zero more, one zero less and it wouldn work 
     # if line == '-1 -1 0.00000000000': 
     #  continue 
     parsed.append(line.split()) 

# a simpler version 
with open(filename, "r") as f: 
    # # you can exclude separator here with such code (uncomment, replace) (2) 
    # parsed = [line.split() for line in f if line != '-1 -1 0.00000000000'] 
    parsed = [line.split() for line in f] 

# at this point 'parsed' is a list of lists of strings. 
# [['1'],['6'],['0.714285714286'],['0', '0', '1.00000000000'],['0', '1', '0.61356352337'] .. ] 

# ALT 1 ------------------------------- 
# we do know the len of each data block 

# get the first 3 lines: 
head = parsed[:3] 

# get the body: 
body = parsed[3:-2] 

# get the last 2 lines: 
tail = parsed[-2:] 

# now you can do anything you want with your data 
# but remember to convert str to int or float 

# first3 as unique: 
unique0 = int(head[0][0]) 
unique1 = int(head[1][0]) 
unique2 = float(head[2][0]) 

# cast body: 
# check each item of body has 3 inner items 
is_correct = all(map(lambda item: len(item)==3, body)) 
# parse str and cast 
if is_correct: 
    for i, j, v in body: 
     # # you can exclude separator here (uncomment) (3) 
     # # * 1. is the same as float(1) 
     # if (i,j,v) == (0,0,1.): 
     #  # here we skip iteration for line w/ '-1 -1 0.0...' 
     #  # but you can place another code that will be executed 
     #  # at the point where block-termination lines appear 
     #  continue 

     some_body_cast_function(int(i), int(j), float(v)) 
else: 
    raise Exception('incorrect body') 


# cast tail 
# check each item of body has 5 inner items 
is_correct = all(map(lambda item: len(item)==5, tail)) 
# parse str and cast 
if is_correct: 
    for i, j, k, m, v in body: # 'l' is bad index, because similar to 1. 
     some_tail_cast_function(int(i), int(j), int(k), int(m), float(v)) 
else: 
    raise Exception('incorrect tail') 

# ALT 2 ----------------------------------- 
# we do NOT know the len of each data block 

# maybe we have some array? 
array = dict() # your array may be other type 

v1,v2,v2 = parsed[:3] 
unique0 = int(v1[0]) 
unique1 = int(v2[0]) 
unique2 = float(v3[0]) 

for item in parsed[3:]: 
    if len(item) == 3: 
     i,j,v = item 
     i = int(i) 
     j = int(j) 
     v = float(v) 

     # # yo can exclude separator here (uncomment) (4) 
     # # * 1. is the same as float(1) 
     # # logic is the same as in 3rd variant 
     # if (i,j,v) == (0,0,1.): 
     #  continue 

     # do your stuff 
     # for example, 
     array[(i,j)]=v 
     array[(j,i)]=v 

    elif len(item) ==5: 
     i, j, k, m, v = item 
     i = int(i) 
     j = int(j) 
     k = int(k) 
     m = int(m) 
     v = float(v) 

     # do your stuff 

    else: 
     raise Exception('unsupported') # or, maybe just 'pass' 
+0

这几乎正是我所需要的。我忘了明确提到'-1 -1 0.00000'行只是块终止行(当迭代到达-1的值时...结束当前数组并开始新行)。我想我可以调整你的例子来获得我需要的东西。当然,任何帮助总是受欢迎的。 – LordStryker 2012-04-17 14:27:48

+0

添加了一些代码插入(4种变体),您可以在其中排除“块终止行”或根据需要处理它们。希望你喜欢它! – akaRem 2012-04-18 10:30:22

+0

我很感谢你的继续帮助。我无法告诉程序在每次达到-1指示符时创建一个新数组,然后用下面的代码块填充该数组。现在它将具有长度为3个元素的所有代码块转储到一个数组中。 – LordStryker 2012-04-18 20:31:51