2013-06-28 149 views
6

我需要获取文件中前一行的值,并在迭代文件时将其与当前行进行比较。该文件是巨大的,所以我无法读取它整个或随机访问行号linecache,因为库函数仍然将整个文件读入内存。阅读文件中的上一行python

编辑我很抱歉,我忘了提及我必须向后读取文件。

EDIT2

我曾尝试以下:

f = open("filename", "r") 
for line in reversed(f.readlines()): # this doesn't work because there are too many lines to read into memory 

line = linecache.getline("filename", num_line) # this also doesn't work due to the same problem above. 
+1

你的意思是就在前面的那一行?你不能随便保存它吗? –

+2

如果您向我们展示了迄今为止所写的内容,您将更有可能获得帮助。 – That1Guy

+0

你能提供你所尝试过的吗?可以逐行循环遍历一个文件,并将该行分配给一个变量是可能的,那么究竟出了什么问题?顺便说一句,HUGE有多大? – ChrisP

回答

12

只需保存以前,当你遍历到下一个

prevLine = "" 
for line in file: 
    # do some work here 
    prevLine = line 

这将存储在prevLine前行,而你是循环

编辑显然OP需要向后读取这个文件:

aaand之后像一个小时的研究我多次在内存限制内做到这一点

Here你去林,那家伙知道自己在做什么,这里是他最好的主意:

General approach #2: Read the entire file, store position of lines

With this approach, you also read through the entire file once, but instead of storing the entire file (all the text) in memory, you only store the binary positions inside the file where each line started. You can store these positions in a similar data structure as the one storing the lines in the first approach.

Whever you want to read line X, you have to re-read the line from the file, starting at the position you stored for the start of that line.

Pros: Almost as easy to implement as the first approach Cons: can take a while to read large files

+0

非常感谢。但我忘了提及我必须向后读取文件。 –

+0

@LimH。我添加了代码以便向后循环:D – Stephan

+0

魔法。我是python的新手,虽然我知道文件是可迭代的,但使用[:: - 1]从来没有想过。谢谢。 –

2

我会写一个简单的发生器任务:

def pairwise(fname): 
    with open(fname) as fin: 
     prev = next(fin) 
     for line in fin: 
      yield prev,line 
      prev = line 

或者,你可以使用pairwise食谱from itertools

def pairwise(iterable): 
    "s -> (s0,s1), (s1,s2), (s2, s3), ..." 
    a, b = itertools.tee(iterable) 
    next(b, None) 
    return itertools.izip(a, b) 
4

@Lim,这里是我会怎么写(回复评论)

def do_stuff_with_two_lines(previous_line, current_line): 
    print "--------------" 
    print previous_line 
    print current_line 

my_file = open('my_file.txt', 'r') 

if my_file: 
    current_line = my_file.readline() 

for line in my_file: 

    previous_line = current_line 
    current_line = line 

    do_stuff_with_two_lines(previous_line, current_line) 
+0

谢谢你。我非常抱歉,但我忘了提及我必须向后读取文件。 –