下面的代码创建了一个功能Which_Line_for_Position(POS)号位置POS线的,即所述号线的在位于位于位置POS 字符在文件中。
该函数可以与任何位置一起作为参数使用,与函数调用之前文件指针的当前位置的值和该指针运动的历史无关。
因此,使用此函数,不仅限于在线路上不间断迭代期间确定当前线路的数量,这与Greg Hewgill的解决方案一样。
with open(filepath,'rb') as f:
GIVE_NO_FOR_END = {}
end = 0
for i,line in enumerate(f):
end += len(line)
GIVE_NO_FOR_END[end] = i
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = i+1
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
。
同样的解决方案可以与模块的帮助下写成的FileInput:
import fileinput
GIVE_NO_FOR_END = {}
end = 0
for line in fileinput.input(filepath,'rb'):
end += len(line)
GIVE_NO_FOR_END[end] = fileinput.filelineno()
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = fileinput.filelineno()+1
fileinput.close()
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
但这种解决方案有一些不便之处:
- 它需要导入模块的FileInput
- 它删除文件的所有内容!我的代码中一定有错,但我不知道文件输入足以找到它。或者它是fileinput.input()函数的正常行为?
- 似乎该文件是在任何迭代启动之前首先完全读取的。如果是这样,对于非常大的文件,文件的大小可能会超过RAM的容量。我不确定这一点:我试图用1.5 GB的文件进行测试,但这个时间很长,我暂时放弃了这一点。如果这一点是正确的,则构成使用另一种解决方案的论据枚举()
。
为例:
text = '''Harold Acton (1904–1994)
Gilbert Adair (born 1944)
Helen Adam (1909–1993)
Arthur Henry Adams (1872–1936)
Robert Adamson (1852–1902)
Fleur Adcock (born 1934)
Joseph Addison (1672–1719)
Mark Akenside (1721–1770)
James Alexander Allan (1889–1956)
Leslie Holdsworthy Allen (1879–1964)
William Allingham (1824/28-1889)
Kingsley Amis (1922–1995)
Ethel Anderson (1883–1958)
Bruce Andrews (born 1948)
Maya Angelou (born 1928)
Rae Armantrout (born 1947)
Simon Armitage (born 1963)
Matthew Arnold (1822–1888)
John Ashbery (born 1927)
Thomas Ashe (1836–1889)
Thea Astley (1925–2004)
Edwin Atherstone (1788–1872)'''
#with open('alao.txt','rb') as f:
f = text.splitlines(True)
# argument True in splitlines() makes the newlines kept
GIVE_NO_FOR_END = {}
end = 0
for i,line in enumerate(f):
end += len(line)
GIVE_NO_FOR_END[end] = i
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = i+1
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
print '\n'.join('line %-3s ending at position %s' % (str(GIVE_NO_FOR_END[end]),str(end))
for end in end_positions)
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
print
for x in (2,450,320,104,105,599,600):
print 'pos=%-6s line %s' % (x,Which_Line_for_Position(x))
结果
line 0 ending at position 25
line 1 ending at position 51
line 2 ending at position 74
line 3 ending at position 105
line 4 ending at position 132
line 5 ending at position 157
line 6 ending at position 184
line 7 ending at position 210
line 8 ending at position 244
line 9 ending at position 281
line 10 ending at position 314
line 11 ending at position 340
line 12 ending at position 367
line 13 ending at position 393
line 14 ending at position 418
line 15 ending at position 445
line 16 ending at position 472
line 17 ending at position 499
line 18 ending at position 524
line 19 ending at position 548
line 20 ending at position 572
line 21 ending at position 600
pos=2 line 0
pos=450 line 16
pos=320 line 11
pos=104 line 3
pos=105 line 4
pos=599 line 21
pos=600 line None
。
然后,将具有功能Which_Line_for_Position(),很容易获得当前行号:只是路过f.tell()作为参数传递给功能
但是警告:当使用f.tell(),做文件中的文件的指针的运动,这是绝对必要的文件是以二进制模式打开:“RB”或“RB +”或“AB”或...
+1,好简单的解决方案,因为它不仅需要'open'呼吁改变。您可能想为其他任何使用的函数提供包装(例如'close'),但它们应该是相当小的pass-thru函数。 – paxdiablo 2011-06-16 04:32:49
哦,对,'close'很方便,我会补充一点。 – 2011-06-16 04:34:30
这两种解决方案都很棒,太棒了! – 2011-06-16 04:36:50