刚开始我的方式进入python,我无法绕过基本的文件导航方法。为什么file_object.tell()为不同位置的文件提供相同的字节?
当我阅读tell()
教程时,它指出它返回我当前坐在我的文件上的位置(以字节为单位)。
我的推理是该文件的每个字符将加起来的字节坐标,对不对?这意味着在一个新行后面,这个字符串只是在字符串上分割的一串字符,我的字节坐标会改变......但是这似乎是不正确的。
我产生巴蜀
$ for i in {1..10}; do echo "@ this is the "$i"th line" ; done > toy.txt
$ for i in {11..20}; do echo " this is the "$i"th line" ; done >> toy.txt
快速玩具文本文件,现在我将通过这个文件迭代并打印出的行号,并在每个周期中,tell()
调用的结果。 @
是为了标记划定文件块的一些行,我想返回(参见下文)。
我的猜测是,for循环遍历文件对象第一个,达到它的结束,因此它始终保持不变。
这是一个玩具的例子,对于我真正的问题,该文件的长度为Gigs,并且通过应用相同的方法,我得到了tell()
的结果,其中我反映了如何通过for循环遍历文件对象。 这是正确的吗?你能否谈谈我错过的概念?
我最终的目标是能够找到文件中的特定坐标,然后并行处理这些来自分布式起点的大文件,这些文件无法以我筛选的方式进行监视。
os.path.getsize("toy.txt")
451
fa = open("toy.txt")
fa.seek(0) # let's double check
fa.tell()
count = 0
for line in fa:
if line.startswith("@"):
print line ,
print "tell {} count {}".format(fa.tell(), count)
else:
if count < 32775:
print line,
print "tell {} count {}".format(fa.tell(), count)
count += 1
输出:
@ this is the 1th line
tell 451 count 0
@ this is the 2th line
tell 451 count 1
@ this is the 3th line
tell 451 count 2
@ this is the 4th line
tell 451 count 3
@ this is the 5th line
tell 451 count 4
@ this is the 6th line
tell 451 count 5
@ this is the 7th line
tell 451 count 6
@ this is the 8th line
tell 451 count 7
@ this is the 9th line
tell 451 count 8
@ this is the 10th line
tell 451 count 9
this is the 11th line
tell 451 count 10
this is the 12th line
tell 451 count 11
this is the 13th line
tell 451 count 12
this is the 14th line
tell 451 count 13
this is the 15th line
tell 451 count 14
this is the 16th line
tell 451 count 15
this is the 17th line
tell 451 count 16
this is the 18th line
tell 451 count 17
this is the 19th line
tell 451 count 18
this is the 20th line
tell 451 count 19