如何迭代Python中的defaultdict(list)? 是否有更好的方式在Python中拥有一个列表字典? 我试过正常iter(dict)
,但我得到了错误:如何迭代Python中的defaultdict(列表)?
>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "para.py", line 31, in print_doc
for para in iter(doc):
TypeError: iteration over non-sequence
主类:
import para
para.print_doc('./foo/bar/para-lines.txt')
的para.pyc:
# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Separator here refers to the paragraph seperator,
# the default separator is '\n'.
def __init__(self, filename, separator=None):
# Set separator if passed into object's parameter,
# else set default separator as '\n'
if separator is None:
def separator(line): return line == '\n'
elif not callable(separator):
raise TypeError, "separator argument must be callable"
self.separator = separator
# Reading lines from files into a dictionary of lists
self.doc = defaultdict(list)
paraIndex = 0
with open(filename) as readFile:
for line in readFile:
if line == separator:
paraIndex+=1
else:
self.doc[paraIndex].append(line)
# Prints out populated doc from txtfile
def print_doc(filename):
text = Paragraphs(filename)
for para in iter(text.doc):
for sent in text.doc[para]:
print "Para#%d, Sent#%d: %s" % (
para, text.doc[para].index(sent), sent)
的如的./foo/bar/para-lines.txt
看起来像这样:
This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.
This is the start of next para.
foo boo bar bar
this is the end.
主类的输出应该是这样的:
Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.
Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.
我是否有权说当我离开'for'循环时,'段落'会超出范围?我如何保留段落并继续在'itertools.groupby'循环之外访问它? – alvas 2011-12-27 16:48:25
不,名称'段落'不会超出范围。 Python并没有为''with''和'for'等块结构打开新的范围,只是为了函数。 – kindall 2011-12-27 16:59:26
段落每次在循环中被重新分配一个新值。如果你希望保留旧的段落,你可以在循环外定义一个'paragraph = []'列表,并且在循环中追加每个段落:'paragraphs.append(paragraph)'。 – unutbu 2011-12-27 17:00:17