如何迭代Python中的defaultdict（列表）？

如何迭代Python中的defaultdict（list）？是否有更好的方式在Python中拥有一个列表字典？我试过正常iter(dict)，但我得到了错误：如何迭代Python中的defaultdict（列表）？

>>> import para 
>>> para.print_doc('./sentseg_en/essentials.txt') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "para.py", line 31, in print_doc 
    for para in iter(doc): 
TypeError: iteration over non-sequence

主类：

import para 
para.print_doc('./foo/bar/para-lines.txt')

的para.pyc：

# -*- coding: utf-8 -*- 
## Modified paragraph into a defaultdict(list) structure 
## Original code from http://code.activestate.com/recipes/66063/ 
from collections import defaultdict 
class Paragraphs: 
    import sys 
    reload(sys) 
    sys.setdefaultencoding('utf-8') 
    # Separator here refers to the paragraph seperator, 
    # the default separator is '\n'. 
    def __init__(self, filename, separator=None): 
     # Set separator if passed into object's parameter, 
     # else set default separator as '\n' 
     if separator is None: 
      def separator(line): return line == '\n' 
     elif not callable(separator): 
      raise TypeError, "separator argument must be callable" 
     self.separator = separator 
     # Reading lines from files into a dictionary of lists 
     self.doc = defaultdict(list) 
     paraIndex = 0 
     with open(filename) as readFile: 
      for line in readFile: 
       if line == separator: 
        paraIndex+=1 
       else: 
        self.doc[paraIndex].append(line) 

# Prints out populated doc from txtfile 
def print_doc(filename): 
    text = Paragraphs(filename) 
    for para in iter(text.doc): 
     for sent in text.doc[para]: 
      print "Para#%d, Sent#%d: %s" % (
       para, text.doc[para].index(sent), sent)

的如的./foo/bar/para-lines.txt看起来像这样：

This is a start of a paragraph. 
foo barr 
bar foo 
foo foo 
This is the end. 

This is the start of next para. 
foo boo bar bar 
this is the end.

主类的输出应该是这样的：

Para#1,Sent#1: This is a start of a paragraph. 
Para#1,Sent#2: foo barr 
Para#1,Sent#3: bar foo 
Para#1,Sent#4: foo foo 
Para#1,Sent#5: This is the end. 

Para#2,Sent#1: This is the start of next para. 
Para#2,Sent#2: foo boo bar bar 
Para#2,Sent#3: this is the end.

来源

2011-12-27 alvas

你链接到的食谱是相当老。它是在2001年编写的，Python有更多的现代工具，如itertools.groupby（在Python2.4中引入，released in late 2003）。这里是你的代码可能看起来像什么用groupby：

import itertools 
import sys 

with open('para-lines.txt', 'r') as f: 
    paranum = 0 
    for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'): 
     if is_separator: 
      # we've reached paragraph separator 
      print 
     else: 
      paranum += 1 
      for n, sentence in enumerate(paragraph, start = 1): 
       sys.stdout.write(
        'Para#{i:d},Sent#{n:d}: {s}'.format(
         i = paranum, n = n, s = sentence))

来源

2011-12-27 16:25:12 unutbu

我是否有权说当我离开'for'循环时，'段落'会超出范围？我如何保留段落并继续在'itertools.groupby'循环之外访问它？ – alvas 2011-12-27 16:48:25

不，名称'段落'不会超出范围。 Python并没有为''with''和'for'等块结构打开新的范围，只是为了函数。 – kindall 2011-12-27 16:59:26

段落每次在循环中被重新分配一个新值。如果你希望保留旧的段落，你可以在循环外定义一个'paragraph = []'列表，并且在循环中追加每个段落：'paragraphs.append（paragraph）'。 – unutbu 2011-12-27 17:00:17

的问题似乎是，你遍历你Paragraphs类，而不是字典。此外，而不是遍历键，然后访问该字典条目，可以考虑使用

for (key, value) in d.items():

来源

2011-12-27 16:02:52 Nicolas78

它的失败，因为你没有在你的段落类中定义__iter__()，然后尝试调用iter(doc)做（其中文档是一个段落实例）。

要迭代一个类必须有__iter__()它返回迭代器。 Docs here。

来源

2011-12-27 16:04:14 soulcheck

你有行

for para in iter(doc):

的问题是，doc是段落的一个实例，而不是一个defaultdict。您在__init__方法中使用的默认字典超出了范围并丢失。所以，你需要做两两件事：

保存在__init__方法，实例变量创建doc（self.doc，例如）。
要么Paragraphs本身可迭代（通过添加__iter__方法），要么允许它访问创建的doc对象。

来源

2011-12-27 16:06:11

我试图节省'self.doc = defaultdict（名单）'和'self.doc [paraIndex]的'doc'和'self.doc'。追加（线）'。但是同样的超出范围问题发生。 – alvas 2011-12-27 16:50:09

@ 2er0：它在范围内，但是作为'doc。doc'（这意味着还有一个命名问题 - 你应该在'print_doc'中使用'paragraph'而不是'doc'）。 – 2011-12-27 17:26:09

是的，谢谢你注意命名问题，在迭代过程中发生了一些小的改动之后。但让我看看我能否将'self.doc'解决方案与unutbu的循环解决方案结合起来。 – alvas 2011-12-27 18:28:23

我想不出为什么你在这里使用字典，更不用说defaultdict了。列表清单会简单得多。

doc = [] 
with open(filename) as readFile: 
    para = [] 
    for line in readFile: 
     if line == separator: 
      doc.append(para) 
      para = [] 
     else: 
      para.append(line) 
    doc.append(para)

来源

2011-12-27 16:09:42

这是因为我的txt文件将是一个很大的txtfile，所以通过嵌套列表访问需要花费很多时间。也许我会需要一本字典词典。如果我想要字典字典，我该怎么办？ – alvas 2011-12-27 16:51:56

这是怎么回事？你为什么认为一个嵌套列表需要的时间比一个字典的时间要长？ – 2011-12-27 18:08:52

如何迭代Python中的defaultdict（列表）？

回答

相关问题