我创建使用Apache 2.x的光束Python的&TypeCheckError:FlatMap和帕尔多必须返回一个迭代
一个谷歌的数据流管道基本上我有一个包含英文句子每个新行的文本文件。
我想为每个新行/句子调用Google NLP(Sentiments)API。
所以我有一个调用API NLP的函数:
class CalculateSentiments(beam.DoFn):
def process(self, element):
language_client = language.Client()
pre_text = re.sub('<[^>]*>', '', element)
text = re.sub(r'[^\w]', ' ', pre_text)
document = language_client.document_from_text(text)
sentiment = document.analyze_sentiment().sentiment
return sentiment.score
,我使用帕尔多来调用这个函数的每一句话。我假设,以下帕尔多将调用NLP情绪API从文本文件中的每一行自动(基本上,我没有通过每一行迭代在文本文件中!?)
output = lines | beam.ParDo(CalculateSentiments())
output | WriteToText(known_args.output)
,但我得到这个执行数据流后出错:
TypeCheckError: FlatMap and ParDo must return an iterable. was returned instead. [while running 'ParDo(CalculateSentiments)'] Traceback (most recent call last):
File "/Users/gsattanthan/.local/lib/python2.7/site-packages/apache_beam/runners/direct/executor.py", line 297, in call evaluator.process_element(value) File "/Users/gsattanthan/.local/lib/python2.7/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 366, in process_element self.runner.process(element) File "/Users/gsattanthan/.local/lib/python2.7/site-packages/apache_beam/runners/common.py", line 267, in process self.reraise_augmented(exn) File "/Users/gsattanthan/.local/lib/python2.7/site-packages/apache_beam/runners/common.py", line 263, in process self._dofn_simple_invoker(element) File "/Users/gsattanthan/.local/lib/python2.7/site-packages/apache_beam/runners/common.py", line 198, in _dofn_simple_invoker self._process_outputs(element, self.dofn_process(element.value)) File "/Users/gsattanthan/.local/lib/python2.7/site-packages/apache_beam/typehints/typecheck.py", line 60, in process return self.wrapper(self.dofn.process, args, kwargs) File "/Users/gsattanthan/.local/lib/python2.7/site-packages/apache_beam/typehints/typecheck.py", line 84, in wrapper return self._check_type(result) File "/Users/gsattanthan/.local/lib/python2.7/site-packages/apache_beam/typehints/typecheck.py", line 98, in _check_type % type(output))
我在做什么错?我使用Pardo的方式与Apache beam doco中显示的非常相似!
有什么想法?