2016-06-28 113 views
0

我试图执行提供的基本示例推断使用反射的架构 Apache SPARK文档的一部分。Apache SPARK与SQLContext :: IndexError

我对Cloudera的快速启动VM(CDH5)

我想执行的例子这样做是如下::

# sc is an existing SparkContext. 
from pyspark.sql import SQLContext, Row 
sqlContext = SQLContext(sc) 

# Load a text file and convert each line to a Row. 
lines = sc.textFile("/user/cloudera/analytics/book6_sample.csv") 
parts = lines.map(lambda l: l.split(",")) 
people = parts.map(lambda p: Row(name=p[0], age=int(p[1]))) 

# Infer the schema, and register the DataFrame as a table. 
schemaPeople = sqlContext.createDataFrame(people) 
schemaPeople.registerTempTable("people") 

# SQL can be run over DataFrames that have been registered as a table. 
teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19") 

# The results of SQL queries are RDDs and support all the normal RDD operations. 
teenNames = teenagers.map(lambda p: "Name: " + p.name) 
for teenName in teenNames.collect(): 
    print(teenName) 

我跑的代码完全如上面一样,但当我执行最后一个命令(for循环)时,总是收到错误“IndexError:列表索引超出范围”。

输入文件book6_sample可在 book6_sample.csv

我完全按照上面所示运行代码,但是当我执行最后一个命令(for循环)时,总是收到错误“IndexError:list index out of range”。

请指出我要出错的地方。

在此先感谢。

问候, 斯里兰卡

回答

0

你的文件有在最后一个空行,这是造成这个error.Open您的文本编辑器文件并删除该行希望将工作

+1

嘿萨钦,这没有工作在提出改变之后。谢谢 – Sri