你需要使用过滤器
package dataframe
import org.apache.spark.sql.SparkSession
/**
* @author [email protected]
*/
//
object DataFrameExample{
//
case class Employee(id: Integer, name: String, address: String, salary: Double, state: String,zip:Integer)
//
def main(args: Array[String]) {
val spark =
SparkSession.builder()
.appName("DataFrame-Basic")
.master("local[4]")
.getOrCreate()
import spark.implicits._
// create a sequence of case class objects
// (we defined the case class above)
val emp = Seq(
Employee(1, "vaquar khan", "111 algoinquin road chicago", 120000.00, "AZ",60173),
Employee(2, "Firdos Pasha", "1300 algoinquin road chicago", 2500000.00, "IL",50112),
Employee(3, "Zidan khan", "112 apt abcd timesqure NY", 50000.00, "NY",55490),
Employee(4, "Anwars khan", "washington dc", 120000.00, "VA",33245),
Employee(5, "Deepak sharma ", "rolling edows schumburg", 990090.00, "IL",60172),
Employee(6, "afaq khan", "saeed colony Bhopal", 1000000.00, "AZ",60173)
)
val employee=spark.sparkContext.parallelize(emp, 4).toDF()
employee.printSchema()
employee.show()
employee.select("state", "zip").show()
println("*** use filter() to choose rows")
employee.filter($"state".equalTo("IL")).show()
println("*** multi contidtion in filer || ")
employee.filter($"state".equalTo("IL") || $"state".equalTo("AZ")).show()
println("*** multi contidtion in filer && ")
employee.filter($"state".equalTo("AZ") && $"zip".equalTo("60173")).show()
}
}
通过这条线来看:'斯卡拉>从pyspark.sql。列导入列'看起来像你试图使用pyspark代码时,你真的y使用scala –
@TonyTorres是的,这是一个错误,我发现后,发布这个问题。现在进行编辑。 – dheee