2015-08-28 91 views
3

我正在使用Apache Spark来分析查询日志。我已经遇到了设置火花的一些困难。现在我正在使用独立群集来处理查询。Apache Spark Mysql连接没有找到合适的jdbc驱动

首先我使用java中的示例代码来计算正常工作的单词。但是当我尝试将它连接到MySQL服务器问题出现时。我使用的是Ubuntu 14.04 LTS,64位。 Spark版本1.4.1,Mysql 5.1。

这是我的代码,当我使用Master Url而不是[Local *]时出现错误找不到合适的驱动程序。我已经包含了日志。

import java.io.Serializable; 
import java.util.HashMap; 
import java.util.List; 
import java.util.Map; 

import org.apache.spark.SparkConf; 
import org.apache.spark.api.java.JavaSparkContext; 
import org.apache.spark.sql.DataFrame; 
import org.apache.spark.sql.Row; 
import org.apache.spark.sql.SQLContext; 

public class LoadFromDb implements Serializable { 

    private static final org.apache.log4j.Logger LOGGER = org.apache.log4j.Logger.getLogger(LoadFromDb.class); 

    private static final String MYSQL_DRIVER = "com.mysql.jdbc.Driver"; 
    private static final String MYSQL_USERNAME = "spark"; 
    private static final String MYSQL_PWD = "spark123"; 
    private static final String MYSQL_CONNECTION_URL = 
      "jdbc:mysql://localhost/productsearch_userinfo?user=" + MYSQL_USERNAME + "&password=" + MYSQL_PWD; 

    private static final JavaSparkContext sc = 
      new JavaSparkContext(new SparkConf().setAppName("SparkJdbcDs").setMaster("spark://shawon-H67MA-USB3-B3:7077")); 

    private static final SQLContext sqlContext = new SQLContext(sc); 

    public static void main(String[] args) { 
     //Data source options 
     Map<String, String> options = new HashMap<>(); 
     options.put("driver", MYSQL_DRIVER); 
     options.put("url", MYSQL_CONNECTION_URL); 
     options.put("dbtable", 
        "query"); 
     //options.put("partitionColumn", "sessionID"); 
     // options.put("lowerBound", "10001"); 
     //options.put("upperBound", "499999"); 
     //options.put("numPartitions", "10"); 

     //Load MySQL query result as DataFrame 
     DataFrame jdbcDF = sqlContext.load("jdbc", options); 

     //jdbcDF.show(); 
     jdbcDF.select("id","queryText").show(); 




    } 
} 

任何示例项目都会提供很多帮助。日志: 用放电的默认log4j的配置文件:组织/阿帕奇/火花/ log4j-defaults.properties

15/08/29 03:38:26 INFO SparkContext: Running Spark version 1.4.1 
15/08/29 03:38:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/08/29 03:38:27 WARN Utils: Your hostname, shawon-H67MA-USB3-B3 resolves to a loopback address: 127.0.0.1; using 192.168.1.102 instead (on interface eth0) 
15/08/29 03:38:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
15/08/29 03:38:27 INFO SecurityManager: Changing view acls to: shawon 
15/08/29 03:38:27 INFO SecurityManager: Changing modify acls to: shawon 
15/08/29 03:38:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shawon); users with modify permissions: Set(shawon) 
15/08/29 03:38:27 INFO Slf4jLogger: Slf4jLogger started 
15/08/29 03:38:27 INFO Remoting: Starting remoting 
15/08/29 03:38:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:60742] 
15/08/29 03:38:27 INFO Utils: Successfully started service 'sparkDriver' on port 60742. 
15/08/29 03:38:27 INFO SparkEnv: Registering MapOutputTracker 
15/08/29 03:38:27 INFO SparkEnv: Registering BlockManagerMaster 
15/08/29 03:38:27 INFO DiskBlockManager: Created local directory at /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1/blockmgr-57acbba4-d7d4-4557-9e6c-e1acf97d4c88 
15/08/29 03:38:27 INFO MemoryStore: MemoryStore started with capacity 473.3 MB 
15/08/29 03:38:27 INFO HttpFileServer: HTTP File server directory is /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1/httpd-a5e6844d-ac3a-4da2-822c-1b98d0a287c4 
15/08/29 03:38:27 INFO HttpServer: Starting HTTP Server 
15/08/29 03:38:27 INFO Utils: Successfully started service 'HTTP file server' on port 55199. 
15/08/29 03:38:27 INFO SparkEnv: Registering OutputCommitCoordinator 
15/08/29 03:38:28 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
15/08/29 03:38:28 INFO SparkUI: Started SparkUI at http://192.168.1.102:4040 
15/08/29 03:38:28 INFO AppClient$ClientActor: Connecting to master akka.tcp://[email protected]:7077/user/Master... 
15/08/29 03:38:28 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150829033828-0000 
15/08/29 03:38:28 INFO AppClient$ClientActor: Executor added: app-20150829033828-0000/0 on worker-20150829033238-192.168.1.102-36976 (192.168.1.102:36976) with 4 cores 
15/08/29 03:38:28 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150829033828-0000/0 on hostPort 192.168.1.102:36976 with 4 cores, 512.0 MB RAM 
15/08/29 03:38:28 INFO AppClient$ClientActor: Executor updated: app-20150829033828-0000/0 is now RUNNING 
15/08/29 03:38:28 INFO AppClient$ClientActor: Executor updated: app-20150829033828-0000/0 is now LOADING 
15/08/29 03:38:28 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58874. 
15/08/29 03:38:28 INFO NettyBlockTransferService: Server created on 58874 
15/08/29 03:38:28 INFO BlockManagerMaster: Trying to register BlockManager 
15/08/29 03:38:28 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.102:58874 with 473.3 MB RAM, BlockManagerId(driver, 192.168.1.102, 58874) 
15/08/29 03:38:28 INFO BlockManagerMaster: Registered BlockManager 
15/08/29 03:38:28 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 
15/08/29 03:38:30 INFO SparkContext: Starting job: show at LoadFromDb.java:43 
15/08/29 03:38:30 INFO DAGScheduler: Got job 0 (show at LoadFromDb.java:43) with 1 output partitions (allowLocal=false) 
15/08/29 03:38:30 INFO DAGScheduler: Final stage: ResultStage 0(show at LoadFromDb.java:43) 
15/08/29 03:38:30 INFO DAGScheduler: Parents of final stage: List() 
15/08/29 03:38:30 INFO DAGScheduler: Missing parents: List() 
15/08/29 03:38:30 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at show at LoadFromDb.java:43), which has no missing parents 
15/08/29 03:38:30 INFO MemoryStore: ensureFreeSpace(4304) called with curMem=0, maxMem=496301506 
15/08/29 03:38:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.2 KB, free 473.3 MB) 
15/08/29 03:38:30 INFO MemoryStore: ensureFreeSpace(2274) called with curMem=4304, maxMem=496301506 
15/08/29 03:38:30 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.2 KB, free 473.3 MB) 
15/08/29 03:38:30 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.102:58874 (size: 2.2 KB, free: 473.3 MB) 
15/08/29 03:38:30 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874 
15/08/29 03:38:30 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at show at LoadFromDb.java:43) 
15/08/29 03:38:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 
15/08/29 03:38:30 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:56580/user/Executor#1344522225]) with ID 0 
15/08/29 03:38:30 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.102, PROCESS_LOCAL, 1171 bytes) 
15/08/29 03:38:30 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.102:56904 with 265.4 MB RAM, BlockManagerId(0, 192.168.1.102, 56904) 
15/08/29 03:38:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.102:56904 (size: 2.2 KB, free: 265.4 MB) 
15/08/29 03:38:31 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.102): java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123 
    at java.sql.DriverManager.getConnection(DriverManager.java:596) 
    at java.sql.DriverManager.getConnection(DriverManager.java:187) 
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:185) 
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:177) 
    at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:359) 
    at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:350) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) 
    at org.apache.spark.scheduler.Task.run(Task.scala:70) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 

15/08/29 03:38:31 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1, 192.168.1.102, PROCESS_LOCAL, 1171 bytes) 
15/08/29 03:38:31 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) on executor 192.168.1.102: java.sql.SQLException (No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123) [duplicate 1] 
15/08/29 03:38:31 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 2, 192.168.1.102, PROCESS_LOCAL, 1171 bytes) 
15/08/29 03:38:31 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2) on executor 192.168.1.102: java.sql.SQLException (No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123) [duplicate 2] 
15/08/29 03:38:31 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 3, 192.168.1.102, PROCESS_LOCAL, 1171 bytes) 
15/08/29 03:38:31 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3) on executor 192.168.1.102: java.sql.SQLException (No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123) [duplicate 3] 
15/08/29 03:38:31 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job 
15/08/29 03:38:31 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/08/29 03:38:31 INFO TaskSchedulerImpl: Cancelling stage 0 
15/08/29 03:38:31 INFO DAGScheduler: ResultStage 0 (show at LoadFromDb.java:43) failed in 1.680 s 
15/08/29 03:38:31 INFO DAGScheduler: Job 0 failed: show at LoadFromDb.java:43, took 1.840969 s 
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.1.102): java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123 
    at java.sql.DriverManager.getConnection(DriverManager.java:596) 
    at java.sql.DriverManager.getConnection(DriverManager.java:187) 
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:185) 
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:177) 
    at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:359) 
    at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:350) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) 
    at org.apache.spark.scheduler.Task.run(Task.scala:70) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 

Driver stacktrace: 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) 
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) 
    at scala.Option.foreach(Option.scala:236) 
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) 
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
15/08/29 03:38:31 INFO SparkContext: Invoking stop() from shutdown hook 
15/08/29 03:38:31 INFO SparkUI: Stopped Spark web UI at http://192.168.1.102:4040 
15/08/29 03:38:31 INFO DAGScheduler: Stopping DAGScheduler 
15/08/29 03:38:31 INFO SparkDeploySchedulerBackend: Shutting down all executors 
15/08/29 03:38:31 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 
15/08/29 03:38:31 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 
15/08/29 03:38:31 INFO Utils: path = /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1/blockmgr-57acbba4-d7d4-4557-9e6c-e1acf97d4c88, already present as root for deletion. 
15/08/29 03:38:31 INFO MemoryStore: MemoryStore cleared 
15/08/29 03:38:31 INFO BlockManager: BlockManager stopped 
15/08/29 03:38:32 INFO BlockManagerMaster: BlockManagerMaster stopped 
15/08/29 03:38:32 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 
15/08/29 03:38:32 INFO SparkContext: Successfully stopped SparkContext 
15/08/29 03:38:32 INFO Utils: Shutdown hook called 
15/08/29 03:38:32 INFO Utils: Deleting directory /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1 
15/08/29 03:38:32 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
+0

他们似乎缺少MySQL-jdbc-驱动程序。您是否尝试过使用Java安装连接到任何MySQL数据库?可能你需要一个适用于你的系统的MySQL-jdbc驱动程序。 – flaschenpost

+0

我有驱动程序作为jar文件,其名称为mysql-connector-java-5.1.36-bin.jar。我可以通过这个jar文件使用java连接,但是当我想使用它使用火花问题出现...... :( – Shawon91

+0

你如何提交你的应用程序? – eliasah

回答

0

我有同样的问题,对Oracle数据库的连接。

我没有以下4两件事,但我认为首先2固定它:

  1. 当运行你的应用程序,添加驱动程序jar到classpath中:

%YOUR_SPARK_HOME%/bin/spark-submit --jars file://c:/jdbcDrivers/ojdbc7.jar

  • 添加驱动程序的连接属性(选项对象在您的情况下):

    dbProps.put("driver", "oracle.jdbc.driver.OracleDriver"); 
    
  • 在%YOUR_SPARK_HOME%/ CONF /火花defaults.conf文件添加:

    spark.driver.extraClassPath =文件:// C:/jdbcDrivers/ojdbc7.jar

  • 在相同的conf还可以添加:

    spark.executor.extraClassPath =文件:// C:/jdbcDrivers/ojdbc7.jar

  • 我得DB连接,但在一些问题上合作,以插入到Oracle截至目前。

    相关问题