2014-02-17 139 views
0

我想使用sqoop将一些数据从hdfs导出到mysql。问题是,当我试图导出一个未压缩文件时,它会正确导出,但如果我尝试导出压缩了lzo压缩的同一文件,sqoop作业将失败。我正在标准cloudera CDH4 VM环境中尝试它。文件中的列由tab分隔,null表示为'\ N'。NoSuchElementException当使用sqoop从hdfs将数据导出到mysql时

文件内容:

[[email protected] ~]$ cat dipayan-test.txt 
dipayan koramangala 29 
raju marathahalli 32 
raju marathahalli 32 
raju \N 32 
raju marathahalli 32 
raju \N 32 
raju marathahalli 32 
raju marathahalli \N 
raju marathahalli \N 

MySQL表的说明:

mysql> describe sqooptest; 
+---------+--------------+------+-----+---------+-------+ 
| Field | Type   | Null | Key | Default | Extra | 
+---------+--------------+------+-----+---------+-------+ 
| name | varchar(100) | YES |  | NULL |  | 
| address | varchar(100) | YES |  | NULL |  | 
| age  | int(11)  | YES |  | NULL |  | 
+---------+--------------+------+-----+---------+-------+ 
3 rows in set (0.01 sec) 

文件在HDFS:

[[email protected] ~]$ hadoop fs -ls /user/cloudera/dipayan-test 
Found 1 items 
-rw-r--r-- 3 cloudera cloudera  138 2014-02-16 23:18 /user/cloudera/dipayan-test/dipayan-test.txt.lzo 

sqoop命令:

sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "XXXXXX" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N' 

错误:

[[email protected] ~]$ sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "mysql" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N' 
14/02/16 23:19:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 
14/02/16 23:19:26 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 
14/02/16 23:19:26 INFO manager.SqlManager: Using default fetchSize of 1000 
14/02/16 23:19:26 INFO tool.CodeGenTool: Beginning code generation 
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0 
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0 
14/02/16 23:19:27 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce 
14/02/16 23:19:27 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20-mapreduce/hadoop-core.jar 
Note: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.java uses or overrides a deprecated API. 
Note: Recompile with -Xlint:deprecation for details. 
14/02/16 23:19:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.jar 
14/02/16 23:19:29 INFO mapreduce.ExportJobBase: Beginning export of sqooptest 
14/02/16 23:19:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0 
14/02/16 23:19:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1 
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1 
14/02/16 23:19:31 INFO mapred.JobClient: Running job: job_201402162201_0013 
14/02/16 23:19:32 INFO mapred.JobClient: map 0% reduce 0% 
14/02/16 23:19:41 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_0, Status : FAILED 
java.io.IOException: Can't export data, please check task tracker logs 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) 
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
    at org.apache.hadoop.mapred.Child.main(Child.java:262) 
Caused by: java.util.NoSuchElementException 
    at java.util.AbstractList$Itr.next(AbstractList.java:350) 
    at sqooptest.__loadFromFields(sqooptest.java:225) 
    at sqooptest.parse(sqooptest.java:174) 
    at org.apach 
14/02/16 23:19:48 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_1, Status : FAILED 
java.io.IOException: Can't export data, please check task tracker logs 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) 
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
    at org.apache.hadoop.mapred.Child.main(Child.java:262) 
Caused by: java.util.NoSuchElementException 
    at java.util.AbstractList$Itr.next(AbstractList.java:350) 
    at sqooptest.__loadFromFields(sqooptest.java:225) 
    at sqooptest.parse(sqooptest.java:174) 
    at org.apach 
14/02/16 23:19:55 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_2, Status : FAILED 
java.io.IOException: Can't export data, please check task tracker logs 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) 
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
    at org.apache.hadoop.mapred.Child.main(Child.java:262) 
Caused by: java.util.NoSuchElementException 
    at java.util.AbstractList$Itr.next(AbstractList.java:350) 
    at sqooptest.__loadFromFields(sqooptest.java:225) 
    at sqooptest.parse(sqooptest.java:174) 
    at org.apach 
14/02/16 23:20:04 INFO mapred.JobClient: Job complete: job_201402162201_0013 
14/02/16 23:20:04 INFO mapred.JobClient: Counters: 7 
14/02/16 23:20:04 INFO mapred.JobClient: Job Counters 
14/02/16 23:20:04 INFO mapred.JobClient:  Failed map tasks=1 
14/02/16 23:20:04 INFO mapred.JobClient:  Launched map tasks=4 
14/02/16 23:20:04 INFO mapred.JobClient:  Data-local map tasks=4 
14/02/16 23:20:04 INFO mapred.JobClient:  Total time spent by all maps in occupied slots (ms)=29679 
14/02/16 23:20:04 INFO mapred.JobClient:  Total time spent by all reduces in occupied slots (ms)=0 
14/02/16 23:20:04 INFO mapred.JobClient:  Total time spent by all maps waiting after reserving slots (ms)=0 
14/02/16 23:20:04 INFO mapred.JobClient:  Total time spent by all reduces waiting after reserving slots (ms)=0 
14/02/16 23:20:04 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 33.5335 seconds (0 bytes/sec) 
14/02/16 23:20:04 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Exported 0 records. 
14/02/16 23:20:04 ERROR tool.ExportTool: Error during export: Export job failed! 

这工作完全如果未压缩文件,我直接与dipayan-test.txt文件的工作。

需要帮助解决此问题,并且想知道在使用lzo文件时是否丢失了某些东西。

回答

0

很可能您的表格没有正确的列。您可以随时进入java文件sqoop为您创建和调试从那里:sqooptest.java:225

2

出口可能会失败的一些原因:

* Loss of connectivity from the Hadoop cluster to the database (either due to hardware fault, or server software crashes) 
* Attempting to INSERT a row which violates a consistency constraint (for example, inserting a duplicate primary key value) 
* Attempting to parse an incomplete or malformed record from the HDFS source data 
* Attempting to parse records using incorrect delimiters 
* Capacity issues (such as insufficient RAM or disk space) 

here

采取我的情况下,我得到相同的NoSuchElementException并设置适当的字段终止符为--fields-terminated-by '\t'解决了这个问题。

当没有提及,Sqoop考虑像MySQL默认的终结:“”作为字段终结‘\ n’作为行终止。

相关问题