2015-12-23 91 views
1

我想让nutch 1.11执行爬网。我正在使用cygwin在Windows 7中运行这些命令。Nutch问题执行爬行

Nutch正在运行,我从运行bin/nutch获取结果,但在尝试运行爬网时收到错误消息。

我收到以下错误,当我尝试运行抓取使用Nutch执行:

错误运行:/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local /斌/ Nutch的注射TestCrawl/crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/seed.txt

与退出值失败127

我有我的JAVA_HOME类路径集,我已经改变了主机文件以包含127.0.0.1作为本地主机。

我很好奇,如果我正确地调用写目录,如果也许这是问题。

完整的打印输出的样子:

[email protected] /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local 
$ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/ TestCrawl/ 2 

Injecting seed URLs 
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl//crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/ 
Injector: starting at 2015-12-23 17:48:21 
Injector: crawlDb: TestCrawl/crawldb 
Injector: urlDir: C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls 
Injector: Converting injected urls to crawl db entries. 
Injector: java.lang.NullPointerException 
     at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) 
     at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) 
     at org.apache.hadoop.util.Shell.run(Shell.java:418) 
     at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) 
     at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) 
     at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) 
     at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) 
     at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:421) 
     at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:281) 
     at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125) 
     at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348) 
     at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) 
     at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:422) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 
     at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) 
     at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) 
     at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:422) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 
     at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) 
     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) 
     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833) 
     at org.apache.nutch.crawl.Injector.inject(Injector.java:323) 
     at org.apache.nutch.crawl.Injector.run(Injector.java:379) 
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
     at org.apache.nutch.crawl.Injector.main(Injector.java:369) 

Error running: 
    /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl//crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/ 
Failed with exit value 127. 

Hadoop的日志,我认为可能是与我得到的错误是:

2016-01-07 12:24:40,360 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path 
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. 
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) 
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) 
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326) 
    at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:432) 
    at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:478) 
    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170) 
    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64) 
    at org.apache.nutch.crawl.Injector.main(Injector.java:369) 
2016-01-07 12:24:40,450 ERROR crawl.Injector - Injector: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 15: solr.server.url=http://localhost:8983/solr 
    at org.apache.hadoop.fs.Path.initialize(Path.java:206) 
    at org.apache.hadoop.fs.Path.<init>(Path.java:172) 
    at org.apache.nutch.crawl.Injector.run(Injector.java:379) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at org.apache.nutch.crawl.Injector.main(Injector.java:369) 
Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 15: solr.server.url=http://localhost:8983/solr 
    at java.net.URI$Parser.fail(URI.java:2848) 
    at java.net.URI$Parser.checkChars(URI.java:3021) 
    at java.net.URI$Parser.parse(URI.java:3048) 
    at java.net.URI.<init>(URI.java:746) 
    at org.apache.hadoop.fs.Path.initialize(Path.java:203) 
    ... 4 more 

回答

0

您从Cygwin运行Linux命令在linux系统中没有C:\路径。正确的命令应该是这样

/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch1.11/runtime/local/bin/nutch inject TestCrawl/crawldb /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch1.11/runtime/local/urls/seed.txt 
+0

谢谢你帮我和我的那个错误;我没有意识到我正在那样做。 – JerrittPace

+0

@JerrittPace不客气。如果帮助你,请选择我的答案作为最佳答案。谢谢。 –

+0

对不起,我试图编辑,但我没有把它做好;我非常感谢帮助,我相信我所做的错误会导致问题,但我仍然遇到与修改后的命令相同的错误,以及没有显式调用目录的简单命令:bin/crawl - i -D solr.server.url = http:// localhost:8983/solr urls TestCrawl 2; bin/crawl -i -D /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/ solr.server.url = http:// localhost:8983/solr/TestCrawl/2;这两个都返回相同的错误:失败,退出值为127。 – JerrittPace

0

你必须回答这个消息您的问题:

2016-01-07 12:24:40,360 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

发生这种情况,因为Hadoop的版本附带的Nutch 1.11被设计在Linux下工作的开箱而不是在窗户上。

我有同样的情况,我最终在ubuntu虚拟框中使用nutch1.11。

0

hadoop-core jar file is needed when you are working with nutch