我正在使用Solr 6.4.1版本,我最近在索引文件中发布了大约1000个文件。我在Windows 10中使用Windows Powershell来使用该命令发布文件。Solr索引在发布文件时产生错误
PS C:\ solr的-6.4.1>的java -DC = Solr_sample -Dauto =是-ddata =文件 -Drecursive =是-jar示例/ exampledocs/post.jar E:\测试\
但其中我发现一个文件没有索引,我试图使用下面的命令再次索引该特定文件,但没有运气。该文件大小为212MB。我附上了以下错误。你能帮我把这个文件发布到Solr索引。
PS C:\solr-6.4.1> java -Dc=Solr_sample -Dauto=yes -Ddata=files -Drecursive=yes -jar example/exampledocs/post.jar E:\Test\C0000000045\
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/Solr_sample/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory E:\Test\C0000000045 (1 files, depth=0)
POSTing file 20162436739-Spheres Volume 3 Foams Plural Spherology. Peter Sloterdijk. MIT.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #500 (Server Error) for url: http://localhost:8983/solr/Solr_sample/update/extract?resource.name=E%3A%5CTest%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Spherology.+Peter+Sloterdijk.+MIT.pdf&literal.id=E%3A%5C
Test%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Spherology.+Peter+Sloterdijk.+MIT.pdf
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 500 Server Error</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /solr/Solr_sample/update/extract. Reason:
<pre> Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.OutOfMemoryError: Java heap space
at java.io.PushbackInputStream.<init>(Unknown Source)
at org.apache.pdfbox.pdfparser.InputStreamSource.<init>(InputStreamSource.java:39)
at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:821)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:727)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:652)
at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:612)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:215)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:972)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:908)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:131)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
</pre>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost:8983/solr/Solr_sample/update/extract?resource.name=E%3A%5CTest%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Sp
herology.+Peter+Sloterdijk.+MIT.pdf&literal.id=E%3A%5CTest%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Spherology.+Peter+Sloterdijk.+MIT.pdf
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/Solr_sample/update...
Time spent: 0:00:13.795
我用命令PS C:\ Solr的-6.4.1> java的-Xmx8g - Dc = Solr_sample -Dauto = yes -Ddata = files -Drecursive = yes -jar example/exampledocs/post.jar E:\ Test \ C0000000045 \但它仍然给出相同的错误。 – Simant
你有多少公羊免费?尽可能多地给它。但是你不能排除PDF文本提取中的一个错误,你可能永远不会从文本中提取文本(或者你可以通过其他方式,然后创建一个DOC文件与人工手动/脚本,如果它是重要的) – Persimmonium
错误消息在服务器上生成,给索引工具更多的内存将无济于事。您可以更改/使用您正在使用的Solr启动脚本的已分配内存的大小。 – MatsLindh