2014-02-11 73 views
0

我需要索引PDF档案,我被告知Solr的能做到这一点。所以我在Weblogic上安装了Solr-Server,并用Web-Interface尝试了一些东西。“大小超过了配置的最大”错误而索引

最后我写了一个基于JUnit测试类,试图用Java和Solrj做同样的事情。

我写了一个(简单)的代码索引一对夫妇的PDF文件,并进行查询,看看是否文件,索引:

@Test 
    public void documentSearchTest() throws IDSystemException 
    { 
    try 
    { 
     server.deleteByQuery("*:*"); 

     Assert.assertTrue("Document not found! - " + TEST_PDF_DOCUMENT1, new File(TEST_PDF_DOCUMENT1).exists()); 
     Assert.assertTrue("Document not found! - " + TEST_PDF_DOCUMENT2, new File(TEST_PDF_DOCUMENT2).exists()); 

     req.addFile(new File(TEST_PDF_DOCUMENT1), CONTENT_TYPE_APPLICATION_PDF); 
     req.addFile(new File(TEST_PDF_DOCUMENT2), CONTENT_TYPE_APPLICATION_PDF); 

     NamedList<Object> result = server.request(req); 

     SolrQuery solrQuery = new SolrQuery().setQuery("*:*"); 

     QueryResponse rsp = server.query(solrQuery); 

     SolrDocumentList docs = rsp.getResults(); 

    } 
    catch (SolrServerException sse) 
    { 
     throw new IDSystemException(LOG, sse.getMessage(), sse); 
    } 
    catch (IOException ioe) 
    { 
     throw new IDSystemException(LOG, ioe.getMessage(), ioe); 
    } 
    } 

通过运行这个测试,我得到以下错误:

<11.02.2014 09:08 Uhr MEZ> <Notice> <Stdout> <BEA-000000> <785764 [[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'] INFO org.apache.solr.core.SolrCore ? [Collection1] REMOVING ALL DOCUMENTS FROM INDEX> 
<11.02.2014 09:08 Uhr MEZ> <Notice> <Stdout> <BEA-000000> <785764 [[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'] INFO org.apache.solr.update.processor.LogUpdateProcessor ? [Collection1] webapp=/solr path=/update params={wt=javabin&version=2} {deleteByQuery=*:*} 0 0> 
<11.02.2014 09:08 Uhr MEZ> <Notice> <Stdout> <BEA-000000> <786215 [[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR org.apache.solr.servlet.SolrDispatchFilter ? null:org.apache.commons.fileupload.FileUploadBase$SizeLimitExceededException: the request was rejected because its size (2100088) exceeds the configured maximum (2097152) 
    at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl$1.raiseError(FileUploadBase.java:902) 
    at org.apache.commons.fileupload.util.LimitedInputStream.checkLimit(LimitedInputStream.java:71) 
    at org.apache.commons.fileupload.util.LimitedInputStream.read(LimitedInputStream.java:128) 
    at org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977) 
    at org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:887) 
    at java.io.InputStream.read(InputStream.java:85) 
    at org.apache.commons.fileupload.util.Streams.copy(Streams.java:94) 
    at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64) 
    at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362) 
    at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126) 
    at org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:547) 
    at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:681) 
    at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:150) 
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:393) 
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) 
    at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56) 
    at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3592) 
    at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) 
    at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:121) 
    at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2202) 
    at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2108) 
    at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1432) 
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) 
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)> 

我检查weblogic的设置(服务器 - >协议 - > HTTP)和那里的最大尺寸后设置为-1(应该是指无限大)。

有什么别的地方,也必须被设置?

编辑: 这里solrconfig.xml中

<?xml version="1.0" encoding="UTF-8" ?> 
<config> 
    <luceneMatchVersion>LUCENE_45</luceneMatchVersion> 
    <directoryFactory name='DirectoryFactory' class='solr.MMapDirectoryFactory' /> 

    <codecFactory name="CodecFactory" class="solr.SchemaCodecFactory" /> 

    <lib dir='${solr.core.instanceDir}\lib' /> 
    <lib dir="${solr.core.instanceDir}\dist\" regex="solr-cell-\d.*\.jar" /> 
    <lib dir="${solr.core.instanceDir}\contrib\extraction\lib" regex=".*\.jar" /> 

    <requestHandler name="standard" class="solr.StandardRequestHandler" default="true" /> 

    <requestHandler name="/update" class="solr.UpdateRequestHandler"> 
     <lst name="defaults"> 
      <str name="update.chain">deduplication</str> 
     </lst> 
    </requestHandler> 

    <requestHandler name="/update/extract" 
     class="solr.extraction.ExtractingRequestHandler"> 
     <lst name="defaults"> 
      <str name="captureAttr">true</str> 
      <str name="lowernames">true</str> 
      <str name="overwrite">true</str> 
      <str name="literalsOverride">true</str> 
      <str name="fmap.a">link</str> 
      <!-- the configuration here could be useful for tests --> 
      <str name="update.chain">deduplication</str> 
     </lst> 
    </requestHandler> 

    <updateRequestProcessorChain name="deduplication"> 
     <processor 
      class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> 
      <bool name="overwriteDupes">false</bool> 
      <str name="signatureField">uid</str> 
      <bool name="enabled">true</bool> 
      <str name="fields">content</str> 
      <str name="minTokenLen">10</str> 
      <str name="quantRate">.2</str> 
      <str name="signatureClass">solr.update.processor.TextProfileSignature</str> 
     </processor> 
     <processor class="solr.LogUpdateProcessorFactory" /> 
     <processor class="solr.RunUpdateProcessorFactory" /> 
    </updateRequestProcessorChain> 

    <requestHandler name="/admin/" 
     class="org.apache.solr.handler.admin.AdminHandlers" /> 
    <admin> 
     <defaultQuery>*:*</defaultQuery> 
    </admin> 

</config> 

回答

1

分段文件被限制在尺寸上的ExtractingRequestHandler

Solr的配置,您应该修改这似乎是2048KB默认 看价值 <requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" /><requestDispatcher...>部分

在你的例子中,你必须p在UT斯达康solrconfig.xml此:

<requestDispatcher handleSelect="false" > 
    <requestParsers enableRemoteStreaming="true" 
       multipartUploadLimitInKB="2048000" <-- set your size Here 
       formdataUploadLimitInKB="2048" 
       addHttpRequestToContext="false"/> 
</requestDispatcher> 
+0

我很新的使用Solr,那么,你不介意我需要修改的值,我不很了解。也许在solrconfig.xml中?我在我的问题中添加了它的完整性。 – Francesco

相关问题