2016-06-01 47 views
0

我想搜索直到特定行的单词,而不是使用solr查询。我尝试了近距离赛,但它没有奏效。我的数据是像SOLR如何限制solr查询中的搜索内容

块引用“日期:星期四,2014年7月24日9点36分44秒GMT \的nCache控制:私人\ nContent类型:应用程序/ JSON;字符集= UTF-8 \ nContent编码: gzip \ nVary:Accept-Encoding \ nP3P:CP =%20CURo TAIo IVAo IVO ONL UNI COM NAV INT DEM STA OUR%20 \ nX-Powered-By:ASP.NET \ n内容长度:570 \ n保持活跃:timeout = 120 \ nConnection:Keep-Alive \ n \ n [{%20rows%20:[],%20index%20:[],%20folders%20:[[%20Inbox%20,%20Inbox%20,%20%20 ,1,1,0,0,0,%20Inbox%20,0,0,%20none%20,0],[%20Drafts%20,%20Drafts%20,%20%20,1,1,0, 0,0,%20Drafts%20,0,0,%20none%20,0],[%20Sent%20,%20Sent%20,%20%20,1,1,0,0,11,%20Sent% 20,1,0,%20none%20,0],[%20Spam%20,%20Spam%20,%20%20,1,1,0,0,0,%20Spam%20,1,0,% 20none%20,0],[%20Deleted%20,%20Trash%20,%20%20,1,1,0,7,9,%20Deleted%20,1,0,%20none%20,0], [%20已保存%20,%20已保存邮件%20,%20%20,1,1,0,0,0,%20已保存%20,1,0,%20n一个%20,0],[%20保存的IM%20,%20已保存的聊天%20,%20保存的%20,2,1,0,0,0,%20保存的内容%20,1,0,%20none%20,0] ],%20fcsupport%20:真,%20hasNewMsg%20:假,%20totalItems%20:0,%20isSuccess%20:真,%20foldersCanMoveTo%20:[%20Sent%20,%20Spam%20,%20Deleted%20 ,%20保存%20,%20保存的%20],%20索引开始%20:}} POST /38664-816/aol-6/en-us/common/rpc/RPC.aspx?user=hl1lkgReIh & transport = xmlhttp & r = 0.019667088333411797 & a = GetMessageList & l = 31211 HTTP/1.1 \ n主机:mail.aol.com \ n用户代理:Mozilla/5.0(Windows NT 5.1; rv:31.0)Gecko/20100101 Firefox/31.0 \ n接受:text/html,application/xhtml + xml,application/xml; q = 0.9,/; q = 0.8 \ n接受 - 语言:en-US,en; q = 0.5 \ nAcept-Encoding:gzip,deflate \ nContent-Type:application/x-www-form-urlencoded; charset = UTF-8 \ nX-Requested-With:XMLHttpRequest \ nReferer:http://mail.aol.com/38664-816/aol-6/en-us/Suite.aspx \ nContent-Length:452 \ nCookie:mbox = PC#1405514778803-136292.22_06#1407395182 | session#1406185366924-436868#1406187442 | check#true#1406185642 ; s_pers =%20s_fid%3D55C638B5F089E6FB-19ACDEED1644FD86%7C1469344726539%3B%20s_getnr%3D1406186326569-重复%7C1469258326569%3B%20s_nrgvo%3DRepeat%7C1469258326571%3B; s_vi = [CS] V1 | 29E33A0D051D366F-60000105200097FF [CE]; UNAUTHID = 1.5efb4a11934a40b8b5272557263dadfe.88c5; RSP_COOKIE = type = 3& name = LTState =版本:5 & LAV:22 & UN:* UQo5AwAnAytffwJSYg%3D%3D & SN:* UQo5AwAnAytffwJSYg%3D%3D &紫外线:AOL & LC:EN-US & UD:aol.com & EA:* UQo5AwAnAytffwJSCAsnWWoJASZL & PRMC :825345 & MT:6个& AMS:1个& CMAI:365 & SNT:0 & vnop:假& MH:core-mia002b.r1000.mail.aol.com &峰br:100 & WM:mail.aol.com & CKD :.mail.aol.com & ckp:%2f & ha:1NGRuUTRRxGFF2s5A4JwkuCT43Q%3d &; aolweatherlocation = 10003;数据层=缺点%3D6.107%26coms%3D629; grvinsights = 69f3a2bb86ed3cd31aa1d14a1ce9e845; CUNAUTHID = 1.5efb4a11934a40b8b5272557263dadfe.88c5; s_sess =%20s_cc%3Dtrue%3B%20s_sq%3Daolcmp%253D%252526pid%25253Dcmp%2525253A%25252520Help%25252520%2525257C%25252520View%25252520Article%2525253A%25252520Clear%25252520cookies%2525252C%25252520cache%2525252C%25252520history%25252520and%25252520footprints%252526pidt %25253D1%252526oid%25253Dhttp%2525253A%2525252F%2525252Fwebmail.aol.com%2525252F%2525253F_AOLLOCAL%2525253Dmail%252526ot%25253DA%2526aolsnssignin%253D%252526pid%25253Dsso%25252520%2525253A%25252520login%252526pidt%25253D1%252526oid%25253DSign%25252520In %252526oidt%25253D3%252526ot%25253DSUBMIT%3B; L7Id = 31211;上下文= ver:3 & sid:923f783b-bc6e-4edf-87c9-e52f19b3ce67 & rt:STANDARD & i:f & ckd:.mail.aol。com & ckp:%2f & ha:X80Ku4ffRKsOVSwgmEVPCfpfxeU%3d &; IDP_A = S-1- V0c3QiuO6BzQ5S6_u3s0brfUqMCktezAz7sWlVfHD90omIijDXRrMJkSM -9- xcnUcSTnXbcZ1aUCgvfuToVeJihcftKY5KtsC_nB7Y9qf6P0xUnNfCIAmWVtRf4ctSQ9JwRIzHa40dhFuULwYLu3NUPTxckeFUFAzcSS4hrmb4grhEtyOGp0qV5rIKtjs4u8; MC_CMP_ESK =无义; SNS_AA = asrc = 2 & sst = 1406185424 & type = 0; _utd = GD#MzRb%2FjjHIe8odpr%2FfxZR2g%3D%3D | PR#一个| ST#sns.webmail.aol.com | UID#; AUTH =版本:22个& UAS:* UQo5AwAnAytffwJSZAskRiwLBSIDWVpVXxVTVwJCLFxdSnpHUWBbeV1jcikERgl6CEYLJUweGUhdFQQLW1h%2bBAZRcllWfVl8VH4DUmRaZARoPhw%2bBFBA & IDL:0 & UN:* UQo5AwAnAytffwJSYg%3D%3D &于:SNS & SN:* UQo5AwAnAytffwJSYg%3D%3D & WIM:%252FwQCAAAAAAAEk2ihy%252BE4MMebm4R1jvxY07zNZhFOHSz2EFBnsNdOAUsl8QyZceo54kWYZ4vwVayLFF7w &麦粒肿: 0 & UD:aol.com & UID:hl1lkgReIh & SS:635417678271359104个& SVS:SNS_AA%7c1406185424 & LA:635417687268954835 & AAT:甲&行为:M &峰br:100 & CBR:AOL & MT:&薪酬:0 & MBT:摹&紫外线:AOL & LC:EN-US &投标:1 & ACD:1403348988 & PIX:3829 & PRMC:825345 & RELM:AOL &麻将:%2 \ nConnection:保活\ n“

并且希望从数据中搜索Content-Type:application/json,而不是在这行之后。我曾尝试

http://192.168.0.164:8983/solr/collection_with_all_details/select?q=Content%3A的Content-Type JSON * &重量= JSON &缩进=真

,但它在整个内容搜索。我需要限制搜索内容

回答

0

我不认为这是可能的在这种情况下。您可以检查highlighter以突出显示响应返回前200个字符。

可能是你需要写一个自定义响应作家,可以帮助这一点。

另一个选项驾驶室将创建更多的字段与indexed="false"stored="true"将更有效率。

创建您的原始字段indexed="true"stored="false",您的索引大小将会减小。新副本字段将为indexed="false"stored="true"

<copyField source="text" dest="textShort" maxChars="200"/> 

检查这是否适合您。

0

您应该真正地预处理您的数据以仅索引要使用的部分。在事实之后这样做并不是一个好的解决方案,因为您已经拥有索引中的大部分内容,并且您正在寻找一个未位于特定字节位置的分隔符(这就是maxChars将能够去做)。

根据您的索引方式,您可以在索引步骤(regextransformer,在您自己的代码中使用SolrJ等)执行此操作,也可以在代码的分析步骤中使用类似于patternreplacefilter。这将允许你删除你要找的标题后的任何东西。

这样,您应该能够将内容索引到一个header字段和一个body字段中,例如,根据您的需要。