2014-09-23 28 views
0

当csv文件被上传在卷曲命令如下solr的不插入第一行中csv文件

C:\>curl "http://localhost:8983/solr/update/csv?commit=true&stream.file=C:\dev\tools\solr-4.7.2\data.txt&stream.contentType=text/csv&header=false&fieldnames=id,cat,pubyear_i,title,author, 
series_s,sequence_i&skipLines=0" 

和data.txt中的含量如下

book1,fantasy,2000,A Storm of Swords,George R.R. Martin,A Song of Ice and Fire,3 
book2,fantasy,2005,A Feast for Crows,George R.R. Martin,A Song of Ice and Fire,4 
book3,fantasy,2011,A Dance with Dragons,George R.R. Martin,A Song of Ice and Fire,5 
book4,sci-fi,1987,Consider Phlebas,Iain M. Banks,The Culture,1 
book5,sci-fi,1988,The Player of Games,Iain M. Banks,The Culture,2 
book6,sci-fi,1990,Use of Weapons,Iain M. Banks,The Culture,3 
book7,fantasy,1984,Shadows Linger,Glen Cook,The Black Company,2 
book8,fantasy,1984,The White Rose,Glen Cook,The Black Company,3 
book9,fantasy,1989,Shadow Games,Glen Cook,The Black Company,4 
book10,sci-fi,2001,Gridlinked,Neal Asher,Ian Cormac,1 
book11,sci-fi,2003,The Line of Polity,Neal Asher,Ian Cormac,2 
book12,sci-fi,2005,Brass Man,Neal Asher,Ian Cormac,3 

第一数据中data.txt文件没有被插入Solr,它的ID是“book1”。有人可以告诉为什么吗?

http://localhost:8983/solr/query?q=id:book1 
{ 
    "responseHeader":{ 
    "status":0, 
    "QTime":1, 
    "params":{ 
     "q":"id:book1"}}, 
    "response":{"numFound":0,"start":0,"docs":[] 
    }} 

Solr日志已经告诉正在添加book1。

15440876 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore û [collection1] Registered new searcher [email protected][collection1] main{StandardDirectoryReader(segments_1g:124:nrt _z(4.7):C12)} 
15440877 [qtp84034882-11] INFO org.apache.solr.update.processor.LogUpdateProcessor û [collection1] webapp=/solr path=/update params={fieldnames=id,cat,pubyear_i,title,author,series_s,sequence_i&skipLines=0&commit=true&stream.con 
tentType=text/csv&header=false&stream.file=C:\dev\tools\solr-4.7.2\data.txt} {add=[?book1 (1480070032327180288), book2 (1480070032332423168), book3 (1480070032335568896), book4 (1480070032337666048), book5 (1480070032339763200), b 
ook6 (1480070032341860352), book7 (1480070032343957504), book8 (1480070032347103232), book9 (1480070032349200384), book10 (1480070032351297536), ... (12 adds)],commit=} 0 92 

如果我要求所有的数据,那么下面你还可以看到BOOK1仍然缺少由十六进制数据

http://localhost:8983/solr/query?q=id:book*&sort=pubyear_i+desc&fl=id,title,pubyear_i&rows=15 
{ 
    "responseHeader":{ 
    "status":0, 
    "QTime":1, 
    "params":{ 
     "fl":"id,title,pubyear_i", 
     "sort":"pubyear_i desc", 
     "q":"id:book*", 
     "rows":"15"}}, 
    "response":{"numFound":11,"start":0,"docs":[ 
     { 
     "id":"book3", 
     "pubyear_i":2011, 
     "title":["A Dance with Dragons"]}, 
     { 
     "id":"book2", 
     "pubyear_i":2005, 
     "title":["A Feast for Crows"]}, 
     { 
     "id":"book12", 
     "pubyear_i":2005, 
     "title":["Brass Man"]}, 
     { 
     "id":"book11", 
     "pubyear_i":2003, 
     "title":["The Line of Polity"]}, 
     { 
     "id":"book10", 
     "pubyear_i":2001, 
     "title":["Gridlinked"]}, 
     { 
     "id":"book6", 
     "pubyear_i":1990, 
     "title":["Use of Weapons"]}, 
     { 
     "id":"book9", 
     "pubyear_i":1989, 
     "title":["Shadow Games"]}, 
     { 
     "id":"book5", 
     "pubyear_i":1988, 
     "title":["The Player of Games"]}, 
     { 
     "id":"book4", 
     "pubyear_i":1987, 
     "title":["Consider Phlebas"]}, 
     { 
     "id":"book7", 
     "pubyear_i":1984, 
     "title":["Shadows Linger"]}, 
     { 
     "id":"book8", 
     "pubyear_i":1984, 
     "title":["The White Rose"]}] 
    }} 

的data.txt方面

0000000 ef bb bf 62 6f 6f 6b 31 2c 66 61 6e 74 61 73 79 
0000020 2c 32 30 30 30 2c 41 20 53 74 6f 72 6d 20 6f 66 
0000040 20 53 77 6f 72 64 73 2c 47 65 6f 72 67 65 20 52 
0000060 2e 52 2e 20 4d 61 72 74 69 6e 2c 41 20 53 6f 6e 
0000100 67 20 6f 66 20 49 63 65 20 61 6e 64 20 46 69 72 
0000120 65 2c 33 0d 0a 62 6f 6f 6b 32 2c 66 61 6e 74 61 
0000140 73 79 2c 32 30 30 35 2c 41 20 46 65 61 73 74 20 
0000160 66 6f 72 20 43 72 6f 77 73 2c 47 65 6f 72 67 65 
0000200 20 52 2e 52 2e 20 4d 61 72 74 69 6e 2c 41 20 53 
0000220 6f 6e 67 20 6f 66 20 49 63 65 20 61 6e 64 20 46 
0000240 69 72 65 2c 34 0d 0a 62 6f 6f 6b 33 2c 66 61 6e 
0000260 74 61 73 79 2c 32 30 31 31 2c 41 20 44 61 6e 63 
0000300 65 20 77 69 74 68 20 44 72 61 67 6f 6e 73 2c 47 
0000320 65 6f 72 67 65 20 52 2e 52 2e 20 4d 61 72 74 69 
0000340 6e 2c 41 20 53 6f 6e 67 20 6f 66 20 49 63 65 20 
0000360 61 6e 64 20 46 69 72 65 2c 35 0d 0a 62 6f 6f 6b 
0000400 34 2c 73 63 69 2d 66 69 2c 31 39 38 37 2c 43 6f 
0000420 6e 73 69 64 65 72 20 50 68 6c 65 62 61 73 2c 49 
0000440 61 69 6e 20 4d 2e 20 42 61 6e 6b 73 2c 54 68 65 
0000460 20 43 75 6c 74 75 72 65 2c 31 0d 0a 62 6f 6f 6b 
0000500 35 2c 73 63 69 2d 66 69 2c 31 39 38 38 2c 54 68 
0000520 65 20 50 6c 61 79 65 72 20 6f 66 20 47 61 6d 65 
0000540 73 2c 49 61 69 6e 20 4d 2e 20 42 61 6e 6b 73 2c 
0000560 54 68 65 20 43 75 6c 74 75 72 65 2c 32 0d 0a 62 
0000600 6f 6f 6b 36 2c 73 63 69 2d 66 69 2c 31 39 39 30 
0000620 2c 55 73 65 20 6f 66 20 57 65 61 70 6f 6e 73 2c 
0000640 49 61 69 6e 20 4d 2e 20 42 61 6e 6b 73 2c 54 68 
0000660 65 20 43 75 6c 74 75 72 65 2c 33 0d 0a 62 6f 6f 
0000700 6b 37 2c 66 61 6e 74 61 73 79 2c 31 39 38 34 2c 
0000720 53 68 61 64 6f 77 73 20 4c 69 6e 67 65 72 2c 47 
0000740 6c 65 6e 20 43 6f 6f 6b 2c 54 68 65 20 42 6c 61 
0000760 63 6b 20 43 6f 6d 70 61 6e 79 2c 32 0d 0a 62 6f 
0001000 6f 6b 38 2c 66 61 6e 74 61 73 79 2c 31 39 38 34 
0001020 2c 54 68 65 20 57 68 69 74 65 20 52 6f 73 65 2c 
0001040 47 6c 65 6e 20 43 6f 6f 6b 2c 54 68 65 20 42 6c 
0001060 61 63 6b 20 43 6f 6d 70 61 6e 79 2c 33 0d 0a 62 
0001100 6f 6f 6b 39 2c 66 61 6e 74 61 73 79 2c 31 39 38 
0001120 39 2c 53 68 61 64 6f 77 20 47 61 6d 65 73 2c 47 
0001140 6c 65 6e 20 43 6f 6f 6b 2c 54 68 65 20 42 6c 61 
0001160 63 6b 20 43 6f 6d 70 61 6e 79 2c 34 0d 0a 62 6f 
0001200 6f 6b 31 30 2c 73 63 69 2d 66 69 2c 32 30 30 31 
0001220 2c 47 72 69 64 6c 69 6e 6b 65 64 2c 4e 65 61 6c 
0001240 20 41 73 68 65 72 2c 49 61 6e 20 43 6f 72 6d 61 
0001260 63 2c 31 0d 0a 62 6f 6f 6b 31 31 2c 73 63 69 2d 
0001300 66 69 2c 32 30 30 33 2c 54 68 65 20 4c 69 6e 65 
0001320 20 6f 66 20 50 6f 6c 69 74 79 2c 4e 65 61 6c 20 
0001340 41 73 68 65 72 2c 49 61 6e 20 43 6f 72 6d 61 63 
0001360 2c 32 0d 0a 62 6f 6f 6b 31 32 2c 73 63 69 2d 66 
0001400 69 2c 32 30 30 35 2c 42 72 61 73 73 20 4d 61 6e 
0001420 2c 4e 65 61 6c 20 41 73 68 65 72 2c 49 61 6e 20 
0001440 43 6f 72 6d 61 63 2c 33 0d 0a 
0001452 
+0

它包含一切吗? – Mysterion 2014-09-23 19:59:01

+0

是的,除了'book1'在第一行中包含所有内容 – 2014-09-23 20:02:46

回答

3

仔细看看日志......它说“?book1”被添加了(注意ID中的问号)。 我最好的猜测是,在文件的开始处有一些有趣的字符成为ID的一部分。也许是一个BOM(我知道一些文本编辑恼人地补充说)。 http://en.wikipedia.org/wiki/Byte_order_mark

您可以通过使用“hexdump data.txt”或“od -tx1 data.txt”验证某些内容 您也可以尝试使用其他文本编辑器来删除该文件。

+0

谢谢你,你是男人 – 2014-09-24 15:12:37

0

它看起来像第一行被视为标题。

但你确实有头参数。我会检查它拼写是否正确,可能会在参数列表中提前移动它。检查solrconfig.xml是否覆盖该值。另外,如果您使用的是Solr 4+,请尝试跳过URL末尾的/ csv位。这是一个传统的地址,也许它对标题行有一些期望。

+0

yeap它拼写正确,即使我删除/ csv它仍然不插入“book1”。我也使用4+。你是什​​么意思通过solrconfig.xml我的意思是我应该检查哪个参数。我在我的问题中添加了solr日志。 – 2014-09-23 20:43:23

+0

应用/ CSV 2014-09-23 20:52:49

+0

嗨,亚历山大,你可以在你的solr中尝试一样,并得到相同的结果。 – 2014-09-24 11:49:46