2014-02-23 45 views
0

我有一大堆我想放入JSON的文本格式的文档。这里是我的解析器,它在使用“/ n”的everyline之后将文本拆分为新的JSON字符串,我想将其更改为剪切每个段落。Txt to JSON按段落拆分

package main 

import (
"bufio" 
"encoding/json" 
"fmt" 
"io" 
"log" 
"os" 
"strings" 
) 

func main() { 
myBigThing := make(map[string]map[string]string) 
f, _ := os.Open("strangecountess.txt") 
r := bufio.NewReader(f) 
var currentPage map[string]string 
pageNum := 0 
for { 
    line, err := r.ReadString('\n') 
    if err != nil { 
     if err != io.EOF { 
      log.Println("Error in parsing :", err) 
     } 
     break 
    } 
    if currentPage == nil { 
     currentPage = make(map[string]string) 
     myBigThing[fmt.Sprintf("page%d", pageNum)] = currentPage 
     pageNum++ 
    } else if line == "" { 
     currentPage = nil 
    } else { 
     tokens := strings.Split(line, ":") 
     if len(tokens) == 2 { 
      currentPage[tokens[0]] = tokens[1] 
     } 
    } 
} 
f, err := os.Create("strangecountess.json") 
if err != nil { 
    log.Println("Error :", err) 
    return 
} 
defer f.Close() 
bout, _ := json.Marshal(myBigThing) 
f.Write(bout) 
} 

我愿意改变语言为这个特定的任务,如果有在那里,这样做,我所有的耳朵有些真棒库。然而留下去是首选:)。

+0

哪种语言? – jeremyjjbrown

+0

你如何识别一个段落? –

+0

@jeremyjjbrown @jeremyjjbrown这是在去,我不想指定,因为我是任何将完成这项工作:) – collinglass

回答

0

如果你对其他工具开放,jq可能可以做你所需要的。

假设文件data包含

When in the course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. 
We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed. 

命令

$ jq -MR '.' data 

产生串序列,每输入行一个:

"When in the course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation." 
"We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed." 

命令

$ jq -MR -n '[inputs]' data 

将收集行到一个数组:

[ 
    "When in the course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.", 
    "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed." 
] 

一旦你有一个JSON对象,可以很容易地添加更多的处理。例如此过滤器

$ jq -MR -n '[inputs] | map("\(.[:30])... \(length) characters")' data 

总结了每个行:

[ 
    "When in the course of human ev... 404 characters", 
    "We hold these truths to be sel... 337 characters" 
] 

和该命令

$ jq -MR -n 'reduce inputs as $i ({}; .["\(.|length)"]=$i)' data 

收集线成一个对象

{ 
    "0": "When in the course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.", 
    "1": "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed." 
} 

有一个在线版本在https://jqplay.org/以及。