2017-10-06 24 views
3

我一直在想如何实现我原本以为是一个简单的程序。 我有一个文本文件的所有用'$$'分隔的文本文件如何使用Golang自定义扫描器字符串文字并扩展内存以将整个文件加载到内存中?

我希望程序解析报价文件并随机选择3个引号来显示和标准输出。

该文件中有1022个引号。

当我试图分裂文件,我得到这个错误: 缺少“

我似乎无法弄清楚如何分配$$用于字符串,我不断收到:
失踪“

这是自定义扫描:

onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) { 
    for i := 0; i < len(data); i++ { 
     //if data[i] == "$$" {    # this is what I did originally 
     //if data[i:i+2] == "$$" { # (mismatched types []byte and string) 
     //if data[i:i+2] == `$$` { # throws (mismatched types []byte and string) 
     // below throws syntax error: unexpected $ AND missing ' 
     if data[1:i+2] == '$$' { 
      return i + 1, data[:i], nil 
     } 
    } 

字符串字面工作正常,如果我只用一个$

由于某种原因只有71个报价被加载到报价切片中。我不知道如何扩大。允许所有1022个引号存储在内存中。

我一直有一个非常困难的时间试图找出如何做到这一点。这就是我现在所拥有的:

package main 
import ( 
    "bufio" 
    "fmt" 
    "log" 
    "math/rand" 
    "os" 
    "time" 
) 

func main() { 
    rand.Seed(time.Now().UnixNano()) // Try changing this number! 
    quote_file, err := os.Open("/Users/bryan/Dropbox/quotes_file.txt") 
    if err != nil { 
     log.Fatal(err) 
    } 
    scanner := bufio.NewScanner(quote_file) 
    // define split function 
    onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) { 
     for i := 0; i < len(data); i++ { 
      if data[i] == '$$' { 
       return i + 1, data[:i], nil 
      } 
     } 
     fmt.Print(data) 
     return 0, data, bufio.ErrFinalToken 
    } 
    scanner.Split(onDollarSign) 
    var quotes []string 

    // I think this will scan the file and append all the parsed quotes into quotes 
    for scanner.Scan() { 
     quotes = append(quotes, scanner.Text()) 

    } 
    if err := scanner.Err(); err != nil { 
     fmt.Fprintln(os.Stderr, "reading input:", err) 
    } 
    fmt.Print(len(quotes)) 
    fmt.Println("quote 1:", quotes[rand.Intn(len(quotes))]) 
    fmt.Println("quote 2:", quotes[rand.Intn(len(quotes))]) 
    fmt.Println("quote 3:", quotes[rand.Intn(len(quotes))]) 
} 

回答

3

无论如何,如果最终读取整个文件,使用扫描仪是一种令人费解的方式。我想读取整个文件,然后简单地将它分成报价列表:

package main 

import (
    "bytes" 
    "io/ioutil" 
    "log" 
    "math/rand" 
    "os" 
) 

func main() { 
    // Slurp file. 
    contents, err := ioutil.ReadFile("/Users/bryan/Dropbox/quotes_file.txt") 
    if err != nil { 
      log.Fatal(err) 
    } 

    // Split the quotes 
    separator := []byte("$$") // Convert string to []byte 
    quotes := bytes.Split(contents, separator) 

    // Select three random quotes and write them to stdout 
    for i := 0; i < 3; i++ { 
      n := rand.Intn(len(quotes)) 
      quote := quotes[n] 

      os.Stdout.Write(quote) 
      os.Stdout.Write([]byte{'\n'}) // new line, if necessary 
    } 
} 

使用扫描仪,如果你读取文件之前选择了三个报价才有意义;那么你可以在你到达最后一个报价后停止阅读。

+0

因此,只有在您使用部分文件的情况下才能使用Scanner?在这种情况下,如果没有完整地读取文件,您将如何计算文件中的引号总数? – BryanWheelock

+1

如果您不知道前面的引用数量,您别无选择,只能阅读整个文件。在这种情况下,使用扫描仪比啜泣文件和分割字节更复杂。 – Peter

3

在golang单引号'用于单个字符(所谓的“符文” - 在内部它是一个int32与Unicode代码点),和双引号字符串从而可以超过1个字符:"$$"

因此解析器在第一个美元符号之后等待一个关闭符文细菌'

这里有一个很好的文章:https://blog.golang.org/strings

UPDATE:如果你想避免铸造的所有data字符串您可以检查这样:

... 
    onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) { 
     for i := 0; i < len(data); i++ { 
      if data[i] == '$' && data[i+1] == '$' { ///// <---- 
       return i + 1, data[:i], nil 
      } 
     } 
     fmt.Print(data) 
     return 0, data, bufio.ErrFinalToken 
    } 
... 
+0

我改做双引号:不能将 “$$” 输入字节 – BryanWheelock

+0

我thinnk这是由于比较'数据[i] =='$$''。尝试使用'data [i:i + 1] ==“$$”'。您能否将代码和一些数据示例加载到https://play.golang.org以查看它是否还活着? –

+0

我的错误:'数据[I:+ 2] == “$$”' –

1

我改写了你的分割功能基于关闭STDLIB func bufio.Scanlines

我还没有彻底测试过,所以你应该锻炼它。你也应该决定如何处理文件末尾的空行,如换行符。

func onDollarSign(data []byte, atEOF bool) (advance int, token []byte, err error) { 

    // If we are at the end of the file and there's no more data then we're done 
    if atEOF && len(data) == 0 { 
     return 0, nil, nil 
    } 

    // If we are at the end of the file and there IS more data return it 
    if atEOF { 
     return len(data), data, nil 
    } 

    // If we find a $ then check if the next rune after is also a $. If so we 
    // want to advance past the second $ and return a token up to but not 
    // including the first $. 
    if i := bytes.IndexByte(data, '$'); i >= 0 { 
     if len(data) > i && data[i+1] == '$' { 
      return i + 2, data[0:i], nil 
     } 
    } 

    // Request more data. 
    return 0, nil, nil 
} 
1

正在扫描引号(scanQuotes)类似于扫描线(bufio.ScanLines)。例如,

package main 

import (
    "bufio" 
    "bytes" 
    "fmt" 
    "os" 
    "strings" 
) 

func dropCRLF(data []byte) []byte { 
    if len(data) > 0 && data[len(data)-1] == '\n' { 
     data = data[0 : len(data)-1] 
     if len(data) > 0 && data[len(data)-1] == '\r' { 
      data = data[0 : len(data)-1] 
     } 
    } 
    return data 
} 

func scanQuotes(data []byte, atEOF bool) (advance int, token []byte, err error) { 
    if atEOF && len(dropCRLF(data)) == 0 { 
     return len(data), nil, nil 
    } 
    sep := []byte("$$") 
    if i := bytes.Index(data, sep); i >= 0 { 
     return i + len(sep), dropCRLF(data[0:i]), nil 
    } 
    if atEOF { 
     return len(data), dropCRLF(data), nil 
    } 
    return 0, nil, nil 
} 

func main() { 
    /* 
     quote_file, err := os.Open("/Users/bryan/Dropbox/quotes_file.txt") 
     if err != nil { 
     log.Fatal(err) 
     } 
    */ 
    quote_file := strings.NewReader(shakespeare) // test data 

    var quotes []string 
    scanner := bufio.NewScanner(quote_file) 
    scanner.Split(scanQuotes) 
    for scanner.Scan() { 
     quotes = append(quotes, scanner.Text()) 
    } 
    if err := scanner.Err(); err != nil { 
     fmt.Fprintln(os.Stderr, "reading quotes:", err) 
    } 

    fmt.Println(len(quotes)) 
    for i, quote := range quotes { 
     fmt.Println(i, quote) 
    } 
} 

var shakespeare = `To be, or not to be: that is the question$$All the world‘s a stage, and all the men and women merely players. They have their exits and their entrances; And one man in his time plays many parts.$$Romeo, Romeo! wherefore art thou Romeo?$$Now is the winter of our discontent$$Is this a dagger which I see before me, the handle toward my hand?$$Some are born great, some achieve greatness, and some have greatness thrust upon them.$$Cowards die many times before their deaths; the valiant never taste of death but once.$$Full fathom five thy father lies, of his bones are coral made. Those are pearls that were his eyes. Nothing of him that doth fade, but doth suffer a sea-change into something rich and strange.$$A man can die but once.$$How sharper than a serpent’s tooth it is to have a thankless child!` + "\n" 

游乐场:https://play.golang.org/p/zMuWMxXJyQ

输出:

10 
0 To be, or not to be: that is the question 
1 All the world‘s a stage, and all the men and women merely players. They have their exits and their entrances; And one man in his time plays many parts. 
2 Romeo, Romeo! wherefore art thou Romeo? 
3 Now is the winter of our discontent 
4 Is this a dagger which I see before me, the handle toward my hand? 
5 Some are born great, some achieve greatness, and some have greatness thrust upon them. 
6 Cowards die many times before their deaths; the valiant never taste of death but once. 
7 Full fathom five thy father lies, of his bones are coral made. Those are pearls that were his eyes. Nothing of him that doth fade, but doth suffer a sea-change into something rich and strange. 
8 A man can die but once. 
9 How sharper than a serpent’s tooth it is to have a thankless child! 
相关问题