2014-09-20 179 views
2

我正在尝试基于Clojure中的迭代来为大文件编写阅读器。但是我怎样才能在Clojure中一行一行地返回?我想让这样的事情:大文件逐行阅读

(的println(do_something(READFILE(:文件选择采用)));处理和打印第一线
(的println(do_something(READFILE(:文件选择采用)));工艺并打印第二行

代码:

(ns testapp.core 
    (:gen-class) 
    (:require [clojure.tools.cli :refer [cli]]) 
    (:require [clojure.java.io])) 


(defn readFile [file, cnt] 
    ; Iterate over opened file (read line by line) 
    (with-open [rdr (clojure.java.io/reader file)] 
    (let [seq (line-seq rdr)] 
     ; how return only one line there? and after, when needed, take next line? 
    ))) 

(defn -main [& args] 
    ; Main function for project 
    (let [[opts args banner] 
     (cli args 
      ["-h" "--help" "Print this help" :default false :flag true] 
      ["-f" "--file" "REQUIRED: File with data"] 
      ["-c" "--clusters" "Count of clusters" :default 3] 
      ["-g" "--hamming" "Use Hamming algorithm"] 
      ["-e" "--evklid" "Use Evklid algorithm"] 
     )] 
    ; Print help, when no typed args 
    (when (:help opts) 
     (println banner) 
     (System/exit 0)) 
    ; Or process args and start work 
    (if (and (:file opts) (or (:hamming opts) (:evklid opts))) 
     (do 
     ; Use Hamming algorithm 
     (if (:hamming opts) 
      (do 
      (println (readFile (:file opts)) 
      (println (readFile (:file opts)) 
     ) 
      ;(count (readFile (:file opts))) 
     ; Use Evklid algorithm 
     (println "Evklid"))) 
     (println "Please, type path for file and algorithm!")))) 
+0

你所说的 “回线” 是什么意思?你可以在一些原子中写出你的行,但是所有的逐行读数都是毫无意义的 - 你的原子保存在记忆中。让你的readFile接受处理函数并打印结果。 – coredump 2014-09-20 15:23:01

回答

3

可能是我很不理解什么叫“由线回线”的意思是对的,但我会建议你写的功能,接受文件和处理功能,t母鸡为您的大文件的每一行打印处理功能的结果。或者,evem更一般的方式,让我们接受处理功能和输出功能(默认调用println),所以如果我们想不仅仅是打印,但把它通过网络,保存在某处,发送到另一个线程,等:

(defn process-file-by-lines 
    "Process file reading it line-by-line" 
    ([file] 
    (process-file-by-lines file identity)) 
    ([file process-fn] 
    (process-file-by-lines file process-fn println)) 
    ([file process-fn output-fn] 
    (with-open [rdr (clojure.java.io/reader file)] 
    (doseq [line (line-seq rdr)] 
     (output-fn 
     (process-fn line)))))) 

所以

(process-file-by-lines "/tmp/tmp.txt") ;; Will just print file line by ine 
(process-file-by-lines "/tmp/tmp.txt" 
         reverse) ;; Will print each line reversed 
4

您也可以尝试从读者,这是不一样的line-seq返回的字符串列表懒懒洋洋地阅读。细节在this answer to a very similar question讨论,但它的要点是在这里:

(defn lazy-file-lines [file] 
     (letfn [(helper [rdr] 
       (lazy-seq 
        (if-let [line (.readLine rdr)] 
        (cons line (helper rdr)) 
        (do (.close rdr) nil))))] 
     (helper (clojure.java.io/reader file)))) 

然后,您可以在map将只在必要时尽量读线。正如链接答案中更详细地讨论的那样,缺点是如果您直到文件结尾都没有阅读,则(.close rdr)将永远不会运行,这可能会导致资源问题。

+1

即使您想要结束,也无法关闭文件,因为描述符在本地范围内。可能是如果你真的需要懒惰seq,最好明确地打开和关闭。 – coredump 2014-09-20 16:52:47

2

尝试doseq:

(defn readFile [file] 
    (with-open [rdr (clojure.java.io/reader file)] 
    (doseq [line (line-seq rdr)] 
     (println line))))