haskell io流内存

我尝试解析haskell中的大型日志文件。我使用System.IO.Streams，但是当我折叠输入时，它似乎吃了很多内存。这里有两个（难看的）例子：haskell io流内存

首先将1M Int加载到列表中的存储器中。

let l = foldl (\aux p -> p:aux) [] [1..1000000] 
return (sum l)

内存消耗很漂亮。 INTS吃3MB和列表需要6MB：

see memory consumption of building list of 1M Int

然后试着用字节串的流相同。我们需要一个丑陋的来回交谈，但我不认为有什么差别

let s = Streams.fromList $ map (B.pack . show) [1..1000000] 

l <- s >>= 
     Streams.map bsToInt >>= 
     Streams.fold (\aux p -> p:aux) [] 

return (sum l)

see memory consumption of building a list of Ints from a stream

为什么它需要更多的内存？如果我从文件中读取它会更糟糕。它需要90Mb

result <- withFileAsInput file load 
putStrLn $ "loaded " ++ show result 
where load is = do 
     l <- Streams.lines is >>= 
      Streams.map bsToInt >>= 
      Streams.fold (\aux p -> p:aux) [] 

     return (sum l)

我的假设是Streams.fold有一些问题。因为库的内置countInput方法不使用它。任何想法？

编辑

调查后我降低问题是：为什么会发生这种代码需要额外的50MB的？

do 
    let l = map (Builder.toLazyByteString . intDec) [1..1000000] 
    let l2 = map (fst . fromJust . B.readInt) l 
    return (foldl' (\aux p -> p:aux) [] l2)

没有转换它只需要30Mb，转换90Mb。

来源

2013-08-20 halacsy

在你的第一个例子中，foldl (\aux p -> p:aux) []是多余的。它使用与它作为参数所需的列表相同的元素构造一个列表！没有冗余，该示例相当于sum [1..1000000]或foldl (+) 0 [1..1000000]。另外，最好使用严格左边的折叠foldl'来避免堆中可还原表达式的积累。请参阅Haskell wiki上的Foldr Foldl Foldl'。

在上例中，您使用System.IO.Streams.Combinators.fold来构建从文件中读取的所有整数列表，然后尝试对列表求和，就像您在第一个示例中所做的那样。

的问题是，因为读取由IO单子规定操作，开始总结列表之前文件中的所有数据被读取文件的测序，并潜伏在堆上，还可能从原来的字符串未转换，并采取更多的内存。

解决方案是在每个新元素到达时执行内部的实际总和;这样，您不需要随时在内存中拥有完整列表，只有当前元素（在执行I/O时能够执行此操作是流式库的目标之一）。由io-streams提供的折叠是严格的，类似于foldl'。所以你也不会在堆上累积可缩减的表达式。

尝试类似System.IO.Streams.Combinators.fold (+) 0。

来源

2013-08-20 19:07:07 danidiaz

'foldl（\ aux p - > p：aux）[]'并不总是多余的。试着拿20美元foldl（\ aux p - > p：aux）[] [1 ..]'vs'20美元[1 ..]'。 – bennofs

@bennofs你是对的。处理无限列表是有区别的.http：//stackoverflow.com/questions/3082324/foldl-versus-foldr-behavior-with-infinite-lists – danidiaz

也许我只是错了，问题来自将int转换为bytestring然后回来。我比较了两种解决方案'foldl（\ aux p - > p：aux）[] $ map（bsToInt。iToBs）[1..1000000]'并且没有转换。如果我有字节串，会出现额外的40Mb。 – halacsy

-1

所以问题是ByteString的惰性创建，而不是迭代器。见 Why creating and disposing temporal ByteStrings eats up my memory in Haskell?

来源

2013-08-25 22:36:50 halacsy

haskell io流内存

回答

相关问题