2016-02-02 24 views
60

同时培养了tensorflow seq2seq模式,我看到以下消息:如何解释张量流中的Poolallocator消息?

 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 27282 get requests, put_count=9311 evicted_count=1000 eviction_rate=0.1074 and unsatisfied allocation rate=0.699032 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 100 to 110 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 13715 get requests, put_count=14458 evicted_count=10000 eviction_rate=0.691659 and unsatisfied allocation rate=0.675684 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 110 to 121 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 6965 get requests, put_count=6813 evicted_count=5000 eviction_rate=0.733891 and unsatisfied allocation rate=0.741421 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 133 to 146 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 44 get requests, put_count=9058 evicted_count=9000 eviction_rate=0.993597 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 46 get requests, put_count=9062 evicted_count=9000 eviction_rate=0.993158 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 4 get requests, put_count=1029 evicted_count=1000 eviction_rate=0.971817 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 2 get requests, put_count=1030 evicted_count=1000 eviction_rate=0.970874 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 44 get requests, put_count=6074 evicted_count=6000 eviction_rate=0.987817 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 12 get requests, put_count=6045 evicted_count=6000 eviction_rate=0.992556 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 2 get requests, put_count=1042 evicted_count=1000 eviction_rate=0.959693 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 44 get requests, put_count=6093 evicted_count=6000 eviction_rate=0.984737 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 4 get requests, put_count=1069 evicted_count=1000 eviction_rate=0.935454 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 17722 get requests, put_count=9036 evicted_count=1000 eviction_rate=0.110668 and unsatisfied allocation rate=0.550615 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 792 to 871 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 6 get requests, put_count=1093 evicted_count=1000 eviction_rate=0.914913 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 6 get requests, put_count=1101 evicted_count=1000 eviction_rate=0.908265 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 3224 get requests, put_count=4684 evicted_count=2000 eviction_rate=0.426985 and unsatisfied allocation rate=0.200062 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 1158 to 1273 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 17794 get requests, put_count=17842 evicted_count=9000 eviction_rate=0.504428 and unsatisfied allocation rate=0.510228 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 1400 to 1540 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 31 get requests, put_count=1185 evicted_count=1000 eviction_rate=0.843882 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 40 get requests, put_count=8209 evicted_count=8000 eviction_rate=0.97454 and unsatisfied allocation rate=0 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 0 get requests, put_count=2272 evicted_count=2000 eviction_rate=0.880282 and unsatisfied allocation rate=-nan 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 0 get requests, put_count=2362 evicted_count=2000 eviction_rate=0.84674 and unsatisfied allocation rate=-nan 
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 38 get requests, put_count=5436 evicted_count=5000 eviction_rate=0.919794 and unsatisfied allocation rate=0 

是什么意思,难道我有一些资源分配问题?我正在Titan X 3500+ CUDA上运行,12 GB GPU

回答

75

TensorFlow有多个内存分配器,用于将以不同方式使用的内存。他们的行为有一些适应性方面。

在您的特殊情况下,由于您使用的是GPU,因此有一个用于CPU内存的PoolAllocator,它已预先向GPU注册以实现快速DMA。预计将从CPU转移到GPU的张量将从该池中分配。

PoolAllocators尝试通过保留一个可以立即重用的已分配和已释放块的池来分摊调用更昂贵的底层分配器的成本。他们的默认行为是缓慢增长,直到驱逐率下降到某个常数以下。 (驱逐率是免费调用的比例,我们将未使用的块从池中返回到底层池,以便不超过大小限制。)在上面的日志消息中,您会看到显示池的“提升pool_size_limit_”行规模越来越大假设你的程序实际上有一个稳定的状态行为,并且它需要最大的区块集合,这个池将会增长以适应它,然后不再增长。它的行为方式是这样的,而不是简单地保留所有分配的块,这样只有很少或仅在程序启动时才需要的大小不太可能保留在池中。

如果内存不足,这些消息应该只是一个值得关注的原因。在这种情况下,日志消息可能有助于诊断问题。另请注意,只有在内存池增长到适当的大小后才能达到峰值执行速度。

+8

让人放心的答案,但对于没有GPU架构专业知识的人来说,这并不容易理解。不过谢谢! – stpk

+0

基于此,你可以回答这个问题:http://stackoverflow.com/questions/35171405/how-to-determine-maximum-batch-s-a-a-a-seq2seq-tensorflow-rnn-training-model – stackit

+0

As一个附录,我认为[code here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/gpu/pool_allocator.cc#L213)将有助于理解。 TF似乎会定期检查驱逐率,如果驱逐率和不满意率都超过了某个数字(那里为0.003),它会增加限制。 –