2012-11-12 21 views
8

我需要将包含内存使用情况的字符串以字节为单位进行转换,如:1048576(即1M),正好是一个人类可读的版本,反之亦然。字节给人可读,并返回。没有数据丢失

:我已经看了这里: Reusable library to get human readable version of file size?

和这里(即使它不是蟒蛇): How to convert human readable memory size into bytes?

没有到目前为止帮助了我,让我看着别处。

我已经找到了,在这里做到这一点对我来说:http://code.google.com/p/pyftpdlib/source/browse/trunk/test/bench.py?spec=svn984&r=984#137,或者更短的URL:http://goo.gl/zeJZl

验证码:

def bytes2human(n, format="%(value)i%(symbol)s"): 
    """ 
    >>> bytes2human(10000) 
    '9K' 
    >>> bytes2human(100001221) 
    '95M' 
    """ 
    symbols = ('B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y') 
    prefix = {} 
    for i, s in enumerate(symbols[1:]): 
     prefix[s] = 1 << (i+1)*10 
    for symbol in reversed(symbols[1:]): 
     if n >= prefix[symbol]: 
      value = float(n)/prefix[symbol] 
      return format % locals() 
    return format % dict(symbol=symbols[0], value=n) 

,也为转换函数的其他方式(同一网站) :

def human2bytes(s): 
    """ 
    >>> human2bytes('1M') 
    1048576 
    >>> human2bytes('1G') 
    1073741824 
    """ 
    symbols = ('B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y') 
    letter = s[-1:].strip().upper() 
    num = s[:-1] 
    assert num.isdigit() and letter in symbols 
    num = float(num) 
    prefix = {symbols[0]:1} 
    for i, s in enumerate(symbols[1:]): 
     prefix[s] = 1 << (i+1)*10 
    return int(num * prefix[letter]) 

这是伟大的,所有的,但它有一些信息损失,例如:

>>> bytes2human(10000) 
'9K' 
>>> human2bytes('9K') 
9216 

为了尝试解决这个问题,我更改格式的功能bytes2human

分为:format="%(value).3f%(symbol)s")

哪个更漂亮,给我的结果:

>>> bytes2human(10000) 
'9.766K' 

但是当我尝试用human2bytes函数将它们转换回来:

>>> human2bytes('9.766K') 

Traceback (most recent call last): 
    File "<pyshell#366>", line 1, in <module> 
    human2bytes('9.766K') 
    File "<pyshell#359>", line 12, in human2bytes 
    assert num.isdigit() and letter in symbols 
AssertionError 

这是因为.

所以我的问题是,我怎么能转换成一个人类可读的版本放回字节版本,没有数据丢失?

注意:我知道小数点后3位也有点数据丢失。但为了这个问题的目的,现在让我们忽略它,我总是可以将它改变成更大的东西。

回答

5

因此,原来答案是比我想象的要简单得多 - 我只要竟引来更详细的版本function的环节之一:

,它能够处理任何范围,我给它。

但还是非常感谢你的帮助:为提高对精确浮点值的点

## {{{ http://code.activestate.com/recipes/578019/ (r15) 
#!/usr/bin/env python 

""" 
Bytes-to-human/human-to-bytes converter. 
Based on: http://goo.gl/kTQMs 
Working with Python 2.x and 3.x. 

Author: Giampaolo Rodola' <g.rodola [AT] gmail [DOT] com> 
License: MIT 
""" 

# see: http://goo.gl/kTQMs 
SYMBOLS = { 
    'customary'  : ('B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'), 
    'customary_ext' : ('byte', 'kilo', 'mega', 'giga', 'tera', 'peta', 'exa', 
         'zetta', 'iotta'), 
    'iec'   : ('Bi', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi', 'Yi'), 
    'iec_ext'  : ('byte', 'kibi', 'mebi', 'gibi', 'tebi', 'pebi', 'exbi', 
         'zebi', 'yobi'), 
} 

def bytes2human(n, format='%(value).1f %(symbol)s', symbols='customary'): 
    """ 
    Convert n bytes into a human readable string based on format. 
    symbols can be either "customary", "customary_ext", "iec" or "iec_ext", 
    see: http://goo.gl/kTQMs 

     >>> bytes2human(0) 
     '0.0 B' 
     >>> bytes2human(0.9) 
     '0.0 B' 
     >>> bytes2human(1) 
     '1.0 B' 
     >>> bytes2human(1.9) 
     '1.0 B' 
     >>> bytes2human(1024) 
     '1.0 K' 
     >>> bytes2human(1048576) 
     '1.0 M' 
     >>> bytes2human(1099511627776127398123789121) 
     '909.5 Y' 

     >>> bytes2human(9856, symbols="customary") 
     '9.6 K' 
     >>> bytes2human(9856, symbols="customary_ext") 
     '9.6 kilo' 
     >>> bytes2human(9856, symbols="iec") 
     '9.6 Ki' 
     >>> bytes2human(9856, symbols="iec_ext") 
     '9.6 kibi' 

     >>> bytes2human(10000, "%(value).1f %(symbol)s/sec") 
     '9.8 K/sec' 

     >>> # precision can be adjusted by playing with %f operator 
     >>> bytes2human(10000, format="%(value).5f %(symbol)s") 
     '9.76562 K' 
    """ 
    n = int(n) 
    if n < 0: 
     raise ValueError("n < 0") 
    symbols = SYMBOLS[symbols] 
    prefix = {} 
    for i, s in enumerate(symbols[1:]): 
     prefix[s] = 1 << (i+1)*10 
    for symbol in reversed(symbols[1:]): 
     if n >= prefix[symbol]: 
      value = float(n)/prefix[symbol] 
      return format % locals() 
    return format % dict(symbol=symbols[0], value=n) 

def human2bytes(s): 
    """ 
    Attempts to guess the string format based on default symbols 
    set and return the corresponding bytes as an integer. 
    When unable to recognize the format ValueError is raised. 

     >>> human2bytes('0 B') 
     0 
     >>> human2bytes('1 K') 
     1024 
     >>> human2bytes('1 M') 
     1048576 
     >>> human2bytes('1 Gi') 
     1073741824 
     >>> human2bytes('1 tera') 
     1099511627776 

     >>> human2bytes('0.5kilo') 
     512 
     >>> human2bytes('0.1 byte') 
     0 
     >>> human2bytes('1 k') # k is an alias for K 
     1024 
     >>> human2bytes('12 foo') 
     Traceback (most recent call last): 
      ... 
     ValueError: can't interpret '12 foo' 
    """ 
    init = s 
    num = "" 
    while s and s[0:1].isdigit() or s[0:1] == '.': 
     num += s[0] 
     s = s[1:] 
    num = float(num) 
    letter = s.strip() 
    for name, sset in SYMBOLS.items(): 
     if letter in sset: 
      break 
    else: 
     if letter == 'k': 
      # treat 'k' as an alias for 'K' as per: http://goo.gl/kTQMs 
      sset = SYMBOLS['customary'] 
      letter = letter.upper() 
     else: 
      raise ValueError("can't interpret %r" % init) 
    prefix = {sset[0]:1} 
    for i, s in enumerate(sset[1:]): 
     prefix[s] = 1 << (i+1)*10 
    return int(num * prefix[letter]) 


if __name__ == "__main__": 
    import doctest 
    doctest.testmod() 
## end of http://code.activestate.com/recipes/578019/ }}} 
4

你几乎在最后一个注释中回答你自己的问题。

human2bytes(s),输入字符串 - 9.766K例如 - 由两个部分的数量和前缀分裂。断言之后(正如您正确观察的是什么抛出错误),该数字乘以前缀表示的相应值,所以9.766 * 1000 = 9766。 “避免”数据丢失的唯一方法是接受足够精确的浮点值作为输入。

为了使human2bytes接受浮点输入,你既可以删除断言num.isdigit(),然后包裹类型转换num = float(num)与尝试 - 除了,或check it by some other means

+0

+1:复制在这里为后人

的代码。您基本上不能在两者之间执行对称翻译 - 为了简洁起见,人类可读形式会截断值。 – synthesizerpatel