2017-04-04 68 views
3

我有一个有关将base64编码的字符串转换为二进制的问题。我收集Fingerprint2D在下面的链接,Python从base64转换为二进制

url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/108770/property/Fingerprint2D/xml" 

Fingerprint2D=AAADccB6OAAAAAAAAAAAAAAAAAAAAAAAAAA8WIEAAAAAAACxAAAAHgAACAAADAzBmAQwzoMABgCI AiTSSACCCAAhIAAAiAEMTMgMJibMsZuGeijn4BnI+YeQ0OMOKAACAgAKAABQAAQEABQAAAAAAAAA AA== 

在PubChem数据库的descriptiong说,这是115字节的字符串,而当转换成二进制应该是920位。我试图将其转换为用下面的二进制,

response = requests.get(url) 
    tree = ET.fromstring(response.text) 

    for el in tree[0]: 
     if "Fingerprint2D" in el.tag: 
      fpp = bin(int(el.text, 16)) 
      print(len(fpp)) 

如果我用上面的代码中,我得到了下面的错误,“价值的错误:用base16无效字面INT():

如果我使用下面的代码,FPP的长度(二进制)等于1278这是不是我的预期。

response = requests.get(url) 
    tree = ET.fromstring(response.text) 

    for el in tree[0]: 
     if "Fingerprint2D" in el.tag: 
      fpp = bin(int(hexlify(el.text), 16)) 
      print(len(fpp)) 

非常感谢!!已经

回答

1

为了解码Base64格式,你需要传递bytes对象的base64.decodebytes功能:

import base64 

t = "AAADccB6OAAAAAAAAAAAAAAAAAAAAAAAAAA8WIEAAAAAAACxAAAAHgAACAAADAzBmAQwzoMABgCI AiTSSACCCAAhIAAAiAEMTMgMJibMsZuGeijn4BnI+YeQ0OMOKAACAgAKAABQAAQEABQAAAAAAAAA AA==".encode("ascii") 

decoded = base64.decodebytes(t) 

print(decoded) 
print(len(decoded)*8) 

我得到如下:

b'\x00\x00\x03q\xc0z8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00<X\x81\x00\x00\x00\x00\x00\x00\xb1\x00\x00\x00\x1e\x00\x00\x08\x00\x00\x0c\x0c\xc1\x98\x040\xce\x83\x00\x06\x00\x88\x02$\xd2H\x00\x82\x08\x00! \x00\x00\x88\x01\x0cL\xc8\x0c&&\xcc\xb1\x9b\x86z(\xe7\xe0\x19\xc8\xf9\x87\x90\xd0\xe3\x0e(\x00\x02\x02\x00\n\x00\x00P\x00\x04\x04\x00\x14\x00\x00\x00\x00\x00\x00\x00\x00' 
920 

所以920位预期。

要获得的数据作为二进制只是迭代上的字节,并转换为使用format和零填充到8位(bin添加0b头,所以它不是合适的),和join串在一起以二进制:

print("".join(["{:08b}".format(x) for x in decoded])) 

结果:

00000000000000000000001101110001110000000111101000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011110001011000100000010000000000000000000000000000000000000000000000001011000100000000000000000000000000011110000000000000000000001000000000000000000000001100000011001100000110011000000001000011000011001110100000110000000000000110000000001000100000000010001001001101001001001000000000001000001000001000000000000010000100100000000000000000000010001000000000010000110001001100110010000000110000100110001001101100110010110001100110111000011001111010001010001110011111100000000110011100100011111001100001111001000011010000111000110000111000101000000000000000001000000010000000000000101000000000000000000101000000000000000001000000010000000000000101000000000000000000000000000000000000000000000000000000000000000000 

(它是920个字符,如预期)

+0

对于二进制输出我期待像“0001001 ..”这样的字符串,还是必须对“解码”输出执行二进制转换? –

+0

了解。看我的编辑。 –

+0

谢谢!然而,还有一件事我不明白,结果二进制字符串的长度是423,我们可以说我们完美解码了预期的920位吗?有没有办法检索长度为920的二进制字符串? –