2013-06-04 45 views
1

我一直试图用Python语言编写一个程序,从Excel文件中读取电池值,并翻译来自单元格的内容爱沙尼亚语译成英语或俄语,并将它们合并为一个字符串。结果打印到文本文件。爱沙尼亚语 - >英语似乎很好地工作,但与俄罗斯,错误开始出现:UnicodeDecodeError错误:在位置8“ASCII”编解码器不能解码字节0xd0:顺序不在范围内(128)

Traceback (most recent call last): 
    File "erid.py", line 140, in <module> 
     f.write(aNimed(row_index, 1, 'ru')+ '\n') 
     File "erid.py", line 120, in aNimed 
    nimi += komponendid[i].strip() 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 8: ordinal not in range(128) 

Traceback (most recent call last): 
File "erid.py", line 140, in <module> 
    f.write(aNimed(row_index, 1, 'ru')+ '\n') File "erid.py", line 120, in aNimed 
    nimi = nimi + komponendid[i][1:].strip() 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 9: ordinal not in 
range(128) 

首先是由“antibakteriaalne”字和第二的“+hoobkäepide”触发。我怀疑是“+”号是在第二种情况下,故障的原因,而不是“a”。一些俄罗斯人物似乎是一个问题,而有些则不是。我有种想法。

Python代码:

# -*- coding: utf-8 -*- 
from xlrd import open_workbook, cellname, XL_CELL_TEXT 
from xlwt import Workbook 
from xlutils.copy import copy 
import sonaraamatud #dictionary 

# open file with data 
book = open_workbook('Datafile.xls') 
# Safe write unicode to ignore unicode errors 
# http://developer.plone.org/troubleshooting/unicode.html 
def safe_write(failName, word): 
    if type(word) == str: 
     failName.write(word + '\n') 
    else: 
     failName.write(word.decode("utf-8") + '\n') 

def safeDecode(word): 
    if type(word) == str: 
     word = unicode(word, 'utf-8', errors='ignore') 
     return word 
    else: 
     word = unicode(word) 
     return word 

# Translate surface coating name 
def translatePind(langa, langb, word): 
     answ = "" 
     if (sonaraamatud.kasOlemas3(langa, sonaraamatud.pinnaKatted) == True): 
       answ = langa 
       return answ 
     #if langa is Estonian 
     if (langa == 'et'): 
       # if langb is english 
       if (langb == 'en'): 
         try: 
           answ = sonaraamatud.pinnakattedEstEng[word] 
         except KeyError: 
           answ = word 
       # If lang b is russian 
       elif (langb == "ru"): 
         try: 
           answ = sonaraamatud.pinnakattedEngRus[sonaraamatud.pinnakattedEstEng[word]] 
         except KeyError: 
           answ = word 

     # if langa is english 
     elif (langa == "en"): 
       # if langb is Estonian 
       if (langb == "et"): 
         try: 
           answ = sonaraamatud.pinnakattedEngEst[word] 
         except KeyError: 
           answ = word 
       # if langb is Russian 
       elif (langb == "ru"): 
         try: 
           answ = sonaraamatud.pinnakattedEngRus[word] 
         except KeyError: 
           answ = "KeyError" 
     return answ 

def aNimed(row, sheetNr, lang): 
     # Function combines name 
     # name: aNimed 
     # @param: rida, lehe number 
     # @return: Product name 
     #vali leht (worksheet) 
     sheet = book.sheet_by_index(sheetNr) #sheetNr 
     komponendid = [] 
     nimi = "" 
     if (lang == 'et'): 
     komponendid.append(str(sheet.cell(row, 5).value)) # Model 
       komponendid.append('(' + sheet.cell(row, 6).value + ')')#surface 
       komponendid.append(sheet.cell(row, 7).value) #extras 
     elif (lang == 'en'): 
       komponendid.append(str(sheet.cell(row, 5).value)) # Mudel 
       komponendid.append('(' + translatePind('et', 'en', sheet.cell(row, 6).value) + ')') 
       komponendid.append(sheet.cell(row, 7).value) #lisad 
     elif (lang == 'ru'): 
       """ 
       Alternativ method trying to use safeDecode, NOT working! 
       komponendid.append(str(safeDecode(sheet.cell(row, 5).value))) # Mudel 
       surface= safeDecode(sheet.cell(row, 6).value) 
       komponendid.append('(' + translatePind('et', 'ru', str(surface)) + ')') 
       komponendid.append(safeDecode(sheet.cell(row, 7).value)) #lisad 
       """ 
       komponendid.append(str(sheet.cell(row, 5).value)) # Mudel 
       komponendid.append('(' + translatePind('et', 'ru',sheet.cell(row, 6).value) + ')') 
       komponendid.append(sheet.cell(row, 7).value) #lisad 
     pikkus = len(komponendid) 

     print(komponendid) 
     for i in range(0, pikkus): 
       if (komponendid[i] == "" or komponendid[i] == "()" or komponendid[i] == " "): 
         i+=1 
         continue 
       elif (i == pikkus-1 and komponendid[i][0] != " "): 
         print("1"+ komponendid[i]) 
         nimi += komponendid[i].strip() 
         i+=1 
       elif (komponendid[i][0] == " " and komponendid[i][1]== "+"): 
         #print("2"+ komponendid[i]) 
         nimi = nimi + komponendid[i][1:].strip() 
         i+=1 
       else : 
         #print("4"+ komponendid[i]) 
         nimi = nimi + komponendid[i].strip() + " " 
         i+=1 
     return nimi 

# Use: aNimed(row, sheetNr, lang) 
sheet = book.sheet_by_index(7) 
f= open('data.txt', 'w') 
for row_index in range (1, sheet.nrows): 
    #print(aNimed(row_index, 5, 'en')) 
    f.write(aNimed(row_index, 1, 'ru')+ '\n') 
    #safe_write(f, aNimed(row_index, 1, 'ru')) 
f.close() 
+0

请问如果使用Unicode字符串,而不是字节字符串工作的呢? (即'u'\ n''等) –

+0

我不认为我理解我应该如何使用它们...... – kyng

+0

将每个字符串从''''或'''''改为'u'''和' U“”'让您使用Unicode字符串,看看是否出现其他错误。 – User

回答

0

这不是特别优雅,但我想我有一个解决办法。从csv文件读取而不是从excel文件读取。例如,

`import csv 
data = [] 
opened_file = open(csv_filename, 'rb') 
reader = csv.reader(opened_file) 
for row in reader: 
    data.append(row) 
opened_file.close()` 

现在,你有你的数据保存为一个列表。做翻译并将其保存为不同的列表,translate_data。现在,这是关键,你可以打开一个新的工作簿

`from xlwt import Workbook 
book = Workbook(encoding="utf8") 
foo = book.add_sheet("foo") 
for row_num in range(len(translated_data)): 
    for col_num in range(len(translated_data[row_num]): 
     foo.write(row_num, col_num, translated_data[row_num][col_num] 
book.save("filename.xls")` 

的关键是,如果你使用工作簿(),您可以指定编码,但如果使用open_workbook(),它看起来像你'用ascii卡住了。

相关问题