Python脚本接收UnicodeEncodeError：“ASCII”编解码器不能编码字符

我有一个从reddit的拉职位和他们在Twitter上简单的Python脚本。不幸的是，今晚它开始出现我所假设的问题，因为某人在reddit上的标题有格式问题。那我reciving的错误是：Python脚本接收UnicodeEncodeError：“ASCII”编解码器不能编码字符

File "redditbot.py", line 82, in <module> 
    main() 
File "redditbot.py", line 64, in main 
tweeter(post_dict, post_ids) 
File "redditbot.py", line 74, in tweeter 
print post+" "+post_dict[post]+" #python" 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 34: ordinal not in range(128)

这里是我的脚本：

# encoding=utf8 
import praw 
import json 
import requests 
import tweepy 
import time 
import urllib2 
import sys 
reload(sys) 
sys.setdefaultencoding('utf8') 

access_token = 'hidden' 
access_token_secret = 'hidden' 
consumer_key = 'hidden' 
consumer_secret = 'hidden' 


def strip_title(title): 
    if len(title) < 75: 
    return title 
else: 
    return title[:74] + "..." 

def tweet_creator(subreddit_info): 
post_dict = {} 
post_ids = [] 
print "[bot] Getting posts from Reddit" 
for submission in subreddit_info.get_hot(limit=2000): 
    post_dict[strip_title(submission.title)] = submission.url 
    post_ids.append(submission.id) 
print "[bot] Generating short link using goo.gl" 
mini_post_dict = {} 
for post in post_dict: 
    post_title = post 
    post_link = post_dict[post] 

    mini_post_dict[post_title] = post_link 
return mini_post_dict, post_ids 

def setup_connection_reddit(subreddit): 
print "[bot] setting up connection with Reddit" 
r = praw.Reddit('PythonReddit PyReTw' 
      'monitoring %s' %(subreddit)) 
subreddit = r.get_subreddit('python') 
return subreddit 



def duplicate_check(id): 
found = 0 
with open('posted_posts.txt', 'r') as file: 
    for line in file: 
     if id in line: 
      found = 1 
return found 

def add_id_to_file(id): 
with open('posted_posts.txt', 'a') as file: 
    file.write(str(id) + "\n") 

def main(): 
subreddit = setup_connection_reddit('python') 
post_dict, post_ids = tweet_creator(subreddit) 
tweeter(post_dict, post_ids) 

def tweeter(post_dict, post_ids): 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret) 
api = tweepy.API(auth) 
for post, post_id in zip(post_dict, post_ids): 
    found = duplicate_check(post_id) 
    if found == 0: 
     print "[bot] Posting this link on twitter" 
     print post+" "+post_dict[post]+" #python" 
     api.update_status(post+" "+post_dict[post]+" #python") 
     add_id_to_file(post_id) 
     time.sleep(3000) 
    else: 
     print "[bot] Already posted" 

if __name__ == '__main__': 
main()

任何帮助将是非常赞赏 - 在此先感谢！

来源

2016-01-17 Arbaxas

你介意修理你的例子的缩进：例如，格式和打印字节之前编码post明确？ – karlson

你可能会觉得这篇文章有用：[Pragmatic Unicode]（http://nedbatchelder.com/text/unipain.html），这是SO老将Ned Batchelder写的。 –

问题可能源自于串联混合字节串和unicode字符串。作为在u前缀所有字符串文字的替代方法，可能为

from __future__ import unicode_literals

为您修复了一些事情。请参阅here以获得更深入的解释，并决定它是否适合您。

来源

2016-01-17 10:58:43 karlson

你要打印unicode字符串到终端（或者可能是通过IO重定向文件），但您的终端（或文件系统）中使用的编码是ASCII。由于Python试图将其从unicode表示转换为ASCII，但因为代码点u'\u201c'（“）无法用ASCII表示，所以它失败。有效地你的代码是这样做的：

>>> print u'\u201c'.encode('ascii') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)

你可以尝试转换为UTF-8：

print (post + " " + post_dict[post] + " #python").encode('utf8')

或转换为ASCII这样的：

print (post + " " + post_dict[post] + " #python").encode('ascii', 'replace')

将取代无效的ASCII字符与?。

另一种方式，如果要打印的调试的目的是有用的，是打印字符串的repr：

print repr(post + " " + post_dict[post] + " #python")

这将输出是这样的：

>>> s = 'string with \u201cLEFT DOUBLE QUOTATION MARK\u201c' 
>>> print repr(s) 
u'string with \u201cLEFT DOUBLE QUOTATION MARK\u201c'

来源

2016-01-17 11:00:48 mhawke

考虑这个简单的程序：

print(u'\u201c' + "python")

如果您尝试打印到终端L（用适当的字符编码），你会得到

“python

但是，如果你试图输出重定向到一个文件，你会得到一个UnicodeEncodeError。

script.py > /tmp/out 
Traceback (most recent call last): 
    File "/home/unutbu/pybin/script.py", line 4, in <module> 
    print(u'\u201c' + "python") 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)

当您打印到终端时，Python使用终端的字符编码来编码unicode。（终端只能打印字节，所以unicode的必须按顺序进行编码，以进行打印。）

当重定向输出到文件，Python不能确定字符编码，因为文件没有声明编码。因此默认情况下，Python2在写入文件之前使用ascii编码隐式编码所有unicode。由于u'\u201c'不能被ascii编码，所以UnicodeEncodeError。（只有前127个unicode代码点可以用ascii编码）。

此问题在Why Print Fails wiki中有详细说明。

要解决这个问题，首先要避免添加unicode和字节字符串。这会导致使用Python2中的ascii编解码器进行隐式转换，以及Python3中的异常。为了将来能够验证你的代码，最好是明确的。

post = post.encode('utf-8') 
print('{} {} #python'.format(post, post_dict[post]))

来源

2016-01-17 11:14:08 unutbu

Python脚本接收UnicodeEncodeError：“ASCII”编解码器不能编码字符

回答

相关问题