如何使用Python从MongoDB tweet数据库创建.csv文件

我在MongoDB中创建了一个数据库，其中包含基于tweepy和NLTK的tweet和感伤分析。在使用MongoXport创建一个带有MongoDB中存储的数据库数据集的CSV文件后，我决定探索更灵活的其他选项（特别是使用其他分隔符而不是“昏迷”），例如，使用Python本身来生成CSV文件。到目前为止，我可以成功打印数据集，纠正ASCII和Unicode问题并使用“|”作为分隔符，但是我很难从打印结果创建一个CSV文件。到目前为止的代码是如下：如何使用Python从MongoDB tweet数据库创建.csv文件

import json 
import csv 
from pymongo import MongoClient 

client = MongoClient('localhost', 27017) 
db = client['twitter_db_stream_1'] 
collection = db['twitter_collection'] 
data_python = collection.find({"user.location":{"$exists":True},"user.location":{"$ne":"null"}},{"created_at":1,"text":1,"user.name":1,"user.location":1,"geo.coordinates":1,"sentiment_value":1,"confidence_value":1}) 

for data in data_python: 
    print(data['created_at'],'|',data['text'].encode('utf8'),'|',data['user']['name'].encode('utf8'),'|',data['user']['location'],'|',data['sentiment_value'],'|',data['confidence_value'])

的打印结果如下：

Tue Apr 18 06:51:58 +0000 2017 | b'Samsung Galaxy S8 International Giveaway @androidauth #giveaway | b'Matt Torok' | None | pos | 1.0

我尝试添加下面的一段使用csv.writer代码，基于从tutorias一些实例中，但它不工作...

csv_file = open('Sentiment_Analisys.csv', 'wb') 
writer = csv.writer(csv_file) 

fields = [["created_at"],["text"],["user.name"],["user.location"],["sentiment_value"],["confidential_value"]] #field names 
writer.writerow(fields) 

for data in data_python: 
    writer.writerow(data['created_at'],data['text'].encode('utf8'),data['user']['name'].encode('utf8'),data['user']['location'],data['sentiment_value'],data['confidence_value']) 

csv_file.close()

请问，有人可以给我一些指导，如何从上面的打印结果创建此CSV文件？

非常感谢！

来源

2017-07-05 Marcelo Couto

你可以转换从'data_python'到'str'的所有东西，只需用逗号打印它？ – PYA

为什么要从打印结果创建CSV？还是你打算创建一个'|'分隔CSV？ – Tanu

感谢您的评论朋友！我需要创建此CSV文件以便稍后在SQL数据库中使用它。要在SQL中将CSV文件作为平面文件源上传，有时候逗号不是有效的分隔符，特别是如果tweet上的文本带有逗号。 –

您似乎已经复制了Python 2.x示例，但正在编写Python 3.x代码。这两个版本的CSV使用情况略有不同。另外，在处理文件时最好使用with语句，这样可以避免在最后显式关闭文件。

writerow()取得一个字符串列表。你的字段名被定义为一个列表的列表，您的信息writerow()需要转换到使用列表如下：

field_names = ["created_at", "text", "user.name", "user.location", "sentiment_value", "confidential_value"] 

with open('Sentiment_Analisys.csv', 'w', newline='') as f_output: 
    csv_output = csv.writer(f_output) 
    csv_output.writerow(field_names) 

    for data in data_python: 
     csv_output.writerow(
      [ 
      data['created_at'],data['text'].encode('utf8', 'ignore'), 
      data['user']['name'].encode('utf8'), 
      data['user']['location'], 
      data['sentiment_value'], 
      data['confidence_value'] 
      ])

来源

2017-07-05 10:18:15

非常感谢Martin，我会尝试这个选项！ –

嗨马丁，非常感谢您的建议。我申请，它工作得很好！唯一的问题是，当我在tweet文本中有“emoji”时，我收到以下错误消息：inline'return codecs.charmap_encode（input，self.errors，encoding_table）[0] UnicodeEncodeError：'charmap'codec can' t编码位置161-162中的字符：字符映射到' –

请问，你有没有建议跳过“表情符号”字符？我试图将'utf8'改为'unicode_escape'，但它不起作用。非常感谢！ –

逗人，下面我想和大家分享的最终代码，获得的支持后，在stackoverflow好朋友。 Mongoexport有其优点，但如果您需要一些灵活性来定义自己的分隔符来创建CSV文件，则此代码可能很有趣。唯一的问题是，你可能会失去“表情符号”字符，因为它们通过UTF-8转换为文本代码。无论如何，根据您的要求，这种限制可能不成问题。根据上面发布的代码，我从Mongo Client传输的查询"user.location":{"$ne":"null"}}有所不同，但在Python代码中，您应该将"null"更改为"None"。我希望我的旅程能够在下面找到正确的代码，并且我的朋友在这篇文章中给予的支持，对未来的某个人可能会有用！最好的祝福！

import pymongo 
import json 
import csv 
import numpy 
import sys 
from pymongo import MongoClient 

client = MongoClient('localhost', 27017) 
db = client['twitter_db_stream_1'] 
collection = db['twitter_collection'] 
data_python = collection.find({"user.location":{"$exists":True},"user.location":{"$ne":None}},{"created_at":1,"text":1,"user.name":1,"user.location":1,"sentiment_value":1,"confidence_value":1}) 

field_names = ["created_at", "text", "user.name", "user.location", "sentiment_value", "confidential_value"] 

with open('Sentiment_Analisys.csv', 'w', newline='') as f_output: 
    csv_output = csv.writer(f_output, delimiter="|") 
    csv_output.writerow(field_names) 

    for data in data_python: 
     csv_output.writerow(
      data['created_at'], 
      data['text'].encode('utf8', 'ignore'), 
      data['user']['name'].encode('utf8'), 
      data['user']['location'], 
      data['sentiment_value'], 
      data['confidence_value'] 
      ])

来源

2017-07-16 01:30:07

如何使用Python从MongoDB tweet数据库创建.csv文件

回答

相关问题