2013-05-08 46 views
0

我试图通过创建与Networkx python包和Gaphi关系图来下载Twitter帐户和追随者信息试图可视化数据。我曾与下载的数据创建python keyError使用networkx包时可视化数据

import networkx as nx 
    import MySQLdb 

    conn = MySQLdb.connect(host="localhost", # your host, usually localhost 
       user="root", # your username 
        passwd="123456", # your password 
        db="twitterbank") # name of the data base 
    cur = conn.cursor() 

    def get_user_info(m): 
     cur.execute("SELECT tweeter_name FROM tweets_fetch where tweeter_id=%s" %m) 

    g=nx.Graph() 

    def add_node_tw(n,weight=None,time=None,location=None): 
     if not g.has_node(n): 
      screen_name=get_user_info(n) 
      g.add_node(n) 
      g.node[n]['weight']=1 
      g.node[n]["screen_name"]=screen_name 
     else: 
      g.node[n]['weight']+=1 

    def add_edge_tw(n1,n2,weight=None): 
     if not g.has_edge(n1,n2): 
      g.add_edge(n1,n2) 
      g[n1][n2]['weight']=1 
     else: 
      g[n1][n2]['weight']+=1 

    #generate set of users 

    users=set() 
    cur.execute("SELECT distinct tweeter_id FROM tweets_fetch") 
    cur.fetchall() 
    for row in cur: 
      users.add(row[0]) 


    g=nx.DiGraph() 

    for u_id in users: 
     add_node_tw(u_id) 
     cur.execute("select * from tweeter_followers where tweeter_id=%s" %u_id) 
     cur.fetchall() 
     for row1 in cur: 
      if row1[0] in users: 
       add_node_tw(row1[0]) 
       add_edge_tw(row1[0],row1[1]) 
    nx.write_graphml(g,'relationship_graphml') 

两个表是:
tweets_fetch: with columns (tweeter_id, tweeter_name, tweet_content, datetime...)
tweeter_followers: with columns (tweeter_id, follower_id)

当我执行上面的代码,错误如下蹦出:

Traceback (most recent call last): 
    File "D:\Sepups\eclipse-SDK-3.7.1-win32- x86_64\eclipse\plugins\org.python.pydev_2.7.3.2013031601\pysrc\pydevd.py", line 1397, in <module> 
    debugger.run(setup['file'], None, None) 
    File "D:\Sepups\eclipse-SDK-3.7.1-win32-x86_64\eclipse\plugins\org.python.pydev_2.7.3.2013031601\pysrc\pydevd.py", line 1090, in run 
    pydev_imports.execfile(file, globals, locals) #execute the script 
    File "D:\java\python\workspace\tweetsHarvest\src\tweet_graph.py", line 47, in <module> 
    add_node_tw(u_id) 
    File "D:\java\python\workspace\tweetsHarvest\src\tweet_graph.py", line 24, in add_node_tw 
    g.node[n]['weight']+=1 
    KeyError: 'weight' 

任何人知道如何要解决这个问题?我真的是python和Gephi的新手。我创建我的代码时提到我创建的代码是http://giladlotan.com/blog/mapping-twitters-python-data-science-communities/

回答

0

我创建了一个基于相同代码的脚本,并且使用一个数据集具有相同的错误。如果您遇到与我相同的问题,那么您的数据中的某些行存在一些问题。对我而言,这只是几千条边缘中的一小部分。要诊断出现问题的位置,可以在add_edge_tw语句之前打印每行,并在add_edge_tw之前添加try/except子句。

我相信其他擅长Python和NetworkX的人可以给出更好的答案,但希望这有助于您在诊断时快速修复。