2013-08-06 42 views
0

我想刮掉Twitter API以检索特定用户的关注者ID,以便我可以映射他们的连接。用于从Twitter API检索关注者的代码导致所有关注者具有相同的ID

当我运行下面的代码时,followerIds为每个用户都是一样的,这不可能是正确的:

try: 
     import json 
    except ImportError: 
     import simplejson as json 
     import urllib2 
     import urllib 
     import codecs 
     import time 
     import datetime 
     import os 
     import random 
     import time 
     import tweepy 
    from tweepy.parsers import RawParser 
     import sys 

    fhLog = codecs.open("LOG.txt",'a','UTF-8') 
    def logPrint(s): 
    fhLog.write("%s\n"%s) 
    print s 

    #List of screennames of users whose followers we want to get 
    users =["_AReichert", 
    "_CindyWallace_", 
    "_MahmoudAbdelal", 
    "1939Ford9N", 
    "1FAMILY2MAN", 
    "8Amber8", 
    "AboutTeaching", 
    "AcamorAcademy", 
    "acraftymom", 
    "ActivNews", 
    "ActuVideosPub", 
    "ad_jonez", 
    "adamsteaching", 
    "ADHD_HELP", 
    "AIHEHistory", 
    "ajpodchaski", 
    "ak2mn", 
    "AkaMsCrowley", 
    "AlanAwstyn", 
    "albertateachers"] 


    # == OAuth Authentication == 


    # The consumer keys can be found on your application's Details 
    # page located at https://dev.twitter.com/apps (under "OAuth settings") 
    consumer_key="" 
    consumer_secret="" 

    # After the step above, you will be redirected to your app's page. 
    # Create an access token under the the "Your access token" section 
    access_token="" 
    access_token_secret="" 


    auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
    auth.set_access_token(access_token, access_token_secret) 

    rawParser = RawParser() 
    api = tweepy.API(auth_handler=auth, parser=rawParser) 


    #Will store ids of followers for each user in the user_output directory 
    os.system("mkdir -p user_output") #Create directory if it does not exist 

    userCnt=0 
    fhOverall=None 
    for user in users: 
     userCnt+=1 
     print("Getting user %s of %s"%(userCnt,len(users))) 
     count=1 
     nCursor=-1#First page 
     while count>0: 
      id_str=user 

      try: 
       fh=open("user_output/"+str(id_str)+"_" + str(count) + ".json","r") 
       result=fh.read() 
       fh.close() 
       wait=0 
      except: 
       result=api.followers_ids(count=5000,user_id=id_str,cursor=nCursor) 
       fh=open("user_output/"+str(id_str)+"_" + str(count) + ".json","w") 
       fh.write(result) 
       fh.close() 
       wait=60 


      result=json.loads(result) 
      nCursor=result["next_cursor_str"] 
      if nCursor=="0": 
       count=-1 
       nCursor=None 
      else: 
       count+=1 
       print("Another page to get") 

      time.sleep(wait) 



    logPrint("\nDONE! Completed Successfully") 
    fhLog.close()  

我该如何解决这个问题?

+0

一方面,您似乎从未将'result'变量放在第一位。它第一次在你的代码中使用'result = json.loads(result)'。 – grncdr

+0

嗨,谢谢你的回复。我意识到我意外地忽略了在'id_str = user'之后定义'result'的try/except循环的代码。 – Clockchan

回答

0

这大概不会回答你的问题,但也有在进口压痕问题...... 试试这个:

try: 
    import json 
except ImportError: 
    import simplejson as json 
import urllib2 
import urllib 
import codecs 
import time 
import datetime 
import os 
import random 
import time 
import tweepy 
from tweepy.parsers import RawParser 
import sys 

此外,您还可以直接创建操作系统模块的目录。试试这个:

if not os.path.exists("./user_output"): 
    os.path.makedirs("./user_output") 

最后,你做一个time.sleep(等待),但可能没有设置等待。试试这个:

if api.followers_ids(count=5000,user_id=id_str,cursor=nCursor): 
    time.sleep(60) 
+0

感谢您的回复,我修复了您指出的内容,但遗憾的是仍无法正常工作。我在原帖中忘记提到的一件事是,每次运行程序时,我都会得到一个不同的普通追随者ID ......每个用户都有相同的追随者ID,但每次运行程序时重复的追随者ID都不相同。我不知道这是否有帮助。再次感谢! – Clockchan

0

为tweepy文档表示该api.followers_ids接受的唯一参数是id,USER_ID或SCREEN_NAME,不是你传递三个参数。

http://pythonhosted.org/tweepy/html/api.html#api-reference

您还需要返回的值赋给结果变量。摆脱if语句并将其放在原处。

result=api.followers_ids(id_str) 
wait=60