2013-02-25 40 views
3

我必须在Celery任务期间从AdWords Billing页面下载csv文件。我不知道我的实施有什么问题,所以需要你的帮助。没有内容处理标题作为机械化响应

登录:

browser = mechanize.Browser() 
browser.open('https://accounts.google.com/ServiceLogin') 
browser.select_form(nr=0) 
browser['Email'] = g_email 
browser['Passwd'] = g_password 
browser.submit() 

browser.set_handle_robots(False) 
billing_resp = browser.open('https://adwords.google.com/') 

这是确定的,我的记账页面上现在。接下来,我解析了令牌和标识的结果页面,分析了Chrome调试器中的请求标头和操作url,现在我想发出POST请求并接收我的csv文件。响应报头(在Chrome中)是:

content-disposition:attachment; filename="myclientcenter.csv.gz" 
content-length:307479 
content-type:application/x-gzip; charset=UTF-8 

随着机械化:

data = { 
    '__u': effectiveUserId, 
    '__c': customerId, 
    'token': token, 
} 

browser.addheaders = [ 
    ('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'), 
    ('content-type', 'application/x-www-form-urlencoded'), 
    ("accept-encoding", "gzip,deflate,sdch"), 
    ('user-agent', "Mozilla/5.0"), 
    ('referer', "https://adwords.google.com/mcm/Mcm?__u=8183865359&__c=3069937889"), 
    ('origin', "https://adwords.google.com"), 
] 

browser.set_handle_refresh(True) 
browser.set_debug_responses(True) 
browser.set_debug_redirects(True) 
browser.set_handle_referer(True) 
browser.set_debug_http(True) 
browser.set_handle_equiv(True) 
browser.set_handle_gzip(True) 

response = browser.open(
    'https://adwords.google.com/mcm/file/ClientSummary/', 
    data='&'.join(['='.join(pair) for pair in data.items()]), 
) 

BUT! Content-Length标题为0,在此响应中不包含Content-Disposition。为什么?我能做些什么来使它工作?

进行了尝试使用要求,但不能连通过登录阶段...

回答

0

我有我自己的问题,现在(我的团队领导感谢)的回答。

主要错误是在这种不正确的请求数据:

data = { 
    '__u': effectiveUserId, 
    '__c': customerId, 
    'token': token, 
} 

让我们再试一次,以妥善解决。

# Open Google login page and log in. 
browser = mechanize.Browser() 
try: 
    browser.open('https://accounts.google.com/ServiceLogin') 
    browser.select_form(nr=0) 
    browser['Email'] = '[email protected]' 
    browser['Passwd'] = 'password' 
    browser.submit() 
except HTTPError: 
    raise AdWordsException("Can't find the Google login form") 

我们现在登录并可以更深入。

try: 
    browser.set_handle_robots(False) 
    billing_resp = browser.open('https://adwords.google.com/') 
except HTTPError: 
    raise AdWordsException("Can't open AdWords dashboard page") 

# Welcome to the AdWords billing dashboard. We can get 
# session-unique token from this page for the further POST-request 
token_re = re.search(r"token:\'(.{41})\'", billing_resp.read()) 
if token_re is None: 
    raise AdWordsException("Can't parse the token") 

# It's time for some magic now. We have to construct proper mcsSelector 
# serialized data structure. This is GWT-RPC wire protocol hell. 
# Paste your specific version from web debugger. 
MCS_TEMPLATE = (
    "7|0|49|https://adwords.google.com/mcm/gwt/|18FBB090A5C26E56AC16C9DF0689E720|" 
    "com.google.ads.api.services.common.selector.Selector/1054041135|" 
    "com.google.ads.api.services.common.date.DateRange/1118087507|" 
    "com.google.ads.api.services.common.date.Date/373224763|" 
    "java.util.ArrayList/4159755760|java.lang.String/2004016611|ClientName|" 
    "ExternalCustomerId|PrimaryUserLogin|PrimaryCompanyName|IsManager|" 
    "SalesChannel|Tier|AccountSettingTypes|Labels|Alerts|CostWithCurrency|" 
    "CostUsd|Clicks|Impressions|Ctr|Conversions|ConversionRate|SearchCtr|" 
    "ContentCtr|BudgetAmount|BudgetStartDate|BudgetEndDate|BudgetPercentSpent|" 
    "BudgetType|RemainingBudget|ClientDateTimeZoneId|" 
    "com.google.ads.api.services.common.selector.OrderBy/524388450|" 
    "SearchableData|" 
    "com.google.ads.api.services.common.sorting.SortOrder/2037387810|" 
    "com.google.ads.api.services.common.pagination.Paging/363399854|" 
    "com.google.ads.api.services.common.selector.Predicate/451365360|" 
    "SeedObfuscatedCustomerId|" 
    "com.google.ads.api.services.common.selector.Predicate$Operator/2293561107|" 
    "java.util.Arrays$ArrayList/2507071751|[Ljava.lang.String;/2600011424|" 
    "3069937889|ExcludeSeeds|true|ClientTraversal|DIRECT|" 
    "com.google.ads.api.services.common.selector.Summary/3224078220|included|1|" 
    "2|3|4|5|" 
    "{report_date}|5|{report_date}" # take a note of this 
    "|6|26|7|8|7|9|7|10|7|11|7|12|7|13|7|14|7|15|7|16|7|17|7|18|7|19|7|20|7|21|" 
    "7|22|7|23|7|24|7|25|7|26|7|27|7|28|7|29|7|30|7|31|7|32|7|33|6|0|0|0|6|2|34|" 
    "35|36|0|34|9|-35|37|100|0|6|0|6|3|38|39|40|2|41|42|1|43|38|44|40|0|41|42|1|" 
    "45|38|46|-45|41|42|1|47|0|0|6|0|6|1|48|6|0|49|6|0|0|" 
) 

# To take stats for today 
report_date = datetime.date.today() 
mcs_selector = MCS_TEMPLATE.format(
    report_date='%s|%s|%s' % (
     report_date.day, 
     report_date.month, 
     report_date.year 
    ), 
) 
data = urllib.urlencode({ 
    'token': token_re.group(1), 
    'mcsSelector': mcs_selector, 
}) 

# And... it finally works! Token and proper mcsSelector is all we need. 
# POST-request with this data returns zipped csv file for us with 
# current balance state and another info that's not available via AdWords API 
zipped_csv = browser.open(
    'https://adwords.google.com/mcm/file/ClientSummary', 
    data=data 
) 
# Unpack it and use as you wish. 
with gzip.GzipFile(mode='r', fileobj=zipped_csv) as csv_io: 
    try: 
     csv = StringIO.StringIO(csv_io.read()) 
    except IOError: 
     raise AdWordsException("Can't get CSV file from response") 
    finally: 
     browser.close()