2017-08-04 22 views
0

所以我有一个查询可以从Web界面执行(并获取结果)。接下来,我想在python脚本中使用这个查询。但是,这失败了。接下来是细节。使用BigQuery时对Python API进行端口网络界面查询

假设这是在Web界面中使用的查询。

SELECT 
    MIN(visitStartTime) 
FROM (TABLE_DATE_RANGE([123456789.ga_sessions_], TIMESTAMP('2017-02-22'), TIMESTAMP('2017-05-22'))) 
GROUP BY 
    visitId, 
    fullVisitorId 
LIMIT 
    1000 

接下来,我想使用这个来自Python的查询。首先,这里有两个实用功能(基于谷歌的引用):

def async_query(query, project='ga---big-query', max_results=1000): 
    client = bigquery.Client(project) 
    query_job = client.run_async_query(str(uuid.uuid4()), query) 
    query_job.use_legacy_sql = False 
    query_job.begin() 

    wait_for_job(query_job) 

    rows = query_job.results().fetch_data(max_results) 
    return rows 


def wait_for_job(job): 
    while True: 
     job.reload() # Refreshes the state via a GET request. 
     if job.state == 'DONE': 
      if job.error_result: 
       raise RuntimeError(job.errors) 
      return 
     time.sleep(1) 

最后,这里的查询:

query = """SELECT 
    MIN(visitStartTime) 
FROM (TABLE_DATE_RANGE([94860076.ga_sessions_], TIMESTAMP('2017-02-22'), TIMESTAMP('2017-05-22'))) 
GROUP BY 
    visitId, 
    fullVisitorId 
LIMIT 
    1000 
""" 

res = async_query(query) 

这将返回以下错误:

--------------------------------------------------------------------------- 
RuntimeError        Traceback (most recent call last) 
<ipython-input-64-1573194bda70> in <module>() 
    27 # query = 'SELECT visitId FROM `94860076.ga_sessions_20170802`' 
    28 
---> 29 res = async_query(query) 

<ipython-input-33-e8addf14673a> in async_query(query, project, max_results) 
     5  query_job.begin() 
     6 
----> 7  wait_for_job(query_job) 
     8 
     9  rows = query_job.results().fetch_data(max_results) 

<ipython-input-33-e8addf14673a> in wait_for_job(job) 
    16   if job.state == 'DONE': 
    17    if job.error_result: 
---> 18     raise RuntimeError(job.errors) 
    19    return 
    20   time.sleep(1) 

RuntimeError: [{'reason': 'invalidQuery', 'location': 'query', 'message': 'Syntax error: Expected "," or "]" but got identifier "ga_sessions_" at [3:34]'}] 

我怀疑问题在于表的命名,但我不知道如何解决它。我没有管理端口SELECT visitId FROM [94860076.ga_sessions_20170802]query = SELECT visitId FROM <backtick>94860076.ga_sessions_20170802<backtick>

回答

2

的问题是发生在这条线:

query_job.use_legacy_sql = False 

当您正在使用传统的SQL,它应该是:

query_job.use_legacy_sql = True 

或者你可以离开它未指定为默认值为True

尽管如此,强烈建议您开始使用Standard Version SQL,它更强大,更稳定,也是BigQuery团队推荐的方法。

查询标准版会是这样的:

SELECT 
    MIN(visitStartTime) 
FROM `94860076.ga_sessions_*` 
WHERE _TABLE_SUFFIX BETWEEN '20170222' AND '20170522' 
GROUP BY 
    visitId, 
    fullVisitorId 
LIMIT 
    1000 
+0

大有赶超!你能提供一个提示如何翻译我对标准版本的查询吗? – Dror

+0

刚编辑答案:) –

+0

令人惊叹!如果我理解正确,在网络界面中,方言是传统方言。那是对的吗? – Dror