2017-08-07 42 views
0

我有一个简单的表作为以下查询的结果...如果已经定义了A列中的某个ID的列B中的值,则更新列B中的此ID的所有值(BigQuery SQL)

select clientId cid, customDimensions.value way4id 
from mytable 
where customDimensions.index = 2 and customDimensions.value != 'undefined' 
group by customDimensions.value, cid 
order by cid asc 
limit 100; 

...看起来像这样

 
cid | way4id 
------ | ------ 
a  | not set 
b  | 123 
b  | not set 
c  | not set 
d  | 1234 
d  | not set 

我想是运行更新查询,以便得到这个

 
cid | way4id 
------ | ------ 
a  | not set 
b  | 123 
b  | 123 
c  | not set 
d  | 1234 
d  | 1234 

我无法理解这一点。子查询?一个循环?

谢谢!

-

好像这是工作

#standardSQL 
SELECT 
    clientId cid, 
    COALESCE (MAX(CASE WHEN cd.value != 'not set' THEN cd.value END) OVER (PARTITION BY clientId), 'not set') way4id 
FROM `mytable`, UNNEST(customDimensions) as cd 
WHERE TRUE 
    AND cd.index = 2 
    AND cd.value != 'undefined' 
ORDER BY cid asc 
LIMIT 100; 

只是想知道我怎么可以使用更新设置语法来更新这个。

尝试做水木清华这样的更新,但没有任何工程

#standardSQL 
UPDATE (SELECT 
    clientId cid, 
    COALESCE (MAX(CASE WHEN cd.value != 'not set' THEN cd.value END) OVER (PARTITION BY clientId), 'not set') way4id 
FROM `akbars-ru-data-streaming.akbars_ru_data_streaming_maximp.stream_max`, UNNEST(customDimensions) as cd 
WHERE TRUE 
    AND cd.index = 2 
    AND cd.value != 'undefined' 
ORDER BY cid asc 
LIMIT 100) as t1 
SET way4id = IF(t2.way4id IS NOT NULL, t2.way4id, 'not set') 
FROM (
    SELECT cid, max(CASE WHEN way4id != 'not set' THEN way4id END) way4id FROM `yourTable` group by cid) t2 
WHERE t1.cid = t2.cid 

你能解决这个查询?

回答

1

这可能给你直截了当地想:

#standardSQL 
SELECT 
    clientId cid, 
    COALESCE(MAX(CASE WHEN customDimensions.value != 'not set' THEN customDimensions.value END) OVER(PARTITION BY clientId), 'not set') way4id 
FROM `mytable` 
WHERE TRUE 
    AND customDimensions.index = 2 
    AND customDimensions.value != 'undefined' 
ORDER BY cid asc 
LIMIT 100; 

你可以用模拟数据测试,像这样:

#standardSQL 
WITH `mytable` AS(
    SELECT 'a' AS clientID, STRUCT<index INT64, value STRING> (2, 'not set') customDimensions UNION ALL 
    SELECT 'a' AS clientID, STRUCT<index INT64, value STRING> (3, '3') customDimensions UNION ALL 
    SELECT 'a' AS clientID, STRUCT<index INT64, value STRING> (2, 'undefined') customDimensions UNION ALL 
    SELECT 'b' AS clientID, STRUCT<index INT64, value STRING> (2, '123') customDimensions UNION ALL 
    SELECT 'b' AS clientID, STRUCT<index INT64, value STRING> (2, 'not set') customDimensions UNION ALL 
    SELECT 'c' AS clientID, STRUCT<index INT64, value STRING> (2, 'not set') customDimensions UNION ALL 
    SELECT 'd' AS clientID, STRUCT<index INT64, value STRING> (2, '1234') customDimensions UNION ALL 
    SELECT 'd' AS clientID, STRUCT<index INT64, value STRING> (2, 'not set') customDimensions 
) 

导致:

Row cid way4id 
1 a not set 
2 b 123 
3 b 123 
4 c not set 
5 d 1234  
6 d 1234 

而不是运行该查询ry,然后在结果集上执行UPDATE操作,只需在第一个查询中引入所需的输出就可能更容易。

如果你仍然想使用UPDATE语法,你可以使用类似:

UPDATE `yourTable` t1 
SET way4id = IF(t2.way4id IS NOT NULL, t2.way4id, 'not set') 
FROM (SELECT cid, max(CASE WHEN way4id != 'not set' THEN way4id END) way4id FROM `yourTable` group by cid) t2 
WHERE t1.cid = t2.cid 

yourTable已经有cidway4id从你的第一个原始查询的结果。请记住,如果您可以使用常规查询解决您的任务,那么最好的方法是避免BQ中的DML操作。

+0

这可能是一个选项,但_mytable_包含大约一百万行。在这种情况下我们如何解决这个问题? –

+0

我不确定你的意思。对于这个答案,'mytable'只是模拟你的实际表格,所以你可以忽略第一部分,而是使用你的表格。除此之外,我想知道你是否会遇到性能问题? –

+0

我编辑了我的答案,现在可能会更清楚一点。 –

0

你可以写,这将导致定义我假设值将只有一个定义的值和非定义是空值,所以我一直保持的条件为“非空”,然后再加入与主表的子集,以获得子查询从查询的子集中定义值。

SELECT clientid, subset.val FROM mytable LEFT JOIN (SELECT clientid AS id, 
way4id AS val FROM mytable WHERE way4id IS NOT null) subset ON subset.id=clientid; 
相关问题