SQL Statement for reconciliation with different operators

这与问题：SQL Statement for Reconciliation非常相关，但更加复杂。SQL Statement for reconciliation with different operators

下面给出的模式：

create table TBL1 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp); 
create table TBL2 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp); 
create table TBL_RESULT (ID varchar2(100) primary key not null, TBL1_ID varchar2(100), TBL2_ID varchar2(100)); 

create unique index UK_TBL_RESULT_TBL1_ID on TBL_RESULT(TBL1_ID); 
create unique index UK_TBL_RESULT_TBL2_ID on TBL_RESULT(TBL2_ID); 

insert into TBL1 VALUES('1', to_date('01/26/2012 20:00:00', 'mm/dd/yyyy hh24:mi:ss')); 
insert into TBL1 VALUES('2', to_date('01/26/2012 20:05:00', 'mm/dd/yyyy hh24:mi:ss')); 

insert into TBL2 VALUES('3', to_date('01/26/2012 19:59:00', 'mm/dd/yyyy hh24:mi:ss')); 
insert into TBL2 VALUES('4', to_date('01/26/2012 20:04:00', 'mm/dd/yyyy hh24:mi:ss'));

我们目前查询：

INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
SELECT rawtohex(sys_guid()),t1.id,t2.id 
FROM 
(SELECT t1.match_criteria,t1.id, row_number() OVER (PARTITION BY t1.match_criteria ORDER BY t1.id) rn 
FROM tbl1 t1) t1, 
(SELECT t2.match_criteria,t2.id, row_number() OVER (PARTITION BY t2.match_criteria ORDER BY t2.id) rn 
FROM tbl2 t2) t2 
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440) 
AND t1.rn=t2.rn

它的输出：

| ID | TBL1_ID | TBL2_ID | 
| '1' | '1'  | '3' | 
| '2' | '1'  | '4' | 
| '3' | '2'  | '3' | 
| '4' | '2'  | '4' |

如您所见，结果不符合唯一性约束（重复TBL1_ID /重复TBL2_ID）。这是因为：

为每个记录的RN始终为1（因而总是等于）
两个记录之间的时间是10分钟。

我们期待的输出，看起来像下表：

| ID | TBL1_ID | TBL2_ID | 
| '1' | '1'  | '4' | 
| '2' | '2'  | '3' |

注1：如果“1”与“3” 2匹配，但随后”没关系'应与'4'匹配以符合约束，并且只要T1.MATCH_CRITERIA在T2.MATCH_CRITERIA的10分钟内。注2：我们从TBL1插入了100万条记录，另有100万条记录从TBL2插入。因此，使用PL/SQL进行顺序插入是不可接受的，除非它可以运行得非常快（少于15分钟）。

注3：不匹配的数据应该被消除。不平衡的数据也是预期的。

注4：我们不限于只执行1个查询。一系列有限的查询将会做。

来源

2012-01-26 John

发生什么情况，如果有在T1行不能在T2（反之亦然）行相匹配？你是否消除了这些数据？或者你是否希望最终得到'TBL1_ID'或'TBL2_ID'为NULL的输出？ –

顺便说一下，您的测试数据在格式掩码中使用了'MM'两次。第二次你的意思是“MI”。这是一个常见的错误。 – APC

@JustinCave，消除数据。 – John

在您的查询生成交叉连接时，因为您的业务规则无法提供将T1中的一条记录与T2中的一条记录相链接的机制。鉴于这显然是一个玩具例如，它是我们很难认为不是一件很简单的其他任何东西：

(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria,t1.id) rn 
.... 
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria,t2.id) rn

这将只需在T1 ResultSet中的第一行与第一行匹配在T2的ResultSet中， T1结果集中的第二行与T2结果集中的第二行，依此类推。

SQL> INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
SELECT seq_tbl_result.nextval,t1.id,t2.id 
FROM 
(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria, t1.id) rn 
FROM tbl1 t1) t1, 
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria, t2.id) rn 
FROM tbl2 t2) t2 
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440) 
AND t1.rn=t2.rn 
SQL> SQL> SQL> 2 3 4 5 6 7 8 9 
10/

2 rows created. 


SQL> select * from tbl_result 
    2/

ID  TBL1_I TBL2_I 
------ ------ ------ 
9  1  3 
10  2  4 

SQL>

这可能不是你想要的。在这种情况下，您需要解释您的数据以及决定与什么链接的规则。例如，是否有某种模式可以让你得到一个锚点？另外，当我统治世界时，使用VARCHAR2（100）列保存数字ID的人将被拍摄。

来源

2012-01-26 15:15:04 APC

“旁边”单独将保证+1 :) –

这是唯一的数据。查询不会在给定此数据的情况下运行：INSERT INTO TBL1 VALUES（'1'，TO_DATE（'01/26/2012 01:00:00'，'mm/dd/yyyy hh24：mi：ss'））; 插入TBL1 VALUES（'2'，to_date（'01/26/2012 02:00:00'，'mm/dd/yyyy hh24：mi：ss'））; （'3'，to_date（'01/26/2012 02:00:00'，'mm/dd/yyyy hh24：mi：ss'））;插入TBL2 VALUES INSERT INTO TBL2 VALUES（'4'，TO_DATE（'01/26/2012 03:00:00'，'mm/dd/yyyy hh24：mi：ss'））; – John

我认为这可以工作：

INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
select seq_tbl_result.nextval, 
tt1.id, tt2.id 
from (select id, v, row_number() over(partition by v order by id) rn 
from (select distinct t1.id, 
case 
when (t1.match_criteria between 
t2.match_criteria - (10/1440) and 
t2.match_criteria + (10/1440)) then 
1 
else 
2 
end v 
from tbl1 t1, tbl2 t2 
where t1.match_criteria between 
t2.match_criteria - (10/1440) and 
t2.match_criteria + (10/1440))) tt1, 
(select id, v, row_number() over(partition by v order by id) rn 
from (select distinct t2.id, 
case 
when (t1.match_criteria between 
t2.match_criteria - (10/1440) and 
t2.match_criteria + (10/1440)) then 
1 
else 
2 
end v 
from tbl1 t1, tbl2 t2 
where t1.match_criteria between 
t2.match_criteria - (10/1440) and 
t2.match_criteria + (10/1440))) tt2 
where tt1.v = tt2.v 
and tt1.rn = tt2.rn

来源

2012-01-26 15:59:45

嗨，我认为v总是1.无论如何，我试图混淆数据。 Tbl 1数据（按ID排序）：2AM，然后从上午1:01开始，每隔1分钟记录10条记录。 Tbl 2数据（按ID排序）：从上午1:01开始，然后是2AM，间隔为1分钟的10条记录。结果：TBL1的2AM与TBL2的1:01 AM数据相匹配。但我认为你已经接近...... – John

另一种情况：TBl1有3个数据：01:00 AM，01:05 AM，02:00 AM。 TBL2仅有2个数据：01:00 AM，02:00 AM。 TBL1的01:00 AM将与TBL2的01:00 AM匹配，但TBL1的01:05 AM将与TBL2的02:00 AM匹配，并且TBL1的02:00 AM将不匹配。 – John

你说得对，v应该以某种方式依赖于t2.matched_criteria –

SQL Statement for reconciliation with different operators

回答

相关问题