2014-10-01 79 views
1

我有一组数据集及其频率,例如,在sqlite中计算特征的出现

w1 w2 w3 freq 
a a a 4 
a a and 3 
a a band 1 
a a well 1 
a and a 2 

我想根据下表来获取观测计数:

  (w3) not(w3) 
(w1,w2)  n1  n2 
not(w1,w2) n3  n4 

其中N1,...,N4是满足条件的观测频率的总和。例如,在第一个观察中,w1 = a,w2 = a,w3 = a。我们现在将检查w1 = a,w2 = a,w3 = a的所有观察值。我们发现只有一个观测符合这个标准,其频率是4.接下来我们做w1 = a,w2 = a,w3!= a并且给出我们观测的频率是3,1,1,总和是5。现在我们将做w1!= a,w2!= a,w3 = a,它是0,并且w1!= a,w2!= a,w3!= a是0.

我想要一个表格,作为:

w1 w2 w3 freq n1 n2 n3 n4 
a a a 4  4 5 0 0 
a a and 3  3 6 0 0 
a a band 1 
a a well 1 
a and a 2 
etc. 

我怎样才能实现这一点使用sqlite3?

回答

1

这可以用相关的,标量子查询来完成:

SELECT w1, 
     w2, 
     w3, 
     freq, 
     (SELECT SUM(freq) 
     FROM MyLittleTable AS T2 
     WHERE T2.w1 = T1.w1 
      AND T2.w2 = T1.w2 
      AND T2.w3 = T1.w3 
     ) AS n1, 
     (SELECT SUM(freq) 
     FROM MyLittleTable AS T2 
     WHERE T2.w1 = T1.w1 
      AND T2.w2 = T1.w2 
      AND T2.w3 != T1.w3 
     ) AS n2, 
     ... 
FROM MyLittleTable AS T1