2017-07-12 56 views
1

我有一个名为ID(整数),N(整数)和V(真实)3列sqlite表。这一对(ID,N)是唯一的。在python sqlite3递归(级联)选择

使用Python模块sqlite3的,我想与形式

select ID from TABLE where N = 0 and V between ? and ? and ID in 
    (select ID from TABLE where N = 7 and V between ? and ? and ID in 
     (select ID from TABLE where N = 8 and V between ? and ? and ID in 
      (...) 
     ) 
    ) 

我收到以下错误执行递归的选择,可能是因为最大递归深度超出(?)。我需要大约20到50 recusion水平

sqlite3.OperationalError: parser stack overflow 

我也试图加入subselections像

select ID from 
     (select ID from TABLE where N = 0 and V between ? and ?) 
    join (select ID from TABLE where N = 7 and V between ? and ?) using (ID) 
    join (select ID from TABLE where N = 8 and V between ? and ?) using (ID) 
    join ... 

但这种做法是supprisingly慢,即使有少数subselections

有没有更好的办法执行相同的选择?
注:该表索引的(N,V)

下面是显示选择如何运作

ID N V 
0 0 0,1 
0 1 0,2 
0 2 0,3 
1 0 0,5 
1 1 0,6 
1 2 0,7 
2 0 0,8 
2 1 0,9 
2 2 1,2 

步骤0

select ID from TABLE where N = 0 and V between 0 and 0,6 

ID是在一个示例(0, 1)
步骤1

select ID from TABLE where N = 1 and V between 0 and 1 and ID in (0, 1) 

ID仍处于(0,1)
步骤2

select ID from TABLE where N = 2 and V between 0,5 and 1 and ID in (0, 1) 

ID为1

+0

“V”边界在每一步都是一样的吗? “N”值来自哪里?你想要所有步骤中的ID吗? –

+0

否每个步骤的V值都不相同。 N值是任意的,就这个例子而言。这个想法只是简单地逐步完善选择 – user2660966

+0

典型的步骤是多少?你想要所有步骤中的ID吗? –

回答

2

展开递归,做到这一点以相反的顺序和做在Python。为此,我创建了一个由100个记录组成的表格,每个记录的ID都在0到99之间,N = 3和V = 5。我任意选择整个记录集作为最内层。

您需要想象为N和V索引的值列表,以便为最后的SQL SELECT选择列表头部的值。循环所做的仅仅是获取内部SELECT产生的ID列表,将它作为IN子句的一部分提供给下一个SELECT。

没有任何索引,这是全部在augenblick。

>>> import sqlite3 
>>> conn = sqlite3.connect('recur.db') 
>>> c = conn.cursor() 
>>> previous_ids = str(tuple(range(0,100))) 
>>> for it in range(50): 
...  rows = c.execute('''SELECT ID FROM the_table WHERE N=3 AND V BETWEEN 2 AND 7 AND ID IN %s''' % previous_ids) 
...  previous_ids = str(tuple([int(_[0]) for _ in rows.fetchall()])) 
...  
>>> previous_ids 
'(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99)' 

编辑:这样可以避免使用长字符串,所需时间比augenblick长。这基本上是使用表格实现的相同想法。

>>> import sqlite3 
>>> conn = sqlite3.connect('recur.db') 
>>> c = conn.cursor() 
>>> N_V = [ 
... (0, (0,6)), 
... (0, (0, 1)), 
... (1, (0, 2)), 
... (2, (0, 3)), 
... (0, (0, 5)), 
... (1, (0, 6)), 
... (2, (0, 7)), 
... (0, (0, 8)), 
... (1, (0, 9)), 
... (2, (1, 2)) 
... ] 
>>> r = c.execute('''CREATE TABLE essentials AS SELECT ID, N, V FROM the_table WHERE N=0 AND V BETWEEN 0 AND 6''') 
>>> for n_v in N_V[1:]: 
...  r = c.execute('''CREATE TABLE next AS SELECT * FROM essentials WHERE essentials.ID IN (SELECT ID FROM the_table WHERE N=%s AND V BETWEEN %s AND %s)''' % (n_v[0], n_v[1][0], n_v[1][1])) 
...  r = c.execute('''DROP TABLE essentials''') 
...  r = c.execute('''ALTER TABLE next RENAME TO essentials''') 
... 
+0

感谢您的回答,确实是一个好方法。你认为这种方法可以处理大型表格吗?因为(1)如果使用数字列表,我相信IN语句的长度是有限的,(2)这种方法隐藏加载大量数据(至少在第一次迭代时),这会降低选择速度。 – user2660966

+0

我没有考虑数字列表长度限制的解决方法;我相信你是对的。 “这种方法意味着加载[a]大量数据”是什么意思? –

+0

我的意思是,名单“previous_ids”可能在第一步中有很多项目(在我的情况下是几百万),然后fetchall命令需要时间 – user2660966

0

索引三重态(ID,N,V),而不是仅在(N,V)双峰所作的加入方法足够快正在考虑

create index I on TABLE(ID, N, V) 

然后

select ID from 
     (select ID from TABLE where N = 0 and V between ? and ?) 
    join (select ID from TABLE where N = 7 and V between ? and ?) using (ID) 
    join (select ID from TABLE where N = 8 and V between ? and ?) using (ID) 
    join ...