2010-04-09 83 views
0

我想合并数据。以下是我的MySQL表。我想用Python遍历这两个列表(其中一个用dupe ='x',另一个用null dupes)。如何比较2个列表并将它们合并到Python/MySQL中?

这是示例数据。实际数据是巨大的。

例如:

a b c d e f key dupe 
-------------------- 
1 d c f k l 1 x 
2 g h j 1  
3 i h u u 2 
4 u r  t 2 x 

从上面的示例表,所需的输出是:

a b c d e f key dupe 
-------------------- 
2 g c h k j 1 
3 i r h u u 2 

我到目前为止有:

import string, os, sys 
import MySQLdb 
from EncryptedFile import EncryptedFile 

enc = EncryptedFile(os.getenv("HOME") + '/.py-encrypted-file') 
user = enc.getValue("user") 
pw = enc.getValue("pw") 

db = MySQLdb.connect(host="127.0.0.1", user=user, passwd=pw,db=user) 

cursor = db.cursor() 
cursor2 = db.cursor() 

cursor.execute("select * from delThisTable where dupe is null") 
cursor2.execute("select * from delThisTable where dupe is not null") 
result = cursor.fetchall() 
result2 = cursor2.fetchall() 

for each record 
    for each field 
     perform the comparison and perform the necessary updates 

      ### How do I compare the record with same key value and update the original row null field value with the non-null value from the duplicate? Please fill this void... 


cursor.close() 
cursor2.close() 
db.close() 

谢谢你们!

+0

想不通的问题。你想获得algorythm,还是根据具体框架来实现? 事实上,你不需要遍历游标和'coalesce'项的字段。 在这种情况下你可以执行普通的SQL吗?如果可以,原因是查询很简单。 – 2010-04-09 20:26:21

+0

这是简单,简单的测试数据。实际上,有几千行和几百列,因此这种方法。谢谢。 – ThinkCode 2010-04-09 20:33:14

+0

update delthistable t set ta = coalesce(dup.a,ta),tb = coalesce(dup.b,tb)... from(select * from delthistable where dupe ='x')dup 其中t.dupe <>'x'and t.key = dup.key ------------------------------------ -------------------------- 从delthistable删除其中dupe <>'x' – 2010-04-09 20:51:51

回答

2

OK,让我们有一些有趣的...

mysql> create table so (a int, b char, c char, d char, e char, f char, `key` int, dupe char); 
Query OK, 0 rows affected (0.05 sec) 

mysql> insert into so values (1, 'd', 'c', 'f', 'k', 'l', 1, 'x'), (2, 'g', null, 'h', null, 'j', 1, null), (3, 'i', null, 'h', 'u', 'u', 2, null), (4, 'u', 'r', null, null, 't', 2, 'x'); 
Query OK, 4 rows affected (0.00 sec) 
Records: 4 Duplicates: 0 Warnings: 0 

mysql> select * from so order by a; 
+------+------+------+------+------+------+------+------+ 
| a | b | c | d | e | f | key | dupe | 
+------+------+------+------+------+------+------+------+ 
| 1 | d | c | f | k | l | 1 | x | 
| 2 | g | NULL | h | NULL | j | 1 | NULL | 
| 3 | i | NULL | h | u | u | 2 | NULL | 
| 4 | u | r | NULL | NULL | t | 2 | x | 
+------+------+------+------+------+------+------+------+ 
4 rows in set (0.00 sec) 

Python 2.6.5 (r265:79063, Mar 26 2010, 22:43:05) 
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import MySQLdb 
>>> db = MySQLdb.connect(host="127.0.0.1", db="test") 
>>> c = db.cursor() 
>>> c.execute("SELECT a, b, c, d, e, f, `key`, dupe FROM so") 
4L 
>>> rows = c.fetchall() 
>>> rows 
((1L, 'd', 'c', 'f', 'k', 'l', 1L, 'x'), (4L, 'u', 'r', None, None, 't', 2L, 'x'), (2L, 'g', None, 'h', None, 'j', 1L, None), (3L, 'i', None, 'h', 'u', 'u', 2L, None)) 
>>> data = dict() 
>>> for row in rows: 
... key, isDupe = row[-2], row[-1] 
... if key not in data: 
... data[key] = list(row[:-1]) 
... else: 
... for i in range(len(row)-1): 
... if data[key][i] is None or (not isDupe and row[i] is not None): 
...  data[key][i] = row[i] 
... 
>>> data 
{1L: [2L, 'g', 'c', 'h', 'k', 'j', 1L], 2L: [3L, 'i', 'r', 'h', 'u', 'u', 2L]} 
+0

感谢您的解决方案。我在实际的表中有几百行。如何使你的代码适应我的实际表格?再次感谢! – ThinkCode 2010-04-09 20:45:25

+0

表中的数据是否适合您的RAM?如果是这样,我认为不需要适应。 – Messa 2010-04-09 20:53:14

+0

它的工作原理!非常感谢。 我想出了将最终数据转储到MySQL表中的最佳方法。某些字段为无,日期格式为date.datetime。简单的方法转储到MySQL? – ThinkCode 2010-04-09 21:55:17

相关问题