2011-01-14 210 views
2

我有三个表,我存储实际人员数据(person),团队(team)和条目(athlete)。三个表的模式是:提高SQL查询性能

Database schema

在每队有可能是两个或两个以上的运动员。

我试图创建一个查询来产生最频繁的对,这意味着谁在两个小组中玩。我想出了以下查询:

SELECT p1.surname, p1.name, p2.surname, p2.name, COUNT(*) AS freq 
FROM person p1, athlete a1, person p2, athlete a2 
WHERE 
    p1.id = a1.person_id AND 
    p2.id = a2.person_id AND 
    a1.team_id = a2.team_id AND 
    a1.team_id IN 
      (SELECT team.id 
      FROM team, athlete 
      WHERE team.id = athlete.team_id 
      GROUP BY team.id 
      HAVING COUNT(*) = 2) 
GROUP BY p1.id 
ORDER BY freq DESC 

显然这是一个耗费资源的查询。有没有办法改进它?

+0

威尔索引帮助吗? – Sudantha 2011-01-14 10:31:27

+0

并非如此,所有内容都被正确编入索引。问题是,该数据库包含几十万行(负责人:10K,团队:450K,运动员:900K) – Anax 2011-01-14 10:37:05

+1

子查询没有连接子句 - 你既需要团队和运动员表中的子查询? – 2011-01-14 10:37:20

回答

4
SELECT id 
FROM team, athlete 
WHERE team.id = athlete.team_id 
GROUP BY team.id 
HAVING COUNT(*) = 2 

性能技巧1:您只需要athlete表。

2

您可能会考虑使用触发器维护团队和人员表中的计数器的以下方法,以便您可以轻松找出哪些团队有2个或更多运动员以及哪些人员在2个或更多个团队中。

(注:我已经移除你的运动员表替代ID键,取而代之的是复合键,这将更好的数据完整性的我也改名为运动员team_athlete)

drop table if exists person; 
create table person 
(
person_id int unsigned not null auto_increment primary key, 
name varchar(255) not null, 
team_count smallint unsigned not null default 0 
) 
engine=innodb; 

drop table if exists team; 
create table team 
(
team_id int unsigned not null auto_increment primary key, 
name varchar(255) not null, 
athlete_count smallint unsigned not null default 0, 
key (athlete_count) 
) 
engine=innodb; 

drop table if exists team_athlete; 
create table team_athlete 
(
team_id int unsigned not null, 
person_id int unsigned not null, 
primary key (team_id, person_id), -- note clustered composite PK 
key person(person_id) -- added index 
) 
engine=innodb; 

delimiter # 

create trigger team_athlete_after_ins_trig after insert on team_athlete 
for each row 
begin 
    update team set athlete_count = athlete_count+1 where team_id = new.team_id; 
    update person set team_count = team_count+1 where person_id = new.person_id; 
end# 

delimiter ; 

insert into person (name) values ('p1'),('p2'),('p3'),('p4'),('p5'); 
insert into team (name) values ('t1'),('t2'),('t3'),('t4'); 

insert into team_athlete (team_id, person_id) values 
(1,1),(1,2),(1,3), 
(2,3),(2,4), 
(3,1),(3,5); 

select * from team_athlete; 
select * from person; 
select * from team; 

select * from team where athlete_count >= 2; 
select * from person where team_count >= 2; 

编辑

添加以下最初误解问题:

创建视图仅包括2人的团队。

drop view if exists teams_with_2_players_view; 

create view teams_with_2_players_view as 
select 
t.team_id, 
ta.person_id, 
p.name as person_name 
from 
team t 
inner join team_athlete ta on t.team_id = ta.team_id 
inner join person p on ta.person_id = p.person_id 
where 
t.athlete_count = 2; 

现在使用的视图以发现最频繁出现的人对。

select 
p1.person_id as p1_person_id, 
p1.person_name as p1_person_name, 
p2.person_id as p2_person_id, 
p2.person_name as p2_person_name, 
count(*) as counter 
from 
teams_with_2_players_view p1 
inner join teams_with_2_players_view p2 on 
    p2.team_id = p1.team_id and p2.person_id > p1.person_id 
group by 
p1.person_id, p2.person_id 
order by 
counter desc; 

希望这有助于:)

EDIT 2检查性能

select count(*) as counter from person; 

+---------+ 
| counter | 
+---------+ 
| 10000 | 
+---------+ 
1 row in set (0.00 sec) 

select count(*) as counter from team; 

+---------+ 
| counter | 
+---------+ 
| 450000 | 
+---------+ 
1 row in set (0.08 sec) 

select count(*) as counter from team where athlete_count = 2; 

+---------+ 
| counter | 
+---------+ 
| 112644 | 
+---------+ 
1 row in set (0.03 sec) 

select count(*) as counter from team_athlete; 

+---------+ 
| counter | 
+---------+ 
| 1124772 | 
+---------+ 
1 row in set (0.21 sec) 

explain 
select 
p1.person_id as p1_person_id, 
p1.person_name as p1_person_name, 
p2.person_id as p2_person_id, 
p2.person_name as p2_person_name, 
count(*) as counter 
from 
teams_with_2_players_view p1 
inner join teams_with_2_players_view p2 on 
    p2.team_id = p1.team_id and p2.person_id > p1.person_id 
group by 
p1.person_id, p2.person_id 
order by 
counter desc 
limit 10; 

+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+ 
| id | select_type | table | type | possible_keys  | key   | key_len | ref     | rows | Extra          | 
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+ 
| 1 | SIMPLE  | t  | ref | PRIMARY,t_count_idx | t_count_idx | 2 | const    | 86588 | Using index; Using temporary; Using filesort | 
| 1 | SIMPLE  | t  | eq_ref | PRIMARY,t_count_idx | PRIMARY  | 4 | foo_db.t.team_id |  1 | Using where         | 
| 1 | SIMPLE  | ta | ref | PRIMARY,person  | PRIMARY  | 4 | foo_db.t.team_id |  1 | Using index         | 
| 1 | SIMPLE  | p  | eq_ref | PRIMARY    | PRIMARY  | 4 | foo_db.ta.person_id |  1 |            | 
| 1 | SIMPLE  | ta | ref | PRIMARY,person  | PRIMARY  | 4 | foo_db.t.team_id |  1 | Using where; Using index      | 
| 1 | SIMPLE  | p  | eq_ref | PRIMARY    | PRIMARY  | 4 | foo_db.ta.person_id |  1 |            | 
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+ 

6 rows in set (0.00 sec) 

select 
p1.person_id as p1_person_id, 
p1.person_name as p1_person_name, 
p2.person_id as p2_person_id, 
p2.person_name as p2_person_name, 
count(*) as counter 
from 
teams_with_2_players_view p1 
inner join teams_with_2_players_view p2 on 
    p2.team_id = p1.team_id and p2.person_id > p1.person_id 
group by 
p1.person_id, p2.person_id 
order by 
counter desc 
limit 10; 

+--------------+----------------+--------------+----------------+---------+ 
| p1_person_id | p1_person_name | p2_person_id | p2_person_name | counter | 
+--------------+----------------+--------------+----------------+---------+ 
|   221 | person 221  |   739 | person 739  |  5 | 
|   129 | person 129  |   249 | person 249  |  5 | 
|   874 | person 874  |   877 | person 877  |  4 | 
|   717 | person 717  |   949 | person 949  |  4 | 
|   395 | person 395  |   976 | person 976  |  4 | 
|   415 | person 415  |   828 | person 828  |  4 | 
|   287 | person 287  |   470 | person 470  |  4 | 
|   455 | person 455  |   860 | person 860  |  4 | 
|   13 | person 13  |   29 | person 29  |  4 | 
|   1 | person 1  |   743 | person 743  |  4 | 
+--------------+----------------+--------------+----------------+---------+ 
10 rows in set (2.02 sec) 
0

如果有一个额外的约束a1.person_id!= a2.person_id,以避免产生一对同一个玩家?这可能不会影响结果的最终排序,但会影响计数的准确性。

如果可能的话,你可以在team表中添加一个名为athlete_count的列(带有索引),当队员被添加或删除时,可以更新这个列,这可以避免需要通过整个运动员的子查询发现两队球员的表。另外,如果我正确理解原始查询,那么当您通过p1.id进行分组时,您只能获得玩家在双人游戏团队中玩的次数,而不能计算游戏对本身的数量。您可能需要Group BY p1.id,p2.id.基于整整两个每队

通过最内部正好有两个人的预聚合

0

修订,我可以使用MIN()和MAX(获得每队的人物和PersonB以单排每队)。这样,该人的身份证将始终处于低 - 高的配对设置,以供将来的团队比较。然后,我可以通过所有团队的共同Mate1和Mate2查询COUNT并直接获取他们的姓名。

SELECT STRAIGHT_JOIN 
     p1.surname, 
     p1.name, 
     p2.surname, 
     p2.name, 
     TeamAggregates.CommonTeams 
    from 
    (select PreQueryTeams.Mate1, 
       PreQueryTeams.Mate2, 
       count(*) CommonTeams 
      from 
       (SELECT team_id, 
         min(person_id) mate1, 
         max(person_id) mate2 
        FROM 
         athlete 
        group by 
         team_id 
        having count(*) = 2) PreQueryTeams 
      group by 
       PreQueryTeams.Mate1, 
       PreQueryTeams.Mate2 ) TeamAggregates, 
     person p1, 
     person p2 
    where 
      TeamAggregates.Mate1 = p1.Person_ID 
     and TeamAggregates.Mate2 = p2.Person_ID 
    order by 
     TeamAggregates.CommonTeams 

原来的答复与队友

任意数量的

我会通过以下做团队。内prequery第一连接每个单独的团队的人所有可能的组合,但有PERSON1 < PERSON2将消除计算同一个人PERSON1和PERSON2。此外,将防止基于较高编号的人的ID反向...如

athlete person team 
1   1  1 
2   2  1 
3   3  1 
4   4  1 
5   1  2 
6   3  2 
7   4  2 
8   1  3 
9   4  3 

So, from team 1 you would get person pairs of 
1,2 1,3 1,4  2,3  2,4 3,4 
and NOT get reversed duplicates such as 
2,1 3,1 4,1  3,2  4,2 4,3 
nor same person 
1,1 2,2 3,3 4,4 


Then from team 2, you would hav pairs of 
1,3 1,4 3,4 

Finally in team 3 the single pair of 
1,4 

thus teammates 1,4 have occured in 3 common teams. 

SELECT STRAIGHT_JOIN 
     p1.surname, 
     p1.name, 
     p2.surname, 
     p2.name, 
     PreQuery.CommonTeams 
    from 
     (select 
      a1.Person_ID Person_ID1, 
      a2.Person_ID Person_ID2, 
      count(*) CommonTeams 
     from 
      athlete a1, 
      athlete a2 
     where 
       a1.Team_ID = a2.Team_ID 
      and a1.Person_ID < a2.Person_ID 
     group by 
      1, 2 
     having CommonTeams > 1) PreQuery, 
     person p1, 
     person p2 
    where 
      PreQuery.Person_ID1 = p1.id 
     and PreQuery.Person_ID2 = p2.id 
    order by 
     PreQuery.CommonTeams 
0

在这里,一些提示,以提高像SQL SELECT查询性能:

  • 使用SET NOCOUNT ON它有助于减少网络流量从而 提高性能。
  • 使用完全合格的程序名(例如 database.schema.objectname
  • 使用sp_executesql而不是execute动态查询
  • 不要使用select *使用select column1,column2,..IF EXISTSSELECT操作
  • 避免命名用户存储过程像sp_procedureName Becouse, 如果我们使用存储过程的名称开始在主数据库sp_然后SQL第一 搜索。所以它可以降低查询性能。