2012-12-09 78 views
0

为混合格式提前道歉,我把它分解了一点,所以看起来会更好。慢速查询有很多连接

我通常能够使用成本来确定查询的慢部分,但在这一部分中它似乎确实展开并且没有太明显。出于某种原因,当我通过排序它随机的(),它从几毫秒减慢至A 8秒时:

EXPLAIN for: 
    SELECT songs.* 
    FROM "songs" 
    INNER JOIN "posts" ON "posts"."id" = "songs"."post_id" 
    INNER JOIN broadcasts on broadcasts.song_id = songs.id 
    inner join stations as ss on ss.id = broadcasts.station_id 
    inner join artists on artists.station_slug = ss.slug 
    inner join artists_genres on artists_genres.artist_id = artists.id 
    inner join genres on genres.id = artists_genres.genre_id 
    inner join broadcasts bb on bb.song_id = songs.id 
    inner join stations sss on sss.id = bb.station_id 
    inner join blogs b on b.station_slug = sss.slug 
    inner join blogs_genres on blogs_genres.blog_id = b.id 
    inner join genres gg on gg.id = blogs_genres.genre_id 
    INNER JOIN "stations" on "stations"."blog_id" = "songs"."blog_id" 
    INNER JOIN blogs on blogs.id = posts.blog_id 
    WHERE "songs"."working" = 't' 
    AND "songs"."processed" = 't' 
    AND (genres.id = 20 and gg.id = 20) 
    AND (NOT EXISTS(
     SELECT NULL 
     FROM authors 
     WHERE authors.song_id = songs.id 
     AND authors.role IN ('remixer', 'mashup') 
    )) 
    AND (songs.source != 'youtube') 
    AND (songs.seconds < 600) 
    AND (songs.matching_id = songs.id) 
    ORDER BY random() desc 
                              QUERY PLAN 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
    Sort (cost=3770.97..3770.98 rows=1 width=2132) 
    Sort Key: (random()) 
    -> Nested Loop (cost=3142.28..3770.96 rows=1 width=2132) 
      Join Filter: ((b.station_slug)::text = (sss.slug)::text) 
      -> Nested Loop Anti Join (cost=3142.28..3762.68 rows=1 width=2150) 
       -> Nested Loop (cost=2526.24..3135.35 rows=1 width=2150) 
         Join Filter: (posts.blog_id = blogs.id) 
         -> Nested Loop (cost=2526.24..3128.65 rows=1 width=2127) 
          -> Nested Loop (cost=2526.24..3127.39 rows=1 width=2131) 
            -> Hash Join (cost=2515.19..3112.87 rows=1 width=2113) 
             Hash Cond: (stations.blog_id = songs.blog_id) 
             -> Seq Scan on stations (cost=0.00..475.49 rows=19549 width=40) 
             -> Hash (cost=2515.15..2515.15 rows=3 width=2077) 
               -> Nested Loop (cost=1265.66..2515.15 rows=3 width=2077) 
                -> Nested Loop (cost=1265.66..2506.44 rows=1 width=2077) 
                  -> Nested Loop (cost=1265.66..2499.97 rows=1 width=1809) 
                   -> Nested Loop (cost=1265.66..2498.71 rows=1 width=1813) 
                     -> Hash Join (cost=1265.66..2357.90 rows=277 width=8) 
                      Hash Cond: (broadcasts.station_id = ss.id) 
                      -> Seq Scan on broadcasts (cost=0.00..895.98 rows=51598 width=8) 
                      -> Hash (cost=1264.35..1264.35 rows=105 width=8) 
                        -> Hash Join (cost=714.50..1264.35 rows=105 width=8) 
                         Hash Cond: ((ss.slug)::text = (artists.station_slug)::text) 
                         -> Seq Scan on stations ss (cost=0.00..475.49 rows=19549 width=18) 
                         -> Hash (cost=713.20..713.20 rows=104 width=18) 
                           -> Nested Loop (cost=5.06..713.20 rows=104 width=18) 
                            -> Bitmap Heap Scan on artists_genres (cost=5.06..99.85 rows=104 width=8) 
                              Recheck Cond: (genre_id = 20) 
                              -> Bitmap Index Scan on index_artists_genres_on_genre_id_and_artist_id (cost=0.00..5.03 rows=104 width=0) 
                               Index Cond: (genre_id = 20) 
                            -> Index Scan using artists_pkey on artists (cost=0.00..5.89 rows=1 width=18) 
                              Index Cond: (artists.id = artists_genres.artist_id) 
                     -> Index Scan using index_songs_on_shared_id on songs (cost=0.00..0.50 rows=1 width=1805) 
                      Index Cond: (songs.matching_id = broadcasts.song_id) 
                      Filter: (songs.working AND songs.processed AND ((songs.source)::text <> 'youtube'::text) AND (songs.seconds < 600) AND (songs.id = songs.matching_id)) 
                   -> Seq Scan on genres (cost=0.00..1.25 rows=1 width=4) 
                     Filter: (genres.id = 20) 
                  -> Index Scan using posts_pkey on posts (cost=0.00..6.46 rows=1 width=268) 
                   Index Cond: (posts.id = songs.post_id) 
                -> Index Scan using index_songs_stations_on_song_id_and_station_id on broadcasts bb (cost=0.00..8.67 rows=3 width=8) 
                  Index Cond: (bb.song_id = broadcasts.song_id) 
            -> Hash Join (cost=11.05..14.39 rows=13 width=18) 
             Hash Cond: (blogs_genres.blog_id = b.id) 
             -> Bitmap Heap Scan on blogs_genres (cost=4.35..7.51 rows=13 width=8) 
               Recheck Cond: (genre_id = 20) 
               -> Bitmap Index Scan on index_blogs_genres_on_genre_id_and_blog_id (cost=0.00..4.35 rows=13 width=0) 
                Index Cond: (genre_id = 20) 
             -> Hash (cost=5.20..5.20 rows=120 width=18) 
               -> Seq Scan on blogs b (cost=0.00..5.20 rows=120 width=18) 
          -> Seq Scan on genres gg (cost=0.00..1.25 rows=1 width=4) 
            Filter: (gg.id = 20) 
         -> Seq Scan on blogs (cost=0.00..5.20 rows=120 width=31) 
       -> Bitmap Heap Scan on authors (cost=616.04..623.56 rows=2 width=4) 
         Recheck Cond: (authors.song_id = songs.id) 
         Filter: ((authors.role)::text = ANY ('{remixer,mashup}'::text[])) 
         -> Bitmap Index Scan on index_authors_on_artist_id_and_song_id_and_role (cost=0.00..616.04 rows=2 width=0) 
          Index Cond: (authors.song_id = songs.id) 
      -> Index Scan using stations_pkey on stations sss (cost=0.00..8.27 rows=1 width=18) 
       Index Cond: (sss.id = bb.station_id) 

任何想法什么导致变慢?在Mac上使用Postgres 9.1。

+2

我想我会重构一下......这太复杂了,可以通过另一个表来解决。 –

+1

ORDER BY random()是什么原因? –

+0

没有看到表格模式和索引,我们无法可靠地回答优化问题。 –

回答

1

您希望确保您已为所加入的所有列构建索引。您还应该尝试避免嵌套查询(根据我的经验),因此您可能需要重新编写NOT EXISTS()部分。

0

这个问题据我所知,这就是为什么order by random()会对查询速度产生很大影响。我并不确定,并且在没有EXPLAIN ANALYSE这两个版本的查询时都不知道,但我怀疑这个问题与处理volatile函数有关。易失性函数不能内联,在这种情况下,PostgreSQL可能会返回并为每行添加一个random()值,然后在已经扫描表之后进行排序等。这可能是您的问题。

更好的方法可能是将random()添加到列列表中,然后按别名或列号进行排序。这将允许在更高效地完成随机()过程中的更早计算。