提高查询速度：简单选择与喜欢

我已经继承了一个大的遗留代码库，它在django 1.5中运行，我目前的任务是加速一个网站部分，需要加载〜1min。提高查询速度：简单选择与喜欢

我做了应用程序的轮廓，并得到这个：

特别罪魁祸首是以下查询（剥离为了简洁）：

SELECT COUNT(*) FROM "entities_entity" WHERE (
    "entities_entity"."date_filed" <= '2016-01-21' AND (
    UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Atherton%') OR 
    UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR 
    -- 34 more of these 
    UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Atherton%') OR 
    UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR 
    -- 34 more of these 
) 
)

基本上由上大查询两个字段，entity_city_state_zip和agent_city_state_zip这是character varying(200) | not null字段。

查询是执行两次（！），同时每次18814.02ms，和一次更换COUNT的SELECT占用额外20216.49（我要去缓存COUNT结果）

的这样的解释看起来：

Aggregate (cost=175867.33..175867.34 rows=1 width=0) (actual time=17841.502..17841.502 rows=1 loops=1) 
    -> Seq Scan on entities_entity (cost=0.00..175858.95 rows=3351 width=0) (actual time=0.849..17818.551 rows=145075 loops=1) 
     Filter: ((date_filed <= '2016-01-21'::date) AND ((upper((entity_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((entity_city_state_zip)::text) ~~ '%BERKELEY%'::text) (..skipped..) OR (upper((agent_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BERKELEY%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BURLINGAME%'::text))) 
     Rows Removed by Filter: 310249 
Planning time: 2.110 ms 
Execution time: 17841.944 ms

我对entity_city_state_zip和agent_city_state_zip使用各种字体，使用索引尝试s组合如：

CREATE INDEX ON entities_entity (upper(entity_city_state_zip)); 
CREATE INDEX ON entities_entity (upper(agent_city_state_zip));

或使用varchar_pattern_ops，没有运气。

服务器使用这样的事情：

qs = queryset.filter(Q(entity_city_state_zip__icontains = all_city_list) | 
        Q(agent_city_state_zip__icontains = all_city_list))

生成查询。

我不知道还有什么可以尝试的，

谢谢！

来源

2016-01-21 NicoSantangelo

'LIKE'查询，这与'开始 '％...''将不使用任何B树索引（包括'xxx_pattern_ops'）。只有在模式匹配时才会选择这些索引。（f.ex.' col LIKE'XXX％''或'col〜'^ XXX''）。你可以试试['pg_trgm'模块]（http://www.postgresql.org/docs/current/static/pgtrgm.html），[它为你提供了一个合适的索引]（http：//dba.stackexchange。 COM /问题/ 10694 /模式匹配与样类似到或正则表达式合的PostgreSQL/10696）。（你可以使用'ilike'来代替like'''lower（）'/'upper（）'调用）。 – pozs

@pozs我不知道！我会试一试 – NicoSantangelo

我至少想知道'Seq Scan'有什么影响，以及索引扫描是否可以被替代。看看'set enable_seqscan = false'对计划有什么影响。数据库是否从SSD运行？ –

我觉得问题在“multiple LIKE”和UPPER（“entities_entity ...

您可以使用：

WHERE entities_entity.entity_city_state_zip SIMILAR TO '%Atherton%|%Berkeley%'

或者是这样的：

WHERE entities_entity.entity_city_state_zip LIKE ANY(ARRAY['%Atherton%', '%Berkeley%'])

编辑

关于在Django原始SQL查询：

问候

来源

2016-01-21 15:41:29

我不知道'LIKE'支持'ANY'作为一个数组作为一个值。我的问题是使'django'创建该查询我会谷歌了一下，看看我能找到 – NicoSantangelo

这是postgres的人）支持）那么Django的...我认为“原始查询”这就是你想要的https： //docs.djangoproject.com/es/1.9/topics/db/sql/并阅读此链接http://stackoverflow.com/questions/31698103/how-do-i-execute-raw-sql-in-a -django-migration –

我看着Pluralsight的课程是解决一个非常类似的问题。该课程是“Postgres for .NET开发人员”，这在“使用简单SQL进行娱乐”，“全文搜索”部分。

要总结自己的解决方案，使用你的例子：

创建您的表中的新列将代表您entity_city_state_zip作为的tsvector：

create table entities_entity (
    date_filed date, 
    entity_city_state_zip text, 
    csz_search tsvector not null -- add this column 
);

最初，你可能要让它空的，然后填充数据并使其不可空。

update entities_entity 
set csz_search = to_tsvector (entity_city_state_zip);

接下来，创建一个触发器，将导致新字段中填充添加一条记录任何时间或修改：

create trigger entities_insert_update 
before insert or update on entities_entity 
for each row execute procedure 
tsvector_update_trigger(csz_search,'pg_catalog.english',entity_city_state_zip);

搜索查询现在可以在的tsvector字段查询，而不是城市/州/邮编领域：

select * from entities_entity 
where csz_search @@ to_tsquery('Atherton')

对这个感兴趣的一些注意事项：

to_tsquery，如果你还没有用过，比上面的例子更复杂。它允许和条件，部分匹配等
它也是区分大小写的，所以没有必要做你有upper功能在查询

最后一步，把GIN指数在tsquery场：

create index entities_entity_ix1 on entities_entity 
using gin(csz_search);

如果我理解正确的路线，这应该让你的查询飞，它将克服B树索引无力的问题上like '%查询工作。

下面是这样一个查询说明计划：

Bitmap Heap Scan on entities_entity (cost=56.16..1204.78 rows=505 width=81) 
    Recheck Cond: (csz_search @@ to_tsquery('Atherton'::text)) 
    -> Bitmap Index Scan on entities_entity_ix1 (cost=0.00..56.04 rows=505 width=0) 
     Index Cond: (csz_search @@ to_tsquery('Atherton'::text))

来源

2016-01-22 03:52:16 Hambone

这真的很酷，我会尽快尝试 – NicoSantangelo

这真是太棒了。我对约2,000,000行数据做了一些快速测试，这种方法大约需要300毫秒，而对于传统查询则需要大约2.4秒。通过在较大数据集上嵌套“或”查询，我敢打赌，这些差异会更加剧烈。 – Hambone

提高查询速度：简单选择与喜欢

回答

相关问题