基于二级索引的Cassandra过滤器

我们现在一直在使用Cassandra，并且我们试图获得一个真正优化的表，它将能够快速查询和过滤大约100k行。基于二级索引的Cassandra过滤器

我们的模型看起来是这样的：

class FailedCDR(Model): 
    uuid = columns.UUID(partition_key=True, primary_key=True) 
    num_attempts = columns.Integer(index=True) 
    datetime = columns.Integer()

如果我形容它的表清楚地表明，num_attempts是指数。

CREATE TABLE cdrs.failed_cdrs (
    uuid uuid PRIMARY KEY, 
    datetime int, 
    num_attempts int 
) WITH bloom_filter_fp_chance = 0.01 
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' 
    AND comment = '' 
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} 
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} 
    AND dclocal_read_repair_chance = 0.1 
    AND default_time_to_live = 0 
    AND gc_grace_seconds = 864000 
    AND max_index_interval = 2048 
    AND memtable_flush_period_in_ms = 0 
    AND min_index_interval = 128 
    AND read_repair_chance = 0.0 
    AND speculative_retry = '99.0PERCENTILE'; 
CREATE INDEX index_failed_cdrs_num_attempts ON cdrs.failed_cdrs (num_attempts);

我们希望能够运行一个类似的过滤器：

failed = FailedCDR.filter(num_attempts__lte=9)

但出现这种情况：

QueryException: Where clauses require either a "=" or "IN" comparison with either a primary key or indexed field

我们如何能完成类似的任务？

来源

2015-09-03 electrometro

如果您想在CQL中执行范围查询，则需要该字段为聚类列。

所以你需要num_attempts字段是一个聚类列。

此外，如果要执行单个查询，则需要在同一分区（或可使用IN子句访问的少量分区）中查询所有行。由于您只有100K行，所以它足够小以适应一个分区。

所以，你可以定义你的表是这样的：

CREATE TABLE test.failed_cdrs (
    partition int, 
    num_attempts int, 
    uuid uuid, 
    datetime int, 
    PRIMARY KEY (partition, num_attempts, uuid));

你会以恒定的分区键插入您的数据，如1

INSERT INTO failed_cdrs (uuid, datetime, num_attempts, partition) 
    VALUES (now(), 123, 5, 1);

然后你就可以做范围查询像这样：

SELECT * from failed_cdrs where partition=1 and num_attempts >=8;

此方法的缺点是要更改num_attemp的值您需要删除旧行并插入新行，因为您不允许更新关键字段。你可以在批处理语句中进行删除和插入操作。

在Cassandra 3.0中可用的更好的选项是创建一个具有num_attempts作为集群列的物化视图，在这种情况下，当您更新基表中的num_attempts时，Cassandra会负责删除并插入。 3.0版本目前正在进行beta测试。

来源

2015-09-03 16:53:05

基于二级索引的Cassandra过滤器

回答

相关问题