Django Queryset：需要帮助优化这组查询

我想从教育问题记录列表中筛选出一些常见的标签组合。Django Queryset：需要帮助优化这组查询

对于这个例子，我只看到2个标签的例子（标签标签），我应该得到如下结果的例子： “point”+“curve”（65个条目） “add”+ “减”（40项） ...

这是在SQL语句期望的结果：

SELECT a.tag, b.tag, count(*) 
FROM examquestions.dbmanagement_tag as a 
INNER JOIN examquestions.dbmanagement_tag as b on a.question_id_id = b.question_id_id 
where a.tag != b.tag 
group by a.tag, b.tag

基本上，我们正在与常见问题不同的标签被识别成一个列表，并在其中组相同的匹配标签组合。

我曾尝试使用Django的查询集做一个类似的查询：

twotaglist = [] #final set of results 

    alphatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag') 
    betatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag') 
    startindex = 0 #startindex reduced by 1 to shorten betatag range each time the atag changes. this is to reduce the double count of comparison of similar matches of tags 
    for atag in alphatags: 
     for btag in betatags[startindex:]: 
      if (atag['tag'] != btag['tag']): 
       commonQns = [] #to check how many common qns 
       atagQns = tag.objects.filter(tag=atag['tag'], question_id__in=qnlist).values('question_id').annotate() 
       btagQns = tag.objects.filter(tag=btag['tag'], question_id__in=qnlist).values('question_id').annotate() 
       for atagQ in atagQns: 
        for btagQ in btagQns: 
         if (atagQ['question_id'] == btagQ['question_id']): 
          commonQns.append(atagQ['question_id']) 
       if (len(commonQns) > 0): 
        twotaglist.append({'atag': atag['tag'], 
             'btag': btag['tag'], 
             'count': len(commonQns)}) 
     startindex=startindex+1

逻辑工作正常，但是我非常新的这个平台，我不知道如果有一个更短的解决办法，而不是使其效率更高。

目前，查询所需的约5K X 5K标记比较:(

附加组件约45秒：Tag类

class tag(models.Model): 
    id = models.IntegerField('id',primary_key=True,null=False) 
    question_id = models.ForeignKey(question,null=False) 
    tag = models.TextField('tag',null=True) 
    type = models.CharField('type',max_length=1) 

    def __str__(self): 
     return str(self.tag)

来源

2013-01-20 jdtoh

不幸的是，除非涉及到外键（或一对一），否则django不允许加入。你将不得不在代码中做到这一点。我已经找到了一种方法（完全未经测试），可以通过单个查询来实现，从而大大缩短执行时间。

from collections import Counter 
from itertools import combinations 

# Assuming Models 
class Question(models.Model): 
    ... 

class Tag(models.Model): 
    tag = models.CharField(..) 
    question = models.ForeignKey(Question, related_name='tags') 

c = Counter() 
questions = Question.objects.all().prefetch_related('tags') # prefetch M2M 
for q in questions: 
    # sort them so 'point' + 'curve' == 'curve' + 'point' 
    tags = sorted([tag.name for tag in q.tags.all()]) 
    c.update(combinations(tags,2)) # get all 2-pair combinations and update counter 
c.most_common(5) # show the top 5

上面代码使用Counters，itertools.combinations，和django prefetch_related其应涵盖最上面的位可能是未知的。如果上述代码无法正常工作，请查看这些资源，然后相应地进行修改。

如果您未在Question模型上使用M2M字段，则仍然可以使用reverse relations来访问标签，就好像它是M2M字段一样。查看我的编辑，将tag_set的反向关系更改为tags。我做了一些其他编辑，这些编辑应该与您定义模型的方式一致。

如果您没有指定related_name='tags'，那么只需在筛选器中更改tags，并将prefetch_related更改为tag_set，那么您就很好。

来源

2013-01-20 11:49:21

您的模型与我的有所不同，因此我无法实现您的解决方案。我只是编辑了问题并添加了Tag类。你认为你可以提供一个解决方案吗？尽管如此，非常感谢您的建议，但是这种模式对我的所有其他代码都会产生太大的影响。我会牢记在将来使用它:) – jdtoh

@jdtoh你可以请你也用你的'问题'模型编辑你的问题吗？从我所看到的，我的解决方案应该仍然适合你。外键有一个“反向关系”，这意味着你可以通过'question.objects.tag_set.all（）'''''''''''''''''''作为一个集合来访问'tags'。另外，模型名称通常以大写字母开头。 –

@jdtoh看到我的编辑，它应该用你定义的模型解决你的问题。 –

如果我理解正确你的问题，我会保持简单的事情，做像这样

relevant_tags = Tag.objects.filter(question_id__in=qnlist) 
#Here relevant_tags has both a and b tags 

unique_tags = set() 
for tag_item in relevant_tags: 
    unique_tags.add(tag_item.tag) 

#unique_tags should have your A and B tags 

a_tag = unique_tags.pop() 
b_tag = unique_tags.pop() 

#Some logic to make sure what is A and what is B 

a_tags = filter(lambda t : t.tag == a_tag, relevant_tags) 
b_tags = filter(lambda t : t.tag == b_tag, relevant_tags) 

#a_tags and b_tags contain A and B tags filtered from relevant_tags 

same_question_tags = dict() 

for q in qnlist: 
    a_list = filter(lambda a: a.question_id == q.id, a_tags) 
    b_list = filter(lambda a: a.question_id == q.id, b_tags) 
    same_question_tags[q] = a_list+b_list

关于这个的好处是，你可以通过循环回到标签循环扩展到N多的标签来获取所有独特的人与日进一步重复进行标记明智的过滤。

确实有更多的方法可以做到这一点。

来源

2013-01-20 10:06:44

't.tag = a_tag'这将永远是'真' –

我试图实现这一点。我可以知道我怎么会有语法错误：lambda不能包含赋值？ – jdtoh

哦，我纠正了它。使用==而不是=运算符 –

Django Queryset：需要帮助优化这组查询

回答

相关问题