如何查询连接列上具有唯一值的行？

我想让我的popular_query子查询删除dupe Place.id，但它不会将其删除。这是下面的代码。我尝试使用不同，但它不尊重order_by规则。如何查询连接列上具有唯一值的行？

SimilarPost = aliased(Post) 
SimilarPostOption = aliased(PostOption) 
popular_query = (db.session.query(Post, func.count(SimilarPost.id)). 
     join(Place, Place.id == Post.place_id). 
     join(PostOption, PostOption.post_id == Post.id). 
     outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val). 
     join(SimilarPost,SimilarPost.id == SimilarPostOption.post_id). 
     filter(Place.id == Post.place_id). 
     filter(self.radius_cond()). 
     group_by(Post.id). 
     group_by(Place.id). 
     order_by(desc(func.count(SimilarPost.id))). 
     order_by(desc(Post.timestamp)) 
     ).subquery().select() 

all_posts = db.session.query(Post).select_from(filter.pick()).all()

我与

print [x.place.name for x in all_posts] 

[u'placeB', u'placeB', u'placeB', u'placeC', u'placeC', u'placeA']

我怎样才能解决这个测试打印输出？

谢谢！

来源

2012-09-07 nubela

如果您删除'group_by（Place.id）'子句并添加'distinct（Place.id）'，它是否尊重排序？如果你使用'distinct'，我会认为'group_by'将是不必要的。 – Nicholas

'self.radius_cond（）'和'filter.pick（）'你在别处定义的东西吗？我没有看到任何实际上使用'popular_query'的东西。 –

这应该得到你想要的东西：

SimilarPost = aliased(Post) 
SimilarPostOption = aliased(PostOption) 
post_popularity = (db.session.query(func.count(SimilarPost.id)) 
     .select_from(PostOption) 
     .filter(PostOption.post_id == Post.id) 
     .correlate(Post) 
     .outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val) 
     .join(SimilarPost, sql.and_(
       SimilarPost.id == SimilarPostOption.post_id, 
       SimilarPost.place_id == Post.place_id) 
     ) 
     .as_scalar()) 
popular_post_id = (db.session.query(Post.id) 
     .filter(Post.place_id == Place.id) 
     .correlate(Place) 
     .order_by(post_popularity.desc()) 
     .limit(1) 
     .as_scalar()) 

deduped_posts = (db.session.query(Post, post_popularity) 
     .join(Place) 
     .filter(Post.id == popular_post_id) 
     .order_by(post_popularity.desc(), Post.timestamp.desc()) 
     .all())

我不能与大型数据集的运行时性能说话，有可能是一个更好的解决方案，但是这就是我设法从不少来源合成（MySQL JOIN with LIMIT 1 on joined table,SQLAlchemy - subquery in a WHERE clause,SQLAlchemy Query documentation）。最大的复杂因素是，您显然需要使用as_scalar将子查询嵌套在正确的位置，因此无法从同一子查询中返回Post ID和计数。

FWIW，这是一个庞然大物，我同意user1675804这个SQLAlchemy代码很深很难理解并且不可维护。你应该仔细看看更多的低技术解决方案，比如向db中添加列或者在python代码中完成更多工作。

来源

2012-09-16 23:36:46

我不想在这里听起来像坏人，但...在我看来，你对这个问题的方法似乎远不如最佳...如果你使用postgresql，你可以使用WITH来简化整个事情...但更好的方法考虑到我的假设，这些帖子的读取次数要比更新的次数多得多，那就是在表格中添加一些列，这些列由插入/更新到其他表的触发器进行更新，至少在性能可能到永远成为一个问题，这是我想要的解决方案

不是很熟悉sqlalchemy，所以不能用清晰的代码写给你，但唯一的其他解决方案，我可以使用至少一个子查询从group by中的每个列的order by中选择事物，这将显着增加你已经慢查询

来源

2012-09-16 21:38:00 xception

如何查询连接列上具有唯一值的行？

回答

相关问题