内容简介:Aggregation is a source of confusion in any type of ORM and Django is no different. The documentation provides a variety of examples and cheat-sheets that demonstrate how to group and aggregate data using the ORM, but I decided to approach this from a diff
Aggregation is a source of confusion in any type of ORM and Django is no different. The documentation provides a variety of examples and cheat-sheets that demonstrate how to group and aggregate data using the ORM, but I decided to approach this from a different angle.
In this article I put QuerySets and SQL side by side . If SQL is where you are most comfortable, this is the Django GROUP BY cheat-sheet for you.
- How to Group By in Django
- How to Use Aggregate Functions
- How to Filter a QuerySet With Group By
- How to Sort a QuerySet With Group By
- How to Combine Multiple Aggregations
- How to Group by Multiple Fields
- How to Group by an Expression
- How to Use Conditional Aggregation
- How to Group by Distinct
- How to Create Expressions Using Aggregate Fields
- How to Group By Across Relations
- How to Group By a Many to Many Relationship
How to Group By in Django
To demonstrate different GROUP BY queries, I will use models from Django's built-in django.contrib.auth
app.
>>> from django.contrib.auth.models import User
Django ORM produces SQL statements with long aliases. For brevity, I will show a cleaned-up, but equivalent, version of what Django executes.
SQL Logging
To see the SQL actually executed by Django, you can turn on SQL logging in the Django settings .
How to Count Rows
Let's count how many users we have:
SELECT COUNT(*) FROM auth_user;
User.objects.count()
Counting rows is so common that Django includes a function for it right on the QuerySet. Unlike other QuerySets we'll see next, count
returns a number.
How to Use Aggregate Functions
Django offers two more ways to count rows in a table.
We'll start with aggregate
:
SELECT COUNT(id) AS id__count FROM auth_user;
from django.db.models import Count User.objects.aggregate(Count('id'))
To use aggregate
we imported the aggregate function Count
. The function accepts an expression to count. In this case, we used the name of the primary key column id
to count all the rows in the table.
Aggregate NULL
Aggregations ignore NULL
values. For more on how aggregations handle NULL
, see 12 Common Mistakes and Missed Optimization Opportunities in SQL .
The result of aggregate
is a dict:
>>> from django.db.models import Count >>> User.objects.aggregate(Count('id')) {"id__count": 891}
The name of the key is derived from the name of the field and the name of the aggregate. In this case, it's id__count
. It's a good idea not to rely on this naming convention, and instead provide your own name:
SELECT COUNT(id) as total FROM auth_user;
>>> from django.db.models import Count >>> User.objects.aggregate(total=Count('id')) {"total": 891}
The name of the argument to aggregate
is also the name of the key in the resulting dictionary.
Using aggregate
we got the result of applying the aggregate function on the entire table. This is useful, but usually we want to apply the aggregation on groups of rows.
Let's count users by their active status:
SELECT is_active, COUNT(id) AS total FROM auth_user GROUP BY is_active
(User.objects .values('is_active') .annotate(total=Count('id')))
This time we used the function annotate
. To produce a GROUP BY we use a combination of values
and annotate
:
values('is_active') annotate(total=Count('id'))
The order is important: failing to call values
before annotate
will not produce aggregate results.
Just like aggregate
, the name of the argument to annotate
is the key in the result of the evaluated QuerySet. In this case it's total
.
How to Filter a QuerySet With Group By
To apply aggregation on a filtered query you can use filter
anywhere in the query. For example, count only staff users by their active status:
SELECT is_active, COUNT(id) AS total FROM auth_user WHERE is_staff = True GROUP BY is_active
(User.objects .values('is_active') .filter(is_staff=True) .annotate(total=Count('id')))
How to Sort a QuerySet With Group By
Like filter, to sort a queryset use order_by
anywhere in the query:
SELECT is_active, COUNT(id) AS total FROM auth_user GROUP BY is_active ORDER BY is_active, total
(User.objects .values('is_active') .annotate(total=Count('id')) .order_by('is_staff', 'total'))
Notice that you can sort by both the GROUP BY key and the aggregate field.
How to Combine Multiple Aggregations
To produce multiple aggregations of the same group, add multiple annotations:
SELECT is_active, COUNT(id) AS total, MAX(date_joined) AS last_joined FROM auth_user GROUP BY is_active
from django.db.models import Max (User.objects .values('is_active') .annotate( total=Count('id'), last_joined=Max('date_joined'), ))
The query will produce the number of active and inactive users, and the last date a user joined in each group.
How to Group by Multiple Fields
Just like performing multiple aggregations, we might also want to group by multiple fields. For example, group by active status and staff status:
SELECT is_active, is_staff, COUNT(id) AS total FROM auth_user GROUP BY is_active, is_staff
(User.objects .values('is_active', 'is_staff') .annotate(total=Count('id')))
The result of this query includes is_active
, is_staff
and the number of users in each group.
How to Group by an Expression
Another common use case for GROUP BY is to group by an expression. For example, count the number of users that joined each year:
SELECT EXTRACT('year' FROM date_joined), COUNT(id) AS total FROM auth_user GROUP BY EXTRACT('year' FROM date_joined)
(User.objects .values('date_joiend__year') .annotate(total=Count('id')))
Notice that to get the year from the date we used the special expression <field>__year
in the first call to values()
. The result of the query is a dict, and the name of the key will be date_joined__year
.
Sometimes, the built-in expressions are not enough, and you need to aggregate on a more complicated expression. For example, group by users that have logged in since they signed-up:
SELECT last_login > date_joined AS logged_since_joined, COUNT(id) AS total FROM auth_user GROUP BY last_login > date_joined
from django.db.models import ( ExpressionWrapper, Q, F, BooleanField, ) (User.objects .annotate( logged_since_joined=ExpressionWrapper( Q(last_login__gt=F('date_joined')), output_field=BooleanField(), ) ) .values('logged_since_joined') .annotate(total=Count('id')) .values('logged_since_joined', 'total')
The expression here is fairly complicated. We first use annotate
to built the expression, and we mark it as a GROUP BY key by referencing the expression in the following call to values()
. From here on, it's exactly the same.
How to Use Conditional Aggregation
Using conditional aggregation, you can aggregate only a part of the group. Conditions come in handy when you have multiple aggregates. For example, count the number of staff and non-staff users by the year they signed-up:
SELECT EXTRACT('year' FROM date_joined), COUNT(id) FILTER ( WHERE is_staff = True ) AS staff_users, COUNT(id) FILTER ( WHERE is_staff = False ) AS non_staff_users FROM auth_user GROUP BY EXTRACT('year' FROM date_joined)
from django.db.models import F, Q (User.objects .values('date_joined__year') .annotate( staff_users=( Count('id', filter=Q(is_staff=True)) ), non_staff_users=( Count('id', filter=Q(is_staff=False)) ), ))
The SQL above is from PostgreSQL, which along with SQLite is currently the only database backend that supports the FILTER
syntax shortcut (formally called "selective aggregates" ). For other database backends, the ORM will use CASE ... WHEN
instead.
tip
I previously wrote about aggregations with filters. Check out my 9 Django tips for working with databases .
How to Use Having
The HAVING
clause is used to filter on the result of an aggregate function. For example, find the years in which more than a 100 users joined:
SELECT is_active, COUNT(id) AS total FROM auth_user GROUP BY is_active HAVING COUNT(id) > 100
(User.objects .annotate(year_joined=F('date_joined__year')) .values('is_active') .annotate(total=Count('id')) .filter(total__gt=100))
The filter on the annotated field total
added an HAVING clause in the generated SQL.
How to Group by Distinct
For some aggregate functions such as COUNT
, it is sometimes desirable to only count distinct occurrences. For example, how many different last names are there per user active status:
SELECT is_active, COUNT(id) AS total, COUNT(DISTINCT last_name) AS unique_names FROM auth_user GROUP BY is_active
(User.objects .values('is_active') .annotate( total=Count('id'), unique_names=Count('last_name', distinct=True), ))
Notice the use of distinct=True
in the call to Count
.
How to Create Expressions Using Aggregate Fields
Aggregate fields are often just the first step to a greater question. For example, what is the percent of unique last names by user active status:
SELECT is_active, COUNT(id) AS total, COUNT(DISTINCT last_name) AS unique_names, (COUNT(DISTINCT last_name)::float / COUNT(id)::float) AS pct_unique_names FROM auth_user GROUP BY is_active
from django.db.models import FloatField from django.db.models.functions import Cast (User.objects .values('is_active') .annotate( total=Count('id'), unique_names=Count('last_name', distinct=True), ) .annotate(pct_unique_names=( Cast('unique_names', FloatField()) / Cast('total', FloatField()) ))
The first annotate()
defines the aggregate fields. The second annotate()
uses the aggregate function to construct an expression.
How to Group By Across Relations
So far we've used only data in a single model, but aggregates are often used across relations. The simpler scenario is of a one-to-one or a foreign key relation. For example, say we have a UserProfile
with a one-to-one relationship to the User, and we want to count users by the type of profile:
SELECT p.type, COUNT(u.id) AS total FROM auth_user u JOIN user_profile p ON u.id = p.user_id GROUP BY p.type
(User.objects .values('user_profile__type') .annotate(total=Count('id')))
Just like GROUP BY expressions, using relations in values
will group by that field. Note that the name of the user profile type in the result will be 'user_profile__type'.
How to Group By a Many to Many Relationship
A more complicated type of relation is the many to many relationship. For example, count in how many groups each user is a member:
SELECT u.id, COUNT(ug.group_id) AS memberships FROM auth_user LEFT OUTER JOIN auth_user_groups ug ON ( u.id = ug.user_id ) GROUP BY u.id
(User.objects .annotate(memberships=Count('groups')) .values('id', 'memberships'))
A user can be a member of more than one group. To count the number of groups the user is member of we used the related name "groups" in the User
model. If the related name is not explicitly set (and not explicitly disabled), Django will automatically generate a name in the format {related model model}_set
. For example, group_set
.
To dig deeper into the ORM and GROUP BY in particular, check out these links:
- How to use grouping sets in Django : An article about advanced group by technics such as group by cube, group by rollup and group by grouping sets.
- How to Get the First or Last Value in a Group Using Group By in SQL : A neat little trick using arrays in PostgreSQL.
- 12 Common Mistakes and Missed Optimization Opportunities in SQL : Some SQL do's and dont's you need to know if you are working with data and writing SQL.
- Django Aggregation cheat-sheet page : How to do common aggregate queries.
以上所述就是小编给大家介绍的《Understand Group by in Django with SQL》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Web全栈工程师的自我修养
余果 / 人民邮电出版社 / 2015-9-1 / 49.00
全栈工程师正成为 IT 行业的新秀,论是上市互联网公司还是创业公司,都对全栈工程师青睐有加。本书作者是腾讯公司高级工程师,在前端、后端和APP开发方面都有丰富的经验,在本书中分享了全栈工程师的技能要求、核心竞争力、未来发展方向、对移动端的思考。除此之外,本书还详细记录了作者从零开始、学习成长的心路历程。 本书内容全面,客观务实,适合互联网行业新人、程序员,以及期待技术转型的从业者阅读参考。一起来看看 《Web全栈工程师的自我修养》 这本书的介绍吧!