Union queries
Union queries are surprisingly easy to create. I need a list of ids and content type ids from a bunch of different models. I was very surpised at how straightforward it is in Django 3.2 to create the UNION query I want.
union_query = None
for content_type in <queryset of content types>:
model = apps.get_model(content_type.app_label, content_type.model)
query = model.objects.filter(<criteria>).values('pk', <content_type.id>)
if union_query is None:
union_query = query
else:
union_query = union_query.union(query, all=True)
Note: I used all=True
because I will never have duplicates in my
(id, content_type_id)
tuples and UNION ALL is faster than UNION in
this case because we can skip the DISTINCT
operation on the final
result.
The observant among you will have noticed a bit of pseudocode in the
example above. I want to insert the content_type_id
from python into
my query. In SQL this would be something like:
SELECT id, 99 FROM myapp_model;
In the Django ORM, that turns out to be something I didn’t know how to
do. I can’t leave it as a bare name and I can’t quote it or the ORM
tries to turn it into a column name or relation that could be turned
into a column name. Turns out I need to use Value
:
query = model.objects \
.filter(<criteria>) \
.values('pk', Value(content_type.id, output_field=IntegerField()))
OK so that now will give me a queryset that produces a list of dicts
like: [{pk: 3, content_type_id: 44}, {pk: 3, content_type_id: 48}]
But when I tried to use those results in the filter section of another
query… I had my next problem.
Querying by value - without Foreign Key relationships
So now I need to use those ids and content_type_ids to filter another
model that has rows with content_type_id and object_id columns. I want
all the lines in the table for the ModelLogEntry model where the
(object_id, content_type_id)
tuple is in the list of (pk,
content_type_id)
tuples created by our UNION query above.
If I only needed to match on a single value, I would probably evaluate
the UNION query, and then do something like .filter(pk__in=<list of
pks>)
- as I did to get the list of content types I need. But I need
to match the id and content_type_id fields. In SQL, I would do:
SELECT wagtailcore_modellogentry.*
FROM wagtailcore_modellogentry
INNER JOIN (
(
((SELECT `link_farms_audience`.`id`, 104 AS `content_type_id` FROM `link_farms_audience` WHERE `link_farms_audience`.`site_id` = 12)
UNION
(SELECT `link_farms_collection`.`id`, 105 AS `content_type_id` FROM `link_farms_collection` WHERE `link_farms_collection`.`site_id` = 12))
UNION
(SELECT `link_farms_link`.`id`, 106 AS `content_type_id` FROM `link_farms_link` WHERE `link_farms_link`.`site_id` = 12))
UNION
(SELECT `core_didyouknowitem`.`id`, 110 AS `content_type_id` FROM `core_didyouknowitem` WHERE `core_didyouknowitem`.`site_id` = 12 ORDER BY `core_didyouknowitem`.`text` ASC)
) AS models
ON models.id = wagtailcore_modellogentry.object_id
AND models.content_type_id = wagtailcore_modellogentry.content_type_id
This was relatively straightforward to write in SQL, so I tried using
raw SQL, e.g. ModelLogQuery.objets.raw('<query here>')
. That
definitely gave me the data I was looking for when I ran it in
shell_plus. But when I tried to use it in my monkey patch, the calling
function wanted to use values()
, which is a function only defined on
real ORM QuerySets - and not available when using raw
.
At this point I suspect I won’t want to use this in production. Goodness only knows how big the union query is likely to get. But it is bothering me that I don’t know how to get Django to let me do a relatively straightforward join without having defined a ForeignQuery relationship in my Python model code.
I still don’t know how to tell Django “do this join damn it!”, but after some reading and thinking of alternate ways to write the SQL, I think I have found a way to write this in the ORM using Exists to create a correlated subquery.
from django.apps import apps
from django.db.models import Exists, IntegerField, OuterRef, Value
from django.contrib.contenttypes.models import ContentType
request = get_current_request()
site = Site.find_for_request(request)
union_query = None
content_types = (
ContentType.objects
.filter(id__in=ModelLogEntry.objects.values_list('content_type_id', flat=True).distinct())
)
for content_type in content_types:
model = apps.get_model(content_type.app_label, content_type.model)
query = (model.objects.filter(site_id=site.id)
.values('pk', content_type_id=Value(content_type.id, output_field=IntegerField()))
)
if union_query is None:
union_query = query
else:
union_query = union_query.union(query, all=True)
return ModelLogEntry.objects.filter(Exists(
union_query.filter(pk=OuterRef('object_id'), content_type_id=OuterRef('content_type_id'))
))
Sigh. One can’t combine .filter
with a union query.
NotSupportedError at /admin/reports/site-history/
Calling QuerySet.filter() after union() is not supported.
I tested the Exists query by setting the union_query
to be just one
type and it works fine. So I learned something useful about the Django
ORM - even if I can’t apply that knowledge in the context in which I
wanted to to use it this time.