Last updated on December 24, 2021
If I do an OR search filter on the fields of the model and the m2m model associated with it, the annotate, the query will take a long time to execute.(1500-2000ms)
If I remove Q(tags__name__icontains=value)
from the filter in the following queryset It works in about 30-50ms, so I think the cause is a problem with m2m.
Filtering the m2m field from the model that is tied to m2m will loop through the entire through table, which in my opinion is time consuming. How can I rewrite the queryset to improve this?
Video: 300k rows, Tag: 5k rows, video_tag_through: 1.3m rows
# models.py
class Tag(models.Model):
name = models.CharField(unique=True, max_length=30)
created_at = models.DateTimeField(default=timezone.now)
...
class Video(models.Model):
title = models.CharField(max_length=300)
tags = models.ManyToManyField(Tag, blank=True)
updated_at = models.DateTimeField(auto_now=True)
...
class History(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
video = models.ForeignKey(Video, on_delete=models.CASCADE)
...
class Favorite(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE
video = models.ForeignKey(Video, on_delete=models.CASCADE)
...
class Playlist(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
is_wl = models.BooleanField(default=False, editable=False)
...
class Track(models.Model):
playlist = models.ForeignKey(Playlist, on_delete=models.CASCADE, null=True)
video = models.ForeignKey(Video, on_delete=models.CASCADE)
...
This may sound complicated and confusing, but the query set looks like this
# query
Video.objects.annotate(
is_viewed=Exists(History.objects.filter(user=user, video=OuterRef("pk"))),
is_favorited=Exists(
Favorite.objects.filter(user=user, video=OuterRef("pk"))
),
is_wl=Exists(
Track.objects.filter(
playlist__user=user, playlist__is_wl=True, video=OuterRef("pk")
)
),
).filter(
Q(title__icontains=value) | Q(tags__name__icontains=value),
is_public=True,
published_at__lte=timezone.now(),
).order_by(
"-published_at"
).distinct()[:20]
The following is the query that was issued and the execution plan.
SELECT DISTINCT "videos_video"."id",
"videos_video"."published_at",
EXISTS
(SELECT (1) AS "a"
FROM "videos_history" U0
WHERE (U0."user_id" IS NULL
AND U0."video_id" = "videos_video"."id")
LIMIT 1) AS "is_viewed",
EXISTS
(SELECT (1) AS "a"
FROM "videos_favorite" U0
WHERE (U0."user_id" IS NULL
AND U0."video_id" = "videos_video"."id")
LIMIT 1) AS "is_favorited",
EXISTS
(SELECT (1) AS "a"
FROM "videos_track" U0
INNER JOIN "videos_playlist" U1 ON (U0."playlist_id" = U1."id")
WHERE (U1."is_wl"
AND U1."user_id" IS NULL
AND U0."video_id" = "videos_video"."id")
LIMIT 1) AS "is_wl"
FROM "videos_video"
LEFT OUTER JOIN "videos_video_tags" ON ("videos_video"."id" = "videos_video_tags"."video_id")
LEFT OUTER JOIN "videos_tag" ON ("videos_video_tags"."tag_id" = "videos_tag"."id")
WHERE ("videos_video"."is_public"
AND "videos_video"."published_at" <= '2021-12-24 08:16:29.506387+00:00'
AND (UPPER("videos_video"."title"::text) LIKE UPPER('%word%')
OR UPPER("videos_tag"."name"::text) LIKE UPPER('%word%')))
ORDER BY "videos_video"."published_at" DESC
LIMIT 20;
EXPLAIN ANALYZE
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=875859.76..875860.06 rows=20 width=27) (actual time=1479.900..1506.379 rows=0 loops=1)
-> Unique (cost=875859.76..876179.61 rows=21323 width=27) (actual time=1149.773..1176.252 rows=0 loops=1)
-> Sort (cost=875859.76..875913.07 rows=21323 width=27) (actual time=1149.772..1176.251 rows=0 loops=1)
Sort Key: videos_video.published_at DESC, videos_video.id, ((hashed SubPlan 2)), ((hashed SubPlan 4)), ((hashed SubPlan 6))
Sort Method: quicksort Memory: 25kB
-> Gather (cost=28681.08..874326.63 rows=21323 width=27) (actual time=1149.765..1176.242 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Left Join (cost=27681.08..51237.97 rows=8885 width=24) (actual time=1120.558..1120.598 rows=0 loops=3)
Hash Cond: (videos_video_tags.tag_id = videos_tag.id)
Filter: ((upper((videos_video.title)::text) ~~ '%WORD%'::text) OR (upper((videos_tag.name)::text) ~~ '%WORD%'::text))
Rows Removed by Filter: 446092
-> Parallel Hash Left Join (cost=27506.95..49598.28 rows=557509 width=132) (actual time=375.071..506.831 rows=446092 loops=3)
Hash Cond: (videos_video.id = videos_video_tags.video_id)
-> Parallel Seq Scan on videos_video (cost=0.00..11106.55 rows=120764 width=116) (actual time=0.029..15.911 rows=96611 loops=3)
Filter: (is_public AND (published_at <= '2021-12-24 08:16:29.506387+00'::timestamp with time zone))
-> Parallel Hash (cost=16726.09..16726.09 rows=557509 width=32) (actual time=321.399..321.399 rows=446007 loops=3)
Buckets: 65536 Batches: 32 Memory Usage: 3200kB
-> Parallel Seq Scan on videos_video_tags (cost=0.00..16726.09 rows=557509 width=32) (actual time=183.551..220.850 rows=446007 loops=3)
-> Hash (cost=104.61..104.61 rows=5561 width=29) (actual time=1.694..1.695 rows=5561 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 400kB
-> Seq Scan on videos_tag (cost=0.00..104.61 rows=5561 width=29) (actual time=0.016..0.453 rows=5561 loops=3)
SubPlan 2
-> Bitmap Heap Scan on videos_history u0 (cost=4.18..12.63 rows=4 width=16) (never executed)
Recheck Cond: (user_id IS NULL)
-> Bitmap Index Scan on videos_history_user_id_9a1343c1 (cost=0.00..4.18 rows=4 width=0) (never executed)
Index Cond: (user_id IS NULL)
SubPlan 4
-> Bitmap Heap Scan on videos_favorite u0_1 (cost=4.19..12.65 rows=5 width=16) (never executed)
Recheck Cond: (user_id IS NULL)
-> Bitmap Index Scan on videos_favorite_user_id_c4289dec (cost=0.00..4.19 rows=5 width=0) (never executed)
Index Cond: (user_id IS NULL)
SubPlan 6