CNK's Blog

S3 bucket configuration for Wagtail

We host our websites in Docker containers using Fargate on AWS. This means we don’t have a permanent file system so we need to use S3 to store the media files our users upload. Fortunately there is a Django package to add a variety of different file storage options, including S3. Setting up S3 via django-storages is pretty straightforward - install the package, configure storages.backends.s3.S3Storage as the storage backend, and include AWS access information in your environment variables.

Configuring the S3 Bucket

The new default for S3 buckets is to block all public access. This is appropriate for documents which we may need to be private. But an authentication token in the query string interferes with browsers caching images. So I would like image files to be public while still keeping documents private.

First step is to turn off aspects of the S3 public access block so we can install a bucket policy to make images public. We like to manage our AWS resources via Terraform. So we need to create the bucket and configure the access block.

    resource "aws_s3_bucket" "example" {
      bucket = "my-tf-test-bucket"
    }

    resource "aws_s3_bucket_public_access_block" "example" {
      bucket = aws_s3_bucket.example.id

      block_public_acls       = true
      block_public_policy     = false  # Temporarily turn off this block until we add a bucket policy
      ignore_public_acls      = true
      restrict_public_buckets = false  # This needs to be false so the bucket policy works
    }

Then we add a bucket policy.

    # Allow public read access to images in our s3 bucket
    data "aws_iam_policy_document" "images_public_read_policy" {
      # First our normal 'all access from account'
      statement {
        actions = ["s3:*"]

        principals {
          type        = "AWS"
          identifiers = ["arn:aws:iam::${local.account_id}:root"]
        }

        resources = [
          "${module.storage-bucket.s3-bucket-arn}",
          "${module.storage-bucket.s3-bucket-arn}/*",
        ]
      }

      statement {
        actions = ["s3:GetObject"]

        principals {
          type        = "AWS"
          identifiers = ["*"]  # Allow access to everyone (public)
        }

        # Block Public Access settings can allow public access to specific
        # resources, but not the enitre bucket. Set restrict_public_buckets = false
        # to allow a policy that allows access to specific resources.
        resources = [
          "${module.storage-bucket.s3-bucket-arn}/images/*",
          "${module.storage-bucket.s3-bucket-arn}/original_images/*",
        ]
      }
    }

    resource "aws_s3_bucket_policy" "public_images_policy" {
      bucket = aws_s3_bucket.example.id
      policy = data.aws_iam_policy_document.images_public_read_policy.json
    }

Once that bucket policy is in place, we can change block_public_policy back to true again to prevent changes.

    resource "aws_s3_bucket_public_access_block" "example" {
      bucket = aws_s3_bucket.example.id

      block_public_acls       = true
      block_public_policy     = true
      ignore_public_acls      = true
      restrict_public_buckets = false  # This needs to be false so the bucket policy works
    }

Configuring Django STORAGES

The terraform / AWS code above gives us S3 objects that behave as we want them to - objects inside the images or original_images directories can be viewed without authentication but objects anywhere else in the bucket need a token. However, my Django project still creates urls that have authentication tokens in their urls - for documents and images. Not what I wanted.

To get the image and document urls to behave differently, I need to configure two different areas kinds of storage - the default one is private and Django will give us urls with authentication query strings and we create an images storage that produces public urls.

    # settings.py
    AWS_STORAGE_BUCKET_NAME = env('AWS_STORAGE_BUCKET_NAME', default=None)
    if AWS_STORAGE_BUCKET_NAME:
        AWS_S3_REGION_NAME = env('AWS_DEFAULT_REGION', default='us-west-2')
        STORAGES = {
            "default": {
                "BACKEND": "storages.backends.s3.S3Storage"
            },
            "images": {
                "BACKEND": "storages.backends.s3.S3Storage",
                'OPTIONS': {
                    'querystring_auth': False,
                }
            },
            "staticfiles": {
                "BACKEND": "django.contrib.staticfiles.storage.StaticFilesStorage",
            },
        }
    else:
        STORAGES = {
            "default": {
                "BACKEND": "django.core.files.storage.FileSystemStorage",
            },
            "images": {
                "BACKEND": "django.core.files.storage.FileSystemStorage",
            },
            "staticfiles": {
                "BACKEND": "django.contrib.staticfiles.storage.StaticFilesStorage",
            },
        }

Then we need to make Wagtail’s images use this images storage. To do this, we create a custom image class and set the storage in the file field definition.

from django.core.files.storage import storages
from wagtail.images.models import AbstractImage, WagtailImageField, get_upload_to

class CustomImage(AbstractImage):
    # Get the 'images' storage from the storages defined in settings.py
    image_storage = storages['images']

    file = WagtailImageField(
        verbose_name=("file"),
        storage=image_storage,
        upload_to=get_upload_to,
        width_field="width",
        height_field="height",
    )

However, most image urls are actually for renditions. So what we really need to do is get renditions to use the images storage too. Wanting to serve renditions from a different location isn’t uncommon so there is a setting for that: WAGTAILIMAGES_RENDITION_STORAGE

    # in settings.py
    WAGTAILIMAGES_IMAGE_MODEL = 'core.CustomImage'
    # Use the images storage so we don't get auth querystrings!!
    WAGTAILIMAGES_RENDITION_STORAGE = 'images'