Contents

Django : Bypass large file upload limit on Cloud Run

Introduction

Our team had a media upload problem in our CMS, developed using Wagtail, which failed with the http 413 Payload Too Large. The problem was a bit hard to develop a root cause analysis for since everything, configurations and all, seemed to be just fine and a quick search on the internet just led down a rabbit hole. Thanks to stack overflow, we realised the problem was in Google’s Cloud Run which has a request size limit of 32 MB and the proposed solution was to use signed URLs to bypass it.

Sounds pretty straightforward but the devil is in the detail. It involves direct uploads to GCS in the frontend and making a subsequent request, without the large file, to our backend to store the additional details.

Signed URL Generation

First need to generate a temporary signed URL for use. Of note is that Cloud Run (and other compute platforms) do not inject a service account key file. Instead, they make access_tokens available on the instance metadata service. You can then exchange this access token with a JWT hence the use of google.auth.compute_engine. Lack of this will lead to another headache AttributeError: you need a private key to sign credentials. Here’s a github gist that really helped.

The view uses two parameters to generate the signed URL, file name and type. It returns a JSON response with the signed URL which is then used for uploads.

views.py

import datetime

from google.auth import compute_engine
from google.auth.exceptions import TransportError
from google.auth.transport import requests
from google.cloud import storage

from django.core.files.storage import get_storage_class
from django.http import JsonResponse
from django.views.generic import View

def generate_signed_upload_url(bucket_name, blob_name, content_type):
    auth_request = requests.Request()
    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(blob_name)

    signing_credentials = compute_engine.IDTokenCredentials(auth_request, "")
    url = blob.generate_signed_url(
        version="v4",
        credentials=signing_credentials,
        expiration=datetime.timedelta(hours=1),
        method="PUT",
        content_type=content_type,
    )

    return url

class SignedURLView(View):
    def post(self, request, *args, **kwargs):

        data = json.loads(request.body)

        file_name = data["fileName"]
        file_type = data["fileType"]

        bucket_name = "Bucket Name"

        blob_name = file_name.replace(" ", "_")

        signed_url = generate_signed_upload_url(
            bucket_name=bucket_name,
            blob_name=blob_name,
            content_type=file_type,
        )

        return JsonResponse({"url": signed_url})

CORS Configuration

CORS is a mechanism for Web services to announce that they will listen to certain requests from Web applications not hosted on their own servers. Google Cloud Storage and other cloud object storage implement this security feature and prevents uploads using signed urls. Allowed origins should therefore be configured on a per bucket basis. For google this guide, Configure cross-origin resource sharing, helps in setting up the configuration.

An example of a CORS policy

[
  {
    "origin": ["https://your-website.com"],
    "responseHeader": [
      "Content-Type",
      "Access-Control-Allow-Origin",
      "X-Upload-Content-Length",
      "X-Goog-Resumable"
    ],
    "method": ["PUT", "OPTIONS"],
    "maxAgeSeconds": 3600
  }
]

Upload Handling and Template

Due to the extensible nature of wagtail, we use wagtailmedia to provide functionality for media files handling. The approach used below is adaptable to any view and template that has a form that does uploads. It’s just HTML and javascript.

The Upload form should also override the save functionality to provide only the file name since upload is already done using a signed URL. This assumes the storage class is also configured to use google cloud storage. For deeper dive, the following documentation managing files in django, file uploads in django and django storages

In <script> ... </script> is where all the magic happens.

  • Capture the selected file in the form field using an event listener
  • During form submission make a call to get the signed URL
  • Perform upload using XMLHttpRequest
  • After upload is complete make the backend call with the additional form data

forms.py

from django.forms import ModelForm
from .models import Media
from .utils import generate_media_name

class MediaForm(ModelForm):

    def save(self, commit=True):
        instance = super().save(commit=False)

        # name should match the blob name generated for the signed url
        instance.file = generate_media_name(instance.file.name)

        if commit:
            instance.save()

        return instance

    class Meta:
        model = Media

/templates/{django app}/add.html

<div>
    <form id="fileForm" action="{% block action %}{% url 'wagtailmedia:add' media_type %}{% endblock %}" method="POST" enctype="multipart/form-data" novalidate>
        {% csrf_token %}
        <ul>
            {% for field in form %}
                {% include "wagtailadmin/shared/field_as_li.html" with field=field %}
            {% endfor %}
            <li>
                <button type="submit" id="submitBtn"><em>{% trans 'Upload' %}</em></button>
            </li>
        </ul>
    </form>
    <div>
        <h3 id="uploadStatus"></h3>
    </div>
</div>

<script>
    let file = null;

    function setFile(val) {
        file = val;
    }

    const submitButton = document.getElementById('submitBtn');

    document.getElementById('id_file').addEventListener('change', (event) => {
        setFile(event.target.files[0]);
    });

    const getSignedURL = async () => {
        const body = {
            fileName: file.name,
            fileType: file.type,
        };

        const response = await fetch("{% url 'signed-url' %}", {
            method: 'POST',
            body: JSON.stringify(body),
            headers: { 'Content-Type': 'application/json', 'X-CSRFToken': '{{ csrf_token }}' },
        });

        const { url } = await response.json();

        return url;
    };

    function progressHandler(event) {
        const percent = Math.round((event.loaded / event.total) * 100);
        document.getElementById('uploadStatus').innerHTML = `${percent}% uploaded... please wait`;
        submitButton.disabled = true;
    }

    function completeHandler() {
        document.getElementById('uploadStatus').innerHTML = 'Upload Completed';
        var formData = new FormData(document.getElementById(('fileForm'))
            var tempFile = new File(["temporary"], file.name, {
                type: file.type
            });
            formData.set('file', tempFile, tempFile.name);

            fetch("{% url 'wagtailmedia:add' media_type %}", {
                method: 'POST',
                redirect: 'follow',
                body: formData,
                headers: {
                    'X-CSRFToken': '{{ csrf_token }}'
                },
                mode: "same-origin",
            })
            .then(response => window.location.replace(response.url))
            .catch((err) => {
                alert('There was an error uploading your file.')
            });
    }

    function errorHandler() {
        submitButton.disabled = false;
        document.getElementById('uploadStatus').innerHTML = 'Upload Failed';
    }

    function abortHandler() {
        submitButton.disabled = false;
        document.getElementById('uploadStatus').innerHTML = 'Upload Aborted';
    }

    function uploadFile(signedURL) {
        const xhr = new XMLHttpRequest();

        xhr.upload.addEventListener('progress', progressHandler);
        xhr.addEventListener('load', completeHandler);
        xhr.addEventListener('error', errorHandler);
        xhr.addEventListener('abort', abortHandler);

        xhr.open('PUT', signedURL);
        xhr.setRequestHeader('Content-Type', file.type);
        xhr.send(file);
    }

    const handleSubmit = async (event) => {
        event.preventDefault();

        try {
            const url = await getSignedURL();

            uploadFile(url);

        } catch (err) {
            console.log(err);
            alert('There was an error uploading your file.');
        }
    };

    document.getElementById('fileForm').addEventListener('submit', (event) => {
        handleSubmit(event);
    });
</script>

Conclusion

The solution works and can enable uploading of large files up to a mind blowing 5TiB from 32MB. Additionally, it’s a lesson to always read Quotas and Limits when using managed infrastructure and services from a cloud service provider in order and evaluate whether workarounds are needed now or in the future when the need arises.

“Programming isn’t about what you know; it’s about what you can figure out.” - Chris Pine