Skip to content

Stream POST request in order to handle large files#161

Merged
ColdHeat merged 7 commits into
CTFd:masterfrom
maelle-le-bon:master
May 18, 2026
Merged

Stream POST request in order to handle large files#161
ColdHeat merged 7 commits into
CTFd:masterfrom
maelle-le-bon:master

Conversation

@maelle-le-bon
Copy link
Copy Markdown
Contributor

@maelle-le-bon maelle-le-bon commented Nov 10, 2024

The bug

I ran into a problem last year: when I tried to create or synchronize a challenge containing a large file (i.e. a forensics challenge with a 15 GB disk image), the entire file was put into memory before starting the request

This causes crashes since I only have 16GB of RAM in my computer.

The cause

Although the requests module supports body streaming when you pass a file pointer to the data parameter, it is not capable of streaming form-data.

When the requests module prepares the headers, it tries to calculate the Content-Length. As a result, the entire body will be stored in memory.

The fix

One solution would be to switch to another HTTP client, capable of streaming form-data.

I chose to modify as little code as possible. I made the choice to delegate the body encoding to the MultipartEncoder from the requests-toolbelt module. This requires a few modifications to the API class, since the MultipartEncoder takes parameters differently from requests.

As a result files must be sent with a filename hint:

# Before
api.post("/api/v1/files", files=[
    ( "file", open("./file.ova") )
], data={ ... })

# After
api.post("/api/v1/files", files={
    "file": ( "file.ova", open("./file.ova"))
}, data={ ... })

# If you want to send multiple files under the key "file", you can use tuple or list instead of dict
api.post("/api/v1/files", files=[
    ("file", ( "file.ova", open("./file.ova"))),
    ("file", ( "description.txt", open("./description.txt")))
], data={ ... })

The Multipart encoder helps requests to upload large files without the need to read the entire file in memory
@ColdHeat
Copy link
Copy Markdown
Member

Did you try compressing the file? I feel like streaming is okay but I think really the issue is that disk images are really huge.

@maelle-le-bon
Copy link
Copy Markdown
Contributor Author

maelle-le-bon commented Nov 26, 2024

Even when compressing asset, it is really common to upload very large files. It could be .ova or android disk image for example.

Storing large file into memory just to send a HTTP request is not ok. It's a known limitation of the requests module and it's a shame we need to use another module to achieve what should be a normal behavior.

(Sorry if I made grammar mistakes it's 2am in France)

@ColdHeat
Copy link
Copy Markdown
Member

My point is more that instead of switching out the behavior in ctfcli, it probably would have been better to just compress your file.

While I am roughly okay with the PR and using streaming, I am not sure if ctfcli should support behavior like this. No one really wants to download a 16 GB file.

@maelle-le-bon
Copy link
Copy Markdown
Contributor Author

maelle-le-bon commented Nov 26, 2024

I understand but sometimes we have simply no choice 😅

For example, this is a CTF we organize each year. The last edition, we got files up to 2.5GB. During deployments this caused big spikes of RAM usage in order to send the file, and sometimes it caused OOM errors.
https://github.com/BreizhCTF/breizhctf-2024/blob/main/Forensic/Tampered/dist/bzhctf.ova

Forensic challenges can be really big.16GB was the largest archive I've ever seen (I was a disk dump from a Windows server if I remember correctly)

@pl4nty
Copy link
Copy Markdown
Contributor

pl4nty commented Nov 30, 2024

This would help me too, I often have compressed forensics artefacts in the 1-5GB range. I upload to CTFd Cloud using beefy CI runners which don't crash, but synchronous reading into memory makes the upload pretty slow. My other workaround was manually uploading to external blob storage, then linking from CTFd

@Zeecka
Copy link
Copy Markdown

Zeecka commented May 17, 2026

I ran into the same issue. No one really wants to download a 16 GB file. -> +16 GB compressed disc images for Forensic local training is definitely a use case. I'm already aware of the importance of using high ratio compression on such files. But CTFd is not only intended to be used on remote CTF with low bandwidth.

@ColdHeat
Copy link
Copy Markdown
Member

I think that's a stronger rationale and I think in truth limits need to truly enforced versus how they are ultimately hit.

I will fix the merge conflicts and accept.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces requests' built-in multipart encoding with requests_toolbelt.MultipartEncoder in ctfcli.core.api.API.request, so that large file uploads (e.g. multi-GB challenge artifacts) are streamed instead of being buffered in memory to compute Content-Length. The new encoder takes fields in a different shape, so callers must now pass each file as (filename, fileobj) tuples, and int form values are coerced to strings since MultipartEncoder rejects non-string values.

Changes:

  • Rewrite API.request to detect form-data calls and route them through MultipartEncoder, setting Content-Type from the encoder.
  • Update file-upload call sites (_create_file, _create_all_files) to the new (filename, fileobj) value shape.
  • Add requests-toolbelt==1.0.0 dependency and regenerate uv.lock.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.

File Description
ctfcli/core/api.py Builds multipart body via MultipartEncoder for streaming uploads
ctfcli/core/challenge.py Updates _create_file/_create_all_files to the new files= shape
pyproject.toml Adds requests-toolbelt==1.0.0 dependency
uv.lock Regenerated lockfile (revision bump and upload_time fields)

Note: _create_solution in ctfcli/core/challenge.py (around line 587–594) still passes ("file", open(...)) to files=, which is incompatible with the new MultipartEncoder contract; this is outside the diff regions so a comment could not be attached there, but it should be fixed in the same PR.

Comments suppressed due to low confidence (1)

ctfcli/core/api.py:67

  • Only dict is handled here, but the outer guard on line 58 also admits any Mapping. A non-dict Mapping (e.g. an OrderedDict subclass with a custom type, or another mapping) would skip the value-stringification path and fields would be left empty. Use isinstance(data, Mapping) for consistency with the outer check.
            if isinstance(data, dict):
                # int are not allowed as value in MultipartEncoder
                fields = list(map(lambda v: (v[0], str(v[1]) if isinstance(v[1], int) else v[1]), data.items()))

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ctfcli/core/api.py
Comment on lines +65 to +70
if isinstance(data, dict):
# int are not allowed as value in MultipartEncoder
fields = list(map(lambda v: (v[0], str(v[1]) if isinstance(v[1], int) else v[1]), data.items()))

if files is not None:
if isinstance(files, dict):
Comment thread ctfcli/core/api.py
Comment on lines +66 to +67
# int are not allowed as value in MultipartEncoder
fields = list(map(lambda v: (v[0], str(v[1]) if isinstance(v[1], int) else v[1]), data.items()))
Comment thread ctfcli/core/api.py Outdated
Comment on lines +75 to +80

return super(API, self).request(
method,
url,
data=multipart,
headers={"Content-Type": multipart.content_type},
Comment thread ctfcli/core/api.py
Comment thread ctfcli/core/api.py Outdated
Comment on lines +64 to +80
fields = list()
if isinstance(data, dict):
# int are not allowed as value in MultipartEncoder
fields = list(map(lambda v: (v[0], str(v[1]) if isinstance(v[1], int) else v[1]), data.items()))

if files is not None:
if isinstance(files, dict):
files = list(files.items())
fields.extend(files) # type: ignore

multipart = MultipartEncoder(fields)

return super(API, self).request(
method,
url,
data=multipart,
headers={"Content-Type": multipart.content_type},
@ColdHeat ColdHeat merged commit 44da596 into CTFd:master May 18, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants