Skip to content

feat(security): enforce strict file-type validation on uploads#3092

Open
ukanga wants to merge 12 commits into
mainfrom
enforce-strict-file-type-on-uploadsa
Open

feat(security): enforce strict file-type validation on uploads#3092
ukanga wants to merge 12 commits into
mainfrom
enforce-strict-file-type-on-uploadsa

Conversation

@ukanga
Copy link
Copy Markdown
Member

@ukanga ukanga commented May 26, 2026

Changes / Features implemented

Deny-by-default upload validation across all entry points via a central upload_validation module: per-extension MIME allowlist + magic-byte/structural content checks, configurable size caps (STRICT_UPLOAD_MAX_BYTES), and UUID storage names so client-supplied filenames never reach disk. Covers XLSForm publishing, CSV/XLS imports, form media, and supporting docs. Adds an RFC 6266 Content-Disposition parser and sets X-Content-Type-Options: nosniff on file downloads.

Steps taken to verify this change does what is intended

flake8, isort, black clean; unit tests for the validator and parser pass.

Side effects of implementing this change

Stricter uploads: the legacy content-type-only allowlist (and audio/SVG/DOC-OLE/ZIP types) is removed. Operators raising a per-extension cap must also raise nginx client_max_body_size.

Before submitting this PR for review, please make sure you have:

  • Included tests
  • Updated documentation

Closes #

Comment thread onadata/apps/main/forms.py Fixed
Comment thread onadata/apps/api/viewsets/xform_viewset.py Fixed
Comment thread onadata/apps/api/viewsets/xform_viewset.py Fixed
Comment thread onadata/libs/serializers/metadata_serializer.py Fixed
Comment thread onadata/libs/utils/logger_tools.py Fixed
Comment thread onadata/apps/logger/models/xform.py Outdated
``settings.id_string``.
"""
extension = os.path.splitext(os.path.split(filename)[1])[1].lower()
return os.path.join(instance.user.username, "xls", f"{uuid.uuid4().hex}{extension}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could be the negative implications of doing this? I guess one is that downloading an XLS file now will have the random name instead of the original name?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if accessing the storage directly, not through the API. By default, the downloaded file will now have the {id_string}.{extension} format. For example, the form Registration Form with the id_string = 'reg_form' will result in the XLSForm file reg_form.xlsx.

Comment thread onadata/apps/main/models/meta_data.py Outdated
except UploadValidationError as error:
raise ValidationError({"data_file": str(error)}) from error

data_file.name = upload.storage_basename
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation appears similar to supporting_docs docs implementation. Can we DRY?

return os.path.join(username, "xls", os.path.split(filename)[1])
original_basename = os.path.split(filename)[1]
extension = os.path.splitext(original_basename)[1].lower()
return os.path.join(username, "xls", f"{uuid.uuid4().hex}{extension}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is similar to the one in xform.py, can we DRY?

@ukanga ukanga force-pushed the enforce-strict-file-type-on-uploadsa branch 2 times, most recently from 0f4c038 to 8b2db81 Compare May 26, 2026 13:49
Comment thread onadata/apps/main/forms.py Dismissed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants