Skip to content

Conversation

@r-bit-rry
Copy link
Contributor

@r-bit-rry r-bit-rry commented Dec 3, 2025

What does this PR do?

Remove support for file:// URIs in RAGDocument content. Previously, the URI regex pattern matched file:// but the code then passed it to httpx which only supports http/https, causing an UnsupportedProtocol error.

Closes #3463

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 3, 2025
@r-bit-rry r-bit-rry marked this pull request as ready for review December 4, 2025 17:25
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is dangerous. please make it configurable and disabled by default.

alternatively, advise users to upload files using /v1/files and attach them with POST /v1/vector_stores or POST /v1/vector_stores/{id}/files.

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are actively deprecating the RAGDocument so we shouldn't be steering users this way. As @mattf we recommend the Files API.

@r-bit-rry
Copy link
Contributor Author

Thanks for the feedback @mattf @franciscojavierarceo - addressed the security concerns:

Changes:

  • file:// URIs now disabled by default via LLAMA_STACK_ALLOW_FILE_URI env var
  • Error guides users to Files API: "file:// URIs disabled. Use Files API (/v1/files) instead, or set LLAMA_STACK_ALLOW_FILE_URI=true."
  • Refactored content_from_doc() to eliminate duplicate code paths

Security retained:

  • Path sanitization (os.path.realpath())
  • Size limits (LLAMA_STACK_MAX_FILE_URI_SIZE_MB, default 100MB)

mattf
mattf previously requested changes Dec 7, 2025
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about using aiofiles, which gets you async stat and io?

@r-bit-rry
Copy link
Contributor Author

aiofiles

I'm not opposing in any way, but I don't see us using this anywhere in the project, which most likely means a new dependency, if we are ok with introducing a new dependency here, ok, but it might introduce new issues

@mattf mattf dismissed their stale review December 8, 2025 11:31

i'll help you get this in shape, but you'll need someone else to approve.

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aiofiles

I'm not opposing in any way, but I don't see us using this anywhere in the project, which most likely means a new dependency, if we are ok with introducing a new dependency here, ok, but it might introduce new issues

fyi, the localfs files impl isn't async either. it's arguably ok to wait to make this async.

@r-bit-rry
Copy link
Contributor Author

aiofiles

I'm not opposing in any way, but I don't see us using this anywhere in the project, which most likely means a new dependency, if we are ok with introducing a new dependency here, ok, but it might introduce new issues

fyi, the localfs files impl isn't async either. it's arguably ok to wait to make this async.

So I'm thinking of finishing this, the current way, then introducing aiofiles as a different RFE, see what the general feeling towards this, and then perform the changes in a different PR (or PRs)

@r-bit-rry r-bit-rry changed the title fix(memory/rag): add support of loading file from file:// uri fix(memory/rag): remove file:// uri prefix Dec 11, 2025
Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to update the title / summary of the PR now given the discussion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFE: Accept file:// for URI for RAGDocument

5 participants