Skip to content

feat(tools): allow tools to return File instances for multimodal context (#5758)#5759

Open
devin-ai-integration[bot] wants to merge 3 commits intomainfrom
devin/1778293043-add-file-tool
Open

feat(tools): allow tools to return File instances for multimodal context (#5758)#5759
devin-ai-integration[bot] wants to merge 3 commits intomainfrom
devin/1778293043-add-file-tool

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented May 9, 2026

Summary

Closes #5758.

Tools can now return crewai_files.FileInput instances (single, list/tuple, or dict) from their _run / _arun methods. The framework detects these returns, replaces the raw return with a short confirmation message that goes back to the LLM as the textual tool result, and attaches the files to the agent's most recent user message so they become part of the next LLM call's multimodal context.

This lets agents add files to their own context dynamically (e.g. a "fetch document" tool can hand back a PDFFile) without forcing the user to pre-load every file via inputs.files up front.

Changes

  • New helper crewai.utilities.tool_files.extract_files_from_tool_result — detects single/list/tuple/dict-of-BaseFile returns, picks unique keys from filename stems, and produces a confirmation message.
  • ToolResult gains an optional files: dict[str, FileInput] | None field (default None) to carry extracted files through to the executor.
  • ToolUsage._use / _ause call the helper after tool.invoke / tool.ainvoke, stash files on self._last_extracted_files, replace the textual result with the confirmation message, and disable caching when files are present.
  • tool_utils.execute_tool_and_check_finality (sync + async) reads _last_extracted_files off the ToolUsage instance and forwards it into the constructed ToolResult.
  • CrewAgentExecutor._handle_agent_action calls a new _attach_tool_files_to_messages when tool_result.files is set. The ReAct loop and _execute_single_native_tool_call (native function-calling path) both attach files to the most recent user message, merging with any existing files mapping.
  • LiteAgent mirrors the same _attach_tool_files_to_messages logic in its ReAct invoke loop.

Tests

  • 11 unit tests for extract_files_from_tool_result (covers single file, list, tuple, dict, empty collections, mixed lists, duplicate filename stems, generic File vs typed TextFile/ImageFile/PDFFile).
  • 11 integration-style tests in tests/tools/test_tool_file_returns.py covering ToolUsage._use extraction, execute_tool_and_check_finality propagation, _attach_tool_files_to_messages merge/append behavior, and _handle_agent_action end-to-end inside the executor (using model_construct to bypass full agent wiring).

All 22 new tests pass locally with ruff check and ruff format --check clean.

Review & Testing Checklist for Human

  • Run a real agent end-to-end with a tool that returns a TextFile / PDFFile / ImageFile and verify the LLM actually receives it as multimodal content on the next turn. The unit tests assert the files dict is attached to the user message, but they do not verify the LLM payload after _inject_multimodal_files runs. This is the highest-risk gap.
  • Native function-calling path (_execute_single_native_tool_call): the file extraction was added inline (with a local import) in addition to the ReAct path, but the new tests primarily cover the ReAct path. Worth exercising an agent configured with native tool calling that returns a file.
  • LiteAgent ReAct loop: same logic was duplicated into lite_agent.py but the new tests don't directly exercise LiteAgent._invoke_loop. A quick LiteAgent smoke test would help.
  • Side-channel via ToolUsage._last_extracted_files: the executor flows already construct a fresh ToolUsage per tool call, but if anything reuses a ToolUsage instance for multiple calls this could leak files across calls. Confirm this assumption holds.
  • Interaction with AddImageTool: _handle_agent_action now runs the file-attachment block before the AddImageTool special case. AddImageTool returns a {"role", "content"} dict (not a BaseFile) so extract_files_from_tool_result should return (None, None), but worth a quick check that AddImageTool still works.

Notes

  • The user's original feature request also floated the idea of a built-in AddFileTool (parallel to AddImageTool). This PR intentionally only adds the underlying capability — any tool can now return a file — and does not ship a built-in AddFileTool. That can be added in a follow-up if desired.
  • agent_utils.execute_single_native_tool_call (the standalone function used by step_executor and friends, not the executor method) was not updated. Tools used through that path will not have their file returns auto-attached. Left as a follow-up since it doesn't go through tool_usage.py.
  • Caching is force-disabled when a tool returns files, since cached strings would not bring the file objects back on subsequent invocations.

Link to Devin session: https://app.devin.ai/sessions/2b2c9c44b57c42feb4000e8f5c487f95

Summary by CodeRabbit

  • New Features

    • Tools can now return multimodal files (images, PDFs, text files, etc.). Returned files are detected, normalized into a human-readable confirmation, attached/merged into the conversation so subsequent agent responses include the file context, and tool results with files are excluded from caching.
  • Tests

    • Added tests covering file extraction, key generation/de-duplication, propagation into tool results, and attachment into agent message history.

Closes #5758

Tools can now return crewai_files.FileInput instances (or lists/dicts of
them) from their _run method to dynamically extend the agent's
multimodal context. The framework detects file returns, replaces the
raw return with a confirmation message, and attaches the files to the
most recent user message so subsequent LLM calls include them.

- Add extract_files_from_tool_result helper
- Extend ToolResult dataclass with optional files field
- Detect FileInput returns in ToolUsage._use / _ause
- Propagate files through execute_tool_and_check_finality
- Attach files to messages in CrewAgentExecutor (ReAct + native tool flows)
- Mirror file attachment in LiteAgent ReAct loop
- Add comprehensive unit tests
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Prompt hidden (unlisted session)

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 33f76253-dbdb-4d24-9016-085a5c0044d8

📥 Commits

Reviewing files that changed from the base of the PR and between 280e910 and 9229b78.

📒 Files selected for processing (1)
  • lib/crewai/src/crewai/utilities/tool_files.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/crewai/src/crewai/utilities/tool_files.py

📝 Walkthrough

Walkthrough

This PR implements support for tools returning file objects. When tools return BaseFile instances, the framework extracts them, generates confirmation messages, propagates them through ToolResult, and attaches the files to agent message history for multimodal LLM context.

Changes

Tool File Return Support

Layer / File(s) Summary
Data Model Contract
lib/crewai/src/crewai/tools/tool_types.py
ToolResult gains optional files: dict[str, FileInput] | None field to carry file mappings.
File Extraction Utilities
lib/crewai/src/crewai/utilities/tool_files.py
New module provides extract_files_from_tool_result() to detect and normalize BaseFile instances from tool results (single, list, tuple, or dict forms) with key generation and confirmation messages.
Tool Execution Integration
lib/crewai/src/crewai/tools/tool_usage.py
ToolUsage detects extracted files after sync/async tool invocation, replaces results with confirmation messages, disables caching, stores files in _last_extracted_files, and includes files in tools_results tracking.
Tool Result Propagation
lib/crewai/src/crewai/utilities/tool_utils.py
Both execute_tool_and_check_finality() and aexecute_tool_and_check_finality() capture extracted files and include them in returned ToolResult.
CrewAgentExecutor Integration
lib/crewai/src/crewai/agents/crew_agent_executor.py
Adds _attach_tool_files_to_messages() helper; attaches files during native tool execution and when processing ToolResult.files in _handle_agent_action.
LiteAgent Integration
lib/crewai/src/crewai/lite_agent.py
Adds _attach_tool_files_to_messages() helper; attaches ToolResult.files to messages in _invoke_loop before next LLM call.
Test Coverage
lib/crewai/tests/tools/test_tool_file_returns.py, lib/crewai/tests/utilities/test_tool_files.py
Validates file extraction from single/list/dict returns, ToolResult propagation, message attachment behavior, and end-to-end agent integration across both executor types.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Agent
  participant Tool
  participant Extractor
  User->>Agent: ask + context
  Agent->>Tool: invoke tool
  Tool-->>Agent: return (result or BaseFile(s))
  Agent->>Extractor: extract_files_from_tool_result(result)
  Extractor-->>Agent: files dict + message OR None
  alt files returned
    Agent->>Agent: attach files to last user message
    Agent->>Agent: replace tool result with confirmation message
  end
  Agent->>LLM: next LLM call includes attached files
Loading

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 A tool returns files with glee,
Extracted with logic so free,
Through messages they flow,
As multimodal they go,
Rich context for agents to see! 📄✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main feature: enabling tools to return File instances for multimodal context, which is the primary change in the changeset.
Linked Issues check ✅ Passed The PR fully addresses #5758's requirement: tools can now return File instances, which are automatically attached to agent context, reducing upfront token costs by enabling dynamic file loading.
Out of Scope Changes check ✅ Passed All changes are scoped to supporting tools returning File instances. No unrelated modifications detected in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch devin/1778293043-add-file-tool

Comment @coderabbitai help to get the list of available commands and usage tips.

tool_usage = self._make_tool_usage(tool)
calling = ToolCalling(tool_name="get_named_files", arguments={"name": "test"})

result = tool_usage.use(calling=calling, tool_string="Action: get_named_files")

from unittest.mock import MagicMock, patch

import pytest
from unittest.mock import MagicMock, patch

import pytest
from crewai_files import File, ImageFile, TextFile
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/agents/crew_agent_executor.py`:
- Around line 944-949: The code mutates shared executor state from worker
threads by calling _attach_tool_files_to_messages(...) inside
_execute_single_native_tool_call; remove that call and instead have
_execute_single_native_tool_call return any extracted_files (e.g., include them
in its returned execution_result or raw_result tuple). Then in
_handle_native_tool_calls, while iterating ordered_results on the main thread,
merge each execution_result["files"] into the executor's files/messages (call
_attach_tool_files_to_messages or an equivalent merge there) before appending
the reasoning prompt so file merges happen only on the main thread and avoid
races.

In `@lib/crewai/src/crewai/tools/tool_usage.py`:
- Around line 344-350: Reset stale extracted-file state by clearing
self._last_extracted_files at the start of the tool-invocation path and whenever
extraction fails: before calling extract_files_from_tool_result set
self._last_extracted_files = None, and after calling it, if extracted_files is
None explicitly set self._last_extracted_files = None (otherwise set it to
extracted_files when non-None). Apply the same change around the other
occurrence that invokes extract_files_from_tool_result so ToolResult.files
cannot be populated with a previous invocation's files.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b775eada-ac1e-4c57-87f4-362b126f9c4b

📥 Commits

Reviewing files that changed from the base of the PR and between e4a91cd and 90f4021.

📒 Files selected for processing (8)
  • lib/crewai/src/crewai/agents/crew_agent_executor.py
  • lib/crewai/src/crewai/lite_agent.py
  • lib/crewai/src/crewai/tools/tool_types.py
  • lib/crewai/src/crewai/tools/tool_usage.py
  • lib/crewai/src/crewai/utilities/tool_files.py
  • lib/crewai/src/crewai/utilities/tool_utils.py
  • lib/crewai/tests/tools/test_tool_file_returns.py
  • lib/crewai/tests/utilities/test_tool_files.py

Comment on lines +944 to +949
extracted_files, files_message = extract_files_from_tool_result(
raw_result
)
if extracted_files is not None:
self._attach_tool_files_to_messages(extracted_files)
raw_result = files_message
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not mutate self.messages from worker threads in native parallel tool execution.

In the parallel native path, _execute_single_native_tool_call runs in a thread pool, and this new call to _attach_tool_files_to_messages(...) mutates shared executor state from worker threads. That can race and lose/overwrite file merges.

Suggested fix approach
-                if extracted_files is not None:
-                    self._attach_tool_files_to_messages(extracted_files)
-                    raw_result = files_message
+                if extracted_files is not None:
+                    raw_result = files_message

         return {
             "call_id": call_id,
             "func_name": func_name,
             "result": result,
             "from_cache": from_cache,
             "original_tool": original_tool,
+            "files": extracted_files,
         }

Then, merge execution_result["files"] on the main thread (e.g., in _handle_native_tool_calls while iterating ordered_results) before appending the reasoning prompt.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/agents/crew_agent_executor.py` around lines 944 - 949,
The code mutates shared executor state from worker threads by calling
_attach_tool_files_to_messages(...) inside _execute_single_native_tool_call;
remove that call and instead have _execute_single_native_tool_call return any
extracted_files (e.g., include them in its returned execution_result or
raw_result tuple). Then in _handle_native_tool_calls, while iterating
ordered_results on the main thread, merge each execution_result["files"] into
the executor's files/messages (call _attach_tool_files_to_messages or an
equivalent merge there) before appending the reasoning prompt so file merges
happen only on the main thread and avoid races.

Comment on lines +344 to +350
extracted_files, files_message = extract_files_from_tool_result(
result
)
if extracted_files is not None:
result = files_message
self._last_extracted_files = extracted_files

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset _last_extracted_files per invocation to prevent stale file leakage.

_last_extracted_files is only assigned when extraction succeeds, so a later call/retry that does not extract files can still expose old files. This can incorrectly populate ToolResult.files from a previous execution (Line 344 and Line 589 flow).

Suggested fix
 def use(
     self, calling: ToolCalling | InstructorToolCalling, tool_string: str
 ) -> str:
+    self._last_extracted_files = None
     if isinstance(calling, ToolUsageError):
         error = calling.message
         ...
 
 async def ause(
     self, calling: ToolCalling | InstructorToolCalling, tool_string: str
 ) -> str:
+    self._last_extracted_files = None
     if isinstance(calling, ToolUsageError):
         error = calling.message
         ...

Also applies to: 589-595

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/tools/tool_usage.py` around lines 344 - 350, Reset
stale extracted-file state by clearing self._last_extracted_files at the start
of the tool-invocation path and whenever extraction fails: before calling
extract_files_from_tool_result set self._last_extracted_files = None, and after
calling it, if extracted_files is None explicitly set self._last_extracted_files
= None (otherwise set it to extracted_files when non-None). Apply the same
change around the other occurrence that invokes extract_files_from_tool_result
so ToolResult.files cannot be populated with a previous invocation's files.

Mypy could not infer that BaseFile values are valid FileInput entries.
Using TypeGuard[FileInput] makes the type narrowing explicit.
Copy link
Copy Markdown

@konbraphat51 konbraphat51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update Docs

Comment on lines +961 to +962
if extracted_files is not None:
should_cache = False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Comment on lines +1459 to +1467
for i in range(len(self.messages) - 1, -1, -1):
msg = self.messages[i]
if msg.get("role") == "user":
existing = msg.get("files") or {}
merged = {**existing, **files}
msg["files"] = merged
return

self.messages.append({"role": "user", "content": "", "files": dict(files)})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this make the exsisting file size duplicated, and make the context token size meanlessly increase a lot?
Because if there is FILE_1 in the first user message, this new message refers the same FILE_1, and I doubt that LLM have to read the FILE_1 twice

Comment on lines +57 to +84
def extract_files_from_tool_result(
result: Any,
) -> tuple[dict[str, FileInput] | None, str | None]:
"""Inspect a tool's return value and extract any ``FileInput`` instances.

Tools may return:

- A single ``BaseFile`` instance (``File``, ``PDFFile``, ``ImageFile``,
``TextFile``, ``AudioFile``, ``VideoFile``).
- A list/tuple of ``BaseFile`` instances.
- A dict mapping names to ``BaseFile`` instances.

When any of these shapes are detected this returns a tuple
``(files, message)`` where ``files`` is a dict suitable for the
multimodal ``files`` slot on a user message and ``message`` is a short
confirmation string describing what was added (intended to be shown to
the LLM as the textual tool result).

For any other return type the helper returns ``(None, None)`` so the
caller can keep the existing string-based behavior unchanged.

Args:
result: The raw return value of a tool's ``run`` / ``_run`` method.

Returns:
A ``(files, message)`` tuple. ``files`` is ``None`` when no files
were detected.
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make user easy to understand how to define the typing of the user-defined tools using File

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Tool to add input_files

1 participant