feat(tools): allow tools to return File instances for multimodal context (#5758)#5759
feat(tools): allow tools to return File instances for multimodal context (#5758)#5759devin-ai-integration[bot] wants to merge 3 commits intomainfrom
Conversation
Closes #5758 Tools can now return crewai_files.FileInput instances (or lists/dicts of them) from their _run method to dynamically extend the agent's multimodal context. The framework detects file returns, replaces the raw return with a confirmation message, and attaches the files to the most recent user message so subsequent LLM calls include them. - Add extract_files_from_tool_result helper - Extend ToolResult dataclass with optional files field - Detect FileInput returns in ToolUsage._use / _ause - Propagate files through execute_tool_and_check_finality - Attach files to messages in CrewAgentExecutor (ReAct + native tool flows) - Mirror file attachment in LiteAgent ReAct loop - Add comprehensive unit tests
|
Prompt hidden (unlisted session) |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| tool_usage = self._make_tool_usage(tool) | ||
| calling = ToolCalling(tool_name="get_named_files", arguments={"name": "test"}) | ||
|
|
||
| result = tool_usage.use(calling=calling, tool_string="Action: get_named_files") |
|
|
||
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import pytest |
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import pytest | ||
| from crewai_files import File, ImageFile, TextFile |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai/src/crewai/agents/crew_agent_executor.py`:
- Around line 944-949: The code mutates shared executor state from worker
threads by calling _attach_tool_files_to_messages(...) inside
_execute_single_native_tool_call; remove that call and instead have
_execute_single_native_tool_call return any extracted_files (e.g., include them
in its returned execution_result or raw_result tuple). Then in
_handle_native_tool_calls, while iterating ordered_results on the main thread,
merge each execution_result["files"] into the executor's files/messages (call
_attach_tool_files_to_messages or an equivalent merge there) before appending
the reasoning prompt so file merges happen only on the main thread and avoid
races.
In `@lib/crewai/src/crewai/tools/tool_usage.py`:
- Around line 344-350: Reset stale extracted-file state by clearing
self._last_extracted_files at the start of the tool-invocation path and whenever
extraction fails: before calling extract_files_from_tool_result set
self._last_extracted_files = None, and after calling it, if extracted_files is
None explicitly set self._last_extracted_files = None (otherwise set it to
extracted_files when non-None). Apply the same change around the other
occurrence that invokes extract_files_from_tool_result so ToolResult.files
cannot be populated with a previous invocation's files.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: b775eada-ac1e-4c57-87f4-362b126f9c4b
📒 Files selected for processing (8)
lib/crewai/src/crewai/agents/crew_agent_executor.pylib/crewai/src/crewai/lite_agent.pylib/crewai/src/crewai/tools/tool_types.pylib/crewai/src/crewai/tools/tool_usage.pylib/crewai/src/crewai/utilities/tool_files.pylib/crewai/src/crewai/utilities/tool_utils.pylib/crewai/tests/tools/test_tool_file_returns.pylib/crewai/tests/utilities/test_tool_files.py
| extracted_files, files_message = extract_files_from_tool_result( | ||
| raw_result | ||
| ) | ||
| if extracted_files is not None: | ||
| self._attach_tool_files_to_messages(extracted_files) | ||
| raw_result = files_message |
There was a problem hiding this comment.
Do not mutate self.messages from worker threads in native parallel tool execution.
In the parallel native path, _execute_single_native_tool_call runs in a thread pool, and this new call to _attach_tool_files_to_messages(...) mutates shared executor state from worker threads. That can race and lose/overwrite file merges.
Suggested fix approach
- if extracted_files is not None:
- self._attach_tool_files_to_messages(extracted_files)
- raw_result = files_message
+ if extracted_files is not None:
+ raw_result = files_message
return {
"call_id": call_id,
"func_name": func_name,
"result": result,
"from_cache": from_cache,
"original_tool": original_tool,
+ "files": extracted_files,
}Then, merge execution_result["files"] on the main thread (e.g., in _handle_native_tool_calls while iterating ordered_results) before appending the reasoning prompt.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/agents/crew_agent_executor.py` around lines 944 - 949,
The code mutates shared executor state from worker threads by calling
_attach_tool_files_to_messages(...) inside _execute_single_native_tool_call;
remove that call and instead have _execute_single_native_tool_call return any
extracted_files (e.g., include them in its returned execution_result or
raw_result tuple). Then in _handle_native_tool_calls, while iterating
ordered_results on the main thread, merge each execution_result["files"] into
the executor's files/messages (call _attach_tool_files_to_messages or an
equivalent merge there) before appending the reasoning prompt so file merges
happen only on the main thread and avoid races.
| extracted_files, files_message = extract_files_from_tool_result( | ||
| result | ||
| ) | ||
| if extracted_files is not None: | ||
| result = files_message | ||
| self._last_extracted_files = extracted_files | ||
|
|
There was a problem hiding this comment.
Reset _last_extracted_files per invocation to prevent stale file leakage.
_last_extracted_files is only assigned when extraction succeeds, so a later call/retry that does not extract files can still expose old files. This can incorrectly populate ToolResult.files from a previous execution (Line 344 and Line 589 flow).
Suggested fix
def use(
self, calling: ToolCalling | InstructorToolCalling, tool_string: str
) -> str:
+ self._last_extracted_files = None
if isinstance(calling, ToolUsageError):
error = calling.message
...
async def ause(
self, calling: ToolCalling | InstructorToolCalling, tool_string: str
) -> str:
+ self._last_extracted_files = None
if isinstance(calling, ToolUsageError):
error = calling.message
...Also applies to: 589-595
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/tools/tool_usage.py` around lines 344 - 350, Reset
stale extracted-file state by clearing self._last_extracted_files at the start
of the tool-invocation path and whenever extraction fails: before calling
extract_files_from_tool_result set self._last_extracted_files = None, and after
calling it, if extracted_files is None explicitly set self._last_extracted_files
= None (otherwise set it to extracted_files when non-None). Apply the same
change around the other occurrence that invokes extract_files_from_tool_result
so ToolResult.files cannot be populated with a previous invocation's files.
Mypy could not infer that BaseFile values are valid FileInput entries. Using TypeGuard[FileInput] makes the type narrowing explicit.
| if extracted_files is not None: | ||
| should_cache = False |
| for i in range(len(self.messages) - 1, -1, -1): | ||
| msg = self.messages[i] | ||
| if msg.get("role") == "user": | ||
| existing = msg.get("files") or {} | ||
| merged = {**existing, **files} | ||
| msg["files"] = merged | ||
| return | ||
|
|
||
| self.messages.append({"role": "user", "content": "", "files": dict(files)}) |
There was a problem hiding this comment.
Doesn't this make the exsisting file size duplicated, and make the context token size meanlessly increase a lot?
Because if there is FILE_1 in the first user message, this new message refers the same FILE_1, and I doubt that LLM have to read the FILE_1 twice
| def extract_files_from_tool_result( | ||
| result: Any, | ||
| ) -> tuple[dict[str, FileInput] | None, str | None]: | ||
| """Inspect a tool's return value and extract any ``FileInput`` instances. | ||
|
|
||
| Tools may return: | ||
|
|
||
| - A single ``BaseFile`` instance (``File``, ``PDFFile``, ``ImageFile``, | ||
| ``TextFile``, ``AudioFile``, ``VideoFile``). | ||
| - A list/tuple of ``BaseFile`` instances. | ||
| - A dict mapping names to ``BaseFile`` instances. | ||
|
|
||
| When any of these shapes are detected this returns a tuple | ||
| ``(files, message)`` where ``files`` is a dict suitable for the | ||
| multimodal ``files`` slot on a user message and ``message`` is a short | ||
| confirmation string describing what was added (intended to be shown to | ||
| the LLM as the textual tool result). | ||
|
|
||
| For any other return type the helper returns ``(None, None)`` so the | ||
| caller can keep the existing string-based behavior unchanged. | ||
|
|
||
| Args: | ||
| result: The raw return value of a tool's ``run`` / ``_run`` method. | ||
|
|
||
| Returns: | ||
| A ``(files, message)`` tuple. ``files`` is ``None`` when no files | ||
| were detected. | ||
| """ |
There was a problem hiding this comment.
Make user easy to understand how to define the typing of the user-defined tools using File
Summary
Closes #5758.
Tools can now return
crewai_files.FileInputinstances (single, list/tuple, or dict) from their_run/_arunmethods. The framework detects these returns, replaces the raw return with a short confirmation message that goes back to the LLM as the textual tool result, and attaches the files to the agent's most recent user message so they become part of the next LLM call's multimodal context.This lets agents add files to their own context dynamically (e.g. a "fetch document" tool can hand back a
PDFFile) without forcing the user to pre-load every file viainputs.filesup front.Changes
crewai.utilities.tool_files.extract_files_from_tool_result— detects single/list/tuple/dict-of-BaseFilereturns, picks unique keys from filename stems, and produces a confirmation message.ToolResultgains an optionalfiles: dict[str, FileInput] | Nonefield (defaultNone) to carry extracted files through to the executor.ToolUsage._use/_ausecall the helper aftertool.invoke/tool.ainvoke, stash files onself._last_extracted_files, replace the textual result with the confirmation message, and disable caching when files are present.tool_utils.execute_tool_and_check_finality(sync + async) reads_last_extracted_filesoff theToolUsageinstance and forwards it into the constructedToolResult.CrewAgentExecutor._handle_agent_actioncalls a new_attach_tool_files_to_messageswhentool_result.filesis set. The ReAct loop and_execute_single_native_tool_call(native function-calling path) both attach files to the most recent user message, merging with any existingfilesmapping.LiteAgentmirrors the same_attach_tool_files_to_messageslogic in its ReAct invoke loop.Tests
extract_files_from_tool_result(covers single file, list, tuple, dict, empty collections, mixed lists, duplicate filename stems, genericFilevs typedTextFile/ImageFile/PDFFile).tests/tools/test_tool_file_returns.pycoveringToolUsage._useextraction,execute_tool_and_check_finalitypropagation,_attach_tool_files_to_messagesmerge/append behavior, and_handle_agent_actionend-to-end inside the executor (usingmodel_constructto bypass full agent wiring).All 22 new tests pass locally with
ruff checkandruff format --checkclean.Review & Testing Checklist for Human
TextFile/PDFFile/ImageFileand verify the LLM actually receives it as multimodal content on the next turn. The unit tests assert thefilesdict is attached to the user message, but they do not verify the LLM payload after_inject_multimodal_filesruns. This is the highest-risk gap._execute_single_native_tool_call): the file extraction was added inline (with a local import) in addition to the ReAct path, but the new tests primarily cover the ReAct path. Worth exercising an agent configured with native tool calling that returns a file.LiteAgentReAct loop: same logic was duplicated intolite_agent.pybut the new tests don't directly exerciseLiteAgent._invoke_loop. A quickLiteAgentsmoke test would help.ToolUsage._last_extracted_files: the executor flows already construct a freshToolUsageper tool call, but if anything reuses aToolUsageinstance for multiple calls this could leak files across calls. Confirm this assumption holds.AddImageTool:_handle_agent_actionnow runs the file-attachment block before theAddImageToolspecial case.AddImageToolreturns a{"role", "content"}dict (not aBaseFile) soextract_files_from_tool_resultshould return(None, None), but worth a quick check thatAddImageToolstill works.Notes
AddFileTool(parallel toAddImageTool). This PR intentionally only adds the underlying capability — any tool can now return a file — and does not ship a built-inAddFileTool. That can be added in a follow-up if desired.agent_utils.execute_single_native_tool_call(the standalone function used bystep_executorand friends, not the executor method) was not updated. Tools used through that path will not have their file returns auto-attached. Left as a follow-up since it doesn't go throughtool_usage.py.Link to Devin session: https://app.devin.ai/sessions/2b2c9c44b57c42feb4000e8f5c487f95
Summary by CodeRabbit
New Features
Tests