Skip to content

Conversation

@vblagoje
Copy link
Member

@vblagoje vblagoje commented Jan 22, 2026

Why

Large tool catalogs overwhelm LLM context windows. Agents need a way to discover tools on-demand rather than receiving all tool definitions upfront.

What

  • ToolSearchToolset: Toolset subclass with BM25-based tool discovery
  • Self-contained _BM25SearchEngine (BM25L variant) - no external dependencies
  • search_tools(query, k) bootstrap tool for LLM-driven discovery
  • Passthrough mode for catalogs below search_threshold (default: 8)
  • clear() method for resetting discovered tools between agent runs
  • Full serialization support (to_dict/from_dict)

How can it be used

from haystack.tools import ToolSearchToolset                                                                                                                                                                 
from haystack.components.agents import Agent                                                                                                                                                                 
                                                                                                                                                                                                             
# Large catalog - LLM discovers via search_tools 
tools = <any type of tool or toolset we support as long as it is ToolsType>                                                                                                                                                           
toolset = ToolSearchToolset(catalog=tools)                                                                                                                                      
agent = Agent(chat_generator=generator, tools=toolset)                                                                                                                                                       
                                                                                                                                                                                                             
# Small catalog - passthrough mode, all tools visible                                                                                                                                                        
toolset = ToolSearchToolset(catalog=small_tools)  # < 8 tools                                                                                                                                                

How did you test it

  • Unit tests for _BM25SearchEngine (indexing, scoring, edge cases)
  • Unit tests for ToolSearchToolset (passthrough, discovery, iteration, serialization)
  • Integration with ComponentTool warm-up verification

Notes for the reviewer

  • BM25L chosen over embedding search for zero dependencies and predictable latency
  • Embedding mode deferred to future PR
  • Tools warmed up lazily on discovery, not at index time

…alogs

Implements ToolSearchToolset - a Toolset subclass that enables dynamic tool
discovery from large catalogs. Tools are discovered via `search_tools` bm25
based special search tool and become available to the LLM.

Key features:
- Single discovery mode: "bm25", postpone "embedding" for future
- Passthrough mode for small catalogs (< search_threshold)
- Self-contained BM25L search engine implementation
- Full serialization support (to_dict/from_dict)
- Auto warm-up when iterating to ensure bootstrap tool availability
@vercel
Copy link

vercel bot commented Jan 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Review Updated (UTC)
haystack-docs Ignored Ignored Preview Jan 22, 2026 9:56am

Request Review

@vblagoje vblagoje added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Jan 22, 2026
@github-actions github-actions bot added the type:documentation Improvements on the docs label Jan 22, 2026
@coveralls
Copy link
Collaborator

coveralls commented Jan 22, 2026

Pull Request Test Coverage Report for Build 21243903558

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.03%) to 92.362%

Totals Coverage Status
Change from base Build 21241419170: 0.03%
Covered Lines: 14850
Relevant Lines: 16078

💛 - Coveralls

The description claimed to return "a JSON array of tool definitions"
but actually returns a plain text confirmation message with tool names.
Tools discovered via search_tools were added to _discovered_tools
without calling warm_up(), causing tools that require initialization
(connections, model loading) to fail when invoked.
The inherited __getitem__ accessed self.tools which is always empty
in ToolSearchToolset. This caused IndexError for valid indexes even
when tools were available through __iter__.
@vblagoje vblagoje removed the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Jan 22, 2026
@vblagoje
Copy link
Member Author

@sjrl @julian-risch - give me a 1-2 days to test it thoroughly with large mcp toolsets and if all good I'll open this PR. This is the general direction @mpangrazzi and I talked about. LMK if you agree.

@vblagoje vblagoje changed the title feat: Add ToolSearchToolset for dynamic tool discovery from large catalogs feat: Add ToolSearchToolset for dynamic tool discovery from large tool catalogs Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Tool Search Tool

3 participants