Skip to content

feat: add Data Fabric tool support#726

Open
milind-jain-uipath wants to merge 2 commits intomainfrom
feat-df-agent-integrations-v0
Open

feat: add Data Fabric tool support#726
milind-jain-uipath wants to merge 2 commits intomainfrom
feat-df-agent-integrations-v0

Conversation

@milind-jain-uipath
Copy link

@milind-jain-uipath milind-jain-uipath commented Mar 25, 2026

Summary

Adds Data Fabric tool support. Enables agents to query Data Fabric entities using SQL by injecting entity schemas into the system prompt at INIT time and providing a generic query_datafabric tool.

What it does: When an agent has Data Fabric entity contexts configured in Studio Web, the INIT node fetches entity metadata (table names, column names, types) from the Data Fabric API at startup. This metadata is formatted as markdown with SQL guidelines and constraints, then appended to the system prompt. A generic query_datafabric tool is registered that accepts raw SQL and dispatches it to the Data Fabric API.

Changes

  • react/agent.py - Add resources param, pass to create_init_node()
  • react/init_node.py - Fetch DF schemas at INIT, inject into system prompt via additional_context
  • react/types.py - Define AgentResources type alias
  • tools/datafabric_tool/ - New module: schema fetching, formatting, entity detection, query_datafabric tool, SQL guidelines and constraints
  • tools/tool_factory.py - Register DF tool, skip DF contexts in generic resource loop
  • tools/context_tool.py - Guard: raise error if DF context is accidentally routed here

Companion PR

https://github.com/UiPath/uipath-agents-python/pull/383

Testing

  • Tested the integration with Local robot run. Please ignore the error thrown from the tool - that is the folder level integration we are doing in FQS.
Screen.Recording.2026-03-24.at.4.48.53.PM.mov

@milind-jain-uipath milind-jain-uipath force-pushed the feat-df-agent-integrations-v0 branch from 0ec64ac to 4f94efc Compare March 25, 2026 05:22
@milind-jain-uipath milind-jain-uipath marked this pull request as ready for review March 25, 2026 05:53
@UIPath-Harshit
Copy link

#721

Please address the comments from the above PR on this..

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f94efcd44

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

resolved_messages = list(messages(state))
if schema_context:
resolved_messages = list(
messages(state, additional_context=schema_context)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve one-arg message callables when injecting DF context

When Data Fabric schemas are available, INIT invokes the message factory as messages(state, additional_context=...). Existing callable message generators commonly take only one positional parameter (e.g., create_messages(state)), so this raises a TypeError at startup for agents with enabled Data Fabric contexts. This is a runtime-breaking regression for previously valid message callables and should be guarded by signature detection or a fallback call without the extra keyword.

Useful? React with 👍 / 👎.

else:
resolved_messages = list(messages(state))
else:
resolved_messages = list(messages)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Inject schema context for static message definitions

If messages is provided as a static sequence (which create_agent supports), the INIT node ignores schema_context and returns list(messages) unchanged. In that path, Data Fabric schemas are fetched but never reach the system prompt, so query_datafabric runs without table/column guidance and the feature silently underperforms or fails to generate valid SQL. The static-message branch should also append/update a system message with the generated schema context.

Useful? React with 👍 / 👎.

@milind-jain-uipath
Copy link
Author

Addressed the comments @UIPath-Harshit .

Changes addressing review feedback:

  • Init context registry
  • Concurrent schema fetch
  • Remove FK reference
  • Pydantic models for schema
  • Tool initialisation moved out from tool factory & dedup added

Couldn't find refined SQL tool though.

@milind-jain-uipath milind-jain-uipath force-pushed the feat-df-agent-integrations-v0 branch 2 times, most recently from 91edca2 to 9bfc30c Compare March 25, 2026 21:21
from uipath.agent.models.agent import BaseAgentResourceConfig

logger = logging.getLogger(__name__)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant whitespaces. Can you run ruff with whitespace check if there is a rule for that. Dont like these spaces Agents generate

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Data Fabric (DF) tool support to the ReAct agent runtime by registering a generic query_datafabric tool and injecting DF entity schema/context into the system prompt at INIT via a pluggable init-context registry.

Changes:

  • Introduces a datafabric_tool module (schema fetching, prompt/context formatting, and query_datafabric tool).
  • Adds init-time context registry + wires INIT node to gather provider context (and plumbs resources into agent creation).
  • Updates context/tool factory behavior and adjusts tests for new required context fields + async init node.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/agent/tools/test_tool_factory.py Updates test context resource config to include context_type.
tests/agent/tools/test_context_tool.py Updates helper to include context_type in context resources.
tests/agent/react/test_init_node.py Updates INIT node tests for async init-node execution and callable signature change.
tests/agent/react/test_create_agent.py Updates expectations for create_init_node(..., resources_for_init=None).
src/uipath_langchain/agent/tools/tool_factory.py Adds tool deduping behavior when building from resources.
src/uipath_langchain/agent/tools/datafabric_tool/system_prompt.txt Adds SQL generation guidelines for DF.
src/uipath_langchain/agent/tools/datafabric_tool/sql_constraints.txt Adds explicit SQL constraints reference text.
src/uipath_langchain/agent/tools/datafabric_tool/schema_context.py Builds/derives DF schema + query pattern context for system prompt injection.
src/uipath_langchain/agent/tools/datafabric_tool/models.py Adds Pydantic models for schema/context + tool input.
src/uipath_langchain/agent/tools/datafabric_tool/datafabric_tool.py Implements DF schema fetch + query_datafabric tool creation.
src/uipath_langchain/agent/tools/datafabric_tool/init.py Registers DF init-context provider at import time.
src/uipath_langchain/agent/tools/context_tool.py Routes DF context resources to the query_datafabric tool.
src/uipath_langchain/agent/tools/init.py Exposes DF helpers and ensures DF module import (provider registration).
src/uipath_langchain/agent/react/types.py Adds AgentResources type alias.
src/uipath_langchain/agent/react/init_node.py Makes INIT node async and gathers init-context from registered providers.
src/uipath_langchain/agent/react/init_context_registry.py Adds registry + gather_init_context mechanism.
src/uipath_langchain/agent/react/agent.py Plumbs resources into INIT node creation.
src/uipath_langchain/agent/react/init.py Exports AgentResources.

Comment on lines +34 to +50
async def _datafabric_init_context_provider(
resources: Sequence[BaseAgentResourceConfig],
) -> str | None:
"""Fetch and format DataFabric entity schemas for system prompt injection."""
entity_identifiers = get_datafabric_entity_identifiers_from_resources(resources)
if not entity_identifiers:
return None

_logger.info(
"Fetching Data Fabric schemas for %d identifier(s)",
len(entity_identifiers),
)
entities = await fetch_entity_schemas(entity_identifiers)
return format_schemas_for_context(entities)


register_init_context_provider("datafabric", _datafabric_init_context_provider)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Data Fabric init-time context registration and schema formatting logic is introduced here, but there are no unit tests covering (a) provider registration + gather_init_context integration, and (b) that fetched entity schemas are formatted and injected as expected. Given the repo’s existing comprehensive tool/init tests, adding focused tests for the provider behavior would help prevent regressions.

Copilot uses AI. Check for mistakes.
else:
resolved_messages = list(messages(state))
else:
resolved_messages = list(messages)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional_context is computed from resources_for_init, but it is only passed into a callable messages generator. When messages is a static list/sequence, the additional context is silently ignored, so Data Fabric schemas (and any other provider output) won’t be injected into the system prompt in that common usage. Consider appending additional_context to the SystemMessage content (or inserting a new SystemMessage) when messages is a sequence, so init-time context is applied consistently.

Suggested change
resolved_messages = list(messages)
resolved_messages = list(messages)
# When using a static sequence of messages, inject any init-time context
# into the system prompt so provider output (e.g., Data Fabric schemas)
# is not silently ignored.
if additional_context:
# Try to append the additional context to the first SystemMessage.
system_msg_index = next(
(i for i, m in enumerate(resolved_messages) if isinstance(m, SystemMessage)),
None,
)
if system_msg_index is not None:
system_msg = resolved_messages[system_msg_index]
# Safely append to existing content, assuming string content.
existing_content = str(system_msg.content)
updated_content = f"{existing_content}\n\n{additional_context}"
resolved_messages[system_msg_index] = SystemMessage(
content=updated_content, additional_kwargs=system_msg.additional_kwargs
)
else:
# No SystemMessage present; prepend a new one with the additional context.
resolved_messages.insert(0, SystemMessage(content=additional_context))

Copilot uses AI. Check for mistakes.
Comment on lines 36 to +42
if callable(messages):
resolved_messages = list(messages(state))
if additional_context:
resolved_messages = list(
messages(state, additional_context=additional_context)
)
else:
resolved_messages = list(messages(state))
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When additional_context is present, the init node calls messages(state, additional_context=...). This will raise TypeError for existing message generator callables that only accept a single positional state argument (no **kwargs), making Data Fabric/resources injection a breaking change for those users. Consider detecting whether the callable accepts additional_context (or **kwargs) and falling back to messages(state) if not.

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +96
group_field = text_field or (field_names[0] if field_names else "Category")
agg_field = numeric_field or (field_names[1] if len(field_names) > 1 else "Amount")
filter_field = text_field or (field_names[0] if field_names else "Name")
fields_sample = ", ".join(field_names[:5]) if field_names else "*"
count_col = field_names[0] if field_names else "id"

query_patterns = [
QueryPattern(
intent="Show all",
sql=f"SELECT {fields_sample} FROM {table} LIMIT 100",
),
QueryPattern(
intent="Find by X",
sql=f"SELECT {fields_sample} FROM {table} WHERE {filter_field} = 'value' LIMIT 100",
),
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields_sample falls back to "*" when no non-hidden fields are present. This contradicts the documented constraints (NO SELECT *) and will produce example patterns that are guaranteed to violate the tool’s own SQL rules. Prefer omitting the “Show all / Find by X / Top N” patterns when there are no fields, or choose a safe default like the primary key field when available.

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +108
QueryPattern(
intent="Top N by Y",
sql=f"SELECT {fields_sample} FROM {table} ORDER BY {agg_field} DESC LIMIT N",
),
QueryPattern(
intent="Count by X",
sql=f"SELECT {group_field}, COUNT({count_col}) as count FROM {table} GROUP BY {group_field}",
),
QueryPattern(
intent="Top N segments",
sql=f"SELECT {group_field}, COUNT({count_col}) as count FROM {table} GROUP BY {group_field} ORDER BY count DESC LIMIT N",
),
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query pattern examples use LIMIT N, which is not valid SQLite syntax and conflicts with the prompt rule “Ensure the query is syntactically correct”. Since these examples are intended for LLM copying, they should be executable SQL (e.g., LIMIT 10) or use a clearly marked comment-style placeholder that won’t be emitted verbatim.

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +121
async def _query_datafabric(sql_query: str) -> dict[str, Any]:
from uipath.platform import UiPath

logger.debug(f"query_datafabric called with SQL: {sql_query}")

sdk = UiPath()
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_datafabric_query_tool() is cached as a singleton, but _query_datafabric creates a new UiPath() SDK instance on every call. This adds avoidable overhead on each SQL query and can impact latency. Consider instantiating the SDK once at tool-creation time (outside the inner coroutine) so calls reuse the same client/config.

Suggested change
async def _query_datafabric(sql_query: str) -> dict[str, Any]:
from uipath.platform import UiPath
logger.debug(f"query_datafabric called with SQL: {sql_query}")
sdk = UiPath()
from uipath.platform import UiPath
# Instantiate the SDK once per tool instance to avoid per-call overhead.
sdk = UiPath()
async def _query_datafabric(sql_query: str) -> dict[str, Any]:
logger.debug(f"query_datafabric called with SQL: {sql_query}")

Copilot uses AI. Check for mistakes.
)
except Exception:
logger.exception("Init context provider '%s' failed; skipping", name)
return "\n\n".join(parts) if parts else None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it structured rather than free form text. Expose it as a pydantic model to be consumed by caller

Copy link

@UIPath-Harshit UIPath-Harshit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments for @cristian-groza to chime-in. Others please address

# --- Init-time context self-registration ---


async def _datafabric_init_context_provider(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to datafabric_tool rather than exposing it as a module method.


# --- Generic Tool Creation ---

_MAX_RECORDS_IN_RESPONSE = 50

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this check with free-form queries and LIMIT clause.

def display_type(self) -> str:
"""Type string with PK/required modifiers for markdown display."""
modifiers = []
if self.is_primary_key:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept of PK will get removed with VDOs integration. You need is_required and other attributes.

query_patterns: list[QueryPattern]


class SQLContext(BaseModel):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EntittySQLContext

@lru_cache(maxsize=1)
def _load_system_prompt() -> str:
"""Load SQL generation strategy from system_prompt.txt."""
prompt_path = _PROMPTS_DIR / "system_prompt.txt"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cristian-groza this needs a discussion.


return create_datafabric_query_tool()

assert resource.settings is not None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this behind isIndexTool flag.

async def create_tools_from_resources(
agent: LowCodeAgentDefinition, llm: BaseChatModel
) -> list[BaseTool]:
"""Create tools from agent resources including Data Fabric tools.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove this comment. Doesnt make sense from overall agent perspective

messages,
None, # input schema
True, # is_conversational
resources_for_init=None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for resources_for_init also. Also this involves an async network call. @cristian-groza do we see any implication of this when using the "convert to coded agent" flow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants