Skip to content

Bug: Parallel tool calls concatenate arguments into invalid JSON instead of separate function_call objects #4430

@rrbanda

Description

@rrbanda

System Info

Environment

  • Llama Stack Version: 0.3.3
  • API Endpoint: /v1/responses
  • MCP Server: Kubernetes MCP Server
  • Model: gemini-llm/models/gemini-2.5-flash

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

Bug Description

When using the OpenAI-compatible Responses API (/v1/responses) with MCP tools that trigger parallel tool calls, Llama Stack concatenates the arguments of multiple tool calls into a single arguments field, producing invalid JSON.

This violates the OpenAI Responses API specification where each parallel tool call should be a separate object in the response output array, each with its own valid JSON arguments and unique call_id.

Steps to Reproduce

  1. Configure Llama Stack with an MCP server (e.g., Kubernetes MCP server)
  2. Send a request to /v1/responses that triggers parallel tool calls
  3. Example prompt: "What resources are there in namespace aap?"

Impact

  1. Parsing failures: JSON.parse() fails on the malformed arguments
  2. Infinite loops: When used with HITL (Human-in-the-Loop) approval workflows, the malformed arguments cause 500 errors when sent back to Llama Stack, triggering retry loops
  3. Lost tool calls: Only the first tool call can be extracted via regex workaround; subsequent parallel calls are lost
    This is not ideal as it loses all but the first parallel tool call.

Related

This may be related to how Llama Stack handles the parallel_tool_calls behavior internally when interfacing with MCP servers.

Error logs

Actual Behavior

Llama Stack returns a single tool call object with concatenated JSON in the arguments field:

{
  "call_id": "single_call_id",
  "type": "function_call",
  "name": "get_resources",
  "arguments": "{\"namespace\":\"aap\"}{\"namespace\":\"aap\"}"
}This is **invalid JSON** - multiple JSON objects concatenated without array wrapper or separators.

Expected behavior

Expected Behavior

Per the OpenAI Function Calling documentation, parallel tool calls should return separate objects:

[
  {
    "id": "fc_12345xyz",
    "call_id": "call_12345xyz",
    "type": "function_call",
    "name": "get_resources",
    "arguments": "{\"namespace\":\"aap\"}"
  },
  {
    "id": "fc_67890abc",
    "call_id": "call_67890abc",
    "type": "function_call",
    "name": "get_resources",
    "arguments": "{\"namespace\":\"aap\"}"
  }
]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions