Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions docs/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,32 @@ Two static tools are built into the server:

`execute_query` accepts:

- `query` (string, required) — the SQL to run.
- `limit` (integer, optional) — caps returned rows.
- `query` (string, required) — the SQL to run. Include `LIMIT N` in the SQL itself if you want a specific row cap; the server does not rewrite your query.
- `settings` (object, optional) — ClickHouse query settings forwarded with the request.

`write_query` accepts the same `query` and `settings` parameters and executes the statement as-is.

**Handler mapping:** `name: execute_query` registers the `HandleReadOnlyQuery` function in `pkg/server/server.go`, which enforces the SELECT-only guard and then delegates to `HandleExecuteQuery`. `name: write_query` registers `HandleExecuteQuery` directly. These two names are the only valid values for static tool entries.

### Server-enforced result caps

Operators configure two DoS / context-window guardrails on `execute_query` and on read-mode dynamic tools (SELECT-like queries only — `write_query` and write-mode dynamic tools are unaffected):

| Config key | Default | `0` means | Negative means |
|------------|---------|-----------|----------------|
| `clickhouse.max_result_rows` | 500 | use default | disable (defer to ClickHouse user profile) |
| `clickhouse.max_result_bytes` | 50000 | use default | disable |

The deprecated `clickhouse.limit` is kept as a silent alias for `clickhouse.max_result_rows` — when both are set, `max_result_rows` wins; the legacy key triggers a one-time deprecation warning at startup.

Caps are enforced in two layers: ClickHouse session settings (`max_result_rows`, `max_result_bytes`, `result_overflow_mode='break'`) are pushed per-query so the engine stops early, and the MCP server itself stops appending rows once the configured cap is hit. The row cap is exact; the byte cap is approximate (cheap per-row sizing, not exact JSON byte counts).

When a cap fires, the response carries:

- A `truncated` object inside the JSON `QueryResult` with `reason` (`max_result_rows` or `max_result_bytes`), `limit`, `returned_rows`, and `returned_bytes_approx`.
- For MCP tool responses: a second `text` content block explaining the truncation and recommending narrowing the query (tighter `WHERE`, server-side aggregation, key-range pagination, narrower `SELECT` list). The model should treat the data block as partial until the underlying query is narrowed.
- For OpenAPI REST responses: an `X-MCP-Truncated: max_result_rows` (or `max_result_bytes`) HTTP header alongside the same body field.

---

## Read dynamic tools (Views)
Expand Down
127 changes: 118 additions & 9 deletions pkg/clickhouse/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,29 @@ import (

// QueryResult represents the result of a query execution
type QueryResult struct {
Columns []string `json:"columns"`
Types []string `json:"types"`
Rows [][]interface{} `json:"rows"`
Count int `json:"count"`
Error string `json:"error,omitempty"`
Columns []string `json:"columns"`
Types []string `json:"types"`
Rows [][]interface{} `json:"rows"`
Count int `json:"count"`
Error string `json:"error,omitempty"`
Truncated *TruncationInfo `json:"truncated,omitempty"`
}

// TruncationReason identifies which server-side cap fired.
const (
TruncationReasonMaxResultRows = "max_result_rows"
TruncationReasonMaxResultBytes = "max_result_bytes"
)

// TruncationInfo describes a result that was capped by MCP before being
// returned to the caller. Surfaces in the JSON payload as `truncated` and is
// also rendered as an extra MCP text-content block / X-MCP-Truncated header
// at the handler layer so the model is told to narrow its query.
type TruncationInfo struct {
Reason string `json:"reason"`
Limit int `json:"limit"`
ReturnedRows int `json:"returned_rows"`
ReturnedBytesApprox int `json:"returned_bytes_approx"`
}

// TableInfo represents information about a table
Expand Down Expand Up @@ -377,17 +395,58 @@ func scanRow(rows driver.Rows) ([]interface{}, error) {
// ExecuteQuery executes a SQL query and returns results
// For non-SELECT queries (DDL, DML) will return single row with `OK`
func (c *Client) ExecuteQuery(ctx context.Context, query string, args ...interface{}) (*QueryResult, error) {
return c.executeWithCaps(ctx, query, 0, 0, args...)
}

// ExecuteCappedQuery executes a SELECT-like query subject to a server-controlled
// row and/or byte cap. The caps are enforced in two layers:
// 1. ClickHouse session settings (max_result_rows, max_result_bytes,
// result_overflow_mode='break') pushed via the per-query context — saves
// ClickHouse from computing rows that will be discarded; safe to no-op when
// the CH user profile forbids settings changes.
// 2. A hard cap inside executeSelect's row-iteration loop — guarantees exact
// row counts (no block-granularity overshoot), bounds MCP-side memory, and
// works regardless of CH-side cooperation.
//
// On non-SELECT queries the caps are ignored and the call falls through to the
// unsuppressed ExecuteQuery path. maxRows<=0 disables the row cap; maxBytes<=0
// disables the byte cap.
func (c *Client) ExecuteCappedQuery(ctx context.Context, query string, maxRows, maxBytes int, args ...interface{}) (*QueryResult, error) {
if maxRows <= 0 && maxBytes <= 0 {
return c.ExecuteQuery(ctx, query, args...)
}
if !IsSelectQuery(query) {
return c.ExecuteQuery(ctx, query, args...)
}
settings := clickhouse.Settings{
"result_overflow_mode": "break",
}
if maxRows > 0 {
// +1 so the engine returns at least one row past the cap when the
// underlying result is larger — Layer 2 uses that overshoot as the
// truncation signal.
settings["max_result_rows"] = uint64(maxRows + 1)
}
if maxBytes > 0 {
settings["max_result_bytes"] = uint64(maxBytes)
}
ctx = clickhouse.Context(ctx, clickhouse.WithSettings(settings))
return c.executeWithCaps(ctx, query, maxRows, maxBytes, args...)
}

func (c *Client) executeWithCaps(ctx context.Context, query string, maxRows, maxBytes int, args ...interface{}) (*QueryResult, error) {
if c.config.ReadOnly && !IsSelectQuery(query) {
return nil, fmt.Errorf("query rejected: read-only mode allows only SELECT/WITH/SHOW/DESC/EXISTS/EXPLAIN statements")
}
if IsSelectQuery(query) {
return c.executeSelect(ctx, query, args...)
return c.executeSelect(ctx, query, maxRows, maxBytes, args...)
}
return c.executeNonSelect(ctx, query, args...)
}

// executeSelect executes a SELECT query
func (c *Client) executeSelect(ctx context.Context, query string, args ...interface{}) (*QueryResult, error) {
// executeSelect executes a SELECT query with optional row/byte caps.
// maxRows<=0 disables the row cap; maxBytes<=0 disables the byte cap.
func (c *Client) executeSelect(ctx context.Context, query string, maxRows, maxBytes int, args ...interface{}) (*QueryResult, error) {
result := &QueryResult{}

rows, err := c.conn.Query(ctx, query, args...)
Expand Down Expand Up @@ -415,13 +474,39 @@ func (c *Client) executeSelect(ctx context.Context, query string, args ...interf
result.Types[i] = ct.DatabaseTypeName()
}

// Fetch rows
// Fetch rows with optional caps.
bytesApprox := 0
for rows.Next() {
rowValues, err := scanRow(rows)
if err != nil {
return nil, err
}
// Row cap (Layer 2). The session-settings push asked CH for maxRows+1,
// so seeing a (maxRows+1)th row here means the underlying result was
// larger and we should truncate + flag.
if maxRows > 0 && len(result.Rows) >= maxRows {
result.Truncated = &TruncationInfo{
Reason: TruncationReasonMaxResultRows,
Limit: maxRows,
ReturnedRows: len(result.Rows),
ReturnedBytesApprox: bytesApprox,
}
break
}
result.Rows = append(result.Rows, rowValues)
bytesApprox += approxRowBytes(rowValues)
// Byte cap (Layer 2). Stop after appending the row that first crosses
// the budget — one row of overshoot is the natural signal and keeps
// the code branch-free.
if maxBytes > 0 && bytesApprox > maxBytes {
result.Truncated = &TruncationInfo{
Reason: TruncationReasonMaxResultBytes,
Limit: maxBytes,
ReturnedRows: len(result.Rows),
ReturnedBytesApprox: bytesApprox,
}
break
}
}

if err := rows.Err(); err != nil {
Expand Down Expand Up @@ -531,6 +616,30 @@ func truncateString(s string, maxLen int) string {
return s[:maxLen] + "..."
}

// approxRowBytes returns a cheap, allocation-light estimate of the JSON-encoded
// size of one result row. The exact serialized size doesn't matter for a DoS
// guardrail — only the order of magnitude does, so we use len(fmt.Sprint(v))
// per field plus a constant per-row overhead for column separators.
func approxRowBytes(row []interface{}) int {
total := 2 // outer brackets per row in JSON
for _, v := range row {
if v == nil {
total += 4 // "null"
continue
}
switch s := v.(type) {
case string:
total += len(s) + 2 // quotes
case []byte:
total += len(s) + 2
default:
total += len(fmt.Sprint(v))
}
total++ // comma
}
return total
}

// convertToSerializable converts ClickHouse-specific types to JSON-serializable types
func convertToSerializable(v interface{}) interface{} {
switch val := v.(type) {
Expand Down
44 changes: 41 additions & 3 deletions pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,11 @@ type ClickHouseConfig struct {
TLS TLSConfig `json:"tls" yaml:"tls"`
ReadOnly bool `json:"read_only" yaml:"read_only" flag:"read-only" env:"CLICKHOUSE_READ_ONLY" desc:"Connect to ClickHouse in read-only mode"`
MaxExecutionTime int `json:"max_execution_time" yaml:"max_execution_time" flag:"clickhouse-max-execution-time" env:"CLICKHOUSE_MAX_EXECUTION_TIME" default:"600" desc:"ClickHouse max execution time in seconds"`
Limit int `json:"limit" yaml:"limit" flag:"clickhouse-limit" env:"CLICKHOUSE_LIMIT" desc:"Maximum limit for query results (0 means no limit)"`
// Limit is DEPRECATED; use MaxResultRows. Retained as a silent alias: when
// MaxResultRows is unset (0) and Limit > 0, EffectiveMaxResultRows() returns Limit.
Limit int `json:"limit,omitempty" yaml:"limit,omitempty" flag:"clickhouse-limit" env:"CLICKHOUSE_LIMIT" desc:"DEPRECATED: alias for max_result_rows"`
MaxResultRows int `json:"max_result_rows,omitempty" yaml:"max_result_rows,omitempty" flag:"clickhouse-max-result-rows" env:"CLICKHOUSE_MAX_RESULT_ROWS" desc:"Per-request row cap on SELECT-like queries (0=default 500, <0=disable and defer to ClickHouse user profile)"`
MaxResultBytes int `json:"max_result_bytes,omitempty" yaml:"max_result_bytes,omitempty" flag:"clickhouse-max-result-bytes" env:"CLICKHOUSE_MAX_RESULT_BYTES" desc:"Per-request approximate byte cap on result body (0=default 50000, <0=disable)"`
HttpHeaders map[string]string `json:"http_headers" yaml:"http_headers" flag:"clickhouse-http-headers" env:"CLICKHOUSE_HTTP_HEADERS" desc:"HTTP Headers for ClickHouse"`
ExtraSettings map[string]string `json:"extra_settings,omitempty" yaml:"extra_settings,omitempty" desc:"Per-request ClickHouse settings injected by tool_input_settings"`
// ClusterName + ClusterSecret enable interserver-secret authentication.
Expand All @@ -56,8 +60,42 @@ type ClickHouseConfig struct {
MaxQueryLength int `json:"max_query_length,omitempty" yaml:"max_query_length,omitempty" flag:"clickhouse-max-query-length" env:"CLICKHOUSE_MAX_QUERY_LENGTH" desc:"Max bytes of SQL query string accepted from clients (0=default 10MB, <0=disabled)"`
}

// defaultMaxQueryLength is the default cap applied when MaxQueryLength is 0.
const defaultMaxQueryLength = 10 * 1024 * 1024 // 10 MiB
// Defaults applied by the Effective* getters when the corresponding field is 0.
// A negative value disables the cap entirely.
const (
defaultMaxQueryLength = 10 * 1024 * 1024 // 10 MiB
defaultMaxResultRows = 500
defaultMaxResultBytes = 50000
)

// EffectiveMaxResultRows returns the per-request row cap for SELECT-like queries.
// Negative => disabled (defer to ClickHouse user profile); 0 => default 500;
// >0 => exact cap. The deprecated Limit field is consulted as a silent alias
// only when MaxResultRows is 0.
func (c ClickHouseConfig) EffectiveMaxResultRows() int {
if c.MaxResultRows < 0 {
return 0
}
if c.MaxResultRows > 0 {
return c.MaxResultRows
}
if c.Limit > 0 {
return c.Limit
}
return defaultMaxResultRows
}

// EffectiveMaxResultBytes returns the approximate per-request response-body cap.
// Negative => disabled; 0 => default 50000; >0 => exact cap.
func (c ClickHouseConfig) EffectiveMaxResultBytes() int {
if c.MaxResultBytes < 0 {
return 0
}
if c.MaxResultBytes > 0 {
return c.MaxResultBytes
}
return defaultMaxResultBytes
}

// EffectiveMaxQueryLength returns the effective cap after applying defaults/disable semantics.
// Returns 0 if the check is disabled.
Expand Down
23 changes: 23 additions & 0 deletions pkg/config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,29 @@ func TestConfigStructs(t *testing.T) {
require.Equal(t, "custom-value", cfg.HttpHeaders["X-Custom-Header"])
})

t.Run("effective_max_result_rows", func(t *testing.T) {
t.Parallel()
// Unset => default 500
require.Equal(t, 500, ClickHouseConfig{}.EffectiveMaxResultRows())
// Explicit positive value wins
require.Equal(t, 250, ClickHouseConfig{MaxResultRows: 250}.EffectiveMaxResultRows())
// Negative => disabled
require.Equal(t, 0, ClickHouseConfig{MaxResultRows: -1}.EffectiveMaxResultRows())
// Deprecated Limit is consulted only when MaxResultRows is 0
require.Equal(t, 1000, ClickHouseConfig{Limit: 1000}.EffectiveMaxResultRows())
// MaxResultRows beats Limit when both are set
require.Equal(t, 250, ClickHouseConfig{MaxResultRows: 250, Limit: 1000}.EffectiveMaxResultRows())
// Explicit-disable (negative) beats Limit too
require.Equal(t, 0, ClickHouseConfig{MaxResultRows: -1, Limit: 1000}.EffectiveMaxResultRows())
})

t.Run("effective_max_result_bytes", func(t *testing.T) {
t.Parallel()
require.Equal(t, 50000, ClickHouseConfig{}.EffectiveMaxResultBytes())
require.Equal(t, 1000, ClickHouseConfig{MaxResultBytes: 1000}.EffectiveMaxResultBytes())
require.Equal(t, 0, ClickHouseConfig{MaxResultBytes: -1}.EffectiveMaxResultBytes())
})

t.Run("tls_config", func(t *testing.T) {
t.Parallel()
cfg := TLSConfig{
Expand Down
Loading