RFC: Skills Observability via Client Hooks#36

Open

JAORMX wants to merge 1 commit intomainfrom

rfc/skills-observability-hooks

Contributor

JAORMX commented Feb 3, 2026

Summary

This RFC proposes adding client hook management to ToolHive CLI, enabling OpenTelemetry-based observability for agent skill execution.

ToolHive will install and manage hooks in supported AI clients that capture skill invocation telemetry and forward it to the existing pkg/telemetry/ infrastructure for OTLP export.

Key Design Decisions

Client shim architecture: Normalizes different hook formats (Claude, Cursor, Windsurf, Cline) to a unified event format
Primary clients: Claude Code and Cursor (stretch: Windsurf, Cline)
Two operating modes:
- Standalone (transitional): CLI directly modifies client config files
- Server-managed (target): Integrates with THV-0034 long-running server for auto-install, drift detection, and enforcement
Enterprise-focused: Configuration via config.yml deployed by IT; developers don't need to run commands

Dependencies

THV-0034: Local Long-Running Server Architecture - Server-managed mode depends on this

Test plan

Review RFC for completeness and clarity
Validate client hook formats against actual client documentation
Confirm API endpoint design aligns with existing pkg/api/ patterns

🤖 Generated with Claude Code


          RFC: Skills Observability via Client Hooks

a5db3e8

This RFC proposes adding client hook management to ToolHive, enabling
OpenTelemetry-based observability for agent skill execution.

Key features:
- Client shim architecture to normalize different hook formats
- Primary support for Claude Code and Cursor
- Standalone mode (transitional) and server-managed mode (target)
- Enterprise deployment via config file + server auto-start
- Integration with THV-0034 long-running server architecture

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

JAORMX force-pushed the rfc/skills-observability-hooks branch from 07214b7 to a5db3e8 Compare

February 3, 2026 13:59

lorr1 reviewed

View reviewed changes

rfcs/THV-0036-skills-observability-hooks.md


		## Summary

		This RFC proposes adding client hook management to ToolHive CLI, enabling OpenTelemetry-based observability for agent skill execution. ToolHive will install and manage hooks in supported AI clients that capture skill invocation telemetry and forward it to the existing `pkg/telemetry/` infrastructure for OTLP export. A client-specific shim architecture normalizes the different hook formats into a unified telemetry pipeline. Primary support targets Claude Code and Cursor, with Windsurf and Cline as stretch goals.

lorr1 Feb 3, 2026

So how some plugins are written mean we can capture more than just skill execution - mcp server user, tool use, which bash commands, etc. My initial idea was to capture all of it and the dashboards / UI will filter to skill usage.

Is there any reason to consider being more restrictive so we only install a hook just for skills. Or let a user configure it.

Contributor Author

JAORMX Feb 4, 2026

It was just to scope down the work. I agree this can expand to more things

lorr1 reviewed

View reviewed changes

rfcs/THV-0036-skills-observability-hooks.md

+              | Client | Priority | Hooks API | Session ID | Tool Name Location | Response Format | Platform |
+              |--------|----------|-----------|------------|-------------------|-----------------|----------|
+              | **Claude Code** | Primary | Full (12 events) | `session_id` | `tool_name` | Exit code only | All |

lorr1 Feb 3, 2026

Claude code can do exit code. They also support json decision control like cursor

https://code.claude.com/docs/en/hooks#pretooluse-decision-control

rfcs/THV-0036-skills-observability-hooks.md

+              ```
+              **Capabilities:**
+              - **Centralized configuration**: IT deploys `config.yml` via MDM; server handles the rest

lorr1 Feb 3, 2026

To make sure I understand, all admins have to do is get toolhive installed via MDM and push some config file with the hook info via MDM. Then, toolhive acts as the process that installs it and has some enforcement policy in case the user removes it.

This is far simpler than a more naive MDM solution where the admins have to build the hooks config, the install script, and the verification process to enforce it is installed.

Is that a correct understanding?

rfcs/THV-0036-skills-observability-hooks.md

+              - **Centralized configuration**: IT deploys `config.yml` via MDM; server handles the rest
+              - **Auto-installation**: Server installs hooks on startup based on configuration
+              - **Drift detection**: Server watches client config files for unauthorized changes
+              - **Auto-remediation**: If `enforce: true`, server re-installs hooks when users remove them

lorr1 Feb 3, 2026

Will we ned any alerting here in case admins want to know of violations? I assume this would be secondary to this main thrust.

rfcs/THV-0036-skills-observability-hooks.md

+                # Install hooks automatically when server starts
+                auto_install: true
+                # Re-install hooks if user removes them (requires file watcher)

lorr1 Feb 3, 2026

This may be a naive question but can file watcher handle granular changes. ~/.claude/settings.json contains both toolhive hooks that we install and possible user hooks they want to add. For each hook event, the user can have a list of hooks.

So, if the user edits this file and removes one of their personal hooks, that's fine. We do nothing. If they remove our hook, we get grumpy.

rfcs/THV-0036-skills-observability-hooks.md

+              | Method | Endpoint | Description |
+              |--------|----------|-------------|
+              | `POST` | `/api/v1beta/hooks/install` | Install hooks for specified client(s) |

lorr1 Feb 3, 2026

Is this an all or nothing install. If we have both observability hooks for skills and, for example, access control hooks that prevent non-toolhive skills from being used, will these endpoints support only installing one. Or is it "install all stacklok hooks".

rfcs/THV-0036-skills-observability-hooks.md

+              skill.name: commit-message
+              skill.version: v1.2.0
+              skill.client: claude|cursor|windsurf|cline
+              skill.status: success|failure|denied

lorr1 Feb 4, 2026

Denials are a bit of a different animal in Claude Code...I think. We can loosely capture that based on if there's a PreToolUse that does not have a matching PostToolUse.

I'll post a follow on message with more details after I investigate more.

lorr1 Feb 4, 2026

Screenshot 2026-02-03 at 6 39 31 PM

Okay so the above shows you two flows (look at the session ID for which flow) where the model asks for permission to use a Skill. The first one is PreToolUse -> PermissionRequest -> PostToolUse with the same ID within the same time. This means I approved the request.

The second is PreToolUse -> PermissionRequest because I didn't approve. I don't have PostToolUse.

So in Claude, you don't get good denial information.

Via Claude's otel logging you can do a bit better by seeing "Hey this Skill was rejected by the user at this time and session" (see below). Doesn't really help us here as we're doing hooks but for what it's worth.

As an aside, I think Anthropic changed their policies around skills, and that they are basically in an auto-approve mode to read a skill unless you have explicit permissions for skills to be in ask or deny mode. I never get prompted for skill use anymore in the last week or so.

rfcs/THV-0036-skills-observability-hooks.md

+              ### Data Security
+              - **Skill inputs not logged**: Only skill name, version, status, and timing are captured
+              - **No sensitive data in metrics**: Metrics contain only skill metadata, not content

lorr1 Feb 4, 2026

Some companies consider skill input prompt that comes with some skills as sensitive. It'd be in the args. But my overall opinion is to just be very very clear with what we are logging but not change anything yet.

lorr1 Feb 4, 2026

Oh lol you kind of address this with the next bullet.

rfcs/THV-0036-skills-observability-hooks.md

+              {
+                "version": 1,
+                "hooks": {
+                  "beforeMCPExecution": [

lorr1 Feb 4, 2026

nit: most of the configurations after claude code pertain to MCP servers, not skills.

rfcs/THV-0036-skills-observability-hooks.md


		## Open Questions

		1. Skill name extraction: How do we reliably identify skill invocations vs regular MCP tool calls across different clients? The tool name is typically "Skill" but we need to extract the actual skill name from `tool_input`. Need to validate extraction logic with real skill invocations across all clients.

lorr1 Feb 4, 2026

We also need to verify we can get skill usage from other clients. I bet you can for cursor. Would need to investigte the others.

rfcs/THV-0036-skills-observability-hooks.md


		1. Skill name extraction: How do we reliably identify skill invocations vs regular MCP tool calls across different clients? The tool name is typically "Skill" but we need to extract the actual skill name from `tool_input`. Need to validate extraction logic with real skill invocations across all clients.

		2. THV-0034 dependency: Should this RFC block on THV-0034 (long-running server), or should we implement standalone mode first and migrate later? Standalone mode has significant limitations for enterprise use.

lorr1 Feb 4, 2026

My read is wiggle now. The standalone server may not be until March. We could start to use this internally in standalone mode before then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet