Releases: demml/scouter
v0.25.0
v0.25.0 Release Summary
What changed
This release fixes a race condition in the trace storage layer. The old deregister_table + register_table two-step left a window where concurrent queries would get "table not found." All TableProvider updates are now atomic swaps through a DashMap-backed custom catalog (TraceCatalogProvider). The summary engine also gets a refresh ticker so read pods pick up commits from the write pod without a restart.
Breaking changes
None. No schema changes, no migration required. The ctx field on TraceSpanDBEngine is now private; use the new ctx() method instead. This only matters if you're constructing or testing the engine directly outside the service layer.
Changes
Trace storage: atomic TableProvider swaps
The built-in DataFusion catalog (datafusion.public) was replaced with TraceCatalogProvider, backed by a DashMap. All engines (TraceSpanDBEngine, TraceSummaryDBEngine, bifrost) now call catalog.swap(table_name, provider) instead of the deregister/register two-step.
Before:
self.ctx.deregister_table(TRACE_SPAN_TABLE_NAME)?;
self.ctx.register_table(TRACE_SPAN_TABLE_NAME, updated_table.table_provider().await?)?;After:
let new_provider = updated_table.table_provider().await?;
self.catalog.swap(TRACE_SPAN_TABLE_NAME, new_provider);DashMap::insert() is atomic. Concurrent readers see either the old provider or the new one, never a gap between them.
TraceSpanDBEngine and TraceSummaryDBEngine share the same TraceCatalogProvider via Arc. The span engine creates it; the summary service gets it through TraceSpanService::catalog. JOIN queries between trace_spans and trace_summaries work because both tables are in the same catalog.
Trace storage: summary engine refresh ticker
TraceSummaryDBEngine now has a background refresh loop (same as the span engine) that calls update_incremental() on the Delta table and swaps the TableProvider when a new version is found.
Refresh interval is SCOUTER_TRACE_REFRESH_INTERVAL_SECS, already in the server config. Values below 1 second are clamped up; tokio::time::interval panics on Duration::ZERO.
test_distributed_refresh covers the two-pod case: writer commits summaries, reader with a 1s ticker picks them up in the next query.
DataFusion session construction: get_session_with_catalog
ObjectStore has a new get_session_with_catalog(catalog_name, schema_name) method. It sets the named catalog as the SQL default, so unqualified table names and ctx.table(name) calls resolve through it instead of datafusion.public.
get_session() is unchanged. build_session_config() is now a private helper shared by both paths.
Bifrost engine: torn-write fix on refresh
The bifrost refresh path was calling table_provider() twice: once to update the write context, once for the catalog swap. If the second call failed, the write context would be updated but the catalog would not. Now there's one call, the result is shared, and the write context is only deregistered if the fetch succeeds.
Upgrading from v0.24.0
Nothing to do. The catalog wires up at server startup. SCOUTER_TRACE_REFRESH_INTERVAL_SECS already controls both the span and summary refresh rates.
v0.24.0
v0.24.0 Release Summary
What Changed
This release introduces Bifrost, a Delta Lake-backed dataset storage and query system that turns Pydantic models into queryable tables. Combined with streamlined event queue infrastructure and expanded evaluation capabilities, v0.24.0 adds a production-grade data layer to Scouter for storing and querying high-volume records in AI applications.
Breaking Changes
None. No schema migrations, no database changes, no API removals. Existing drift, evaluation, and tracing functionality is unchanged.
Changes
Bifrost: Delta Lake Dataset Storage
Bifrost turns a Pydantic model into a Delta Lake table. You define the schema, push records through a gRPC queue, and query with SQL.
Write path:
DatasetProducer.insert(record)— serializes to JSON and sends via an unbounded channel (returns immediately, sub-microsecond latency).- Background batching — accumulates records until
batch_sizeorscheduled_delay_secstriggers a flush. - Arrow serialization —
DynamicBatchBuilderconverts JSON rows to ArrowRecordBatch, injecting system columns automatically. - gRPC transport — batch sent to server as Arrow IPC bytes.
- Delta Lake — server appends to table partitioned by
scouter_partition_date.
Read path:
DatasetClient.sql(query)— sends SQL to server via gRPC.- DataFusion execution — full SQL support (joins, CTEs, window functions, aggregations).
- Zero-copy delivery — results returned as Arrow IPC bytes.
- Format conversion — call
.to_arrow(),.to_polars(),.to_pandas()to convert. - Strict reads —
DatasetClient.read()returns validated Pydantic model instances.
Schema validation (schema-on-write):
- Pydantic JSON Schema → Arrow schema conversion, fingerprinted.
- Fingerprint checked on every batch write.
- Schema mismatch caught before data lands.
- System columns injected automatically:
scouter_created_at(microsecond timestamp),scouter_partition_date(Date32),scouter_batch_id(UUID v7).
Supported types:
- Primitives:
str,int,float,bool,datetime,date - Collections:
Optional[T],List[T](nested supported) - Enums:
Enum→Dictionary(Int16, Utf8) - Nested models:
BaseModel→Struct(...)(recursive, up to 32 levels)
Clients:
Bifrost— unified read/write (long-lived, callshutdown()on exit)DatasetProducer— write only (background queue, callshutdown()on exit)DatasetClient— read only (stateless queries bound to a table viaTableConfig)
All clients use gRPC transport configured via GrpcConfig. See Bifrost docs for examples.
Event Queue Refactor
Queue infrastructure refactored for clarity and maintainability:
- Queue traits and implementations reorganized in
scouter-events. DatasetQueueadded for high-throughput dataset inserts.- Existing Kafka, RabbitMQ, and Redis adapters unchanged.
gRPC API Expansion
New gRPC endpoints for dataset operations:
CreateDataset— register a table with fingerprint validation.InsertBatch— append Arrow IPC bytes to Delta Lake.QueryDataset— execute SQL and return Arrow IPC results.ReadDataset— read records matching a filter.
Protobuf definitions updated in scouter.grpc.v1.proto.
Evaluation Improvements
Agent assertions:
TraceAssertionTaskadded — assertions on OpenTelemetry spans fetched from Delta Lake.trace_idadded to agent assertion context (enables cross-span evaluation).
Test coverage:
- New
test_agent_assertion.pywith 84 lines of test cases. - New
test_eval_orchestrator.pywith 173 lines covering offline eval orchestration. - Trace evaluator test expanded.
Documentation
New Bifrost docs section:
- Overview — architecture and design
- Quickstart — end-to-end write and read example
- Writing Data — producer config and patterns
- Reading Data — SQL queries and format conversions
- Schema Reference —
TableConfig, type mapping, fingerprinting
Upgrading from v0.23.0
No action required.
- Server: Standard build and deployment. No database migrations. Bifrost uses object storage (local, S3, GCS, Azure) configured via
SCOUTER_STORAGE_URI. - Python client: Standard rebuild with
make setup.project(rebuilds Rust extension). - Existing workflows: Drift, evaluation, and tracing work exactly as before.
To use Bifrost, define a Pydantic schema, create a TableConfig, and use Bifrost or DatasetProducer/DatasetClient to write and query data.
v0.23.0
v0.23.0 Release Summary
What Changed
This release adds distributed Delta Lake support for the trace storage engine. In multi-pod deployments, reader pods now automatically pick up new data committed by writer pods without a restart. A new configurable refresh interval controls how often each pod refreshes its in-memory Delta table snapshot from shared object storage.
Breaking Changes
None. No schema changes, no migration required. The new SCOUTER_TRACE_REFRESH_INTERVAL_SECS env var defaults to 10 and requires no action for existing deployments.
Changes
Distributed trace storage: cross-pod Delta Lake refresh
Previously, each pod's TraceSpanDBEngine loaded the Delta table snapshot once at startup. In a multi-pod deployment sharing GCS/S3, reader pods would never see data committed by the writer pod until they restarted.
The engine's actor loop now runs a periodic refresh_table() tick alongside its existing command and compaction handlers. On each tick it calls update_incremental() on a cloned DeltaTable. If the version advanced, it deregisters and re-registers the DataFusion SessionContext table provider so subsequent queries return fresh results. If the incremental update fails (empty table, transient network error), the clone is discarded and the original table state is preserved.
Key details:
| Setting | Default | Env var |
|---|---|---|
| Refresh interval | 10 seconds | SCOUTER_TRACE_REFRESH_INTERVAL_SECS |
Set lower (e.g. 5) for faster cross-pod visibility at the cost of more object-store LIST calls. Set higher to reduce overhead when read latency is not critical.
The refresh runs independently on every pod — unlike compaction, there is no control-table mutual exclusion. Each pod refreshes its own in-memory snapshot.
Trace engine: safer incremental updates
All update_incremental calls in the engine (compaction, writes, optimizations) now call update_datafusion_session() after updating the table, ensuring the DataFusion SessionContext always has the correct object store registered. This fixes a class of bugs where DeltaScan::scan() could fail to resolve file URLs after a table update in cloud-backed deployments.
CI: release workflow fix
The release workflow tag comparison step now uses ${{ github.ref_name }} instead of $GITHUB_REF_NAME, fixing an interpolation issue where the tag name was not correctly resolved in the version check step.
Upgrading from v0.22.0
No action required. The refresh interval defaults to 10 seconds. To tune it, set SCOUTER_TRACE_REFRESH_INTERVAL_SECS in your environment.
v0.22.0
v0.22.0 Release Summary
What Changed
This release ships the offline agent evaluation framework: EvalScenario, EvalScenarios, EvalRunner, and EvalOrchestrator let you define named test cases, run them against a live queue + tracer, and get structured pass/fail metrics with per-scenario breakdowns. The release also integrates AgentAssertionTask into the full offline eval pipeline and adds in-process span capture so scenarios can assert on traces without a running server.
Breaking Changes
Database migration required. A new column is added to scouter.genai_eval_record:
ALTER TABLE scouter.genai_eval_record
ADD COLUMN IF NOT EXISTS tags TEXT[] NOT NULL DEFAULT ARRAY[]::TEXT[];The migration runs automatically on server startup via sqlx.
Changes
Offline Evaluation: EvalScenario / EvalScenarios / EvalRunner
Three new types form the core of the offline scenario framework:
EvalScenario — a single named test case. Holds a query string, optional context dict, tasks (assertion, LLM judge, trace, or agent), and a pass_threshold (float 0–1, default 1.0). Each scenario gets a stable UUID7 ID on construction.
from scouter.evaluate import EvalScenario, AgentAssertionTask, AgentAssertion
scenario = EvalScenario(
name="tool_use_check",
query="Search for recent AI papers",
tasks=[
AgentAssertionTask(
id="search_called",
assertion=AgentAssertion.tool_called("web_search"),
)
],
pass_threshold=1.0,
)EvalScenarios — a collection of EvalScenario objects with internal state (datasets, contexts) populated by EvalRunner.collect_scenario_data() and results populated by EvalRunner.evaluate(). Not intended to be constructed manually; produced by EvalRunner.
EvalRunner — stateful engine that owns scenario definitions and GenAIEvalProfile references (as Arcs, same pattern as ScouterQueue). Two-phase API:
collect_scenario_data(queue, tracer)— drains records and spans captured during scenario execution and associates them with scenarios byscenario_idtag.evaluate()— runs all tasks, returnsScenarioEvalResults.
Optionally call compare(baseline: ScenarioEvalResults) to produce a ScenarioComparisonResults with per-scenario deltas.
New result types:
| Type | Purpose |
|---|---|
EvalMetrics |
Aggregate pass rates: overall_pass_rate, scenario_pass_rate, dataset_pass_rates (per-alias) |
ScenarioResult |
Pass/fail + task results for one scenario |
ScenarioEvalResults |
All ScenarioResults + aggregate EvalMetrics |
ScenarioDelta |
Δ pass rate between two runs for one scenario |
ScenarioComparisonResults |
Full comparison across all scenarios with regression/improvement classification |
Offline Evaluation: EvalOrchestrator (Python)
EvalOrchestrator is a high-level Python wrapper that manages the full capture lifecycle so callers don't have to sequence enable_capture / disable_capture / collect_scenario_data manually.
from scouter.evaluate import EvalOrchestrator, EvalScenario, EvalScenarios
orchestrator = EvalOrchestrator(
scenarios=EvalScenarios(scenarios=[...]),
queue=queue,
tracer=tracer,
profiles={"agent": profile},
)
results: ScenarioEvalResults = orchestrator.run(agent_fn=my_agent)agent_fn is Callable[[str], str] — takes a query, returns a response string. The orchestrator:
- Enables queue capture + local span capture.
- Iterates scenarios, sets
scouter.eval.scenario_idin OTel baggage, callsagent_fn(scenario.query). - Disables capture, calls
EvalRunner.collect_scenario_data(), thenevaluate(). - Returns
ScenarioEvalResults.
Subclass EvalOrchestrator and override execute_scenario() to handle non-string responses or add lifecycle hooks.
AgentAssertionTask: full pipeline integration
AgentAssertionTask was previously standalone (via execute_agent_assertion_tasks()). It is now fully wired into the EvalDataset pipeline:
EvaluationTask::AgentAssertionvariant added.EvaluationTaskType::AgentAssertionvariant added (serializes as"AgentAssertion").TaskConfig::AgentAssertiondeserializes from stored task JSON.AssertionTasks.agent: Vec<AgentAssertionTask>— tasks are routed to this bucket when building datasets fromTasksFile.EvaluationTask::AgentAssertionparticipates independs_onresolution.
New supporting types:
TokenUsage — structured token count from LLM responses. Fields: input_tokens, output_tokens, total_tokens (all Optional[int]). Exposed to Python as a #[pyclass].
AgentContextBuilder (Rust-internal) — normalizes vendor-specific LLM response formats into a standard structure before assertion evaluation. Auto-detects format:
- Pre-normalized (Scouter standard shape)
- OpenAI —
choices[].message.tool_calls,usage,model - Anthropic —
content[]withToolUseBlock,usage,model - Google/Gemini —
candidates[].content.parts[]withfunction_call - Fallback tree walk
Path limits enforced: max 512 chars per path, max 32 segments.
In-process span capture
A new local capture mode lets tests assert on trace spans without a running Scouter server or Delta Lake backend. Spans are buffered in memory instead of forwarded to the transport.
Buffer capacity: 20,000 spans (CAPTURE_BUFFER_MAX). Writes beyond this limit are dropped with a warning.
Rust API (Tracer):
tracer.enable_local_capture()?;
// ... instrumented code ...
let spans = tracer.drain_local_spans()?;
let by_trace = tracer.get_local_spans_by_trace_ids(vec!["abc123...".into()])?;
tracer.disable_local_capture()?; // discards bufferPython API (ScouterInstrumentor):
instrumentor.enable_local_capture()
# ... instrumented code ...
spans: list[TraceSpanRecord] = instrumentor.drain_local_spans()
spans_filtered = instrumentor.get_local_spans_by_trace_ids(["abc123..."])
instrumentor.disable_local_capture()Module-level aliases also available: enable_local_span_capture, disable_local_span_capture, drain_local_span_capture.
disable_local_capture logs a warning and discards buffered spans if any remain.
EvalRecord: tags and trace_id stamping
Tags — EvalRecord now carries a tags: list[str] field in key=value format. Tags are persisted to PostgreSQL and returned in all query paths (get, paginated, archive).
record = EvalRecord(context={"response": "..."})
record.add_tag("environment", "staging")
record.add_tag("model", "gpt-4o")
# record.tags == ["environment=staging", "model=gpt-4o"]trace_id at construction — EvalRecord(trace_id="<hex>") now accepted. Previously trace_id could only be set after construction.
Automatic stamping in QueueBus — when an EvalRecord is inserted via ScouterQueue and has no trace_id, the bus checks the active OTel span context. If a valid span is active, its trace ID is stamped onto the record (both the Rust struct and the Python-side object). The Python object is updated via a mutable borrow; a warning is logged if the cast fails.
Scenario tag auto-injection — if OTel baggage contains scouter.eval.scenario_id, the bus appends "scouter.eval.scenario_id=<value>" to record.tags automatically. Tag values are validated: alphanumeric, hyphens, underscores, max 128 chars. Invalid values are dropped with a warning.
ScouterQueue: offline record capture
New methods for offline use (mirroring the local span capture API):
queue.enable_capture() # buffer EvalRecords in memory in addition to sending
queue.disable_capture() # stop buffering and discard buffered records
queue.drain_records("alias") # drain records from one queue by alias
queue.drain_all_records() # drain from all queues, keyed by aliasCapture is off by default. Enabling it has negligible overhead; records are still forwarded to the normal transport.
GenAIEvalProfile references are now stored as Arc<GenAIEvalProfile> inside ScouterQueue.profiles, so EvalScenarios can share ownership without cloning.
shutdown() now releases the GIL during the 250ms wait periods, preventing deadlocks in multi-threaded Python programs.
Server: debug trace endpoint
A new diagnostic route returns the 10 most recent traces from the past 24 hours:
GET /scouter/trace/debug/recent
Returns the same TracePaginationResponse as the paginated trace query. Intended for local debugging and health verification; not authenticated differently from other trace routes.
Upgrading from v0.21.1
-
Apply the database migration. It runs automatically on server startup. If you run migrations manually, execute:
ALTER TABLE scouter.genai_eval_record ADD COLUMN IF NOT EXISTS tags TEXT[] NOT NULL DEFAULT ARRAY[]::TEXT[];
-
No other action required. All API changes are additive. Existing
EvalRecord,ScouterQueue, and tracing usage continues to work without modification.
v0.21.1
v0.21.1 Release Summary
What Changed
This release renames the GenAI evaluation result types to remove the GenAI prefix. The names GenAIEvalResults, GenAIEvalSet, GenAIEvalTaskResult, and GenAIEvalResultSet have been replaced with shorter, framework-agnostic equivalents. No behavior or schema changes are included.
Breaking Changes
All four GenAI eval result types have been renamed. Any code importing or referencing these types must be updated.
| Old name | New name |
|---|---|
GenAIEvalResults |
EvalResults |
GenAIEvalSet |
EvalSet |
GenAIEvalTaskResult |
EvalTaskResult |
GenAIEvalResultSet |
EvalResultSet |
This affects:
- Python imports from
scouterorscouter.evaluate - Type annotations referencing these classes
- Any serialized JSON being deserialized back into these types via
model_validate_json(type name is not embedded in the JSON payload, so existing stored results are unaffected)
Changes
GenAI evaluation — type renaming
The GenAI prefix has been dropped from four result container types. The change is purely nominal — fields, methods, and return values are identical. The rename touches the Rust crates (scouter_types, scouter_evaluate, scouter_dataframe, scouter_client, scouter_sql), PyO3 bindings, Python stubs, and the public scouter.evaluate module.
Drifter.compute_drift() return type annotation updated from GenAIEvalResultSet to EvalResultSet. EvalDataset.evaluate() return type updated from GenAIEvalResults to EvalResults.
Tracing — storage architecture documentation
Added py-scouter/docs/docs/tracing/storage-architecture.md documenting the Delta Lake + DataFusion write/query pipeline, component reference table, and the dual-actor buffer/engine design.
Upgrading from v0.21.0
-
Replace all imports of the renamed types:
# Before from scouter.evaluate import ( GenAIEvalResults, GenAIEvalSet, GenAIEvalTaskResult, GenAIEvalResultSet, ) # After from scouter.evaluate import ( EvalResults, EvalSet, EvalTaskResult, EvalResultSet, )
The top-level
scouternamespace export has changed the same way:# Before from scouter import GenAIEvalResults # After from scouter import EvalResults
-
Update any type annotations that reference the old names.
-
No database migrations, no environment variable changes, no serialization format changes.
v0.21.0
v0.21.0 Release Summary
What Changed
v0.21.0 adds a read-side caching layer over cloud object stores and tunes the DataFusion SessionContext for higher-concurrency GCS/S3 workloads. A bug where the trace summary table skipped vacuum after compaction is fixed. An internal record type rename is propagated across all crates.
Breaking Changes
None. No schema changes, no migration required.
Changes
Object store caching layer (CachingStore)
A new CachingStore<T: ObjectStore> wrapper in scouter_dataframe caches head() responses and small get_range() reads (≤2 MB) from cloud object stores.
After Delta Lake Z-ORDER compaction, Parquet files are immutable — the same path always returns the same bytes. DataFusion issues repeated HEAD + footer range reads on every query. Without caching, each read is a separate cloud round-trip (~30–60 ms on GCS). CachingStore eliminates these by serving repeated reads from an in-process mini_moka cache.
Cache configuration:
| Setting | Default | Env var |
|---|---|---|
| Max cache size | 64 MB | SCOUTER_OBJECT_CACHE_MB |
| TTL | 1 hour | — |
| Max cacheable range read | 2 MB | — |
All mutating and streaming operations (put, delete, list, get for large ranges) pass through to the inner store uncached.
DataFusion SessionContext tuning
The shared SessionContext used for trace queries now includes explicit read-path and write-path settings:
| Setting | Old | New | Why |
|---|---|---|---|
metadata_size_hint |
512 KB | 1 MB | Captures bloom filter + footer + column indexes in one GCS round-trip instead of the default multi-step chain |
bloom_filter_on_read |
default | true |
Activates bloom filters on trace_id and entity_id to skip non-matching row groups before decoding |
schema_force_view_types |
default | true |
Zero-copy Utf8View/BinaryView — prevents DataFusion from downgrading these on read-back from Parquet |
meta_fetch_concurrency |
32 | 64 | Parallel HEAD stats during Delta log replay; matches pool_max_idle_per_host |
maximum_parallel_row_group_writers |
default | 4 | Concurrent row group encoding during compaction and flush |
maximum_buffered_record_batches_per_stream |
default | 8 | Smooths bursty reads from GCS |
Connection pool tuning
Cloud object store HTTP client settings updated:
| Setting | Old | New |
|---|---|---|
pool_max_idle_per_host |
16 | 64 |
pool_idle_timeout |
90s | 120s |
| Request timeout | — | 30s |
| Connect timeout | — | 5s |
Bug fix: vacuum missing after summary optimize
TraceSummaryDBEngine::run_maintenance() called optimize_table() but not vacuum_table() afterward. Compaction tombstones old Parquet files; without an immediate vacuum those files remain on storage until the next scheduled vacuum cycle.
Fixed to vacuum immediately after a successful optimize:
Ok(()) => {
if let Err(e) = self.vacuum_table(0).await {
error!("Post-optimize vacuum failed: {}", e);
}
// release task ...
}This matches the existing behavior in TraceSpanDBEngine.
Internal record type rename (PR #221)
Internal record type renamed across scouter_client, scouter_drift, scouter_evaluate, scouter_events, scouter_server, and py-scouter. No public API change for Python users — stub files updated.
Upgrading from v0.20.0
No action required. All changes are additive or internal.
SCOUTER_OBJECT_CACHE_MB is optional. The default (64 MB) is appropriate for most deployments. Increase it if you have many concurrent readers querying large numbers of Parquet files.
v0.20.0
v0.20.0 Release Summary
What Changed
v0.20.0 removes PostgreSQL from the trace/span pipeline entirely. All trace ingestion, storage, querying, and maintenance now runs through DataFusion and Delta Lake only. This release also adds a distributed coordination layer for multi-pod compaction, a pre-aggregated trace summary table for fast listing, cloud storage fixes for GCS/S3/Azure, and three new tuning env vars.
Breaking Changes
The PostgreSQL trace schema is no longer used. Any tooling, migrations, or queries that read trace data from PostgreSQL will stop working after upgrading. The scouter_sql aggregator is now a thin forwarding layer; all trace reads and writes go through Delta Lake via scouter_dataframe.
Changes
Traces now fully on Delta Lake + DataFusion
PostgreSQL has been removed from the trace read/write path. The architecture is now:
gRPC / HTTP ingest → in-memory buffer (actor) → Delta Lake (span table + summary table)
↑
DataFusion query engine
The scouter_sql aggregator retains its interface for compatibility but no longer writes span data to PostgreSQL.
Trace summary table
A new trace_summaries Delta Lake table stores one row per trace with pre-computed fields:
| Column | Type | Description |
|---|---|---|
trace_id |
FixedSizeBinary(16) |
Trace identifier |
service_name |
Dictionary(Int32, Utf8) |
Service that produced the root span |
root_operation |
Utf8 |
Name of the root span |
start_time / end_time |
Timestamp(µs, UTC) |
Trace wall-clock bounds |
duration_ms |
Int64 |
End-to-end latency in milliseconds |
span_count |
Int64 |
Total spans in the trace |
error_count |
Int64 |
Spans with error status |
search_blob |
Utf8 |
Concatenated attribute text for full-text search |
entity_ids |
List<Utf8> |
Application entity IDs attached to the trace |
queue_ids |
List<Utf8> |
Queue message IDs attached to the trace |
This table is partitioned by partition_date (Date32). Listing traces and applying filters no longer requires scanning the full span table.
Distributed compaction control table
A new _scouter_control Delta Lake table coordinates compaction, retention, and vacuum tasks across pods. Each task (summary_optimize, etc.) has a single row with idle/processing status, a pod_id, and a next_run_at timestamp. Locks older than 30 minutes are automatically reclaimed.
This prevents multiple pods from running conflicting Z-ORDER optimize operations simultaneously against shared object storage.
New attribute search UDF
A custom DataFusion scalar UDF (match_attr_expr) enables full-text attribute search against the search_blob column. This replaces SQL LIKE patterns that required per-attribute column scans.
// DataFusion query predicate
match_attr_expr(col("search_blob"), lit("user_id=abc123"))New trace query API routes
Two new HTTP endpoints were added to scouter-server:
GET /traces/:trace_id/spans— returns all spans for a specific trace IDPOST /traces/spans/filter— returns spans matchingTraceFilters(service name, time range, attribute values, entity ID, etc.)
Typed DataFusion predicates for Parquet pruning
Query helpers ts_lit() and date_lit() emit typed Timestamp(Microsecond, UTC) and Date32 literals. These match column types exactly, enabling Parquet row-group min/max pruning and partition directory skipping without type coercion overhead.
Cloud storage fixes
- GCS / S3 / Azure:
storage_root()now correctly extracts only the bucket name from URIs likegs://my-bucket/path/to/prefix. Previously returned the full path after stripping the scheme prefix, causing object store initialization failures. - Azure: Fixed path construction for Delta table locations.
PassthroughLogStoreFactoryadded for cloud log store registration when using GCS.
Span schema changes
Columns removed from the span table:
root_span_id— derivable from the summary tabledepth,span_order,path— unused by query layer
Columns added:
search_blob— concatenated attribute text for UDF-based searchqueue_ids— list of queue message IDs
New configuration env vars
| Variable | Default | Description |
|---|---|---|
SCOUTER_TRACE_COMPACTION_INTERVAL_HOURS |
24 |
How often Delta Lake Z-ORDER optimize runs for trace tables |
SCOUTER_TRACE_FLUSH_INTERVAL_SECS |
5 |
How often the in-memory span buffer flushes to Delta Lake |
SCOUTER_TRACE_BUFFER_SIZE |
10000 |
Span buffer capacity before a forced flush |
Larger SCOUTER_TRACE_BUFFER_SIZE values reduce the number of small Parquet files written to cloud storage but increase the window of data that could be lost on a crash.
Upgrading from v0.19.0
- Remove any direct PostgreSQL queries against trace tables. These tables may still exist but are no longer written to.
- Set
SCOUTER_STORAGE_URIto a writable location (local path,s3://,gs://, oraz://). This was required in v0.19.0 for spans and is now required for summaries and the control table as well. - On first startup, the server creates the
trace_summariesand_scouter_controlDelta tables automatically. No migration script is needed. - If running multiple server replicas, all replicas must share the same
SCOUTER_STORAGE_URI. The control table coordinates cross-pod compaction; replicas pointing at different storage paths will not coordinate.
v0.19.0
v0.18.0
What's Changed
v0.18.0 Release Notes
Server Deployment Modes
- The server binary now accepts a --mode flag to run HTTP only, gRPC only, or both (default). This enables independent horizontal scaling of each protocol — deploy gRPC close to high-throughput queue producers and HTTP separately for the REST API.
Full Changelog: v0.17.1...v0.18.0
v0.17.1
- Patch to fix wheel testing