UN-3431 [FIX] Restore log_events_id on tool-run dispatch and persist without UI subscriber by chandrasekharan-zipstack · Pull Request #1960 · Zipstack/unstract

chandrasekharan-zipstack · 2026-05-13T06:35:08Z

What

workers/file_processing/structure_tool_task.py: thread log_events_id into the agentic_table and structure_pipeline ExecutionContext dispatches.
workers/executor/executor_tool_shim.py: gate only the PROGRESS publish on log_events_id; the LOG payload now falls back to execution_id as the routing channel.

Why

During the agentic_table refactor (UN-3403 [FEAT] Agentic table extractor plugin with multi-agent LLM-powered table extraction #1914), both new dispatch sites stopped passing log_events_id. Tool-run logs ("Pipeline step 1: …", "Processing prompt: about", etc.) never reached the workflow execution logs UI for any run that went through structure_pipeline — that is the dominant code path.
ExecutorToolShim.stream_log additionally short-circuited at if not self.log_events_id: return before the LOG-payload publish, so even contexts that have valid execution_id + organization_id but no websocket subscriber (API deployments) silently dropped their execution_log rows.

How

Restore the log_events_id=log_events_id kwarg on the two regressed ExecutionContext constructions; the third dispatch (agentic_extraction) already had it.
In ExecutorToolShim.stream_log, move the PROGRESS branch behind a truthy log_events_id check and use self.log_events_id or self.execution_id as the LOG channel so persistence works without a UI subscriber. Mirrors backend WorkflowLog channel-fallback behaviour.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No regression risk. PROGRESS publishing behaviour is unchanged for IDE / workflow UI runs (still keyed on log_events_id). LOG persistence only gains coverage — when log_events_id was present, the LOG already used it as channel; when it was absent, the code previously returned and now publishes via execution_id instead.

Database Migrations

None.

Env Config

None.

Relevant Docs

N/A.

Related Issues or PRs

UN-3431
Regression introduced by UN-3403 [FEAT] Agentic table extractor plugin with multi-agent LLM-powered table extraction #1914 (agentic_table executor refactor).
Original tool-run log streaming feature: UN-3431 [FIX] Stream tool-run logs to workflow execution UI with markdown rendering #1927.
Pairs with the cloud-side fix that threads workflow IDs into plugin tool shims: Zipstack/unstract-cloud#1491.

Dependencies Versions

None.

Notes on Testing

Built unstract/worker-unified:test from this branch (with the cloud plugin fix overlaid) and recreated all v2 workers locally. Smoke tests to run:

Run a structure_pipeline workflow (single prompt, RAG path) via the workflow UI — TOOL_RUN-stage rows should now appear in unstract.execution_log and stream live in the workflow logs panel.
Run the same workflow as an API deployment (no websocket subscriber) — TOOL_RUN rows should still persist to execution_log.
Run a Single-Pass Extraction workflow — Reading document context..., Running single-pass extraction with N fields... lines should appear (covered jointly by this PR + the cloud plugin PR).
Run a table / line-item / agentic-table prompt — corresponding extraction lines should appear.
Bare IDE run from Prompt Studio — PROGRESS spinner should still update on the prompt card; nothing extra in execution_log (unchanged).

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

…without UI subscriber structure_tool_task lost log_events_id from both the agentic_table and structure_pipeline ExecutionContexts during the agentic_table refactor; the executor shim therefore received an empty log_events_id and bailed before publishing anything. Tool-run lines stopped reaching the workflow logs UI for every dispatch through these paths. Two changes: - structure_tool_task: thread log_events_id into both ExecutionContexts. - executor_tool_shim.stream_log: gate only the PROGRESS path on log_events_id; the LOG payload now falls back to execution_id as the routing channel so logs persist to execution_log even when no websocket subscriber exists (API deployments). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-13T06:35:15Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9c244c03-2024-43e1-854d-f59565bbbd9d

📥 Commits

Reviewing files that changed from the base of the PR and between e50a5ba and a081ff9.

📒 Files selected for processing (3)

workers/executor/executor_tool_shim.py
workers/executor/executors/index.py
workers/executor/executors/legacy_executor.py

💤 Files with no reviewable changes (1)

workers/executor/executors/index.py

Summary by CodeRabbit

Bug Fixes
- Improved workflow log publishing so logs are published more reliably even when some identifiers are missing.
- Made task execution log-event streaming consistent across executor paths.
New Features
- Emits a one-time "Using {adapter}" message per adapter to reduce repetitive logs.
- Adds clearer runtime messages for pipeline run configuration and prompt/answer counts.

Walkthrough

Workflow log streaming is decoupled from progress-event publishing: task dispatch now forwards log_events_id into executor contexts; ExecutorToolShim.stream_log() persists workflow logs whenever execution_id and organization_id exist (using log_events_id or falling back to execution_id as channel), while progress publishing remains conditional on log_events_id. Adapter-once instrumentation was added to log adapter identities a single time.

Changes

Workflow log publishing resilience

Layer / File(s)	Summary
Dispatch adds log_events_id to ExecutionContext `workers/file_processing/structure_tool_task.py`	Both `agentic_table` and `legacy` `structure_pipeline` Celery dispatcher paths now include `log_events_id=log_events_id` when constructing `ExecutionContext`, ensuring downstream availability of the log-events identifier.
Stream/publish split and channel fallback `workers/executor/executor_tool_shim.py`	`ExecutorToolShim.stream_log()`: progress publishing is conditional on `log_events_id`; workflow/persisted log publishing proceeds when `execution_id` and `organization_id` are present. The publish `channel_id` uses `log_events_id` when available otherwise falls back to `execution_id`. Early-return logic that prevented fallback was removed.

Executor runtime logging & adapter-once instrumentation

Layer / File(s)	Summary
Remove redundant step logs in indexing flow `workers/executor/executors/index.py`	Removed pre-work log messages (`"Indexing file..."` and `"Adding nodes to vector db..."`) that previously appeared before document construction/indexing calls.
Consolidate pipeline/step logging and add run-config output reporting `workers/executor/executors/legacy_executor.py`	Structure-pipeline logging: emits a single “Run config” line with prompt count and flags; completion reporting changed to count of answered prompts using `PromptServiceConstants.OUTPUT`. Per-step verbose stream messages were reduced or consolidated.
Adapter identity logging once-per-adapter `workers/executor/executor_tool_shim.py`, `workers/executor/executors/legacy_executor.py`	Introduces `_adapters_logged` set and `log_adapter_once(kind, adapter_id, adapter)` on `ExecutorToolShim`; `LegacyExecutor` calls `shim.log_adapter_once(...)` for LLM/Embedding/VectorDB in initializations and indexing paths to emit a single “Using {kind}: `{label}`” message per adapter id.
RAG retrieval streaming adjustments `workers/executor/executors/legacy_executor.py`	When `chunk_size > 0`, retrieval logging now emits a RAG retrieval start and a post-retrieval “Retrieved N chunks via RAG …” message reporting retrieved chunk count.
Summarize LLM model naming `workers/executor/executors/legacy_executor.py`	Summarization start log now includes `llm.get_model_name()` instead of a generic LLM initialization message.

Sequence Diagram

sequenceDiagram
    participant Dispatcher as Dispatcher
    participant Executor as Executor (worker)
    participant Shim as ExecutorToolShim
    participant LogPub as LogPublisher
    participant DB as StateStore/TaskState

    Dispatcher->>DB: read log_events_id
    Dispatcher->>Executor: dispatch task with ExecutionContext(log_events_id)
    Executor->>Shim: stream_log(message)
    alt execution_id & organization_id present
      Shim->>LogPub: publish(workflow_log, channel_id = log_events_id or execution_id)
    end
    alt log_events_id present
      Shim->>LogPub: publish(progress_event, channel_id = log_events_id)
    end
    Executor->>Shim: log_adapter_once(kind, adapter_id, adapter)
    Shim->>LogPub: stream_log("Using {kind}: `label`") [only first time per adapter_id]

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately identifies the main fix: restoring log_events_id dispatch and enabling log persistence without UI subscribers.
Description check	✅ Passed	The description comprehensively covers all template sections with specific details about changes, rationale, testing strategy, and regression risk assessment.
Docstring Coverage	✅ Passed	Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/tool-log-dispatch-regression

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-05-13T06:46:25Z

Greptile Summary

This PR fixes a regression from #1914 where two new ExecutionContext dispatch sites in structure_tool_task.py omitted log_events_id, causing tool-run logs to never reach the workflow-execution-logs UI. It also corrects ExecutorToolShim.stream_log to no longer short-circuit on missing log_events_id, instead publishing LOG payloads via execution_id as a fallback so API deployments (no websocket subscriber) still persist rows to execution_log.

structure_tool_task.py: restores log_events_id=log_events_id on the agentic_table and structure_pipeline ExecutionContext constructions, matching the existing agentic_extraction dispatch.
executor_tool_shim.py: gates only the PROGRESS publish on log_events_id; the LOG payload now falls back to execution_id as channel and adds a new log_adapter_once helper for dedup-safe adapter identity surfacing.
legacy_executor.py / index.py: consolidates verbose multi-step log messages into cleaner single-line summaries and wires log_adapter_once for LLM/Embedding/VectorDB identities.

Confidence Score: 5/5

Safe to merge — the core fix is a targeted two-line restoration of a missing kwarg and a well-scoped restructure of the publish guard; no execution logic changes.

Both changed dispatch sites in structure_tool_task.py add back a previously-present kwarg, the stream_log restructure is a straightforward guard reorder, and the new log_adapter_once helper is additive. The LOG channel fallback to execution_id mirrors existing backend behaviour and is guarded by the same execution_id + organization_id check that already protected DB writes.

No files require special attention.

Important Files Changed

Filename	Overview
workers/file_processing/structure_tool_task.py	Restores missing log_events_id kwarg on two ExecutionContext dispatches (agentic_table, structure_pipeline), fixing the core regression.
workers/executor/executor_tool_shim.py	Moves PROGRESS publish behind log_events_id guard; LOG payload falls back to execution_id channel; adds log_adapter_once dedup helper. Logic is sound.
workers/executor/executors/legacy_executor.py	Consolidates noisy step logs into single-line summaries; adds log_adapter_once calls; llm.get_model_name() in summarize path is called without None-fallback, producing backtick-wrapped 'None' if model name is unset.
workers/executor/executors/index.py	Removes two verbose intermediate stream_log calls; clean-up only, no logic change.

Sequence Diagram

sequenceDiagram
    participant STT as structure_tool_task
    participant D as Dispatcher
    participant AT as agentic_table executor
    participant LE as legacy executor (structure_pipeline)
    participant ETS as ExecutorToolShim.stream_log
    participant LP as LogPublisher

    STT->>D: "dispatch(at_ctx, log_events_id=log_events_id)"
    D->>AT: execute(context)
    AT->>ETS: stream_log(msg)
    alt log_events_id present
        ETS->>LP: "publish(channel=log_events_id, PROGRESS)"
    end
    ETS->>LP: "publish(channel=log_events_id OR execution_id, LOG)"
    LP-->>ETS: persisted to execution_log

    STT->>D: "dispatch(pipeline_ctx, log_events_id=log_events_id)"
    D->>LE: execute(context)
    LE->>ETS: stream_log(Run config)
    alt log_events_id present
        ETS->>LP: "publish(channel=log_events_id, PROGRESS)"
    end
    ETS->>LP: "publish(channel=log_events_id OR execution_id, LOG)"
    LE->>ETS: stream_log(Pipeline completed N/M)
    ETS->>LP: "publish(channel=log_events_id OR execution_id, LOG)"

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
workers/executor/executors/legacy_executor.py:2298-2300
`llm.get_model_name()` is called without a None/empty fallback here. If the adapter returns `None` or an empty string, the log line will display `` `None` `` in the UI. The `log_adapter_once` helper already handles this gracefully with `get_model() or adapter_id`; the same pattern should be applied here.

```suggestion
            model_label = llm.get_model_name() or llm_adapter_id
            shim.stream_log(
                f"Summarizing extracted text using LLM: `{model_label}`"
            )
```

_{Reviews (3): Last reviewed commit: "UN-3431 [MISC] Improve tool-run log narr..." | Re-trigger Greptile}

Reshape the per-run shim.stream_log emissions so the workflow logs UI reads as a per-phase narrative with one start, one end, and adapter identity surfaced exactly once per unique adapter: - Add a non-sensitive run-config preamble at the top of _handle_structure_pipeline: prompt count + single_pass / summarize / challenge flags. No prompt names or text are logged. - Introduce ExecutorToolShim.log_adapter_once(kind, adapter_id, adapter) with a per-shim dedup set so "Using LLM/Embedding/Vector DB: `<model>`" appears at most once per unique adapter id. Used from _initialize_adapters, _handle_index, and the summarize path. - Drop intermediate / redundant lines that did not add information on their own: "Initializing text extractor", "Using text extractor" (rolled into the start line), "Extracting text from document", "Saving extraction metadata", "Initialized embedding and vector DB adapters", "Indexing file", "Adding nodes to vector db". - Collapse the index-status trio ("Document already indexed, re-indexing" + "Indexing document for the first time" + "Indexing document into vector store") into a single "Indexing document" / "Re-indexing document" line driven by doc_id_found. - Gate "Retrieving context for" and "Retrieved N chunks via RAG for" on chunk_size > 0 so single-pass / full-context paths do not emit a misleading retrieval line. - Combine summarize start into one line that names the LLM model. - Wrap dynamic identifiers (adapter labels, extractor class, prompt names) in backticks; drop trailing "..." across all stream_log emissions. - Emit a final "Pipeline completed: N/M prompts answered" with a non-null count from structured_output[OUTPUT]. Pairs with the cloud-side log cleanup PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-13T08:52:54Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-05-13T08:53:14Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 325 passed, 0 failed (325 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

chandrasekharan-zipstack self-assigned this May 13, 2026

chandrasekharan-zipstack marked this pull request as ready for review May 13, 2026 06:43

chandrasekharan-zipstack requested review from athul-rs, harini-venkataraman and muhammad-ali-e May 13, 2026 06:43

athul-rs approved these changes May 13, 2026

View reviewed changes

vishnuszipstack approved these changes May 13, 2026

View reviewed changes

vishnuszipstack and others added 2 commits May 13, 2026 14:16

Merge branch 'main' into fix/tool-log-dispatch-regression

0561565

chandrasekharan-zipstack merged commit a65d017 into main May 13, 2026
8 checks passed

chandrasekharan-zipstack deleted the fix/tool-log-dispatch-regression branch May 13, 2026 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UN-3431 [FIX] Restore log_events_id on tool-run dispatch and persist without UI subscriber#1960

UN-3431 [FIX] Restore log_events_id on tool-run dispatch and persist without UI subscriber#1960
chandrasekharan-zipstack merged 3 commits into
mainfrom
fix/tool-log-dispatch-regression

chandrasekharan-zipstack commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 13, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

sonarqubecloud Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chandrasekharan-zipstack commented May 13, 2026

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

greptile-apps Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

sonarqubecloud Bot commented May 13, 2026

Quality Gate passed

Uh oh!

github-actions Bot commented May 13, 2026

Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 13, 2026 •

edited

Loading

greptile-apps Bot commented May 13, 2026 •

edited

Loading