Wrapper gaps: dead duplicate framework runners, divergent async scheduler, unsafe shared singletons in `src/praisonai/praisonai`

## Scope

In-depth analysis of `src/praisonai/praisonai` (the wrapper layer). I focused only on key architectural gaps — not docs, tests, coverage, file sizes, or line counts.

After a full pass over the wrapper, three gaps stand out because they actively violate the project's stated philosophy ("simpler · DRY · multi-agent safe · async-safe · no global singletons · protocol-driven"). Each one is reproducible from the current `main`, has been read end-to-end, and the proposed fix is concrete.

---

## Gap 1 — `agents_generator.py` carries ~470 lines of dead, duplicate, broken framework-runner methods

### Where

`src/praisonai/praisonai/agents_generator.py`

Five private methods defined on `AgentsGenerator`:

| Method | Lines |
|---|---|
| `_run_autogen` | 673–751 |
| `_run_autogen_v4` | 753–859 |
| `_run_ag2` | 861–1005 |
| `_run_crewai` | 1006–1143 |
| `_run_praisonai` | 1144–EOF |

### Why this is a problem

1. **They are never called.** The execution path is `generate_crew_and_kickoff` → `self.framework_adapter.run(...)` at line 607:
   ```python
   # agents_generator.py:602-615
   from .framework_adapters.validators import assert_framework_available
   assert_framework_available(framework)
   self.logger.info(f"Using framework: {framework}")
   return self.framework_adapter.run(
       config,
       self.config_list,
       topic,
       tools_dict=tools_dict,
       agent_callback=getattr(self, 'agent_callback', None),
       task_callback=getattr(self, 'task_callback', None),
       cli_config=getattr(self, 'cli_config', None),
   )
   ```
   A repo-wide grep confirms no caller for `_run_autogen`, `_run_autogen_v4`, `_run_ag2`, `_run_crewai`, or `_run_praisonai` anywhere in `src/`. The only matching hit is the unrelated `_run_praisonai_agents` in `jobs/executor.py`.

2. **They cannot run even if called** — names are not imported in this module:
   ```python
   # agents_generator.py:688 (inside _run_autogen)
   user_proxy = autogen.UserProxyAgent(   # NameError: 'autogen' is not defined
       name="User",
       ...
   )
   ```
   ```python
   # agents_generator.py:770 (inside _run_autogen_v4)
   model_client = OpenAIChatCompletionClient(   # NameError
       model=model_config.get('model', 'gpt-5-nano'),
       ...
   )
   # 796: AutoGenV4AssistantAgent — undefined
   # 815: TextMentionTermination — undefined
   # 816: MaxMessageTermination  — undefined
   # 820: RoundRobinGroupChat    — undefined
   ```

3. **They duplicate the canonical adapter implementations.** Compare `_run_autogen` (lines 673–751) to `framework_adapters/autogen_adapter.py:29–100`:
   ```python
   # framework_adapters/autogen_adapter.py:55-89  (the real, working impl)
   import autogen
   user_proxy = autogen.UserProxyAgent(
       name="User",
       human_input_mode="NEVER",
       is_termination_msg=lambda x: ...,
       code_execution_config={"work_dir": "coding", "use_docker": False},
   )
   ...
   agents[role] = autogen.AssistantAgent(
       name=agent_name,
       llm_config=llm_config_dict,
       system_message=...,
   )
   ```
   This is character-for-character the same code as `agents_generator._run_autogen`, except the adapter version actually imports `autogen` and is wired into the registry. The same pattern holds for v0.4, AG2, CrewAI, and PraisonAI.

This directly contradicts the philosophy: *"DRY: reuse abstractions, no duplication"* and *"Protocol-driven core: protocols/hooks in core; heavy impl in wrapper/tools"*. Today, anyone fixing a CrewAI bug has to guess which copy is live (the adapter), and the dead copy slowly drifts and rots.

### Fix

Delete the five methods. No call sites, no public API impact, no shim needed. Concretely, in `src/praisonai/praisonai/agents_generator.py`:

```python
# DELETE: lines 673-EOF (the five _run_* methods)
# KEEP: generate_crew_and_kickoff(...) which already delegates to
#       self.framework_adapter.run(...) at line 607
```

That removes ~470 lines of dead code, eliminates the drift hazard, and lets `agents_generator.py` shrink to what it actually does: parse the YAML, validate, pick the adapter, hand off.

---

## Gap 2 — `async_agent_scheduler.py` claims to be a "backward-compatible re-export" but is a full duplicate implementation with silent feature drift

### Where

- `src/praisonai/praisonai/async_agent_scheduler.py` (424 lines — async)
- `src/praisonai/praisonai/scheduler/agent_scheduler.py` (554 lines — sync, canonical)
- `src/praisonai/praisonai/scheduler/shared.py` (shared helpers)

### Why this is a problem

The header of `async_agent_scheduler.py` advertises itself as a thin shim:

```python
# async_agent_scheduler.py:1-15
"""Backward-compatible re-export. Prefer `praisonai.scheduler`.

This module is deprecated. Use the canonical implementation in the
scheduler package for full functionality including async support.
"""
import warnings
warnings.warn(
    "praisonai.async_agent_scheduler is pending deprecation; it will be moved to "
    "praisonai.scheduler.async_agent_scheduler in a future release. ...",
    PendingDeprecationWarning, stacklevel=2,
)

# TODO: Once AsyncAgentScheduler is moved to scheduler package, import from there
# For now, re-export the existing implementation to avoid breaking changes
from .scheduler.shared import ScheduleParser, backoff_delay, safe_call
```

…but everything after that `TODO` is a **full, independent implementation** of `AsyncAgentExecutorInterface`, `AsyncPraisonAgentExecutor`, and `AsyncAgentScheduler` (lines 30–425). Meanwhile the sync `AgentScheduler` lives in `scheduler/agent_scheduler.py`. The two share `ScheduleParser` / `backoff_delay`, and nothing else.

That has already produced silent feature drift. The async scheduler is missing features the sync one ships today:

| Feature (sync, `scheduler/agent_scheduler.py`) | Async (`async_agent_scheduler.py`) |
|---|---|
| `timeout` per-execution kill switch (lines 44, 244–259) | ❌ absent |
| `max_cost` budget guard (lines 45, 216–220) | ❌ absent |
| Cost tracking (`self._total_cost`, lines 269–272) | ❌ absent |
| `_update_state_if_daemon()` (lines 175–210) — daemon state file updates | ❌ absent |
| `from_yaml(...)` constructor (lines 337–419) | ❌ absent |
| `from_recipe(...)` constructor (lines 441–535) | ❌ absent |
| `start_from_yaml_config()` (lines 421–438) | ❌ absent |

So `praisonai.AgentScheduler` and `praisonai.AsyncAgentScheduler` look symmetric in the public surface, but the async path silently drops budget protection, recipe support, YAML support, daemon state, and timeouts. The retry/execute loops are also separately written:

```python
# sync — scheduler/agent_scheduler.py:233-309
def _execute_with_retry(self, max_retries: int):
    self._execution_count += 1
    ...
    for attempt in range(max_retries):
        try:
            if self.timeout:
                signal.alarm(self.timeout)
                result = self._executor.execute(self.task)
                ...
            self._total_cost += estimated_cost
            self._success_count += 1
            ...
        except Exception as e:
            ...
            wait_time = backoff_delay(attempt, initial=30.0, cap=300.0)
            if self._stop_event.wait(wait_time):
                return
```

```python
# async — async_agent_scheduler.py:353-390
async def _execute_with_retry(self, max_retries: int):
    self._ensure_async_primitives()
    async with self._stats_lock:
        self._execution_count += 1
    ...
    for attempt in range(max_retries):
        try:
            result = await self._executor.execute(self.task)
            async with self._stats_lock:
                self._success_count += 1
            safe_call(self.on_success, result)
            return
        except Exception as e:
            last_exc = e
            if attempt < max_retries - 1:
                wait_time = backoff_delay(attempt)
                await asyncio.sleep(wait_time)
```

Two implementations of the same retry/stats lifecycle, differing only in `await` vs blocking calls. This is exactly what the philosophy calls out: *"Multi-agent + async safe by default"* and *"DRY: reuse abstractions"* — the wrapper should have **one** scheduler that exposes both sync and async entry points over a shared core.

### Fix

1. Move the file to its canonical home and complete the in-flight migration the TODO calls out:
   - Create `src/praisonai/praisonai/scheduler/async_agent_scheduler.py` containing the class.
   - Reduce the root `async_agent_scheduler.py` to a thin re-export, matching the existing pattern in `agent_scheduler.py:1-18`:
     ```python
     # src/praisonai/praisonai/async_agent_scheduler.py  (after fix)
     """Backward-compatible re-export. Prefer `praisonai.scheduler`."""
     import warnings
     warnings.warn(
         "praisonai.async_agent_scheduler is deprecated; "
         "use 'from praisonai.scheduler import AsyncAgentScheduler' instead.",
         DeprecationWarning, stacklevel=2,
     )
     from praisonai.scheduler.async_agent_scheduler import (  # noqa: F401
         AsyncAgentScheduler, AsyncPraisonAgentExecutor, create_async_agent_scheduler,
     )
     ```

2. Factor the execute/retry/stats lifecycle into a shared mixin in `scheduler/shared.py` so the two classes diverge only on `await` boundaries. Sketch:
   ```python
   # scheduler/shared.py  (new)
   class _ScheduleCore:
       """Shared lifecycle: stats, budget, timeout, retry policy."""
       def _begin_attempt(self): self._execution_count += 1
       def _on_success(self, result):
           self._success_count += 1
           self._total_cost += self._estimate_cost(result)
           safe_call(self.on_success, result)
       def _on_failure(self, exc):
           self._failure_count += 1
           safe_call(self.on_failure, exc)
       def _budget_exhausted(self) -> bool:
           return bool(self.max_cost and self._total_cost >= self.max_cost)
   ```
   Then `AgentScheduler` provides sync `_execute_with_retry` and `AsyncAgentScheduler` provides async `_execute_with_retry`, both calling into `_ScheduleCore`. Port `timeout`, `max_cost`, `from_yaml`, `from_recipe`, and `_update_state_if_daemon` into the shared layer so the async path picks them up for free.

---

## Gap 3 — Process-level mutable singletons in `interactive_tools.py` and `tool_resolver.py` break multi-agent and multi-project use

The philosophy is explicit: *"no global singletons, no heavy module-level work"* and *"multi-agent safe by default"*. Two concrete violations in the wrapper today.

### 3a. `_shared_runtime` is global, unlocked, and shared across all concurrent agents

`src/praisonai/praisonai/cli/features/interactive_tools.py:196-253`

```python
# ── Shared runtime singleton ────────────────────────────────────────────────
# ACP and LSP tools share ONE InteractiveRuntime to avoid duplicate LSP servers.
_shared_runtime = None
_shared_agent_tools = None


def _get_shared_runtime(config: ToolConfig):
    """Get or create a shared InteractiveRuntime instance."""
    global _shared_runtime, _shared_agent_tools

    if _shared_runtime is not None and _shared_agent_tools is not None:
        return _shared_runtime, _shared_agent_tools

    from .interactive_runtime import InteractiveRuntime, RuntimeConfig
    from .agent_tools import create_agent_centric_tools

    runtime_config = RuntimeConfig(
        workspace=config.workspace,
        lsp_enabled=config.lsp_enabled,
        acp_enabled=config.acp_enabled,
        approval_mode=config.approval_mode,
    )

    _shared_runtime = InteractiveRuntime(runtime_config)
    _shared_agent_tools = create_agent_centric_tools(_shared_runtime)
    ...
```

Two real problems:

1. **No lock.** Two agents instantiating tools concurrently both pass the `is not None` check, both call `InteractiveRuntime(...)`, the second overwrites the first — leaking an LSP server process and an ACP session that will never be stopped because `cleanup_runtime` only sees the latest reference.

2. **Config is captured on first call and silently ignored after.** Agent A calls with `workspace="/foo"`, `approval_mode="auto"`. Agent B later calls with `workspace="/bar"`, `approval_mode="manual"` — B gets back A's runtime, bound to `/foo` and `auto`, with no warning. The `config` argument becomes a lie for everyone after the first caller.

### 3b. `_default_resolver` is anchored to the CWD at first call and never invalidated

`src/praisonai/praisonai/tool_resolver.py:458-477`

```python
# Process-level lazy singleton for performance (matches profiler.py pattern)
_default_resolver: Optional[ToolResolver] = None
_default_resolver_lock = threading.Lock()

def _get_default_resolver() -> ToolResolver:
    """Process-default ToolResolver (double-checked lazy init).

    Returns cached ToolResolver that is anchored to the working directory
    at first call. Local tools.py resolution is CWD-dependent and cached
    for the lifetime of the process.

    For test isolation or multi-project CLIs, create explicit resolver
    instances instead of using this cached default.
    """
    global _default_resolver
    if _default_resolver is None:
        with _default_resolver_lock:
            if _default_resolver is None:
                _default_resolver = ToolResolver()
    return _default_resolver
```

`ToolResolver.__init__` defaults `tools_py_path` to the literal string `"tools.py"` (resolver line 62) and caches whatever it loaded on first access (lines 78–114). In any long-lived process — daemon, IDE plugin, MCP server, Jupyter session, the SDK as a library — the first `resolve_tool("foo")` call locks in *that* CWD's `tools.py`. Switch projects, change directory, run a second `praisonai` command in the same Python process, and you still get the original project's tools. The docstring acknowledges the hazard but recommends "create explicit resolver instances" — which nothing else in the wrapper does, so all internal call sites silently inherit the bug.

### Fix

For both: replace process-level globals with **`contextvars.ContextVar`** so each agent / task / request gets its own value, and the singleton becomes the explicit fallback only at the CLI edge.

`interactive_tools.py`:

```python
# Replace lines 196-253 with:
import contextvars
import threading

_runtime_var: contextvars.ContextVar = contextvars.ContextVar("interactive_runtime", default=None)
_runtime_lock = threading.Lock()  # only guards the first creation per context

def _get_shared_runtime(config: ToolConfig):
    bundle = _runtime_var.get()
    if bundle is not None:
        return bundle
    with _runtime_lock:
        bundle = _runtime_var.get()
        if bundle is None:
            from .interactive_runtime import InteractiveRuntime, RuntimeConfig
            from .agent_tools import create_agent_centric_tools
            runtime = InteractiveRuntime(RuntimeConfig(
                workspace=config.workspace,
                lsp_enabled=config.lsp_enabled,
                acp_enabled=config.acp_enabled,
                approval_mode=config.approval_mode,
            ))
            bundle = (runtime, create_agent_centric_tools(runtime))
            _runtime_var.set(bundle)
    return bundle

def cleanup_runtime():
    bundle = _runtime_var.get()
    if bundle is None:
        return
    runtime, _ = bundle
    # ... existing shutdown logic, then:
    _runtime_var.set(None)
```

Concurrent agents now each get their own runtime when running under their own `contextvars.copy_context()`; the `config` argument is honoured on every first-in-context call instead of being silently dropped.

`tool_resolver.py`:

```python
# Replace lines 458-477 with:
import contextvars

_resolver_var: contextvars.ContextVar[Optional[ToolResolver]] = contextvars.ContextVar(
    "tool_resolver", default=None,
)

def _get_default_resolver() -> ToolResolver:
    """Per-context resolver. Falls back to a fresh ToolResolver per context,
    so changing CWD between agents / requests is honoured."""
    resolver = _resolver_var.get()
    if resolver is None:
        resolver = ToolResolver()
        _resolver_var.set(resolver)
    return resolver

def reset_default_resolver() -> None:
    """Explicit invalidation hook for daemons / IDE plugins switching projects."""
    _resolver_var.set(None)
```

This keeps the fast path (cache once per context) but stops binding the entire process to whatever CWD the first caller happened to be in. Callers that already pass an explicit `resolver=` argument (the `resolve_tool` / `resolve_tools` / `has_tool` / `validate_yaml_tools` family at lines 481–547) are unaffected.

---

## Why these three

- **Gap 1** is *dead* — fixing it is pure removal, zero risk, and immediately stops new code from being added to the wrong copy.
- **Gap 2** is *drift-in-progress* — the sync scheduler is gaining features (`max_cost`, `from_recipe`, daemon state) the async one quietly lacks, so the longer it sits, the more the two diverge.
- **Gap 3** is *latent-but-load-bearing* — the singletons work fine for one-shot CLI runs (which is most of the test surface) but quietly break the exact deployment shape the project is moving toward: long-lived agents, multi-project daemons, MCP/UI servers running many concurrent sessions.

All three are scoped to `src/praisonai/praisonai`. Happy to send PRs for any/all of these if useful.

Feature (sync, `scheduler/agent_scheduler.py`)	Async (`async_agent_scheduler.py`)
`timeout` per-execution kill switch (lines 44, 244–259)	❌ absent
`max_cost` budget guard (lines 45, 216–220)	❌ absent
Cost tracking (`self._total_cost`, lines 269–272)	❌ absent
`_update_state_if_daemon()` (lines 175–210) — daemon state file updates	❌ absent
`from_yaml(...)` constructor (lines 337–419)	❌ absent
`from_recipe(...)` constructor (lines 441–535)	❌ absent
`start_from_yaml_config()` (lines 421–438)	❌ absent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrapper gaps: dead duplicate framework runners, divergent async scheduler, unsafe shared singletons in `src/praisonai/praisonai` #1722

Scope

Gap 1 — `agents_generator.py` carries ~470 lines of dead, duplicate, broken framework-runner methods

Where

Why this is a problem

Fix

Gap 2 — `async_agent_scheduler.py` claims to be a "backward-compatible re-export" but is a full duplicate implementation with silent feature drift

Where

Why this is a problem

Fix

Gap 3 — Process-level mutable singletons in `interactive_tools.py` and `tool_resolver.py` break multi-agent and multi-project use

3a. `_shared_runtime` is global, unlocked, and shared across all concurrent agents

3b. `_default_resolver` is anchored to the CWD at first call and never invalidated

Fix

Why these three

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Method	Lines
`_run_autogen`	673–751
`_run_autogen_v4`	753–859
`_run_ag2`	861–1005
`_run_crewai`	1006–1143
`_run_praisonai`	1144–EOF

Uh oh!

Wrapper gaps: dead duplicate framework runners, divergent async scheduler, unsafe shared singletons in src/praisonai/praisonai #1722

Description

Scope

Gap 1 — agents_generator.py carries ~470 lines of dead, duplicate, broken framework-runner methods

Where

Why this is a problem

Fix

Gap 2 — async_agent_scheduler.py claims to be a "backward-compatible re-export" but is a full duplicate implementation with silent feature drift

Where

Why this is a problem

Fix

Gap 3 — Process-level mutable singletons in interactive_tools.py and tool_resolver.py break multi-agent and multi-project use

3a. _shared_runtime is global, unlocked, and shared across all concurrent agents

3b. _default_resolver is anchored to the CWD at first call and never invalidated

Fix

Why these three

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Wrapper gaps: dead duplicate framework runners, divergent async scheduler, unsafe shared singletons in `src/praisonai/praisonai` #1722

Gap 1 — `agents_generator.py` carries ~470 lines of dead, duplicate, broken framework-runner methods

Gap 2 — `async_agent_scheduler.py` claims to be a "backward-compatible re-export" but is a full duplicate implementation with silent feature drift

Gap 3 — Process-level mutable singletons in `interactive_tools.py` and `tool_resolver.py` break multi-agent and multi-project use

3a. `_shared_runtime` is global, unlocked, and shared across all concurrent agents

3b. `_default_resolver` is anchored to the CWD at first call and never invalidated