Skip to content

Merge/refactor with tests#161

Open
howethomas wants to merge 15 commits into
mainfrom
merge/refactor-with-tests
Open

Merge/refactor with tests#161
howethomas wants to merge 15 commits into
mainfrom
merge/refactor-with-tests

Conversation

@howethomas
Copy link
Copy Markdown
Contributor

Summary

This PR integrates the refactor work with the saved test-improvement branch and brings the combined branch back to a green state.

It pulls together:

  • the refactor/speckit-compliance-and-redis-abstraction changes
  • the tests/improve-coverage-from-main coverage work
  • a small follow-up fix so merged-branch tests match the refactor behavior

What changed

  • merged latest main into the refactor-based integration branch
  • merged tests/improve-coverage-from-main into that branch
  • kept the original source branches intact and pushable
  • resolved the post-merge test mismatches in:
    • conserver/links/expire_vcon/test_expire_vcon.py
    • conserver/links/openai_transcribe/tests/test_openai_transcribe_unit.py
    • tests/core/test_vcon_redis.py

Coverage and test impact

The merged branch includes the deterministic coverage work that raised suite coverage to 81% on the test branch, including new or expanded tests around:

  • API retrieval, ingress, search, DLQ, and indexing paths
  • storage helpers and Milvus branch behavior
  • OpenAI transcription helpers and control flow
  • WTF transcription helper and run-path behavior
  • Deepgram helper and unit coverage
  • SCITT helper and receipt-verification coverage
  • DataTrails link behavior and edge cases

Validation

Ran the full suite on the merge branch:

  • uv run pytest --disable-warnings
  • Result: 552 passed, 2 skipped, 1 warning

howethomas and others added 15 commits May 10, 2026 16:30
Add common/lib/queue.py with a single chokepoint for the Redis queue
patterns the conserver actually uses (LPUSH/BLPOP/RPUSH/LLEN, DLQ
naming, vCon-key TTL). Migrate every caller off raw redis_mgr.redis
access for queue ops: main.py worker loop, follower.py, the
expire_vcon link, and the tag_router link.

Redis stays the implementation; the class wraps a redis client and
defaults to the shared one. context_utils still receives the raw
client because it needs JSON ops that live outside the queue surface.

The tag_router test fixture is reworked to patch VconQueue instead of
redis_mgr.redis; the legacy mock_redis.rpush.assert_* assertions keep
working by aliasing route_to to the same child mock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The vcon-server code imports the upstream ``vcon`` package across links,
storage backends, and tests, but it was not declared in pyproject.toml.
Pin it explicitly as a core dependency.

0.9.2 fixes three vcon-speckit non-negotiables for the conserver:
- build_new() now emits "vcon": "0.4.0" per draft-ietf-vcon-vcon-core-02
- build_new() no longer initializes empty group/redacted defaults
- new add_wtf_transcription_analysis() helper places WTF transcripts in
  analysis[] (the speckit-mandated location)

These remove the need for per-link post-processing of the vCon dict on
every write path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add lib/vcon_compat.normalize_legacy_fields and call it from
VconRedis.get_vcon and get_vcon_dict so links always see spec-compliant
data even when older writers left legacy names in storage.

Mappings (per draft-ietf-vcon-vcon-core-02 and vcon-speckit):
  appended       -> amended           (top-level)
  must_support   -> critical          (top-level and per-entry)
  schema_version -> schema            (analysis, dialog)
  mimetype       -> mediatype         (dialog, attachments)
  type           -> purpose           (attachments, only when purpose unset)

The attachment ``type`` -> ``purpose`` migration is conservative:
lawful_basis attachments per draft-howe-vcon-lawful-basis legitimately
carry ``type: "lawful_basis"`` as the purpose value, so we never
overwrite an existing ``purpose`` field.

10 tests cover each rename plus the lawful_basis exception and the
empty-vcon no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add VconRedis._enforce_spec_on_write and call it from every
store_vcon / store_vcon_dict / *_async path. Defensive cleanup
even though vcon-lib 0.9.2 produces clean output from build_new():
a Vcon(legacy_dict) round-trip can still surface stale defaults.

Enforced on write:
- top-level "vcon": "0.4.0" (draft-ietf-vcon-vcon-core-02 §4.1.1)
- drop empty group: [] (speckit: reserved, must not be emitted empty)
- drop empty redacted: {} (speckit: omit when not redacted)

Non-empty group/redacted are preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The speckit non-negotiable for analysis/attachment bodies is that the
body is a string, with ``encoding: "json"`` when content is JSON.
vcon-lib's ``add_analysis`` currently passes dict/list bodies through
verbatim with ``encoding: "none"``, which we see in conserver/links/scitt
(receipt list) and any link that hands a dict to add_analysis.

Extend VconRedis._enforce_spec_on_write to detect dict/list bodies in
analysis[] and attachments[], serialize them via json.dumps, and set
encoding to "json". String bodies are untouched.

Adds 4 more tests covering dict body, list body, string passthrough,
and attachment body. Full suite (api+conserver+links): 216 passed,
24 skipped, zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
links.transcribe is now a vendor dispatcher. Set ``options.vendor`` to
one of openai / groq / hugging_face / deepgram / whisper_builtin and
the call is delegated to the corresponding implementation. Omitting
``vendor`` falls back to ``whisper_builtin``, preserving the prior
behavior of this link (Vcon.transcribe via vcon-lib).

The four vendor-specific link modules (openai_transcribe, groq_whisper,
hugging_face_whisper, deepgram_link) are kept on disk and now carry
"DEPRECATED" docstrings. ``conserver/main.py`` gains a
``_resolve_link_module_and_options`` helper that rewrites legacy
``module: links.<vendor>_*`` config entries to ``module: links.transcribe``
with the inferred ``vendor`` injected, logging a one-time deprecation
warning per legacy module. Existing chain configs keep working with no
changes.

wtf_transcribe is intentionally NOT merged into this dispatcher: it
emits a different analysis schema (draft-howe-vcon-wtf-extension) and
has different consumer semantics.

Tests:
- conserver/links/transcribe/test_transcribe_dispatch.py: 7 tests
  cover dispatch to each vendor, unknown-vendor error, default vendor
  fallback, and whisper_builtin store/no-store paths
- common/tests/test_legacy_link_aliases.py: 8 tests cover passthrough,
  each legacy module rewrite, dispatcher-shaped options respected, and
  the None-options edge case

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- diet/__init__.py: drop the module-level "MDO THIS SHOULD PRINT"
  logger.info that was committed by accident.
- post_analysis_to_slack/__init__.py: replace the bare TODO with a
  note explaining why lib.links.filters.is_included can't be swapped
  in without a breaking option-schema change (per-analysis loop vs
  per-vCon filter; mismatched only_if shape).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diet was the last link still importing redis_mgr.redis directly for
its in-place dict manipulation (redis.json().get/set). Switch to
VconRedis.get_vcon_dict / store_vcon_dict so:

- the read-side legacy-field normalizer runs before diet processes
- the write-side spec-enforce step (vcon syntax param, empty
  group/redacted, body stringification) runs on every diet write

Tests: drop the @patch('links.diet.redis') decorator across 11 cases,
target VconRedis instead, and add a small _bridge_vcon_redis helper
that wires get_vcon_dict / store_vcon_dict side-effects back to the
legacy ``mock_json.set/get`` MagicMock so the existing assertions
(args[2] access, set/get call expectations) keep working untouched.

After this commit, no link or storage module (other than the redis
storage backend itself, which is the one allowed exception) imports
redis_mgr.redis directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Module-level ``pytest.mark.skipif(not os.environ.get("FOO_KEY"))``
gates (deepgram_link, groq_whisper, etc.) evaluate at collection time,
before conftest.py's autouse ``load_dotenv(".env.test")`` fixture
fires. Result: credentials in .env.test were invisible to those gates
and their tests skipped even when the key was set.

Add ``env_files = .env.test`` to pytest.ini. pytest-dotenv is already
in the dev group; it loads the file pre-collection, so the skipif
gates see the env vars. With DEEPGRAM_KEY set in .env.test the
deepgram suite moves from 10 skipped to 9 passed + 1 failed (a stale
integration-fixture URL, not related to this refactor).

conftest.py's load_dotenv is left in place — redundant but harmless.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.env.test was tracked in git with a fake CONSERVER_API_TOKEN and a
test DEEPGRAM_KEY. Once pytest.ini started loading it pre-collection
via pytest-dotenv, the file became the natural place to drop real
OPENAI_API_KEY / GROQ_API_KEY etc. to unlock integration tests — and
the tracked status made an accidental ``git add`` of real credentials
just one slip away.

Untrack the file (working-tree copy preserved), gitignore it, and
add .env.test.example as the tracked template with all the gate
variables documented and empty.

Any developer who already has credentials in .env.test keeps them
locally untouched; new contributors copy .env.test.example to
.env.test and fill in their own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_deepgram_integration_real_api pointed at
``raw.githubusercontent.com/.../server/links/hugging_face_whisper/...wav``
but the repo was reorganized to ``conserver/links/...`` at some
point — the fixture file is still committed, the URL just had the
wrong path. GitHub raw returned 404, Deepgram bubbled it up as
``DeepgramApiError: The remote server hosting the media returned a
client error: 404 Not Found``.

Update the path segment to ``conserver/links/...``. With a real
DEEPGRAM_KEY set, the test now passes end-to-end against the real
Deepgram API in ~5s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add focused unit coverage for API, storage, transcription, and SCITT paths so the suite clears the 80% target while keeping CI coverage reporting and flaky test stabilization in place.

Co-authored-by: Cursor <cursoragent@cursor.com>
Update the post-merge test expectations to match the queue abstraction, ffmpeg import shape, and legacy-field normalization so the integration branch stays green after combining refactor and coverage work.

Co-authored-by: Cursor <cursoragent@cursor.com>
@howethomas howethomas requested a review from pavanputhra May 11, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant