Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 3, 2025

Addresses PR review feedback on the embedding reconciliation feature: broken uptime calculation, brittle sparse detection, overly broad exception handling, and incorrect vector type handling.

Changes

Health Endpoint Uptime

  • Use monotonic time instead of wall clock to prevent clock drift/skew issues
  • HealthService now accepts startup_stopwatch (from time.monotonic()) instead of startup_time
  • Removed broken computed field approach; uptime_seconds is now passed explicitly

Sparse Provider Detection

  • Replace string-based class name checking with isinstance(self, SparseEmbeddingProvider)

Exception Handling

  • Narrow from except Exception to specific types: ProviderError, IndexingError, ConnectionError, TimeoutError, OSError
  • Add contextual logging: collection name, file counts, provider info

Vector Type Normalization

  • Handle Mapping types (e.g., Qdrant's NamedVectors) correctly
  • Properly detect existing vectors from dict, Mapping, list, or None representations
# Before: would miss NamedVectors objects
if not isinstance(existing_vectors, dict):
    existing_vectors = {"": existing_vectors} if existing_vectors else {}

# After: handles all Mapping types
from collections.abc import Mapping
if isinstance(existing_vectors, Mapping):
    existing_vector_names = set(existing_vectors.keys())
elif existing_vectors:
    existing_vector_names = {""}  # Single unnamed dense vector
else:
    existing_vector_names = set()

Tests

  • Added edge case tests: list vectors, empty results, missing payload, single-provider configs
  • Marked complex prime_index integration tests appropriately

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…, vector normalization, and tests

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

✅ CLA Check Passed

All contributors to this PR are exempt from the CLA requirement:

  • ✅ Copilot (bot)

Knitli organization members and recognized bots do not need to sign the CLA.

Thanks for your contribution to codeweaver! 🧵

Copilot AI and others added 2 commits December 3, 2025 16:31
Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
… comments

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 3, 2025

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cas-server.xethub.hf.co
    • Triggering command: .venv/bin/pytest .venv/bin/pytest tests/unit/test_indexer_reconciliation.py -v (dns block)
    • Triggering command: .venv/bin/pytest .venv/bin/pytest tests/unit/test_indexer_reconciliation.py -v --no-cov (dns block)
    • Triggering command: .venv/bin/pytest .venv/bin/pytest tests/unit/test_indexer_reconciliation.py -v --no-cov -x (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title [WIP] Fix automatic embedding reconciliation in indexer fix: health endpoint uptime, sparse provider detection, exception handling, vector normalization Dec 3, 2025
Copilot AI requested a review from bashandbone December 3, 2025 16:40
@bashandbone bashandbone marked this pull request as ready for review December 3, 2025 16:47
Copilot AI review requested due to automatic review settings December 3, 2025 16:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses key issues in the embedding reconciliation feature including uptime calculation, provider detection, exception handling, and vector type handling. The changes improve reliability and maintainability of the reconciliation process.

  • Uses monotonic time for health endpoint uptime to prevent clock drift/skew issues
  • Replaces brittle string-based sparse provider detection with robust isinstance() checks
  • Narrows exception handling from broad Exception to specific types with enhanced logging context
  • Improves vector type normalization to handle Qdrant's Mapping types (e.g., NamedVectors)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/codeweaver/server/server.py Updates health service initialization to pass startup_stopwatch instead of startup_time
src/codeweaver/server/health/health_service.py Changes constructor to accept startup_stopwatch and uses monotonic time for uptime calculation
src/codeweaver/server/health/models.py Removes computed field approach for uptime, replaces with explicit field passed to constructor
src/codeweaver/providers/embedding/providers/base.py Replaces string-based class name checking with isinstance(self, SparseEmbeddingProvider)
src/codeweaver/engine/indexer/indexer.py Narrows exception handling to specific types (ProviderError, IndexingError, ConnectionError, TimeoutError, OSError) with enhanced contextual logging; improves vector type detection to handle Mapping types correctly
tests/unit/test_indexer_reconciliation.py Adds comprehensive edge case tests (list vectors, empty results, missing payload, single-provider configs) and exception handling tests; marks complex integration tests appropriately

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bashandbone bashandbone merged commit 029a852 into fix-sparse-retention Dec 3, 2025
14 checks passed
@bashandbone bashandbone deleted the copilot/sub-pr-188 branch December 3, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants