Skip to content

[SPRINT-02-03] Add inventory indexing audit report #90

@Gonza10V

Description

@Gonza10V

Parent sprint: #87
Depends on: #88, #89
Recommended order: 3
Codex-ready: yes

Goal

Add an explicit inventory indexing audit report so collection indexing failures become visible and debuggable.

Background

The current indexing flow mutates indexed_plasmids, indexed_backbones, restriction_enzyme_implementations, and ligase_implementations, but it does not produce a structured summary of what was found, skipped, missing, ambiguous, or pulled. When assembly fails, it is unclear whether the root cause is missing local SBOL, missing SynBioHub references, unsupported roles, missing implementations, missing fusion sites, missing antibiotics, or selection failure.

Scope

Add a lightweight report object or dictionary. Preferred object name:

@dataclass
class InventoryAudit:
    source: str
    component_definitions_seen: int = 0
    module_definitions_seen: int = 0
    implementations_seen: int = 0
    engineered_plasmids_indexed: int = 0
    backbones_indexed: int = 0
    restriction_enzymes_indexed: int = 0
    ligases_indexed: int = 0
    plasmids_missing_implementation: list[str] = field(default_factory=list)
    plasmids_missing_fusion_sites: list[str] = field(default_factory=list)
    plasmids_missing_antibiotic: list[str] = field(default_factory=list)
    backbones_missing_implementation: list[str] = field(default_factory=list)
    unresolved_built_references: list[str] = field(default_factory=list)
    unsupported_roles: list[str] = field(default_factory=list)
    warnings: list[str] = field(default_factory=list)

A dict is acceptable if that fits the current code better.

Requirements

  • The audit should be returned from index_sbol_document(...) or stored as self.last_inventory_audit.
  • The audit should be serializable to JSON or have to_dict().
  • Indexing should continue where safe, while recording warnings.
  • Truly fatal indexing problems may still raise, but should include enough context.

Non-goals

  • Do not introduce a full new database/index structure yet.
  • Do not rewrite route selection.
  • Do not make audit reporting depend on Notion, GitHub, PUDU, or Opentrons.

Acceptance criteria

  • Audit counts include seen ComponentDefinitions, ModuleDefinitions, and Implementations.
  • Audit counts include indexed plasmids, backbones, restriction enzymes, and ligases.
  • Missing implementation/fusion-site/antibiotic cases are recorded explicitly.
  • Unresolved Implementation.built references are recorded explicitly.
  • Audit is available after local indexing and SynBioHub indexing.
  • Unit or integration tests assert expected audit fields for a tiny fixture.

Verification

Run:

pytest -k "audit or inventory or index"
ruff check .

Codex implementation notes

  • Keep the audit schema small and stable.
  • Prefer identities/display IDs in audit entries, not raw SBOL objects.
  • Do not turn expected missing metadata into hard crashes unless downstream code cannot proceed safely.
  • Use the audit output in failing assertions so future debugging is easier.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions