Skip to content

Proposition provenance metadata #7

@jimador

Description

@jimador

Observation

When a proposition is extracted, Proposition captures the content (text, mentions, confidence) and source chunks (grounding), but not the conversational context of extraction:

  • Which turn was this extracted from? (Turn 1 vs. turn 50 matters for temporal reasoning)
  • Who said it? (System instruction, operator, user, AI response — different trust implications)
  • How was it extracted? (Initial context loading, mid-conversation incremental, revision, manual entry)

Without provenance, downstream systems can't make informed decisions about trust or authority. The metadata field could store this, but there's no standard schema — every consumer invents their own.

What DICE already has

  • grounding: List<String> — chunk IDs, but not conversational metadata
  • metadata: Map<String, Any> — could carry provenance, but no standard keys
  • ExtractionPerspectiveUSER, AGENT, ALL — controls whose knowledge is extracted during the extraction step, but isn't stored on the resulting proposition
  • SourceAnalysisContext — passed to PropositionPipeline.processChunk(), carries schema, entityResolver, contextId, knownEntities, relations, promptVariables — but no provenance fields
  • ConversationSource — wraps a Conversation for incremental extraction, has message indices — but provenance isn't propagated to extracted propositions

The question

Should DICE standardize provenance metadata on propositions?

Some possibilities:

  1. Standard metadata keys — define constants like dice.provenance.speakerRole, dice.provenance.extractionTurn, dice.provenance.extractionMode. Uses existing metadata map, no schema changes. Extraction pipeline populates them when information is available.

  2. PropositionProvenance data class — a structured provenance record stored in metadata or as a first-class field:

    data class PropositionProvenance(
        val extractionTurn: Int?,
        val speakerRole: SpeakerRole?,     // SYSTEM, OPERATOR, USER, ASSISTANT
        val extractionMode: ExtractionMode  // INITIAL, INCREMENTAL, REVISION, MANUAL
    )
  3. Extend SourceAnalysisContext — add provenance fields so they flow through the extraction pipeline automatically. IncrementalPropositionExtraction could derive provenance from ConversationSource (turn = message index, speaker = message role).

Where provenance matters

Provenance Signal Impact
Speaker role Trust: SYSTEM/OPERATOR > USER > ASSISTANT
Extraction turn Staleness: earlier turns more established but potentially more stale
Extraction mode Confidence: MANUAL > INITIAL > INCREMENTAL
Source context Cross-context reasoning: was this imported from another conversation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions