Observation
When a proposition is extracted, Proposition captures the content (text, mentions, confidence) and source chunks (grounding), but not the conversational context of extraction:
- Which turn was this extracted from? (Turn 1 vs. turn 50 matters for temporal reasoning)
- Who said it? (System instruction, operator, user, AI response — different trust implications)
- How was it extracted? (Initial context loading, mid-conversation incremental, revision, manual entry)
Without provenance, downstream systems can't make informed decisions about trust or authority. The metadata field could store this, but there's no standard schema — every consumer invents their own.
What DICE already has
grounding: List<String> — chunk IDs, but not conversational metadata
metadata: Map<String, Any> — could carry provenance, but no standard keys
ExtractionPerspective — USER, AGENT, ALL — controls whose knowledge is extracted during the extraction step, but isn't stored on the resulting proposition
SourceAnalysisContext — passed to PropositionPipeline.processChunk(), carries schema, entityResolver, contextId, knownEntities, relations, promptVariables — but no provenance fields
ConversationSource — wraps a Conversation for incremental extraction, has message indices — but provenance isn't propagated to extracted propositions
The question
Should DICE standardize provenance metadata on propositions?
Some possibilities:
-
Standard metadata keys — define constants like dice.provenance.speakerRole, dice.provenance.extractionTurn, dice.provenance.extractionMode. Uses existing metadata map, no schema changes. Extraction pipeline populates them when information is available.
-
PropositionProvenance data class — a structured provenance record stored in metadata or as a first-class field:
data class PropositionProvenance(
val extractionTurn: Int?,
val speakerRole: SpeakerRole?, // SYSTEM, OPERATOR, USER, ASSISTANT
val extractionMode: ExtractionMode // INITIAL, INCREMENTAL, REVISION, MANUAL
)
-
Extend SourceAnalysisContext — add provenance fields so they flow through the extraction pipeline automatically. IncrementalPropositionExtraction could derive provenance from ConversationSource (turn = message index, speaker = message role).
Where provenance matters
| Provenance Signal |
Impact |
| Speaker role |
Trust: SYSTEM/OPERATOR > USER > ASSISTANT |
| Extraction turn |
Staleness: earlier turns more established but potentially more stale |
| Extraction mode |
Confidence: MANUAL > INITIAL > INCREMENTAL |
| Source context |
Cross-context reasoning: was this imported from another conversation? |
Observation
When a proposition is extracted,
Propositioncaptures the content (text,mentions,confidence) and source chunks (grounding), but not the conversational context of extraction:Without provenance, downstream systems can't make informed decisions about trust or authority. The
metadatafield could store this, but there's no standard schema — every consumer invents their own.What DICE already has
grounding: List<String>— chunk IDs, but not conversational metadatametadata: Map<String, Any>— could carry provenance, but no standard keysExtractionPerspective—USER,AGENT,ALL— controls whose knowledge is extracted during the extraction step, but isn't stored on the resulting propositionSourceAnalysisContext— passed toPropositionPipeline.processChunk(), carriesschema,entityResolver,contextId,knownEntities,relations,promptVariables— but no provenance fieldsConversationSource— wraps aConversationfor incremental extraction, has message indices — but provenance isn't propagated to extracted propositionsThe question
Should DICE standardize provenance metadata on propositions?
Some possibilities:
Standard metadata keys — define constants like
dice.provenance.speakerRole,dice.provenance.extractionTurn,dice.provenance.extractionMode. Uses existingmetadatamap, no schema changes. Extraction pipeline populates them when information is available.PropositionProvenance data class — a structured provenance record stored in
metadataor as a first-class field:Extend SourceAnalysisContext — add provenance fields so they flow through the extraction pipeline automatically.
IncrementalPropositionExtractioncould derive provenance fromConversationSource(turn = message index, speaker = message role).Where provenance matters