Skip to content

No conversation context — each prompt/response pair is extracted in isolation #3

@NotYuSheng

Description

@NotYuSheng

Problem

format_conversations (scripts/telegram_extract.py) extracts each prompt→response as a completely independent, single-turn pair. There is no awareness of the surrounding conversation thread. Prior exchanges are never included as context.

For a conversation like:

Other: "what did you think of the movie?"
You:   "loved it"
Other: "which part?"         ← treated as a brand-new, context-free prompt
You:   "the ending"

The second pair ("which part?", "the ending") is extracted with no knowledge of what came before, producing a nonsensical training sample.

Expected behavior

Related message pairs within the same conversation should be grouped into multi-turn samples with accumulated context, so the model learns to respond with awareness of prior turns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions