Problem
format_conversations (scripts/telegram_extract.py) extracts each prompt→response as a completely independent, single-turn pair. There is no awareness of the surrounding conversation thread. Prior exchanges are never included as context.
For a conversation like:
Other: "what did you think of the movie?"
You: "loved it"
Other: "which part?" ← treated as a brand-new, context-free prompt
You: "the ending"
The second pair ("which part?", "the ending") is extracted with no knowledge of what came before, producing a nonsensical training sample.
Expected behavior
Related message pairs within the same conversation should be grouped into multi-turn samples with accumulated context, so the model learns to respond with awareness of prior turns.
Problem
format_conversations(scripts/telegram_extract.py) extracts each prompt→response as a completely independent, single-turn pair. There is no awareness of the surrounding conversation thread. Prior exchanges are never included as context.For a conversation like:
The second pair
("which part?", "the ending")is extracted with no knowledge of what came before, producing a nonsensical training sample.Expected behavior
Related message pairs within the same conversation should be grouped into multi-turn samples with accumulated context, so the model learns to respond with awareness of prior turns.