Skip to content

Fix fragmented teletext subtitles by delaying flush until sentence completion#2106

Open
Apprentice2907 wants to merge 3 commits intoCCExtractor:masterfrom
Apprentice2907:srt-file-just-contains-awkward-letter
Open

Fix fragmented teletext subtitles by delaying flush until sentence completion#2106
Apprentice2907 wants to merge 3 commits intoCCExtractor:masterfrom
Apprentice2907:srt-file-just-contains-awkward-letter

Conversation

@Apprentice2907
Copy link
Contributor

[FIX] Fix fragmented teletext subtitles by delaying flush until sentence completion

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Problem

In some teletext broadcasts, subtitles are transmitted as incremental fragments where only a few characters change between consecutive pages. CCExtractor currently flushes the previous subtitle whenever the fuzzy comparison detects a difference between pages, causing each fragment to be written as a separate subtitle entry.

Example of incorrect output:

Ra
Pe
wee
aan !

Expected output:

Rape wee aan!

Root Cause

The issue occurs in process_page() inside telxcc.c. When fuzzy_memcmp() detects a difference between the previous and current teletext page, the code immediately flushes the previous subtitle via telxcc_dump_prev_page(ctx, sub). However, in some broadcasts, teletext pages are updated gradually, causing small differences that trigger premature flushing.

Solution

This patch modifies the flushing logic so that the previous subtitle is only flushed if it ends with a sentence-terminating punctuation mark (., ?, !, :). Otherwise, the current fragment is merged into the previous buffer and the subtitle is not flushed yet.

This allows incremental teletext fragments to be combined into a complete subtitle before being written.

Changes

  • File modified: src/lib_ccx/telxcc.c
  • Function: process_page()
  • Only the flushing condition after fuzzy_memcmp() was adjusted

Testing

Tested with:

  1. Teletext streams producing fragmented subtitles → Now merged into complete sentences
  2. Normal teletext subtitle streams → Behavior unchanged

Benefits

  • Prevents fragmented subtitle output
  • Preserves existing timing and buffering logic
  • Does not modify teletext decoding or encoder behavior
  • Minimal and safe change limited to flush logic

Copy link
Contributor

@cfsmp3 cfsmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

This PR shares the same first 3 commits (identical SHAs) as your other PR #2108. It contains the same mix of unrelated changes:

  1. README --timestamp-map docs (already merged in #2065)
  2. JSON report format feature (~500 lines)
  3. Teletext sentence-merging behavior change

Please close this PR and work from a clean branch structure. See my review on #2108 for the full breakdown of what needs to be split into separate PRs.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants