Skip to content

Fix apostrophe issue#2108

Open
Apprentice2907 wants to merge 4 commits intoCCExtractor:masterfrom
Apprentice2907:fix-apostrophe-issue
Open

Fix apostrophe issue#2108
Apprentice2907 wants to merge 4 commits intoCCExtractor:masterfrom
Apprentice2907:fix-apostrophe-issue

Conversation

@Apprentice2907
Copy link
Contributor

@Apprentice2907 Apprentice2907 commented Feb 11, 2026

[FIX] Fix apostrophe decoding incorrectly as position changes (#2098)

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Description

Fixes #2098

Problem:
Apostrophes in captions (e.g., "it's", "McDonald's") were being decoded incorrectly as position changes in SCC/CCD output, causing garbled caption formatting.

Root cause:
In the process608() function, the second byte (lo) of character pairs was being written unconditionally without validating that it's a printable character (>= 0x20). This caused control codes in the lo byte to be incorrectly processed as text characters.

Solution:
Added validation check if (lo >= 0x20) before calling handle_single(lo, context) to ensure only printable characters are processed as text, while control codes are properly ignored.

Changed file:

  • src/lib_ccx/ccx_decoders_608.c (line ~1095)

Testing:
The fix ensures that apostrophes and other special characters with byte value 0x27 are correctly handled as printable characters rather than being misinterpreted as positioning commands.

Copy link
Contributor

@cfsmp3 cfsmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

This PR bundles at least 4 unrelated changes into a single branch. Each needs to be its own PR:

1. README.md — --timestamp-map documentation

This is a duplicate of PR #2065, which was already merged. Remove this from your branch.

2. JSON report format (report=json) — ~500 lines

A whole new feature: report_format_t enum, print_file_report_json() function in params_dump.c, option parsing in parser.rs, header changes. This is a significant feature and needs its own PR with proper review and testing.

3. 608 apostrophe fix — ccx_decoders_608.c

The if (lo >= 0x20) guard before handle_single(lo, context) — this is the actual fix claimed in the PR title. However, note that the issue #2098 describes a problem specifically in SCC/CCD output encoding, not in 608 decoding. The apostrophe ' is 0x27, which is >= 0x20, so this check wouldn't actually filter it. The fix may still be valid for filtering null/control bytes in the lo position, but the description doesn't match the code change. This needs a separate PR with a clearer explanation of what specific input produces wrong output, and what the fix actually addresses.

4. Teletext sentence merging — telxcc.c

This is a major behavioral change: instead of flushing subtitle text when the buffer changes, it now merges fragments together until it sees sentence-ending punctuation (.?!:). This fundamentally changes teletext subtitle timing and content for every teletext stream. This absolutely needs its own PR with thorough testing against multiple teletext samples, and a clear explanation of why this change is needed.

Also:

  • Stray whitespace changes in ccx_common_option.c (empty lines) and ccx_common_option.h (broken comment formatting for scc_framerate)
  • The commit messages don't match the changes (first commit says "Document existing --timestamp-map option" but the branch also contains unrelated code)

Action required: Please split this into separate PRs — one per concern. Each should have a clear description, testing evidence, and a clean commit history.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Captions containing apostrophes may be decoded incorrectly as position changes in SCC, or other symbols in CCD

2 participants