Skip to content

[FEATURE] Allow output \0 terminated frames (for WebSocket streaming support)#2105

Open
pszemus wants to merge 5 commits intoCCExtractor:masterfrom
pszemus:null-terminated-frames
Open

[FEATURE] Allow output \0 terminated frames (for WebSocket streaming support)#2105
pszemus wants to merge 5 commits intoCCExtractor:masterfrom
pszemus:null-terminated-frames

Conversation

@pszemus
Copy link
Contributor

@pszemus pszemus commented Feb 10, 2026

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

When streaming subtitles (particularly DVBSUB) from ccextractor to WebSocket endpoints via tools like websocat, multi-line subtitles cause issues. Each line is sent as a separate message, resulting in only the last line being visible at the receiving end.

For example, using the following pipeline:

ccextractor --udp <src_stream_address> --codec dvbsub --out=txt --stdout --forceflush | websocat ws://<endpoint-uri>

multi-line subtitle frames are sent line-by-line, losing all but the final line.

This PR introduces the --null-terminated option, which appends a null character (\0) as a frame delimiter after each complete subtitle frame (whether single or multi-line). This enables proper frame boundaries for streaming scenarios.

Then, it'll be possible to create the following pipeline:

ccextractor --udp <src_stream_address> --codec dvbsub --out=txt --null-terminated --stdout --forceflush | websocat -0 ws://<endpoint-uri>

With this change, websocat's -0 flag can properly parse complete subtitle frames using the null delimiter (see websocat documentation).

Benefits:

  • Enables reliable WebSocket streaming of subtitles without data loss
  • Maintains backward compatibility (opt-in feature)
  • Follows established patterns for null-terminated stream processing
  • Simple, focused change that solves a real-world use case

Please compare the following two output files, where with --null-terminated enabled new lines in multi-line subtitles were preserved and all frames end with \0.

@pszemus pszemus force-pushed the null-terminated-frames branch from bdf3aa1 to ff9c160 Compare February 11, 2026 15:42
Copy link
Contributor

@cfsmp3 cfsmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feature with a clear real-world use case. The implementation is clean and properly wired through both C and Rust. However, the --null-terminated flag currently only works for DVB bitmap subtitles, not for text-based captions (CEA-608/708). This needs to be fixed before merging.

The problem

In src/lib_ccx/ccx_encoders_transcript.c, you replaced encoded_crlf with encoded_end_frame in only one place — the bitmap subtitle path at line 92:

// write_cc_bitmap_as_transcript() — line 92 — ✅ changed
write_wrapped(context->out->fh, context->encoded_end_frame, context->encoded_end_frame_length);

But the text subtitle path (write_cc_buffer_as_transcript) still uses encoded_crlf in three places that also need updating:

// Line 206 — ❌ not changed (end of each subtitle line)
ret = write(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);

// Line 328 — ❌ not changed (end of each subtitle block)
ret = write(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);

There's also line 77 and 90 where encoded_crlf is used for parsing/splitting tokens — those should probably stay as-is since they're detecting line breaks within the input, not writing output.

How to verify

I tested with a CEA-608 stream:

./ccextractor input.ts --txt --stdout --null-terminated 2>/dev/null | xxd | head -30

The output contains only 0d 0a (CRLF) — zero null bytes. The flag has no effect for text-based captions.

What to fix

In src/lib_ccx/ccx_encoders_transcript.c, replace encoded_crlf with encoded_end_frame on lines 206 and 328 (the two write() calls in write_cc_buffer_as_transcript). Leave lines 77 and 90 alone — those are input parsing, not output.

Note: you'll also need to update the ret < context->encoded_crlf_length comparisons on lines 207 and 329 to use encoded_end_frame_length accordingly.

@pszemus
Copy link
Contributor Author

pszemus commented Feb 16, 2026

Thanks @cfsmp3 I've fixed missing code paths.
With my test file, now the output changes after setting --null-terminated from:

00000000: 5745 4c4c 2c20 4920 4755 4553 5320 594f  WELL, I GUESS YO
00000010: 5520 434f 554c 4420 5341 5920 5448 4154  U COULD SAY THAT
00000020: 0d0a 4920 4341 5245 2e2e 2e42 4543 4155  ..I CARE...BECAU
00000030: 5345 2049 2042 524f 5547 4854 2059 4f55  SE I BROUGHT YOU
00000040: 0d0a 494e 544f 2054 4849 5320 574f 524c  ..INTO THIS WORL
00000050: 442e 0d0a

to:

00000000: 5745 4c4c 2c20 4920 4755 4553 5320 594f  WELL, I GUESS YO
00000010: 5520 434f 554c 4420 5341 5920 5448 4154  U COULD SAY THAT
00000020: 0049 2043 4152 452e 2e2e 4245 4341 5553  .I CARE...BECAUS
00000030: 4520 4920 4252 4f55 4748 5420 594f 5500  E I BROUGHT YOU.
00000040: 494e 544f 2054 4849 5320 574f 524c 442e  INTO THIS WORLD.
00000050: 00

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 0626bb5...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 6/7
DVD 3/3
DVR-MS 2/2
General 25/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

Congratulations: Merging this PR would fix the following tests:


It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 0626bb5...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 85/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants