[FEATURE] Allow output \0 terminated frames (for WebSocket streaming support)#2105
[FEATURE] Allow output \0 terminated frames (for WebSocket streaming support)#2105pszemus wants to merge 5 commits intoCCExtractor:masterfrom
Conversation
bdf3aa1 to
ff9c160
Compare
cfsmp3
left a comment
There was a problem hiding this comment.
Good feature with a clear real-world use case. The implementation is clean and properly wired through both C and Rust. However, the --null-terminated flag currently only works for DVB bitmap subtitles, not for text-based captions (CEA-608/708). This needs to be fixed before merging.
The problem
In src/lib_ccx/ccx_encoders_transcript.c, you replaced encoded_crlf with encoded_end_frame in only one place — the bitmap subtitle path at line 92:
// write_cc_bitmap_as_transcript() — line 92 — ✅ changed
write_wrapped(context->out->fh, context->encoded_end_frame, context->encoded_end_frame_length);But the text subtitle path (write_cc_buffer_as_transcript) still uses encoded_crlf in three places that also need updating:
// Line 206 — ❌ not changed (end of each subtitle line)
ret = write(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);
// Line 328 — ❌ not changed (end of each subtitle block)
ret = write(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);There's also line 77 and 90 where encoded_crlf is used for parsing/splitting tokens — those should probably stay as-is since they're detecting line breaks within the input, not writing output.
How to verify
I tested with a CEA-608 stream:
./ccextractor input.ts --txt --stdout --null-terminated 2>/dev/null | xxd | head -30
The output contains only 0d 0a (CRLF) — zero null bytes. The flag has no effect for text-based captions.
What to fix
In src/lib_ccx/ccx_encoders_transcript.c, replace encoded_crlf with encoded_end_frame on lines 206 and 328 (the two write() calls in write_cc_buffer_as_transcript). Leave lines 77 and 90 alone — those are input parsing, not output.
Note: you'll also need to update the ret < context->encoded_crlf_length comparisons on lines 207 and 329 to use encoded_end_frame_length accordingly.
|
Thanks @cfsmp3 I've fixed missing code paths. to: |
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 0626bb5...:
Your PR breaks these cases:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 0626bb5...:
Your PR breaks these cases:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
In raising this pull request, I confirm the following (please check boxes):
My familiarity with the project is as follows (check one):
When streaming subtitles (particularly DVBSUB) from ccextractor to WebSocket endpoints via tools like websocat, multi-line subtitles cause issues. Each line is sent as a separate message, resulting in only the last line being visible at the receiving end.
For example, using the following pipeline:
multi-line subtitle frames are sent line-by-line, losing all but the final line.
This PR introduces the
--null-terminatedoption, which appends a null character (\0) as a frame delimiter after each complete subtitle frame (whether single or multi-line). This enables proper frame boundaries for streaming scenarios.Then, it'll be possible to create the following pipeline:
With this change, websocat's
-0flag can properly parse complete subtitle frames using the null delimiter (see websocat documentation).Benefits:
Please compare the following two output files, where with
--null-terminatedenabled new lines in multi-line subtitles were preserved and all frames end with\0.--out=webvtt:ccextractor_webvtt.txt
--out=txt --null-terminated:ccextractor_txt_null-terminated.txt