Skip to content

[RUST] Added ISDB module#2109

Open
vatsalkeshav wants to merge 8 commits intoCCExtractor:masterfrom
vatsalkeshav:rusty-isdb-decoder
Open

[RUST] Added ISDB module#2109
vatsalkeshav wants to merge 8 commits intoCCExtractor:masterfrom
vatsalkeshav:rusty-isdb-decoder

Conversation

@vatsalkeshav
Copy link
Contributor

@vatsalkeshav vatsalkeshav commented Feb 13, 2026

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

This pull request migrates src/lib_ccx/ccx_decoders_isdb.c, ccx_decoders_isdb.h to /src/rust/src/isdb rust module.

  • types.rs - ISDB-specific types, enums, and constants etc.
  • leaf.rs. - Low-level helper functions; independent of functions in mid/high/mod.rs
  • mid.rs - Mid-level processing functions
  • high.rs - High-level parser functions
  • mod.rs - ffi from rust side

Tested against streams :

  1. https://drive.google.com/file/d/0B_61ywKPmI0TMzlwZGw0MzdIc0U/view?usp=drive_link&resourcekey=0-UkWAUJcBIl_6uJL3nzCTcA
  2. https://drive.google.com/file/d/0B_61ywKPmI0TM3dKRlJ6UjI0STQ/view?usp=drive_link&resourcekey=0-zwyV8fkI_xMeM3_1zUZ_RQ

@vatsalkeshav
Copy link
Contributor Author

vatsalkeshav commented Feb 13, 2026

Please note :

  • this pr introduces slight output parity when tested against both the streams above
    Check diff from stream 1 as of above (above is from ccextractor compiled off master, below is from ccextractor compiled off this pr):

    5691c5691
    < !
    ---
    > ! que o dia comecebem... !

    similar diff from stream 2 from above is also there :

    183c183
    <  e
    ---
    >  em c�psulas menores, mais f�ceis de ingerir.
    189c189
    <  n
    ---
    >  nas costas e musculares.
    207c207
    <  a
    ---
    >  a partir de 10 minutos

    In /src/lib_ccx/ccx_decoders_isdb.c ,~ lines 530-535 in get_text (func)

    if (ccx_strstr_ignorespace(text->buf, sb_text->buf))
    			{
    				found = CCX_TRUE;
    				// See if complete string is there if not update that string
    				if (!ccx_strstr_ignorespace(sb_text->buf, text->buf))
    				{
    					reserve_buf(sb_text, text->used);
    					memcpy(sb_text->buf, text->buf, text->used);
    				}
    				break;
    			}

    Here, text is copied into the buffer but sb_text->used is not updated. Hence, when flushing, only the used bytes are written, so maybe, just ! is output instead of the full cc.

    In src/rust/src/isdb/mid.rs, ~ lines 135-143 in get_text (func)

    if strstr_ignorespace(&text.buf, &sb.buf) {
                    found = true;
                    // See if complete string is there if not update that string
                    if !strstr_ignorespace(&sb.buf, &text.buf) {
                        sb.buf = text.buf.clone();
                    }
                    break;
                }

    Here, we replace the entire buffer via clone() - the reason why the complete text is output

  • The rust code owns the isdb context the entire time, so there's minimal need of c<->rust data transfer library

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit dd29311...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 6/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 85/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit dd29311...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 6/7
DVD 3/3
DVR-MS 2/2
General 25/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

Congratulations: Merging this PR would fix the following tests:


It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

Copy link
Contributor

@cfsmp3 cfsmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: ISDB Rust Port

Nice work on this port. The code is well-organized across the 4 modules (types/leaf/mid/high), the C-to-Rust mapping tables in each module are very helpful, and the test suite is comprehensive.

I tested against the first sample stream you linked (the ~60min Brazilian broadcast) and the Rust output is 99.95% identical to the C version across 6300+ lines of subtitles — the few minor differences are actually slightly better output from the Rust side (less fragmented text in the rollup deduplication path).

Bug: Operator precedence in parse_caption_management_data (leaf.rs:205)

if ctx.dmf == 0x0C || ctx.dmf == 0x0D || ctx.dmf == 0x0E && pos < buf.len() {
    ctx.dc = buf[pos];

In Rust, && binds tighter than ||, so this parses as:

ctx.dmf == 0x0C || ctx.dmf == 0x0D || (ctx.dmf == 0x0E && pos < buf.len())

The pos < buf.len() bounds check only protects the 0x0E case. If dmf is 0x0C or 0x0D and the buffer happens to end right after the dmf byte, buf[pos] will panic (index out of bounds). Fix:

if (ctx.dmf == 0x0C || ctx.dmf == 0x0D || ctx.dmf == 0x0E) && pos < buf.len() {

Minor: Wrong bytes in debug log (leaf.rs:210)

debug!("CC MGMT DATA: languages: {:?}", &buf[pos - 3..pos]);

At this point pos has been incremented by 1 past the dmf byte, so buf[pos - 3..pos] reads 3 bytes before the current position (unrelated data). The C code reads buf[0], buf[1], buf[2] at the current pointer position. The Rust equivalent should be &buf[pos..pos + 3]. This is just a log issue, not a functional bug.

Note on Rust port merging timeline

Independently of this diff — we're currently working on improving our CI to include proper regression testing for Rust ports (comparing C vs Rust output against reference streams). Until that infrastructure is in place, we won't be merging Rust ports, because we lack confidence that we can catch regressions going forward. This is not a reflection of the quality of your work (which is solid), just a process/tooling gap on our side. We'll revisit once CI is ready.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants