Skip to content

Route TOC-without-page-numbers documents to the correct strategy#285

Open
Me3sP wants to merge 1 commit into
VectifyAI:mainfrom
Me3sP:fix/toc-no-page-numbers-routing
Open

Route TOC-without-page-numbers documents to the correct strategy#285
Me3sP wants to merge 1 commit into
VectifyAI:mainfrom
Me3sP:fix/toc-no-page-numbers-routing

Conversation

@Me3sP
Copy link
Copy Markdown

@Me3sP Me3sP commented May 20, 2026

Problem

tree_parser only had two dispatch branches: a TOC with page numbers, or everything else. A document with a printed table of contents that lists no page numbers fell into the else branch and was handled by process_no_toc — regenerating the structure from scratch and ignoring the existing TOC entirely.

As a result, process_toc_no_page_numbers was unreachable as a primary strategy. It only ever ran as a fallback from process_toc_with_page_numbers inside meta_processor.

Fix

  • Add the missing tree_parser branch so a TOC with no page numbers is dispatched to process_toc_no_page_numbers directly, using the TOC instead of discarding it.
  • Forward start_index from meta_processor into process_toc_no_page_numbers. It previously relied on the default (1), which would index incorrectly when invoked for non-top-level nodes.

The existing fallback chain is preserved: process_toc_no_page_numbers still degrades to process_no_toc on low verification accuracy.

Impact

Additive only — no existing branch behavior changes. Documents that previously hit process_no_toc despite having a usable TOC now keep their authored structure.

tree_parser only had two branches: a TOC with page numbers, or
everything else. A document with a printed TOC that lists no page
numbers fell into the else branch and was processed with
process_no_toc, regenerating the structure from scratch and ignoring
the existing TOC entirely.

process_toc_no_page_numbers was therefore unreachable as a primary
strategy and only ran as a fallback from process_toc_with_page_numbers.

Add the missing branch so a TOC with no page numbers is dispatched to
process_toc_no_page_numbers directly. Also forward start_index from
meta_processor into process_toc_no_page_numbers, which previously
relied on the default and would index incorrectly for non-top-level
nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant