Fix fallback logic in page chunk processing to prevent irrelevant results #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thank you for building such a well-structured PDF chatbot system! The codebase is clean and the three-step processing pipeline (document selection → page detection → answer generation) is really thoughtfully designed. I hope this small improvement helps make the search results even more reliable.
Problem
When JSON parsing fails in
_process_page_chunk(), the current fallback returns the first page of the chunk regardless of relevance. This can lead to misleading search results where users receive confident answers from completely unrelated content.Solution
first_pagewhen parsing fails[]insteadImpact
Fixes the edge case where parsing errors could pollute search results with irrelevant content.