Skip to content

Conversation

@UdayRajSinghh
Copy link

Thank you for building such a well-structured PDF chatbot system! The codebase is clean and the three-step processing pipeline (document selection → page detection → answer generation) is really thoughtfully designed. I hope this small improvement helps make the search results even more reliable.


Problem

When JSON parsing fails in _process_page_chunk(), the current fallback returns the first page of the chunk regardless of relevance. This can lead to misleading search results where users receive confident answers from completely unrelated content.

Solution

  • Remove the fallback that returns first_page when parsing fails
  • Return empty array [] instead
  • This ensures only genuinely relevant pages are included in results

Impact

  • Prevents false positive results in document search
  • Maintains result accuracy when some chunks fail to process
  • No breaking changes to existing functionality

Fixes the edge case where parsing errors could pollute search results with irrelevant content.

- Remove misleading fallback that returns first page when JSON parsing fails
- Return empty array instead of potentially irrelevant content
- Prevents incorrect page citations in search results
@vercel
Copy link

vercel bot commented Aug 24, 2025

@UdayRajSinghh is attempting to deploy a commit to the Roe AI Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant