Skip to content

Fix offset fallback and add configurable tokenizer parameter#154

Open
danielhv10 wants to merge 3 commits intoVectifyAI:mainfrom
danielhv10:fix
Open

Fix offset fallback and add configurable tokenizer parameter#154
danielhv10 wants to merge 3 commits intoVectifyAI:mainfrom
danielhv10:fix

Conversation

@danielhv10
Copy link

  • Fix offset defaulting to 0 when calculate_page_offset returns None,
    preventing downstream errors
  • Add tokenizer parameter to allow specifying a custom tiktoken encoding name
    independently of the model, propagated through utils.py, page_index.py,
    page_index_md.py, config.yaml, and run_pageindex.py
  • Add pycryptodome dependency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant