When building a hierarchical tree index on moderately long documents (such as multi-page PDFs), the library executes parallel API calls without client-side throttling or concurrency limiters. When utilizing public, credit-capped, or tier-limited LLM endpoints (like OpenAI, Anthropic, or NVIDIA NIM), this unthrottled burst of concurrent requests frequently triggers HTTP Error 429 (Too Many Requests).
Furthermore, this rate-limit failure cascades into a hard crash with a KeyError (e.g., KeyError: 'toc_detected'). When the LLM API repeatedly fails with 429 errors and exhausts all retries, the completion wrapper returns an empty string "". The JSON extractor parses this empty response into an empty dictionary {}. Because the code attempts to directly access required JSON keys without validation or exception handling, it causes the entire tree index generation process to crash.
Root Cause Details
-
Unthrottled asyncio.gather Calls:
- File:
pageindex/utils.py (inside generate_summaries_for_structure):
Runs all page node summaries concurrently via asyncio.gather(*tasks). For long documents with dozens of pages, this instantly floods the endpoint with concurrent requests.
- File:
pageindex/page_index.py (inside process_large_node_recursively):
Processes child nodes concurrently, creating nested concurrent branches of LLM requests without limits.
- File:
pageindex/page_index_md.py (inside process_large_node_recursively):
Also processes child nodes concurrently under parallel tasks.
-
Fixed-Delay Retries (No Backoff/Jitter):
- The retry loop in
llm_completion and llm_acompletion uses a fixed 1-second delay (time.sleep(1) / await asyncio.sleep(1)). Because all concurrent requests fail together and retry together at the exact same instant, they repeatedly clash under rate limits, exhausting all retries rapidly.
-
Cascading KeyError due to Unsafe JSON Parsing:
-
When all retries are exhausted, the completion wrapper returns an empty string "".
-
In pageindex/page_index.py (inside toc_detector_single_page):
response = llm_completion(model=model, prompt=prompt)
json_content = extract_json(response)
return json_content['toc_detected']
When response is empty, json_content is extracted as {} (an empty dict), throwing KeyError: 'toc_detected'.
-
This is a recurring pattern across the codebase. The same crash vulnerability exists in multiple functions that perform direct dictionary lookup on parsed LLM JSON outputs without checking key existence:
check_if_toc_extraction_is_complete (in pageindex/page_index.py):
json_content = extract_json(response)
return json_content['completed'] # Vulnerable to KeyError: 'completed'
check_if_toc_transformation_is_complete (in pageindex/page_index.py):
json_content = extract_json(response)
return json_content['completed'] # Vulnerable to KeyError: 'completed'
Environment Info
- Python Version: 3.x
- OS: Windows / Linux / macOS
- LLM Integration: LiteLLM v1.83.7
- Target Model:
meta/llama-3.1-70b-instruct (reproduced using NVIDIA NIM endpoint)
Steps to Reproduce
- Configure a script to build a tree index on a multi-page PDF using a rate-limited endpoint/model (e.g.
meta/llama-3.1-70b-instruct or a lower-tier OpenAI key).
- Configure tree building options to generate node summaries:
import asyncio
from pageindex import page_index
result = page_index(
doc="sample_document.pdf",
model="openai/meta/llama-3.1-70b-instruct",
if_add_node_id="yes",
if_add_node_summary="yes",
if_add_doc_description="yes"
)
- Run the script. The initial concurrent tasks will immediately exhaust the API's rate limits, fail all retries, and result in the
KeyError crash.
Observed Traceback / Logs
LiteLLM completion() model= meta/llama-3.1-70b-instruct; provider = openai
[TIMESTAMP] [INFO]
[TIMESTAMP] [INFO] Retrying request to /chat/completions in 0.492651 seconds
[TIMESTAMP] [INFO] Retrying request to /chat/completions in 0.950194 seconds
[TIMESTAMP] [ERROR] Error: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {'status': 429, 'title': 'Too Many Requests'}
...
[TIMESTAMP] [ERROR] Max retries reached for prompt:
Your job is to detect if there is a table of content provided in the given text...
[TIMESTAMP] [ERROR] Failed to extract JSON: Expecting value: line 1 column 1 (char 0)
[TIMESTAMP] [ERROR] Failed to parse JSON even after cleanup
============================================================
Traceback (most recent call last):
File "reproduce_429.py", line 116, in main
result = page_index(
doc=pdf_path,
model=model_name,
if_add_node_id="yes",
if_add_node_summary="yes",
if_add_doc_description="yes"
)
File "pageindex/page_index.py", line 1121, in page_index
return page_index_main(doc, opt)
File "pageindex/page_index.py", line 1110, in page_index_main
return asyncio.run(page_index_builder())
File "Lib/asyncio/runners.py", line 195, in run
return runner.run(main)
File "Lib/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "Lib/asyncio/base_events.py", line 725, in run_until_complete
return future.result()
File "pageindex/page_index.py", line 1083, in page_index_builder
structure = await tree_parser(page_list, opt, doc=doc, logger=logger)
File "pageindex/page_index.py", line 1030, in tree_parser
check_toc_result = check_toc(page_list, opt)
File "pageindex/page_index.py", line 697, in check_toc
toc_page_list = find_toc_pages(start_page_index=0, page_list=page_list, opt=opt)
File "pageindex/page_index.py", line 351, in find_toc_pages
detected_result = toc_detector_single_page(page_list[i][0],model=opt.model)
File "pageindex/page_index.py", line 122, in toc_detector_single_page
return json_content['toc_detected']
~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'toc_detected'
============================================================
When building a hierarchical tree index on moderately long documents (such as multi-page PDFs), the library executes parallel API calls without client-side throttling or concurrency limiters. When utilizing public, credit-capped, or tier-limited LLM endpoints (like OpenAI, Anthropic, or NVIDIA NIM), this unthrottled burst of concurrent requests frequently triggers HTTP Error 429 (Too Many Requests).
Furthermore, this rate-limit failure cascades into a hard crash with a
KeyError(e.g.,KeyError: 'toc_detected'). When the LLM API repeatedly fails with 429 errors and exhausts all retries, the completion wrapper returns an empty string"". The JSON extractor parses this empty response into an empty dictionary{}. Because the code attempts to directly access required JSON keys without validation or exception handling, it causes the entire tree index generation process to crash.Root Cause Details
Unthrottled
asyncio.gatherCalls:pageindex/utils.py(insidegenerate_summaries_for_structure):Runs all page node summaries concurrently via
asyncio.gather(*tasks). For long documents with dozens of pages, this instantly floods the endpoint with concurrent requests.pageindex/page_index.py(insideprocess_large_node_recursively):Processes child nodes concurrently, creating nested concurrent branches of LLM requests without limits.
pageindex/page_index_md.py(insideprocess_large_node_recursively):Also processes child nodes concurrently under parallel tasks.
Fixed-Delay Retries (No Backoff/Jitter):
llm_completionandllm_acompletionuses a fixed 1-second delay (time.sleep(1)/await asyncio.sleep(1)). Because all concurrent requests fail together and retry together at the exact same instant, they repeatedly clash under rate limits, exhausting all retries rapidly.Cascading
KeyErrordue to Unsafe JSON Parsing:When all retries are exhausted, the completion wrapper returns an empty string
"".In
pageindex/page_index.py(insidetoc_detector_single_page):When
responseis empty,json_contentis extracted as{}(an empty dict), throwingKeyError: 'toc_detected'.This is a recurring pattern across the codebase. The same crash vulnerability exists in multiple functions that perform direct dictionary lookup on parsed LLM JSON outputs without checking key existence:
check_if_toc_extraction_is_complete(inpageindex/page_index.py):check_if_toc_transformation_is_complete(inpageindex/page_index.py):Environment Info
meta/llama-3.1-70b-instruct(reproduced using NVIDIA NIM endpoint)Steps to Reproduce
meta/llama-3.1-70b-instructor a lower-tier OpenAI key).KeyErrorcrash.Observed Traceback / Logs