Skip to content

[Bug] Unthrottled concurrent LLM requests lead to HTTP 429 Rate Limits and cascading KeyError in tree generation #283

@prthm2910

Description

@prthm2910

When building a hierarchical tree index on moderately long documents (such as multi-page PDFs), the library executes parallel API calls without client-side throttling or concurrency limiters. When utilizing public, credit-capped, or tier-limited LLM endpoints (like OpenAI, Anthropic, or NVIDIA NIM), this unthrottled burst of concurrent requests frequently triggers HTTP Error 429 (Too Many Requests).

Furthermore, this rate-limit failure cascades into a hard crash with a KeyError (e.g., KeyError: 'toc_detected'). When the LLM API repeatedly fails with 429 errors and exhausts all retries, the completion wrapper returns an empty string "". The JSON extractor parses this empty response into an empty dictionary {}. Because the code attempts to directly access required JSON keys without validation or exception handling, it causes the entire tree index generation process to crash.


Root Cause Details

  1. Unthrottled asyncio.gather Calls:

    • File: pageindex/utils.py (inside generate_summaries_for_structure):
      Runs all page node summaries concurrently via asyncio.gather(*tasks). For long documents with dozens of pages, this instantly floods the endpoint with concurrent requests.
    • File: pageindex/page_index.py (inside process_large_node_recursively):
      Processes child nodes concurrently, creating nested concurrent branches of LLM requests without limits.
    • File: pageindex/page_index_md.py (inside process_large_node_recursively):
      Also processes child nodes concurrently under parallel tasks.
  2. Fixed-Delay Retries (No Backoff/Jitter):

    • The retry loop in llm_completion and llm_acompletion uses a fixed 1-second delay (time.sleep(1) / await asyncio.sleep(1)). Because all concurrent requests fail together and retry together at the exact same instant, they repeatedly clash under rate limits, exhausting all retries rapidly.
  3. Cascading KeyError due to Unsafe JSON Parsing:

    • When all retries are exhausted, the completion wrapper returns an empty string "".

    • In pageindex/page_index.py (inside toc_detector_single_page):

      response = llm_completion(model=model, prompt=prompt)
      json_content = extract_json(response)    
      return json_content['toc_detected']

      When response is empty, json_content is extracted as {} (an empty dict), throwing KeyError: 'toc_detected'.

    • This is a recurring pattern across the codebase. The same crash vulnerability exists in multiple functions that perform direct dictionary lookup on parsed LLM JSON outputs without checking key existence:

      • check_if_toc_extraction_is_complete (in pageindex/page_index.py):
        json_content = extract_json(response)
        return json_content['completed']  # Vulnerable to KeyError: 'completed'
      • check_if_toc_transformation_is_complete (in pageindex/page_index.py):
        json_content = extract_json(response)
        return json_content['completed']  # Vulnerable to KeyError: 'completed'

Environment Info

  • Python Version: 3.x
  • OS: Windows / Linux / macOS
  • LLM Integration: LiteLLM v1.83.7
  • Target Model: meta/llama-3.1-70b-instruct (reproduced using NVIDIA NIM endpoint)

Steps to Reproduce

  1. Configure a script to build a tree index on a multi-page PDF using a rate-limited endpoint/model (e.g. meta/llama-3.1-70b-instruct or a lower-tier OpenAI key).
  2. Configure tree building options to generate node summaries:
    import asyncio
    from pageindex import page_index
    
    result = page_index(
        doc="sample_document.pdf",
        model="openai/meta/llama-3.1-70b-instruct",
        if_add_node_id="yes",
        if_add_node_summary="yes",
        if_add_doc_description="yes"
    )
  3. Run the script. The initial concurrent tasks will immediately exhaust the API's rate limits, fail all retries, and result in the KeyError crash.

Observed Traceback / Logs

LiteLLM completion() model= meta/llama-3.1-70b-instruct; provider = openai
[TIMESTAMP] [INFO] 
[TIMESTAMP] [INFO] Retrying request to /chat/completions in 0.492651 seconds
[TIMESTAMP] [INFO] Retrying request to /chat/completions in 0.950194 seconds
[TIMESTAMP] [ERROR] Error: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {'status': 429, 'title': 'Too Many Requests'}
...
[TIMESTAMP] [ERROR] Max retries reached for prompt: 
    Your job is to detect if there is a table of content provided in the given text...
[TIMESTAMP] [ERROR] Failed to extract JSON: Expecting value: line 1 column 1 (char 0)
[TIMESTAMP] [ERROR] Failed to parse JSON even after cleanup

============================================================
Traceback (most recent call last):
  File "reproduce_429.py", line 116, in main
    result = page_index(
        doc=pdf_path,
        model=model_name,
        if_add_node_id="yes",
        if_add_node_summary="yes",
        if_add_doc_description="yes"
    )
  File "pageindex/page_index.py", line 1121, in page_index
    return page_index_main(doc, opt)
  File "pageindex/page_index.py", line 1110, in page_index_main
    return asyncio.run(page_index_builder())
  File "Lib/asyncio/runners.py", line 195, in run
    return runner.run(main)
  File "Lib/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "Lib/asyncio/base_events.py", line 725, in run_until_complete
    return future.result()
  File "pageindex/page_index.py", line 1083, in page_index_builder
    structure = await tree_parser(page_list, opt, doc=doc, logger=logger)
  File "pageindex/page_index.py", line 1030, in tree_parser
    check_toc_result = check_toc(page_list, opt)
  File "pageindex/page_index.py", line 697, in check_toc
    toc_page_list = find_toc_pages(start_page_index=0, page_list=page_list, opt=opt)
  File "pageindex/page_index.py", line 351, in find_toc_pages
    detected_result = toc_detector_single_page(page_list[i][0],model=opt.model)
  File "pageindex/page_index.py", line 122, in toc_detector_single_page
    return json_content['toc_detected']
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'toc_detected'
============================================================

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions