[Bug] Unthrottled concurrent LLM requests lead to HTTP 429 Rate Limits and cascading KeyError in tree generation

When building a hierarchical tree index on moderately long documents (such as multi-page PDFs), the library executes parallel API calls without client-side throttling or concurrency limiters. When utilizing public, credit-capped, or tier-limited LLM endpoints (like OpenAI, Anthropic, or NVIDIA NIM), this unthrottled burst of concurrent requests frequently triggers **HTTP Error 429 (Too Many Requests)**.

Furthermore, this rate-limit failure cascades into a hard crash with a **`KeyError`** (e.g., `KeyError: 'toc_detected'`). When the LLM API repeatedly fails with 429 errors and exhausts all retries, the completion wrapper returns an empty string `""`. The JSON extractor parses this empty response into an empty dictionary `{}`. Because the code attempts to directly access required JSON keys without validation or exception handling, it causes the entire tree index generation process to crash.

---

## **Root Cause Details**

1. **Unthrottled `asyncio.gather` Calls:**
   - **File:** `pageindex/utils.py` (inside `generate_summaries_for_structure`):
     Runs all page node summaries concurrently via `asyncio.gather(*tasks)`. For long documents with dozens of pages, this instantly floods the endpoint with concurrent requests.
   - **File:** `pageindex/page_index.py` (inside `process_large_node_recursively`):
     Processes child nodes concurrently, creating nested concurrent branches of LLM requests without limits.
   - **File:** `pageindex/page_index_md.py` (inside `process_large_node_recursively`):
     Also processes child nodes concurrently under parallel tasks.

2. **Fixed-Delay Retries (No Backoff/Jitter):**
   - The retry loop in `llm_completion` and `llm_acompletion` uses a fixed 1-second delay (`time.sleep(1)` / `await asyncio.sleep(1)`). Because all concurrent requests fail together and retry together at the exact same instant, they repeatedly clash under rate limits, exhausting all retries rapidly.

3. **Cascading `KeyError` due to Unsafe JSON Parsing:**
   - When all retries are exhausted, the completion wrapper returns an empty string `""`.
   - In `pageindex/page_index.py` (inside `toc_detector_single_page`):
     ```python
     response = llm_completion(model=model, prompt=prompt)
     json_content = extract_json(response)    
     return json_content['toc_detected']
     ```
     When `response` is empty, `json_content` is extracted as `{}` (an empty dict), throwing `KeyError: 'toc_detected'`.
   
   - **This is a recurring pattern across the codebase.** The same crash vulnerability exists in multiple functions that perform direct dictionary lookup on parsed LLM JSON outputs without checking key existence:
     - `check_if_toc_extraction_is_complete` (in `pageindex/page_index.py`):
       ```python
       json_content = extract_json(response)
       return json_content['completed']  # Vulnerable to KeyError: 'completed'
       ```
     - `check_if_toc_transformation_is_complete` (in `pageindex/page_index.py`):
       ```python
       json_content = extract_json(response)
       return json_content['completed']  # Vulnerable to KeyError: 'completed'
       ```

---

## **Environment Info**
- **Python Version:** 3.x
- **OS:** Windows / Linux / macOS
- **LLM Integration:** LiteLLM v1.83.7
- **Target Model:** `meta/llama-3.1-70b-instruct` (reproduced using NVIDIA NIM endpoint)

---

## **Steps to Reproduce**

1. Configure a script to build a tree index on a multi-page PDF using a rate-limited endpoint/model (e.g. `meta/llama-3.1-70b-instruct` or a lower-tier OpenAI key).
2. Configure tree building options to generate node summaries:
   ```python
   import asyncio
   from pageindex import page_index

   result = page_index(
       doc="sample_document.pdf",
       model="openai/meta/llama-3.1-70b-instruct",
       if_add_node_id="yes",
       if_add_node_summary="yes",
       if_add_doc_description="yes"
   )
   ```
3. Run the script. The initial concurrent tasks will immediately exhaust the API's rate limits, fail all retries, and result in the `KeyError` crash.

---

## **Observed Traceback / Logs**

```text
LiteLLM completion() model= meta/llama-3.1-70b-instruct; provider = openai
[TIMESTAMP] [INFO] 
[TIMESTAMP] [INFO] Retrying request to /chat/completions in 0.492651 seconds
[TIMESTAMP] [INFO] Retrying request to /chat/completions in 0.950194 seconds
[TIMESTAMP] [ERROR] Error: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {'status': 429, 'title': 'Too Many Requests'}
...
[TIMESTAMP] [ERROR] Max retries reached for prompt: 
    Your job is to detect if there is a table of content provided in the given text...
[TIMESTAMP] [ERROR] Failed to extract JSON: Expecting value: line 1 column 1 (char 0)
[TIMESTAMP] [ERROR] Failed to parse JSON even after cleanup

============================================================
Traceback (most recent call last):
  File "reproduce_429.py", line 116, in main
    result = page_index(
        doc=pdf_path,
        model=model_name,
        if_add_node_id="yes",
        if_add_node_summary="yes",
        if_add_doc_description="yes"
    )
  File "pageindex/page_index.py", line 1121, in page_index
    return page_index_main(doc, opt)
  File "pageindex/page_index.py", line 1110, in page_index_main
    return asyncio.run(page_index_builder())
  File "Lib/asyncio/runners.py", line 195, in run
    return runner.run(main)
  File "Lib/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "Lib/asyncio/base_events.py", line 725, in run_until_complete
    return future.result()
  File "pageindex/page_index.py", line 1083, in page_index_builder
    structure = await tree_parser(page_list, opt, doc=doc, logger=logger)
  File "pageindex/page_index.py", line 1030, in tree_parser
    check_toc_result = check_toc(page_list, opt)
  File "pageindex/page_index.py", line 697, in check_toc
    toc_page_list = find_toc_pages(start_page_index=0, page_list=page_list, opt=opt)
  File "pageindex/page_index.py", line 351, in find_toc_pages
    detected_result = toc_detector_single_page(page_list[i][0],model=opt.model)
  File "pageindex/page_index.py", line 122, in toc_detector_single_page
    return json_content['toc_detected']
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'toc_detected'
============================================================
```

---



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Unthrottled concurrent LLM requests lead to HTTP 429 Rate Limits and cascading KeyError in tree generation #283

Root Cause Details

Environment Info

Steps to Reproduce

Observed Traceback / Logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Unthrottled concurrent LLM requests lead to HTTP 429 Rate Limits and cascading KeyError in tree generation #283

Description

Root Cause Details

Environment Info

Steps to Reproduce

Observed Traceback / Logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions