Skip to content

Remove graphic elements#1965

Open
edknv wants to merge 7 commits into
NVIDIA:mainfrom
edknv:edwardk/remove-graphic-elements-0
Open

Remove graphic elements#1965
edknv wants to merge 7 commits into
NVIDIA:mainfrom
edknv:edwardk/remove-graphic-elements-0

Conversation

@edknv
Copy link
Copy Markdown
Collaborator

@edknv edknv commented May 4, 2026

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@edknv edknv marked this pull request as ready for review May 5, 2026 00:20
@edknv edknv requested review from a team as code owners May 5, 2026 00:20
@edknv edknv requested a review from nkmcalli May 5, 2026 00:20
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Greptile Summary

This PR removes the graphic elements pipeline end-to-end: the GraphicElementsActor Ray operator, NemotronGraphicElementsV1 local model, InfographicDetectionActor family, related utilities in table_and_chart.py and ocr/shared.py, CLI flags, and ~3,000 lines of associated code. The mechanical cleanup is thorough and internally consistent across graph wiring, stage registry, batch tuning, tests, and config.

  • P1 – Breaking public API removals without deprecation: ExtractParams.use_graphic_elements, ExtractParams.graphic_elements_invoke_url, InfographicParams, NemotronGraphicElementsV1, and the InfographicDetectionActor family are all removed from their respective public __all__ exports with no deprecation warnings or migration documentation. Callers constructing ExtractParams(use_graphic_elements=True) will receive a Pydantic ValidationError; callers importing InfographicParams will get an ImportError.
  • P2 – yolox_endpoints left in ingest-config.yaml without explanation: The comments explaining its purpose and port mapping were removed; the key remains and is still processed by chart/config.py.

Confidence Score: 3/5

Hold for deprecation — the removal of multiple public API symbols without a migration path will break existing user code on upgrade.

A P1 finding (multiple breaking public API removals) is present. The mechanical removal is clean and correct, but public contracts (ExtractParams fields, InfographicParams, NemotronGraphicElementsV1, detection actors) are dropped with no deprecation warnings, violating the project's explicit backward-compatibility rule and the score ceiling for a P1 finding.

nemo_retriever/src/nemo_retriever/params/models.py, nemo_retriever/src/nemo_retriever/params/init.py, nemo_retriever/src/nemo_retriever/infographic/init.py, nemo_retriever/src/nemo_retriever/model/local/init.py

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/params/models.py Removes use_graphic_elements and graphic_elements_invoke_url from ExtractParams, and deletes InfographicParams entirely — breaking public API without a deprecation cycle.
nemo_retriever/src/nemo_retriever/infographic/init.py Removes detection-class exports (InfographicDetectionActor family) from __all__ and the corresponding mandatory imports — public API broken without deprecation.
nemo_retriever/src/nemo_retriever/ingest-config.yaml Removes YOLOX graphic-elements endpoint documentation from the chart section while leaving yolox_endpoints: null in place without explanation.
nemo_retriever/src/nemo_retriever/graph/ingestor_runtime.py Removes all GraphicElementsActor wiring, batch-tuning overrides, and ge_concurrency accounting from the Ray graph builder; cleanup appears complete.
nemo_retriever/src/nemo_retriever/ocr/shared.py Removes use_graphic_elements parameter from ocr_page_elements, _find_ge_detections_for_bbox helper, and the two graphic-element joining code paths; clean removal.
nemo_retriever/src/nemo_retriever/utils/table_and_chart.py Removes match_bboxes, _join_yolox_graphic_elements_and_ocr_output, process_yolox_graphic_elements, and join_graphic_elements_and_ocr_output — all specific to the removed graphic-elements pipeline.
nemo_retriever/src/nemo_retriever/pipeline/main.py Removes --use-graphic-elements and --graphic-elements-invoke-url CLI flags and their corresponding plumbing through _build_extract_params; consistent with model removal.
nemo_retriever/src/nemo_retriever/local/stages/stage999_post_mortem_analysis.py Renumbers stage sidecar keys (stage3→table_structure, stage4→ocr) and updates all downstream references consistently.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ExtractParams\nextract_charts=True] --> B{Before PR}
    A --> C{After PR}

    B --> D{use_graphic_elements?}
    D -- Yes --> E[GraphicElementsActor\nYOLOX GE model + OCR\nenrich_graphic_elements stage]
    D -- No --> F[OCRActor\nenrich_chart stage]

    C --> G[OCRActor\nenrich_chart stage\ndirect path only]

    E -.->|REMOVED| X1[chart/chart_detection.py\nchart/shared.py\nchart/cpu_actor.py\nchart/gpu_actor.py]
    E -.->|REMOVED| X2[model/local/nemotron_graphic_elements_v1.py\ninfographic/infographic_detection.py]
    E -.->|REMOVED| X3[ExtractParams.use_graphic_elements\nExtractParams.graphic_elements_invoke_url\nInfographicParams\nNemotronGraphicElementsV1]

    style E fill:#f96,stroke:#c33
    style X1 fill:#f96,stroke:#c33
    style X2 fill:#f96,stroke:#c33
    style X3 fill:#f96,stroke:#c33
    style G fill:#9f9,stroke:#393
Loading

Comments Outside Diff (1)

  1. nemo_retriever/src/nemo_retriever/ingest-config.yaml, line 159-180 (link)

    P2 yolox_endpoints left undocumented after comment removal

    The comments explaining that yolox_endpoints in chart.endpoint_config referred to the YOLOX graphic-elements model — including the docker-compose port mapping and in-network hostname examples — were removed. The yolox_endpoints: null key remains in the file, and chart/config.py still processes it via load_chart_extractor_schema_from_dict. Without the context comment, operators configuring this file have no indication of what yolox_endpoints points to, what service to run, or what port to use. If this key is now dead config (no downstream consumer in ChartExtractorSchema), it should be removed; if it still serves a purpose, the comment should be restored.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: nemo_retriever/src/nemo_retriever/ingest-config.yaml
    Line: 159-180
    
    Comment:
    **`yolox_endpoints` left undocumented after comment removal**
    
    The comments explaining that `yolox_endpoints` in `chart.endpoint_config` referred to the YOLOX *graphic-elements* model — including the docker-compose port mapping and in-network hostname examples — were removed. The `yolox_endpoints: null` key remains in the file, and `chart/config.py` still processes it via `load_chart_extractor_schema_from_dict`. Without the context comment, operators configuring this file have no indication of what `yolox_endpoints` points to, what service to run, or what port to use. If this key is now dead config (no downstream consumer in `ChartExtractorSchema`), it should be removed; if it still serves a purpose, the comment should be restored.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
nemo_retriever/src/nemo_retriever/params/models.py:281-300
**Breaking public API removals without deprecation**

Several items that were part of the stable public API surface are removed in this PR with no deprecation cycle or migration note. Any existing user code relying on these will break at import or call time:

- `ExtractParams.use_graphic_elements` and `ExtractParams.graphic_elements_invoke_url` removed — callers that construct `ExtractParams(use_graphic_elements=True, ...)` will now receive a Pydantic `ValidationError` for unexpected fields.
- `InfographicParams` removed from `nemo_retriever.params` (`params/__init__.py`) — `from nemo_retriever.params import InfographicParams` will raise `ImportError`.
- `NemotronGraphicElementsV1` removed from `nemo_retriever.model.local` (`model/local/__init__.py`).
- `InfographicDetectionActor`, `InfographicDetectionCPUActor`, `InfographicDetectionGPUActor`, `detect_infographic_elements_v1` removed from `nemo_retriever.infographic` (`infographic/__init__.py`).

Per the `api-backward-compatibility` rule, removing or renaming public parameters requires a deprecation cycle (emit `warnings.warn(..., DeprecationWarning)` for one release before hard removal) and migration documentation.

### Issue 2 of 2
nemo_retriever/src/nemo_retriever/ingest-config.yaml:159-180
**`yolox_endpoints` left undocumented after comment removal**

The comments explaining that `yolox_endpoints` in `chart.endpoint_config` referred to the YOLOX *graphic-elements* model — including the docker-compose port mapping and in-network hostname examples — were removed. The `yolox_endpoints: null` key remains in the file, and `chart/config.py` still processes it via `load_chart_extractor_schema_from_dict`. Without the context comment, operators configuring this file have no indication of what `yolox_endpoints` points to, what service to run, or what port to use. If this key is now dead config (no downstream consumer in `ChartExtractorSchema`), it should be removed; if it still serves a purpose, the comment should be restored.

Reviews (1): Last reviewed commit: "lint" | Re-trigger Greptile

Comment thread nemo_retriever/src/nemo_retriever/params/models.py
@edknv edknv requested review from jperez999 and removed request for nkmcalli May 5, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant