Remove graphic elements#1965
Conversation
…e-graphic-elements-0
…nv-ingest into edwardk/remove-graphic-elements-0
Greptile SummaryThis PR removes the graphic elements pipeline end-to-end: the
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/params/models.py | Removes use_graphic_elements and graphic_elements_invoke_url from ExtractParams, and deletes InfographicParams entirely — breaking public API without a deprecation cycle. |
| nemo_retriever/src/nemo_retriever/infographic/init.py | Removes detection-class exports (InfographicDetectionActor family) from __all__ and the corresponding mandatory imports — public API broken without deprecation. |
| nemo_retriever/src/nemo_retriever/ingest-config.yaml | Removes YOLOX graphic-elements endpoint documentation from the chart section while leaving yolox_endpoints: null in place without explanation. |
| nemo_retriever/src/nemo_retriever/graph/ingestor_runtime.py | Removes all GraphicElementsActor wiring, batch-tuning overrides, and ge_concurrency accounting from the Ray graph builder; cleanup appears complete. |
| nemo_retriever/src/nemo_retriever/ocr/shared.py | Removes use_graphic_elements parameter from ocr_page_elements, _find_ge_detections_for_bbox helper, and the two graphic-element joining code paths; clean removal. |
| nemo_retriever/src/nemo_retriever/utils/table_and_chart.py | Removes match_bboxes, _join_yolox_graphic_elements_and_ocr_output, process_yolox_graphic_elements, and join_graphic_elements_and_ocr_output — all specific to the removed graphic-elements pipeline. |
| nemo_retriever/src/nemo_retriever/pipeline/main.py | Removes --use-graphic-elements and --graphic-elements-invoke-url CLI flags and their corresponding plumbing through _build_extract_params; consistent with model removal. |
| nemo_retriever/src/nemo_retriever/local/stages/stage999_post_mortem_analysis.py | Renumbers stage sidecar keys (stage3→table_structure, stage4→ocr) and updates all downstream references consistently. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[ExtractParams\nextract_charts=True] --> B{Before PR}
A --> C{After PR}
B --> D{use_graphic_elements?}
D -- Yes --> E[GraphicElementsActor\nYOLOX GE model + OCR\nenrich_graphic_elements stage]
D -- No --> F[OCRActor\nenrich_chart stage]
C --> G[OCRActor\nenrich_chart stage\ndirect path only]
E -.->|REMOVED| X1[chart/chart_detection.py\nchart/shared.py\nchart/cpu_actor.py\nchart/gpu_actor.py]
E -.->|REMOVED| X2[model/local/nemotron_graphic_elements_v1.py\ninfographic/infographic_detection.py]
E -.->|REMOVED| X3[ExtractParams.use_graphic_elements\nExtractParams.graphic_elements_invoke_url\nInfographicParams\nNemotronGraphicElementsV1]
style E fill:#f96,stroke:#c33
style X1 fill:#f96,stroke:#c33
style X2 fill:#f96,stroke:#c33
style X3 fill:#f96,stroke:#c33
style G fill:#9f9,stroke:#393
Comments Outside Diff (1)
-
nemo_retriever/src/nemo_retriever/ingest-config.yaml, line 159-180 (link)yolox_endpointsleft undocumented after comment removalThe comments explaining that
yolox_endpointsinchart.endpoint_configreferred to the YOLOX graphic-elements model — including the docker-compose port mapping and in-network hostname examples — were removed. Theyolox_endpoints: nullkey remains in the file, andchart/config.pystill processes it viaload_chart_extractor_schema_from_dict. Without the context comment, operators configuring this file have no indication of whatyolox_endpointspoints to, what service to run, or what port to use. If this key is now dead config (no downstream consumer inChartExtractorSchema), it should be removed; if it still serves a purpose, the comment should be restored.Prompt To Fix With AI
This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/ingest-config.yaml Line: 159-180 Comment: **`yolox_endpoints` left undocumented after comment removal** The comments explaining that `yolox_endpoints` in `chart.endpoint_config` referred to the YOLOX *graphic-elements* model — including the docker-compose port mapping and in-network hostname examples — were removed. The `yolox_endpoints: null` key remains in the file, and `chart/config.py` still processes it via `load_chart_extractor_schema_from_dict`. Without the context comment, operators configuring this file have no indication of what `yolox_endpoints` points to, what service to run, or what port to use. If this key is now dead config (no downstream consumer in `ChartExtractorSchema`), it should be removed; if it still serves a purpose, the comment should be restored. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
nemo_retriever/src/nemo_retriever/params/models.py:281-300
**Breaking public API removals without deprecation**
Several items that were part of the stable public API surface are removed in this PR with no deprecation cycle or migration note. Any existing user code relying on these will break at import or call time:
- `ExtractParams.use_graphic_elements` and `ExtractParams.graphic_elements_invoke_url` removed — callers that construct `ExtractParams(use_graphic_elements=True, ...)` will now receive a Pydantic `ValidationError` for unexpected fields.
- `InfographicParams` removed from `nemo_retriever.params` (`params/__init__.py`) — `from nemo_retriever.params import InfographicParams` will raise `ImportError`.
- `NemotronGraphicElementsV1` removed from `nemo_retriever.model.local` (`model/local/__init__.py`).
- `InfographicDetectionActor`, `InfographicDetectionCPUActor`, `InfographicDetectionGPUActor`, `detect_infographic_elements_v1` removed from `nemo_retriever.infographic` (`infographic/__init__.py`).
Per the `api-backward-compatibility` rule, removing or renaming public parameters requires a deprecation cycle (emit `warnings.warn(..., DeprecationWarning)` for one release before hard removal) and migration documentation.
### Issue 2 of 2
nemo_retriever/src/nemo_retriever/ingest-config.yaml:159-180
**`yolox_endpoints` left undocumented after comment removal**
The comments explaining that `yolox_endpoints` in `chart.endpoint_config` referred to the YOLOX *graphic-elements* model — including the docker-compose port mapping and in-network hostname examples — were removed. The `yolox_endpoints: null` key remains in the file, and `chart/config.py` still processes it via `load_chart_extractor_schema_from_dict`. Without the context comment, operators configuring this file have no indication of what `yolox_endpoints` points to, what service to run, or what port to use. If this key is now dead config (no downstream consumer in `ChartExtractorSchema`), it should be removed; if it still serves a purpose, the comment should be restored.
Reviews (1): Last reviewed commit: "lint" | Re-trigger Greptile
Description
Checklist