Added function to get merged cell range #35

harumiWeb · 2026-01-05T03:04:40Z

Summary by CodeRabbit

New Features
- Extract merged-cell ranges from Excel and opt to include/exclude them in outputs
- New output/filter option to control merged-cell inclusion
- Added rich example demonstrating form reconstruction with many merged cells
Documentation
- Updated READMEs and docs with merged-cell details, examples, and usage notes
- Added sample application forms and a Gantt chart example for real-world scenarios

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Pipeline/Modeling に merged_cells を追加して統合 - Engine に抽出オプションと出力フィルタ（include_merged_cells）を追加

coderabbitai · 2026-01-05T03:04:49Z

📝 Walkthrough

Walkthrough

A new feature adds merged cell range extraction and filtering across ExStruct: MergedCell model, extraction via OpenPyXL, backend protocol extension, pipeline step, engine options and output filtering, updated modeling, tests, samples, and docs reflecting merged_cells support.

Changes

Cohort / File(s)	Summary
Documentation `README.md`, `README.ja.md`, `docs/README.en.md`, `docs/README.ja.md`, `docs/release-notes/v0.3.2.md`, `mkdocs.yml`	Added/updated examples and descriptions to include merged cell ranges, renamed benchmark → example, added Example 2 and release notes, and updated navigation.
Data model & public API surface `src/exstruct/models/__init__.py`, `docs/agents/DATA_MODEL.md`, `docs/agents/FEATURE_SPEC.md`, `pyproject.toml`	Added `MergedCell` type and `SheetData.merged_cells`; bumped docs/spec versions and project version to v0.3.2.
Core extraction utilities `src/exstruct/core/cells.py`, `src/exstruct/core/backends/base.py`	New function `extract_sheet_merged_cells(Path) -> dict[str, list[MergedCell]]`; added `MergedCellData` alias and `Backend.extract_merged_cells()` method.
Backends `src/exstruct/core/backends/openpyxl_backend.py`, `src/exstruct/core/backends/com_backend.py`	Openpyxl backend implements `extract_merged_cells()` delegating to cell utils with safe-fallback; COM backend exposes method raising `NotImplementedError`.
Pipeline & integration `src/exstruct/core/pipeline.py`, `src/exstruct/core/integrate.py`, `src/exstruct/core/modeling.py`	Added include_merged_cells flag to inputs, artifacts, and resolve logic; new pipeline step `step_extract_merged_cells_openpyxl`; threaded merged_cell_data through raw collection and workbook building; `extract_workbook()` gains `include_merged_cells` param.
Engine & filtering `src/exstruct/engine.py`	`StructOptions` and `FilterOptions` gain `include_merged_cells`; engine forwards option to extraction and conditionally filters merged_cells from outputs.
Samples Forms: `sample/forms_with_many_merged_cells/en_form_sf425/`, `sample/forms_with_many_merged_cells/ja_general_form/` Gantt: `sample/gantt_chart/en/*`, (deleted) `sample/gantt_chart/sample_gantt.json`, `sample/gantt_chart/sample_gantt_json_for_llm.md`	Added merged-cell–heavy form samples (EN/JA) and notebook guidance; reorganized Gantt samples into en/ with new example script and JSON; removed older Japanese Gantt samples.
Tests — core & backends `tests/core/test_merged_cells_core.py`, `tests/backends/test_merged_cells.py`, `tests/backends/test_auto_page_breaks.py`	Added unit tests for `extract_sheet_merged_cells`, Openpyxl backend success/failure, and COM backend NotImplemented behavior; adjusted helpers to accept new flag.
Tests — pipeline & integration `tests/core/test_pipeline.py`, `tests/core/test_pipeline_fallbacks.py`, `tests/integration/test_integrate_raw_data.py`	Updated pipeline tests to expect merged-cells step presence/absence based on flag; propagated merged_cell_data in integration tests.
Tests — modeling, engine & export `tests/models/test_modeling.py`, `tests/engine/test_engine.py`, `tests/export/test_export_requirements.py`	Updated tests to include merged_cells in raw/model data; added test ensuring empty merged_cells is omitted from serialized JSON.
Planning / QA docs `docs/agents/TASKS.md`, `docs/agents/TEST_REQUIREMENTS.md`, `docs/agents/ROADMAP.md`	Converted SmartArt-related plans to merged-cell–focused tasks; added test cases (MRG-01..MRG-04), EXP-25, roadmap entries v0.3.2/v0.3.3, and coverage target updates.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Engine as ExStructEngine
    participant Pipeline as Extraction Pipeline
    participant Backend as OpenpyxlBackend
    participant Cells as Cell Utils
    participant Models as Modeling

    Client->>Engine: extract(file, mode, include_merged_cells=True)
    Engine->>Pipeline: extract_workbook(..., include_merged_cells=True)
    Pipeline->>Pipeline: resolve_extraction_inputs(...)

    alt include_merged_cells == True
        Pipeline->>Backend: extract_merged_cells(file)
        Backend->>Cells: extract_sheet_merged_cells(file)
        Cells->>Cells: iterate ws.merged_cells.ranges\ncompute (r1,c1,r2,c2) and top-left value
        Cells-->>Backend: dict[sheet -> list[MergedCell]]
        Backend-->>Pipeline: merged_cell_data
        Pipeline->>Pipeline: step_extract_merged_cells_openpyxl()
        Pipeline->>Models: collect_sheet_raw_data(..., merged_cell_data)
    else
        Pipeline->>Models: collect_sheet_raw_data(..., merged_cells=[])
    end

    Pipeline->>Models: build_sheet_data(raw.merged_cells)
    Models->>Engine: WorkbookData(sheets with merged_cells)
    alt FilterOptions.include_merged_cells == True
        Engine-->>Client: WorkbookData includes merged_cells
    else
        Engine-->>Client: WorkbookData omits merged_cells
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Dev/refactor #23 – Refactors core pipeline/backends/models that this PR extends; closely related at the code/interface level.

Poem

🐇 I hop through rows where ranges hide,
I stitch the cells that once were wide.
r1 to c2, a tidy sweep,
Merged ranges saved for LLMs to keep.
Hooray—JSON tidy, snug, and neat! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.55% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Added function to get merged cell range' accurately describes the primary feature introduced in this PR—merged cell extraction functionality across multiple backend implementations and data models.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-01-05T03:05:38Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 95.83333% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/exstruct/core/cells.py	88.88%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (8)

sample/gantt_chart/en/sample.json_for_llm.md (1)
3-3: Optional: Consider using a proper heading level instead of emphasis.

Line 3 uses italic emphasis for the subtitle. While this is stylistically acceptable, consider using a lower-level heading (e.g., ### (Original, Redistributable Template)) for better semantic structure and accessibility.
🔎 Proposed fix
-_(Original, Redistributable Template)_
+### (Original, Redistributable Template)
src/exstruct/models/__init__.py (1)
79-87: Suggest adding Field descriptor to v for consistency.

The v field (line 86) uses a simple default value "" while all other fields in MergedCell use Field() descriptors with descriptions. For consistency and clarity, consider adding a Field descriptor with a description.
🔎 Proposed fix
-    v: str = ""
+    v: str = Field(default="", description="Representative value of the merged cell range.")
tests/export/test_export_requirements.py (1)
117-120: Parse JSON instead of substring checking for more robust assertions.

The current test performs substring matching on the JSON string ("merged_cells" not in payload), which is fragile. If "merged_cells" appeared anywhere in the JSON (e.g., in a value or nested key), the test would incorrectly fail.
🔎 Proposed refactor to parse and validate JSON structure
 def test_merged_cells_empty_is_omitted_in_sheet_json() -> None:
     sheet = SheetData(rows=[CellRow(r=1, c={"0": "v"})], merged_cells=[])
-    payload = sheet.to_json()
-    assert "merged_cells" not in payload
+    payload_json = sheet.to_json()
+    payload = json.loads(payload_json)
+    assert "merged_cells" not in payload
src/exstruct/core/backends/com_backend.py (1)
157-159: Enhance docstring to match the file's Google-style pattern.

Other methods in this file (extract_print_areas, extract_colors_map, extract_auto_page_breaks) use Google-style docstrings with Returns and/or Raises sections. The new extract_merged_cells method should follow the same pattern for consistency.
🔎 Proposed docstring enhancement
     def extract_merged_cells(self) -> MergedCellData:
-        """Extract merged cell ranges via COM (not implemented)."""
+        """Extract merged cell ranges via COM (not implemented).
+
+        Returns:
+            Mapping of sheet name to merged cell ranges.
+
+        Raises:
+            NotImplementedError: COM merged cell extraction is not yet supported.
+        """
         raise NotImplementedError("COM merged cell extraction is not implemented.")
src/exstruct/core/backends/openpyxl_backend.py (1)
91-100: Consider adding logging for consistency and debugging.

The method correctly delegates to extract_sheet_merged_cells and handles failures gracefully. However, unlike the similar extract_colors_map method (lines 67-89), this method silently swallows exceptions. Adding a debug or warning log would improve observability and align with the established pattern.
🔎 Proposed enhancement for consistency
 def extract_merged_cells(self) -> MergedCellData:
     """Extract merged cell ranges per sheet.

     Returns:
         Mapping of sheet name to merged cell ranges.
     """
     try:
         return extract_sheet_merged_cells(self.file_path)
-    except Exception:
+    except Exception as exc:
+        logger.debug(
+            "Merged cells extraction failed; returning empty dict. (%r)", exc
+        )
         return {}
tests/core/test_merged_cells_core.py (2)
8-27: Add docstrings to test functions per coding guidelines.

The helper function and test function are missing Google-style docstrings. Per the project's coding guidelines, all functions should include docstrings.
🔎 Suggested docstrings
 def _make_merged_book(path: Path) -> None:
+    """Create a test workbook with merged cell ranges.
+
+    Args:
+        path: Destination file path for the workbook.
+    """
     wb = Workbook()
 def test_extract_sheet_merged_cells_basic(tmp_path: Path) -> None:
+    """Test extraction of merged cell ranges from a workbook."""
     path = tmp_path / "merged.xlsx"
30-36: Add docstring to test function.
 def test_extract_sheet_merged_cells_empty(tmp_path: Path) -> None:
+    """Test that extraction returns empty list for workbooks without merged cells."""
     path = tmp_path / "plain.xlsx"
sample/forms_with_many_merged_cells/en_form_sf425/sample.json_for_llm.md (1)

1-149: Optional: Address markdown linting issues.

The static analysis tools flagged several markdown style issues, primarily using bold emphasis instead of proper headings (MD036) and heading level increments that skip levels (MD001). While these don't affect functionality, addressing them would improve document structure and consistency.

Since this is sample documentation, these style issues are minor and can be addressed at your discretion.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea60a35 and 9a87f42.

⛔ Files ignored due to path filters (7)

docs/assets/demo_form.ja.png is excluded by !**/*.png
docs/assets/demo_form_en.png is excluded by !**/*.png
sample/forms_with_many_merged_cells/en_form_sf425/sample.xlsx is excluded by !**/*.xlsx
sample/forms_with_many_merged_cells/ja_general_form/ja_form.xlsx is excluded by !**/*.xlsx
sample/gantt_chart/en/sample.xlsx is excluded by !**/*.xlsx
sample/gantt_chart/sample_gantt.xlsx is excluded by !**/*.xlsx
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (38)

README.ja.md
README.md
docs/README.en.md
docs/README.ja.md
docs/agents/DATA_MODEL.md
docs/agents/FEATURE_SPEC.md
docs/agents/ROADMAP.md
docs/agents/TASKS.md
docs/agents/TEST_REQUIREMENTS.md
sample/basic/sample.json_for_llm.md
sample/forms_with_many_merged_cells/en_form_sf425/README.md
sample/forms_with_many_merged_cells/en_form_sf425/sample.json
sample/forms_with_many_merged_cells/en_form_sf425/sample.json_for_llm.md
sample/forms_with_many_merged_cells/ja_general_form/ja_form.json
sample/forms_with_many_merged_cells/ja_general_form/ja_form.json_for_llm.md
sample/gantt_chart/en/example.py
sample/gantt_chart/en/sample.json
sample/gantt_chart/en/sample.json_for_llm.md
sample/gantt_chart/sample_gantt.json
sample/gantt_chart/sample_gantt_json_for_llm.md
src/exstruct/core/backends/base.py
src/exstruct/core/backends/com_backend.py
src/exstruct/core/backends/openpyxl_backend.py
src/exstruct/core/cells.py
src/exstruct/core/integrate.py
src/exstruct/core/modeling.py
src/exstruct/core/pipeline.py
src/exstruct/engine.py
src/exstruct/models/__init__.py
tests/backends/test_auto_page_breaks.py
tests/backends/test_merged_cells.py
tests/core/test_merged_cells_core.py
tests/core/test_pipeline.py
tests/core/test_pipeline_fallbacks.py
tests/engine/test_engine.py
tests/export/test_export_requirements.py
tests/integration/test_integrate_raw_data.py
tests/models/test_modeling.py

💤 Files with no reviewable changes (2)

sample/gantt_chart/sample_gantt_json_for_llm.md
sample/gantt_chart/sample_gantt.json

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Add type hints to all function and method arguments and return values, must be mypy strict compliant
Use Any type only at external library boundaries (xlwings, pandas, numpy, etc.)
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Each function must have a single responsibility; cyclomatic complexity must not exceed 12
Organize imports in the following order: 1) standard library, 2) third-party packages, 3) exstruct internal modules
Add Google-style docstrings to all functions and classes
Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer
Generated code must pass mypy strict mode with zero errors
Generated code must pass Ruff linter checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid God Functions (large monolithic functions that do too much)
Avoid God Objects (classes that have too many responsibilities)
Eliminate unnecessary nesting and deeply nested conditional branches
Avoid circular dependencies between modules
Write functions that are testable and avoid unnecessary side effects

Files:

tests/export/test_export_requirements.py
src/exstruct/engine.py
tests/integration/test_integrate_raw_data.py
tests/backends/test_auto_page_breaks.py
src/exstruct/models/__init__.py
tests/core/test_merged_cells_core.py
src/exstruct/core/backends/com_backend.py
src/exstruct/core/cells.py
src/exstruct/core/backends/openpyxl_backend.py
tests/backends/test_merged_cells.py
src/exstruct/core/modeling.py
tests/core/test_pipeline.py
src/exstruct/core/integrate.py
tests/models/test_modeling.py
sample/gantt_chart/en/example.py
src/exstruct/core/pipeline.py
src/exstruct/core/backends/base.py
tests/core/test_pipeline_fallbacks.py
tests/engine/test_engine.py

🧬 Code graph analysis (14)

tests/export/test_export_requirements.py (1)

src/exstruct/models/__init__.py (2)

SheetData (145-237)

CellRow (89-98)

src/exstruct/engine.py (1)

src/exstruct/__init__.py (1)

process_excel (318-398)

tests/core/test_merged_cells_core.py (2)

src/exstruct/core/cells.py (1)

extract_sheet_merged_cells (529-561)

tests/backends/test_merged_cells.py (1)

_make_merged_book (14-20)

src/exstruct/core/backends/com_backend.py (3)

src/exstruct/core/backends/base.py (1)

extract_merged_cells (41-42)

src/exstruct/core/backends/openpyxl_backend.py (1)

extract_merged_cells (91-100)

tests/backends/test_backends.py (2)

test_com_backend_extract_print_areas_handles_sheet_error (104-123)

test_com_backend_extract_colors_map_returns_none_on_failure (78-101)

src/exstruct/core/cells.py (2)

src/exstruct/models/__init__.py (1)

MergedCell (79-86)

src/exstruct/core/workbook.py (1)

openpyxl_workbook (17-56)

src/exstruct/core/backends/openpyxl_backend.py (5)

src/exstruct/core/cells.py (1)

extract_sheet_merged_cells (529-561)

src/exstruct/core/ranges.py (1)

parse_range_zero_based (25-48)

src/exstruct/core/workbook.py (1)

openpyxl_workbook (17-56)

src/exstruct/core/backends/base.py (1)

extract_merged_cells (41-42)

src/exstruct/core/backends/com_backend.py (1)

extract_merged_cells (157-159)

tests/backends/test_merged_cells.py (3)

src/exstruct/core/backends/com_backend.py (1)

ComBackend (20-159)

src/exstruct/core/backends/openpyxl_backend.py (1)

OpenpyxlBackend (26-114)

src/exstruct/models/__init__.py (1)

MergedCell (79-86)

src/exstruct/core/modeling.py (1)

src/exstruct/models/__init__.py (1)

MergedCell (79-86)

tests/core/test_pipeline.py (1)

src/exstruct/core/pipeline.py (3)

build_pre_com_pipeline (255-338)

ExtractionInputs (43-66)

ExtractionArtifacts (70-89)

src/exstruct/core/integrate.py (2)

tests/integration/test_integrate_fallback.py (2)

test_extract_workbook_fallback_includes_print_areas (38-56)

test_extract_workbook_with_links (59-79)

src/exstruct/__init__.py (1)

extract (91-125)

sample/gantt_chart/en/example.py (2)

src/exstruct/engine.py (3)

ColorsOptions (33-56)

ExStructEngine (171-571)

StructOptions (60-84)

src/exstruct/__init__.py (1)

export (128-167)

src/exstruct/core/pipeline.py (4)

src/exstruct/models/__init__.py (2)

MergedCell (79-86)

PrintArea (136-142)

src/exstruct/core/backends/openpyxl_backend.py (2)

OpenpyxlBackend (26-114)

extract_merged_cells (91-100)

src/exstruct/core/backends/base.py (1)

extract_merged_cells (41-42)

src/exstruct/core/backends/com_backend.py (1)

extract_merged_cells (157-159)

src/exstruct/core/backends/base.py (3)

src/exstruct/models/__init__.py (1)

MergedCell (79-86)

src/exstruct/core/backends/com_backend.py (1)

extract_merged_cells (157-159)

src/exstruct/core/backends/openpyxl_backend.py (1)

extract_merged_cells (91-100)

tests/engine/test_engine.py (2)

src/exstruct/models/__init__.py (1)

MergedCell (79-86)

src/exstruct/engine.py (3)

OutputOptions (149-168)

FilterOptions (101-128)

serialize (362-385)

🪛 LanguageTool

sample/forms_with_many_merged_cells/en_form_sf425/sample.json_for_llm.md

[style] ~139-~139: Try using a synonym here to strengthen your wording.
Context: ...ng the collection of information. Send comments regarding the burden estimate or any ot...

(COMMENT_REMARK)

🪛 markdownlint-cli2 (0.18.1)

sample/forms_with_many_merged_cells/ja_general_form/ja_form.json_for_llm.md

32-32: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

sample/gantt_chart/en/sample.json_for_llm.md

3-3: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

sample/forms_with_many_merged_cells/en_form_sf425/sample.json_for_llm.md

5-5: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

11-11: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

15-15: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

21-21: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

55-55: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

57-57: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

59-59: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

65-65: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

121-121: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

125-125: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

147-147: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

docs/README.ja.md

500-500: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

README.ja.md

500-500: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Codacy Static Code Analysis
GitHub Check: test (windows-latest, 3.11)

🔇 Additional comments (49)

docs/agents/ROADMAP.md (1)

39-45: ✓ LGTM — roadmap entries are well-formatted and correctly positioned.

The new version sections (v0.3.2 and v0.3.3) follow the existing document structure and language conventions. Feature descriptions are clear, and version ordering is correct.

sample/forms_with_many_merged_cells/ja_general_form/ja_form.json_for_llm.md (1)

1-110: Well-structured sample documentation.

This Markdown file provides a clear, human-readable representation of the Japanese care insurance application form. The structure, tables, and formatting are appropriate for documentation purposes.

Note: The static analysis hint about line 32 is a false positive—the bold text **（※）...** is correctly used for emphasis on a parenthetical note, not as a heading.

sample/forms_with_many_merged_cells/ja_general_form/ja_form.json (2)

217-363: Well-structured merged_cells data.

The merged_cells array is comprehensive and properly formatted with valid coordinate ranges and optional values. This provides a good test case for the merged cell extraction feature, covering various merge patterns throughout the form.

188-214: No action required—the data structure is valid.

The values -1, -2, -3, and -4 in rows 64–68 are cell values stored in column "0", not column keys. Column keys are always strings (e.g., "0", "1"). The schema explicitly allows integer values, including negative integers, so this is a valid and supported pattern.

Likely an incorrect or invalid review comment.

tests/backends/test_auto_page_breaks.py (1)

30-30: LGTM! Test scaffolding aligned with expanded API.

The new include_merged_cells parameter properly extends the test helper's signature to match the production extract_workbook interface, even though this specific test focuses on auto_page_breaks. The unused parameter is appropriate for a mock function.

src/exstruct/models/__init__.py (1)

173-175: LGTM! Properly integrated merged_cells field.

The merged_cells field is correctly defined with proper type hints, Field descriptor, and description. It follows the established pattern used for other list fields in SheetData.

docs/agents/TASKS.md (1)

1-17: LGTM! Comprehensive test coverage plan.

The task list covers all critical testing areas for the merged cells feature: backend extraction, pipeline propagation, modeling, engine filtering, and export serialization. The coverage target is clearly stated.

sample/gantt_chart/en/sample.json (1)

460-485: LGTM! Valid merged_cells sample data.

The merged_cells array correctly demonstrates the MergedCell model with:

Proper coordinate format (r1/r2 as 1-based rows, c1/c2 as 0-based columns)

Consistent horizontal merges for phase headers and title

Representative values matching the corresponding row data

This sample effectively demonstrates the merged cell feature.

sample/forms_with_many_merged_cells/en_form_sf425/README.md (1)

9-10: URL is valid and SF-425 form is available.

The EPA forms URL (https://www.epa.gov/financial/forms) is accessible and the SF-425 form is available at that location.

tests/core/test_pipeline_fallbacks.py (1)

37-37: LGTM: Consistent API extension for merged cells parameter.

The addition of include_merged_cells=None across all three test functions correctly aligns with the updated resolve_extraction_inputs signature.

Also applies to: 70-70, 111-111

src/exstruct/core/backends/openpyxl_backend.py (1)

16-16: LGTM: Clean imports for merged cells support.

The new imports are properly organized and support the merged cell extraction feature.

Also applies to: 20-20

tests/backends/test_merged_cells.py (1)

23-29: LGTM: Comprehensive test coverage for merged cell extraction.

The three test functions provide good coverage:

Successful extraction from OpenpyxlBackend

Failure handling with graceful fallback

NotImplementedError verification for ComBackend

Also applies to: 32-43, 46-49

tests/integration/test_integrate_raw_data.py (1)

59-59: LGTM: Clean integration of merged_cell_data parameter.

The addition of merged_cell_data={"Sheet1": []} correctly extends the test calls to match the updated collect_sheet_raw_data signature.

Also applies to: 94-94

src/exstruct/core/modeling.py (1)

9-9: LGTM! Clean integration of merged_cells field.

The changes correctly add the merged_cells field to SheetRawData and propagate it through to SheetData via build_sheet_data. Type hints, docstrings, and import organization all follow the project's coding guidelines.

Also applies to: 30-30, 40-40, 73-73

tests/models/test_modeling.py (1)

28-28: LGTM! Correctly accommodates the new field.

The addition of merged_cells=[] appropriately updates the test to include the new required field in SheetRawData.

docs/agents/DATA_MODEL.md (1)

3-3: LGTM! Documentation accurately reflects the new feature.

The documentation correctly describes the new MergedCell model with proper field types and coordinate system conventions (1-based rows, 0-based columns). The version bump, changelog entry, and section renumbering are all appropriate.

Also applies to: 151-167, 180-180, 248-248

tests/engine/test_engine.py (4)

17-17: LGTM!

Import of MergedCell is correctly added to support the new merged cells feature in the sample workbook.

37-37: LGTM!

The include_merged_cells parameter is correctly added to the fake extract function signature, matching the real extract_workbook API.

68-68: LGTM!

The sample workbook now includes a MergedCell instance, enabling comprehensive testing of the merged cells feature throughout the engine tests.

91-97: LGTM!

The test correctly verifies that include_merged_cells=False filters out merged cell data from serialized output. The pattern follows the existing filter tests (test_engine_serialize_filters_shapes, test_engine_serialize_filters_tables).

Note: Unlike test_engine_serialize_filters_tables (line 82), this test correctly omits the unused tmp_path parameter.

src/exstruct/core/cells.py (2)

18-18: LGTM!

Import of MergedCell alongside CellRow is correct and follows the existing import pattern.

529-561: LGTM!

Well-implemented function for extracting merged cell ranges:

Proper type hints and Google-style docstring as per coding guidelines

Correctly uses openpyxl_workbook context manager for resource safety

Handles edge cases: None merged_cells, None cell values

Column indices correctly converted from 1-based (openpyxl) to 0-based (as documented in MergedCell model: "c1: Start column (0-based)")

The implementation aligns with the MergedCell model definition in src/exstruct/models/__init__.py (lines 78-85).

src/exstruct/core/backends/base.py (3)

6-6: LGTM!

Import correctly updated to include MergedCell alongside existing model imports.

11-11: LGTM!

Type alias MergedCellData follows the established pattern of CellData and PrintAreaData.

40-42: LGTM!

The extract_merged_cells method is correctly added to the Backend protocol, following the pattern of existing methods (extract_cells, extract_print_areas). The relevant code snippets confirm that:

OpenpyxlBackend implements this method (returns {} on failure)

ComBackend raises NotImplementedError

This aligns with test requirements BE-06 and BE-07 in docs/agents/TEST_REQUIREMENTS.md.

docs/agents/TEST_REQUIREMENTS.md (5)

3-3: LGTM!

Version bump to 0.5 and explicit 78% coverage requirement are appropriate additions for tracking the merged cells feature release.

Also applies to: 7-7

51-57: LGTM!

Comprehensive test requirements for merged cells:

MRG-01: Mode-specific extraction (standard/verbose only)

MRG-02: Coordinate conventions (1-based row, 0-based column)

MRG-03: Top-left cell value handling

MRG-04: Multiple range retention

These align with the MergedCell model definition and extract_sheet_merged_cells implementation.

154-154: LGTM!

EXP-25 correctly specifies filtering behavior, consistent with the test in tests/engine/test_engine.py (line 91-97).

225-225: LGTM!

PIPE-09 correctly documents conditional pipeline step inclusion for merged cells extraction.

236-237: LGTM!

Backend test requirements:

BE-06: OpenpyxlBackend returns empty map on failure (matches openpyxl_backend.py lines 90-99)

BE-07: ComBackend raises NotImplementedError (matches com_backend.py lines 156-158)

docs/README.ja.md (5)

5-5: LGTM!

Image path format is consistent with the documentation structure.

11-12: LGTM!

Feature descriptions correctly updated to include "セル結合範囲" (merged cell ranges) in both the feature list and output mode descriptions.

338-495: LGTM!

Excellent Example 2 demonstrating the merged cells feature with a practical application form use case. The JSON excerpt shows:

merged_cells array with r1, c1, r2, c2, v fields

Real-world form data structure

LLM transformation to Markdown

This effectively showcases how ExStruct handles complex Excel structures with merged cells.

496-501: Intentional styling choice.

The static analysis hint (MD036) flags the bold text as "emphasis used instead of heading". This appears to be an intentional formatting choice to highlight the key takeaway without disrupting the document hierarchy.

509-509: LGTM!

Sample directory listing correctly updated with the new merged cells sample folder.

docs/README.en.md (5)

7-7: LGTM!

Feature description correctly expanded to include "merged cell ranges" in the comprehensive list of structured data outputs.

14-14: LGTM!

Output mode descriptions correctly updated to indicate merged cell ranges are included in standard and verbose modes.

136-137: LGTM!

Mode documentation accurately reflects the merged cells feature:

standard: includes merged cell ranges

verbose: includes merged cell ranges

This is consistent with MRG-01 in TEST_REQUIREMENTS.md.

339-552: LGTM!

Example 2 effectively demonstrates the merged cells feature using the SF-425 Federal Financial Report form. The JSON excerpt shows:

merged_cells array with coordinate fields

Values preserved from top-left cells

Integration with existing features (rows, shapes, print_areas)

The LLM reconstruction demonstrates practical value for document processing workflows.

562-562: LGTM!

Sample directory listing updated consistently with Japanese README.

src/exstruct/core/integrate.py (3)

20-20: LGTM!

Parameter include_merged_cells: bool | None = None correctly added following the established pattern of other optional feature flags (include_cell_links, include_print_areas, include_colors_map).

35-35: LGTM!

Docstring correctly documents the new parameter with consistent formatting: "Whether to include merged cell ranges; None uses mode defaults."

52-52: LGTM!

Parameter correctly threaded through to resolve_extraction_inputs, enabling the pipeline to conditionally include merged cell extraction based on the flag value.

src/exstruct/engine.py (1)

73-73: LGTM! Merged cell support follows established patterns.

The include_merged_cells flag implementation is consistent with existing feature flags throughout the engine:

StructOptions uses None for auto-resolution (light=False, others=True)

FilterOptions defaults to True for output inclusion

Filtering logic in _filter_sheet matches the pattern used for other optional fields

Proper forwarding to extract_workbook

Also applies to: 83-83, 126-128, 287-289, 359-359

README.md (2)

339-419: Check placeholder text in Example 2.

The new example effectively demonstrates merged cell handling, but the JSON and Markdown content contains placeholder characters (??? and similar). Verify whether this is intentional demo data or if actual content should be included.

Example at lines 355, 358, 371, 384-418 shows these placeholders.

429-429: Sample directory exists with expected content.

The sample/forms_with_many_merged_cells/ directory is present and contains properly organized samples including English (SF425 form) and Japanese general forms with accompanying documentation files.

README.ja.md (1)

11-12: LGTM! Japanese documentation properly mirrors English version.

The Japanese README updates are consistent with the English version:

Feature descriptions include merged cell ranges (セル結合範囲)

Example 2 structure matches English version

Terminology is properly localized

Also applies to: 152-152, 338-495, 509-509

tests/core/test_pipeline.py (1)

16-188: LGTM! Comprehensive test coverage for merged cell support.

The tests properly validate:

Conditional step inclusion based on include_merged_cells flag

Step exclusion when the flag is disabled

Default resolution behavior (None → True for standard/verbose modes)

Data flow through artifacts to final workbook structure

src/exstruct/core/pipeline.py (1)

13-22: LGTM! Complete and well-integrated pipeline support for merged cells.

The implementation demonstrates excellent adherence to coding guidelines:

Type hints present throughout (mypy strict compliant)

Google-style docstrings for all new functions and parameters

Consistent patterns with existing extraction steps

Single-responsibility functions

Proper data flow from extraction through artifacts to final workbook

Fallback path correctly handles merged cells

The merged cell extraction step is properly:

Gated by the include_merged_cells flag

Integrated into all extraction modes (light/standard/verbose)

Using the same pattern as other openpyxl-based steps

Also applies to: 35-35, 43-90, 171-236, 255-338, 470-481, 586-749

tests/backends/test_merged_cells.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

sample/forms_with_many_merged_cells/ja_general_form/notebooklm.chat.md (1)

1-167: Optional: Add table of contents for lengthy documentation.

With 167 lines across multiple Q&A sections, readers may benefit from a quick navigation aid (e.g., markdown TOC, anchor links to major sections) to locate relevant guidance more quickly. This is especially useful for end users consulting the guide while filling out the form.
tests/backends/test_merged_cells.py (1)
29-55: Optional: Consider adding docstrings to test functions.

While test names are self-documenting (pytest convention), the coding guidelines specify "Add Google-style docstrings to all functions and classes." Adding brief docstrings would ensure strict guideline compliance.
Example docstring format
def test_openpyxl_backend_extract_merged_cells(tmp_path: Path) -> None:
    """Verify OpenpyxlBackend extracts merged cell ranges correctly."""
    # ... test body

def test_openpyxl_backend_extract_merged_cells_handles_failure(
    monkeypatch: MonkeyPatch, tmp_path: Path
) -> None:
    """Verify extract_merged_cells returns empty dict on extraction failure."""
    # ... test body

def test_com_backend_extract_merged_cells_not_implemented() -> None:
    """Verify ComBackend raises NotImplementedError for merged cell extraction."""
    # ... test body
Based on coding guidelines, which specify docstrings on all functions.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a87f42 and 32f14e4.

📒 Files selected for processing (5)

docs/release-notes/v0.3.2.md
mkdocs.yml
pyproject.toml
sample/forms_with_many_merged_cells/ja_general_form/notebooklm.chat.md
tests/backends/test_merged_cells.py

✅ Files skipped from review due to trivial changes (3)

pyproject.toml
mkdocs.yml
docs/release-notes/v0.3.2.md

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Add type hints to all function and method arguments and return values, must be mypy strict compliant
Use Any type only at external library boundaries (xlwings, pandas, numpy, etc.)
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Each function must have a single responsibility; cyclomatic complexity must not exceed 12
Organize imports in the following order: 1) standard library, 2) third-party packages, 3) exstruct internal modules
Add Google-style docstrings to all functions and classes
Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer
Generated code must pass mypy strict mode with zero errors
Generated code must pass Ruff linter checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid God Functions (large monolithic functions that do too much)
Avoid God Objects (classes that have too many responsibilities)
Eliminate unnecessary nesting and deeply nested conditional branches
Avoid circular dependencies between modules
Write functions that are testable and avoid unnecessary side effects

Files:

tests/backends/test_merged_cells.py

🧬 Code graph analysis (1)

tests/backends/test_merged_cells.py (3)

src/exstruct/core/backends/com_backend.py (1)

ComBackend (20-159)

src/exstruct/core/backends/openpyxl_backend.py (1)

OpenpyxlBackend (26-114)

src/exstruct/models/__init__.py (1)

MergedCell (79-86)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Codacy Static Code Analysis

🔇 Additional comments (6)

sample/forms_with_many_merged_cells/ja_general_form/notebooklm.chat.md (1)

1-167: Clear and comprehensive form-completion guidance.

This new documentation provides well-structured, Japanese-language Q&A guidance for the care insurance application form, with helpful cross-references between sections and visual organization. The intentional duplication (comparing NotebookLM output from JSON vs. PDF inputs) is appropriately labeled and adds pedagogical value.

tests/backends/test_merged_cells.py (5)

1-11: LGTM: Import organization follows guidelines.

The imports are correctly organized in the prescribed order: standard library, third-party packages, and exstruct internal modules.

14-26: LGTM: All past review feedback addressed.

The helper function now includes the return type annotation, Google-style docstring, and properly closes the workbook to prevent resource leaks. All issues from the previous review have been resolved.

29-35: LGTM: Test correctly verifies merged cell extraction.

The test properly validates that OpenpyxlBackend extracts merged cells by checking all MergedCell fields (r1, c1, r2, c2, v) against expected values.

38-49: LGTM: Error handling test is well-structured.

The test correctly verifies that extract_merged_cells returns an empty dict when the underlying extraction fails, using proper monkeypatching and type-safe mock implementation.

52-55: LGTM: NotImplementedError test validates ComBackend behavior.

The test correctly verifies that ComBackend raises NotImplementedError for merged cell extraction, consistent with the implementation shown in the relevant code snippets.

harumiWeb added 8 commits January 3, 2026 21:32

結合セル取得モデルの新規作成

85c7c93

データモデル更新

58131c7

tasks refresh

5ee5d65

- Openpyxl から結合セル範囲を抽出し、MergedCell として保持

abcdac2

- Pipeline/Modeling に merged_cells を追加して統合 - Engine に抽出オプションと出力フィルタ（include_merged_cells）を追加

テストケース追加

de6d3a2

sampleの整備

a5e0be8

readme更新

f648c90

英語版readme更新

9a87f42

harumiWeb mentioned this pull request Jan 5, 2026

Adding cell merge range data acquisition function #33

Closed

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

tests/backends/test_merged_cells.py Show resolved Hide resolved

harumiWeb added 3 commits January 5, 2026 12:33

参考資料追加

5e1e438

docstring追加

853c12b

v0.3.2 release

32f14e4

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

harumiWeb merged commit a08f4ba into main Jan 5, 2026
9 checks passed

harumiWeb deleted the dev/merge-cells branch January 5, 2026 03:44

coderabbitai bot mentioned this pull request Jan 6, 2026

feat: Compressing the amount of context in merged cell data #40

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added function to get merged cell range #35

Added function to get merged cell range #35

Uh oh!

harumiWeb commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 5, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov-commenter commented Jan 5, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added function to get merged cell range #35

Added function to get merged cell range #35

Uh oh!

Conversation

harumiWeb commented Jan 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov-commenter commented Jan 5, 2026

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harumiWeb commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 5, 2026 •

edited

Loading