-
Notifications
You must be signed in to change notification settings - Fork 14
Added function to get merged cell range #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Pipeline/Modeling に merged_cells を追加して統合 - Engine に抽出オプションと出力フィルタ(include_merged_cells)を追加
📝 WalkthroughWalkthroughA new feature adds merged cell range extraction and filtering across ExStruct: MergedCell model, extraction via OpenPyXL, backend protocol extension, pipeline step, engine options and output filtering, updated modeling, tests, samples, and docs reflecting merged_cells support. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Engine as ExStructEngine
participant Pipeline as Extraction Pipeline
participant Backend as OpenpyxlBackend
participant Cells as Cell Utils
participant Models as Modeling
Client->>Engine: extract(file, mode, include_merged_cells=True)
Engine->>Pipeline: extract_workbook(..., include_merged_cells=True)
Pipeline->>Pipeline: resolve_extraction_inputs(...)
alt include_merged_cells == True
Pipeline->>Backend: extract_merged_cells(file)
Backend->>Cells: extract_sheet_merged_cells(file)
Cells->>Cells: iterate ws.merged_cells.ranges\ncompute (r1,c1,r2,c2) and top-left value
Cells-->>Backend: dict[sheet -> list[MergedCell]]
Backend-->>Pipeline: merged_cell_data
Pipeline->>Pipeline: step_extract_merged_cells_openpyxl()
Pipeline->>Models: collect_sheet_raw_data(..., merged_cell_data)
else
Pipeline->>Models: collect_sheet_raw_data(..., merged_cells=[])
end
Pipeline->>Models: build_sheet_data(raw.merged_cells)
Models->>Engine: WorkbookData(sheets with merged_cells)
alt FilterOptions.include_merged_cells == True
Engine-->>Client: WorkbookData includes merged_cells
else
Engine-->>Client: WorkbookData omits merged_cells
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
sample/gantt_chart/en/sample.json_for_llm.md (1)
3-3: Optional: Consider using a proper heading level instead of emphasis.Line 3 uses italic emphasis for the subtitle. While this is stylistically acceptable, consider using a lower-level heading (e.g.,
### (Original, Redistributable Template)) for better semantic structure and accessibility.🔎 Proposed fix
-_(Original, Redistributable Template)_ +### (Original, Redistributable Template)src/exstruct/models/__init__.py (1)
79-87: Suggest adding Field descriptor tovfor consistency.The
vfield (line 86) uses a simple default value""while all other fields inMergedCelluseField()descriptors with descriptions. For consistency and clarity, consider adding a Field descriptor with a description.🔎 Proposed fix
- v: str = "" + v: str = Field(default="", description="Representative value of the merged cell range.")tests/export/test_export_requirements.py (1)
117-120: Parse JSON instead of substring checking for more robust assertions.The current test performs substring matching on the JSON string (
"merged_cells" not in payload), which is fragile. If "merged_cells" appeared anywhere in the JSON (e.g., in a value or nested key), the test would incorrectly fail.🔎 Proposed refactor to parse and validate JSON structure
def test_merged_cells_empty_is_omitted_in_sheet_json() -> None: sheet = SheetData(rows=[CellRow(r=1, c={"0": "v"})], merged_cells=[]) - payload = sheet.to_json() - assert "merged_cells" not in payload + payload_json = sheet.to_json() + payload = json.loads(payload_json) + assert "merged_cells" not in payloadsrc/exstruct/core/backends/com_backend.py (1)
157-159: Enhance docstring to match the file's Google-style pattern.Other methods in this file (
extract_print_areas,extract_colors_map,extract_auto_page_breaks) use Google-style docstrings withReturnsand/orRaisessections. The newextract_merged_cellsmethod should follow the same pattern for consistency.🔎 Proposed docstring enhancement
def extract_merged_cells(self) -> MergedCellData: - """Extract merged cell ranges via COM (not implemented).""" + """Extract merged cell ranges via COM (not implemented). + + Returns: + Mapping of sheet name to merged cell ranges. + + Raises: + NotImplementedError: COM merged cell extraction is not yet supported. + """ raise NotImplementedError("COM merged cell extraction is not implemented.")src/exstruct/core/backends/openpyxl_backend.py (1)
91-100: Consider adding logging for consistency and debugging.The method correctly delegates to
extract_sheet_merged_cellsand handles failures gracefully. However, unlike the similarextract_colors_mapmethod (lines 67-89), this method silently swallows exceptions. Adding a debug or warning log would improve observability and align with the established pattern.🔎 Proposed enhancement for consistency
def extract_merged_cells(self) -> MergedCellData: """Extract merged cell ranges per sheet. Returns: Mapping of sheet name to merged cell ranges. """ try: return extract_sheet_merged_cells(self.file_path) - except Exception: + except Exception as exc: + logger.debug( + "Merged cells extraction failed; returning empty dict. (%r)", exc + ) return {}tests/core/test_merged_cells_core.py (2)
8-27: Add docstrings to test functions per coding guidelines.The helper function and test function are missing Google-style docstrings. Per the project's coding guidelines, all functions should include docstrings.
🔎 Suggested docstrings
def _make_merged_book(path: Path) -> None: + """Create a test workbook with merged cell ranges. + + Args: + path: Destination file path for the workbook. + """ wb = Workbook()def test_extract_sheet_merged_cells_basic(tmp_path: Path) -> None: + """Test extraction of merged cell ranges from a workbook.""" path = tmp_path / "merged.xlsx"
30-36: Add docstring to test function.def test_extract_sheet_merged_cells_empty(tmp_path: Path) -> None: + """Test that extraction returns empty list for workbooks without merged cells.""" path = tmp_path / "plain.xlsx"sample/forms_with_many_merged_cells/en_form_sf425/sample.json_for_llm.md (1)
1-149: Optional: Address markdown linting issues.The static analysis tools flagged several markdown style issues, primarily using bold emphasis instead of proper headings (MD036) and heading level increments that skip levels (MD001). While these don't affect functionality, addressing them would improve document structure and consistency.
Since this is sample documentation, these style issues are minor and can be addressed at your discretion.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (7)
docs/assets/demo_form.ja.pngis excluded by!**/*.pngdocs/assets/demo_form_en.pngis excluded by!**/*.pngsample/forms_with_many_merged_cells/en_form_sf425/sample.xlsxis excluded by!**/*.xlsxsample/forms_with_many_merged_cells/ja_general_form/ja_form.xlsxis excluded by!**/*.xlsxsample/gantt_chart/en/sample.xlsxis excluded by!**/*.xlsxsample/gantt_chart/sample_gantt.xlsxis excluded by!**/*.xlsxuv.lockis excluded by!**/*.lock
📒 Files selected for processing (38)
README.ja.mdREADME.mddocs/README.en.mddocs/README.ja.mddocs/agents/DATA_MODEL.mddocs/agents/FEATURE_SPEC.mddocs/agents/ROADMAP.mddocs/agents/TASKS.mddocs/agents/TEST_REQUIREMENTS.mdsample/basic/sample.json_for_llm.mdsample/forms_with_many_merged_cells/en_form_sf425/README.mdsample/forms_with_many_merged_cells/en_form_sf425/sample.jsonsample/forms_with_many_merged_cells/en_form_sf425/sample.json_for_llm.mdsample/forms_with_many_merged_cells/ja_general_form/ja_form.jsonsample/forms_with_many_merged_cells/ja_general_form/ja_form.json_for_llm.mdsample/gantt_chart/en/example.pysample/gantt_chart/en/sample.jsonsample/gantt_chart/en/sample.json_for_llm.mdsample/gantt_chart/sample_gantt.jsonsample/gantt_chart/sample_gantt_json_for_llm.mdsrc/exstruct/core/backends/base.pysrc/exstruct/core/backends/com_backend.pysrc/exstruct/core/backends/openpyxl_backend.pysrc/exstruct/core/cells.pysrc/exstruct/core/integrate.pysrc/exstruct/core/modeling.pysrc/exstruct/core/pipeline.pysrc/exstruct/engine.pysrc/exstruct/models/__init__.pytests/backends/test_auto_page_breaks.pytests/backends/test_merged_cells.pytests/core/test_merged_cells_core.pytests/core/test_pipeline.pytests/core/test_pipeline_fallbacks.pytests/engine/test_engine.pytests/export/test_export_requirements.pytests/integration/test_integrate_raw_data.pytests/models/test_modeling.py
💤 Files with no reviewable changes (2)
- sample/gantt_chart/sample_gantt_json_for_llm.md
- sample/gantt_chart/sample_gantt.json
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Add type hints to all function and method arguments and return values, must be mypy strict compliant
UseAnytype only at external library boundaries (xlwings, pandas, numpy, etc.)
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Each function must have a single responsibility; cyclomatic complexity must not exceed 12
Organize imports in the following order: 1) standard library, 2) third-party packages, 3) exstruct internal modules
Add Google-style docstrings to all functions and classes
Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer
Generated code must pass mypy strict mode with zero errors
Generated code must pass Ruff linter checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid God Functions (large monolithic functions that do too much)
Avoid God Objects (classes that have too many responsibilities)
Eliminate unnecessary nesting and deeply nested conditional branches
Avoid circular dependencies between modules
Write functions that are testable and avoid unnecessary side effects
Files:
tests/export/test_export_requirements.pysrc/exstruct/engine.pytests/integration/test_integrate_raw_data.pytests/backends/test_auto_page_breaks.pysrc/exstruct/models/__init__.pytests/core/test_merged_cells_core.pysrc/exstruct/core/backends/com_backend.pysrc/exstruct/core/cells.pysrc/exstruct/core/backends/openpyxl_backend.pytests/backends/test_merged_cells.pysrc/exstruct/core/modeling.pytests/core/test_pipeline.pysrc/exstruct/core/integrate.pytests/models/test_modeling.pysample/gantt_chart/en/example.pysrc/exstruct/core/pipeline.pysrc/exstruct/core/backends/base.pytests/core/test_pipeline_fallbacks.pytests/engine/test_engine.py
🧬 Code graph analysis (14)
tests/export/test_export_requirements.py (1)
src/exstruct/models/__init__.py (2)
SheetData(145-237)CellRow(89-98)
src/exstruct/engine.py (1)
src/exstruct/__init__.py (1)
process_excel(318-398)
tests/core/test_merged_cells_core.py (2)
src/exstruct/core/cells.py (1)
extract_sheet_merged_cells(529-561)tests/backends/test_merged_cells.py (1)
_make_merged_book(14-20)
src/exstruct/core/backends/com_backend.py (3)
src/exstruct/core/backends/base.py (1)
extract_merged_cells(41-42)src/exstruct/core/backends/openpyxl_backend.py (1)
extract_merged_cells(91-100)tests/backends/test_backends.py (2)
test_com_backend_extract_print_areas_handles_sheet_error(104-123)test_com_backend_extract_colors_map_returns_none_on_failure(78-101)
src/exstruct/core/cells.py (2)
src/exstruct/models/__init__.py (1)
MergedCell(79-86)src/exstruct/core/workbook.py (1)
openpyxl_workbook(17-56)
src/exstruct/core/backends/openpyxl_backend.py (5)
src/exstruct/core/cells.py (1)
extract_sheet_merged_cells(529-561)src/exstruct/core/ranges.py (1)
parse_range_zero_based(25-48)src/exstruct/core/workbook.py (1)
openpyxl_workbook(17-56)src/exstruct/core/backends/base.py (1)
extract_merged_cells(41-42)src/exstruct/core/backends/com_backend.py (1)
extract_merged_cells(157-159)
tests/backends/test_merged_cells.py (3)
src/exstruct/core/backends/com_backend.py (1)
ComBackend(20-159)src/exstruct/core/backends/openpyxl_backend.py (1)
OpenpyxlBackend(26-114)src/exstruct/models/__init__.py (1)
MergedCell(79-86)
src/exstruct/core/modeling.py (1)
src/exstruct/models/__init__.py (1)
MergedCell(79-86)
tests/core/test_pipeline.py (1)
src/exstruct/core/pipeline.py (3)
build_pre_com_pipeline(255-338)ExtractionInputs(43-66)ExtractionArtifacts(70-89)
src/exstruct/core/integrate.py (2)
tests/integration/test_integrate_fallback.py (2)
test_extract_workbook_fallback_includes_print_areas(38-56)test_extract_workbook_with_links(59-79)src/exstruct/__init__.py (1)
extract(91-125)
sample/gantt_chart/en/example.py (2)
src/exstruct/engine.py (3)
ColorsOptions(33-56)ExStructEngine(171-571)StructOptions(60-84)src/exstruct/__init__.py (1)
export(128-167)
src/exstruct/core/pipeline.py (4)
src/exstruct/models/__init__.py (2)
MergedCell(79-86)PrintArea(136-142)src/exstruct/core/backends/openpyxl_backend.py (2)
OpenpyxlBackend(26-114)extract_merged_cells(91-100)src/exstruct/core/backends/base.py (1)
extract_merged_cells(41-42)src/exstruct/core/backends/com_backend.py (1)
extract_merged_cells(157-159)
src/exstruct/core/backends/base.py (3)
src/exstruct/models/__init__.py (1)
MergedCell(79-86)src/exstruct/core/backends/com_backend.py (1)
extract_merged_cells(157-159)src/exstruct/core/backends/openpyxl_backend.py (1)
extract_merged_cells(91-100)
tests/engine/test_engine.py (2)
src/exstruct/models/__init__.py (1)
MergedCell(79-86)src/exstruct/engine.py (3)
OutputOptions(149-168)FilterOptions(101-128)serialize(362-385)
🪛 LanguageTool
sample/forms_with_many_merged_cells/en_form_sf425/sample.json_for_llm.md
[style] ~139-~139: Try using a synonym here to strengthen your wording.
Context: ...ng the collection of information. Send comments regarding the burden estimate or any ot...
(COMMENT_REMARK)
🪛 markdownlint-cli2 (0.18.1)
sample/forms_with_many_merged_cells/ja_general_form/ja_form.json_for_llm.md
32-32: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
sample/gantt_chart/en/sample.json_for_llm.md
3-3: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
sample/forms_with_many_merged_cells/en_form_sf425/sample.json_for_llm.md
5-5: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
11-11: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
15-15: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
21-21: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
55-55: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
57-57: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
59-59: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
65-65: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
121-121: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
125-125: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
147-147: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
docs/README.ja.md
500-500: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
README.ja.md
500-500: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: test (windows-latest, 3.11)
🔇 Additional comments (49)
docs/agents/ROADMAP.md (1)
39-45: ✓ LGTM — roadmap entries are well-formatted and correctly positioned.The new version sections (v0.3.2 and v0.3.3) follow the existing document structure and language conventions. Feature descriptions are clear, and version ordering is correct.
sample/forms_with_many_merged_cells/ja_general_form/ja_form.json_for_llm.md (1)
1-110: Well-structured sample documentation.This Markdown file provides a clear, human-readable representation of the Japanese care insurance application form. The structure, tables, and formatting are appropriate for documentation purposes.
Note: The static analysis hint about line 32 is a false positive—the bold text
**(※)...**is correctly used for emphasis on a parenthetical note, not as a heading.sample/forms_with_many_merged_cells/ja_general_form/ja_form.json (2)
217-363: Well-structured merged_cells data.The
merged_cellsarray is comprehensive and properly formatted with valid coordinate ranges and optional values. This provides a good test case for the merged cell extraction feature, covering various merge patterns throughout the form.
188-214: No action required—the data structure is valid.The values
-1,-2,-3, and-4in rows 64–68 are cell values stored in column"0", not column keys. Column keys are always strings (e.g.,"0","1"). The schema explicitly allows integer values, including negative integers, so this is a valid and supported pattern.Likely an incorrect or invalid review comment.
tests/backends/test_auto_page_breaks.py (1)
30-30: LGTM! Test scaffolding aligned with expanded API.The new
include_merged_cellsparameter properly extends the test helper's signature to match the productionextract_workbookinterface, even though this specific test focuses onauto_page_breaks. The unused parameter is appropriate for a mock function.src/exstruct/models/__init__.py (1)
173-175: LGTM! Properly integrated merged_cells field.The
merged_cellsfield is correctly defined with proper type hints, Field descriptor, and description. It follows the established pattern used for other list fields in SheetData.docs/agents/TASKS.md (1)
1-17: LGTM! Comprehensive test coverage plan.The task list covers all critical testing areas for the merged cells feature: backend extraction, pipeline propagation, modeling, engine filtering, and export serialization. The coverage target is clearly stated.
sample/gantt_chart/en/sample.json (1)
460-485: LGTM! Valid merged_cells sample data.The merged_cells array correctly demonstrates the MergedCell model with:
- Proper coordinate format (r1/r2 as 1-based rows, c1/c2 as 0-based columns)
- Consistent horizontal merges for phase headers and title
- Representative values matching the corresponding row data
This sample effectively demonstrates the merged cell feature.
sample/forms_with_many_merged_cells/en_form_sf425/README.md (1)
9-10: URL is valid and SF-425 form is available.The EPA forms URL (https://www.epa.gov/financial/forms) is accessible and the SF-425 form is available at that location.
tests/core/test_pipeline_fallbacks.py (1)
37-37: LGTM: Consistent API extension for merged cells parameter.The addition of
include_merged_cells=Noneacross all three test functions correctly aligns with the updatedresolve_extraction_inputssignature.Also applies to: 70-70, 111-111
src/exstruct/core/backends/openpyxl_backend.py (1)
16-16: LGTM: Clean imports for merged cells support.The new imports are properly organized and support the merged cell extraction feature.
Also applies to: 20-20
tests/backends/test_merged_cells.py (1)
23-29: LGTM: Comprehensive test coverage for merged cell extraction.The three test functions provide good coverage:
- Successful extraction from OpenpyxlBackend
- Failure handling with graceful fallback
- NotImplementedError verification for ComBackend
Also applies to: 32-43, 46-49
tests/integration/test_integrate_raw_data.py (1)
59-59: LGTM: Clean integration of merged_cell_data parameter.The addition of
merged_cell_data={"Sheet1": []}correctly extends the test calls to match the updatedcollect_sheet_raw_datasignature.Also applies to: 94-94
src/exstruct/core/modeling.py (1)
9-9: LGTM! Clean integration of merged_cells field.The changes correctly add the
merged_cellsfield toSheetRawDataand propagate it through toSheetDataviabuild_sheet_data. Type hints, docstrings, and import organization all follow the project's coding guidelines.Also applies to: 30-30, 40-40, 73-73
tests/models/test_modeling.py (1)
28-28: LGTM! Correctly accommodates the new field.The addition of
merged_cells=[]appropriately updates the test to include the new required field inSheetRawData.docs/agents/DATA_MODEL.md (1)
3-3: LGTM! Documentation accurately reflects the new feature.The documentation correctly describes the new
MergedCellmodel with proper field types and coordinate system conventions (1-based rows, 0-based columns). The version bump, changelog entry, and section renumbering are all appropriate.Also applies to: 151-167, 180-180, 248-248
tests/engine/test_engine.py (4)
17-17: LGTM!Import of
MergedCellis correctly added to support the new merged cells feature in the sample workbook.
37-37: LGTM!The
include_merged_cellsparameter is correctly added to the fake extract function signature, matching the realextract_workbookAPI.
68-68: LGTM!The sample workbook now includes a
MergedCellinstance, enabling comprehensive testing of the merged cells feature throughout the engine tests.
91-97: LGTM!The test correctly verifies that
include_merged_cells=Falsefilters out merged cell data from serialized output. The pattern follows the existing filter tests (test_engine_serialize_filters_shapes,test_engine_serialize_filters_tables).Note: Unlike
test_engine_serialize_filters_tables(line 82), this test correctly omits the unusedtmp_pathparameter.src/exstruct/core/cells.py (2)
18-18: LGTM!Import of
MergedCellalongsideCellRowis correct and follows the existing import pattern.
529-561: LGTM!Well-implemented function for extracting merged cell ranges:
- Proper type hints and Google-style docstring as per coding guidelines
- Correctly uses
openpyxl_workbookcontext manager for resource safety- Handles edge cases:
Nonemerged_cells,Nonecell values- Column indices correctly converted from 1-based (openpyxl) to 0-based (as documented in
MergedCellmodel: "c1: Start column (0-based)")The implementation aligns with the
MergedCellmodel definition insrc/exstruct/models/__init__.py(lines 78-85).src/exstruct/core/backends/base.py (3)
6-6: LGTM!Import correctly updated to include
MergedCellalongside existing model imports.
11-11: LGTM!Type alias
MergedCellDatafollows the established pattern ofCellDataandPrintAreaData.
40-42: LGTM!The
extract_merged_cellsmethod is correctly added to theBackendprotocol, following the pattern of existing methods (extract_cells,extract_print_areas). The relevant code snippets confirm that:
OpenpyxlBackendimplements this method (returns{}on failure)ComBackendraisesNotImplementedErrorThis aligns with test requirements BE-06 and BE-07 in
docs/agents/TEST_REQUIREMENTS.md.docs/agents/TEST_REQUIREMENTS.md (5)
3-3: LGTM!Version bump to 0.5 and explicit 78% coverage requirement are appropriate additions for tracking the merged cells feature release.
Also applies to: 7-7
51-57: LGTM!Comprehensive test requirements for merged cells:
- MRG-01: Mode-specific extraction (standard/verbose only)
- MRG-02: Coordinate conventions (1-based row, 0-based column)
- MRG-03: Top-left cell value handling
- MRG-04: Multiple range retention
These align with the
MergedCellmodel definition andextract_sheet_merged_cellsimplementation.
154-154: LGTM!EXP-25 correctly specifies filtering behavior, consistent with the test in
tests/engine/test_engine.py(line 91-97).
225-225: LGTM!PIPE-09 correctly documents conditional pipeline step inclusion for merged cells extraction.
236-237: LGTM!Backend test requirements:
- BE-06: OpenpyxlBackend returns empty map on failure (matches
openpyxl_backend.pylines 90-99)- BE-07: ComBackend raises NotImplementedError (matches
com_backend.pylines 156-158)docs/README.ja.md (5)
5-5: LGTM!Image path format is consistent with the documentation structure.
11-12: LGTM!Feature descriptions correctly updated to include "セル結合範囲" (merged cell ranges) in both the feature list and output mode descriptions.
338-495: LGTM!Excellent Example 2 demonstrating the merged cells feature with a practical application form use case. The JSON excerpt shows:
merged_cellsarray withr1,c1,r2,c2,vfields- Real-world form data structure
- LLM transformation to Markdown
This effectively showcases how ExStruct handles complex Excel structures with merged cells.
496-501: Intentional styling choice.The static analysis hint (MD036) flags the bold text as "emphasis used instead of heading". This appears to be an intentional formatting choice to highlight the key takeaway without disrupting the document hierarchy.
509-509: LGTM!Sample directory listing correctly updated with the new merged cells sample folder.
docs/README.en.md (5)
7-7: LGTM!Feature description correctly expanded to include "merged cell ranges" in the comprehensive list of structured data outputs.
14-14: LGTM!Output mode descriptions correctly updated to indicate merged cell ranges are included in
standardandverbosemodes.
136-137: LGTM!Mode documentation accurately reflects the merged cells feature:
standard: includes merged cell rangesverbose: includes merged cell rangesThis is consistent with MRG-01 in TEST_REQUIREMENTS.md.
339-552: LGTM!Example 2 effectively demonstrates the merged cells feature using the SF-425 Federal Financial Report form. The JSON excerpt shows:
merged_cellsarray with coordinate fields- Values preserved from top-left cells
- Integration with existing features (rows, shapes, print_areas)
The LLM reconstruction demonstrates practical value for document processing workflows.
562-562: LGTM!Sample directory listing updated consistently with Japanese README.
src/exstruct/core/integrate.py (3)
20-20: LGTM!Parameter
include_merged_cells: bool | None = Nonecorrectly added following the established pattern of other optional feature flags (include_cell_links,include_print_areas,include_colors_map).
35-35: LGTM!Docstring correctly documents the new parameter with consistent formatting: "Whether to include merged cell ranges; None uses mode defaults."
52-52: LGTM!Parameter correctly threaded through to
resolve_extraction_inputs, enabling the pipeline to conditionally include merged cell extraction based on the flag value.src/exstruct/engine.py (1)
73-73: LGTM! Merged cell support follows established patterns.The
include_merged_cellsflag implementation is consistent with existing feature flags throughout the engine:
- StructOptions uses
Nonefor auto-resolution (light=False, others=True)- FilterOptions defaults to
Truefor output inclusion- Filtering logic in
_filter_sheetmatches the pattern used for other optional fields- Proper forwarding to
extract_workbookAlso applies to: 83-83, 126-128, 287-289, 359-359
README.md (2)
339-419: Check placeholder text in Example 2.The new example effectively demonstrates merged cell handling, but the JSON and Markdown content contains placeholder characters (
???and similar). Verify whether this is intentional demo data or if actual content should be included.Example at lines 355, 358, 371, 384-418 shows these placeholders.
429-429: Sample directory exists with expected content.The
sample/forms_with_many_merged_cells/directory is present and contains properly organized samples including English (SF425 form) and Japanese general forms with accompanying documentation files.README.ja.md (1)
11-12: LGTM! Japanese documentation properly mirrors English version.The Japanese README updates are consistent with the English version:
- Feature descriptions include merged cell ranges (セル結合範囲)
- Example 2 structure matches English version
- Terminology is properly localized
Also applies to: 152-152, 338-495, 509-509
tests/core/test_pipeline.py (1)
16-188: LGTM! Comprehensive test coverage for merged cell support.The tests properly validate:
- Conditional step inclusion based on
include_merged_cellsflag- Step exclusion when the flag is disabled
- Default resolution behavior (None → True for standard/verbose modes)
- Data flow through artifacts to final workbook structure
src/exstruct/core/pipeline.py (1)
13-22: LGTM! Complete and well-integrated pipeline support for merged cells.The implementation demonstrates excellent adherence to coding guidelines:
- Type hints present throughout (mypy strict compliant)
- Google-style docstrings for all new functions and parameters
- Consistent patterns with existing extraction steps
- Single-responsibility functions
- Proper data flow from extraction through artifacts to final workbook
- Fallback path correctly handles merged cells
The merged cell extraction step is properly:
- Gated by the
include_merged_cellsflag- Integrated into all extraction modes (light/standard/verbose)
- Using the same pattern as other openpyxl-based steps
Also applies to: 35-35, 43-90, 171-236, 255-338, 470-481, 586-749
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
sample/forms_with_many_merged_cells/ja_general_form/notebooklm.chat.md (1)
1-167: Optional: Add table of contents for lengthy documentation.With 167 lines across multiple Q&A sections, readers may benefit from a quick navigation aid (e.g., markdown TOC, anchor links to major sections) to locate relevant guidance more quickly. This is especially useful for end users consulting the guide while filling out the form.
tests/backends/test_merged_cells.py (1)
29-55: Optional: Consider adding docstrings to test functions.While test names are self-documenting (pytest convention), the coding guidelines specify "Add Google-style docstrings to all functions and classes." Adding brief docstrings would ensure strict guideline compliance.
Example docstring format
def test_openpyxl_backend_extract_merged_cells(tmp_path: Path) -> None: """Verify OpenpyxlBackend extracts merged cell ranges correctly.""" # ... test body def test_openpyxl_backend_extract_merged_cells_handles_failure( monkeypatch: MonkeyPatch, tmp_path: Path ) -> None: """Verify extract_merged_cells returns empty dict on extraction failure.""" # ... test body def test_com_backend_extract_merged_cells_not_implemented() -> None: """Verify ComBackend raises NotImplementedError for merged cell extraction.""" # ... test bodyBased on coding guidelines, which specify docstrings on all functions.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/release-notes/v0.3.2.mdmkdocs.ymlpyproject.tomlsample/forms_with_many_merged_cells/ja_general_form/notebooklm.chat.mdtests/backends/test_merged_cells.py
✅ Files skipped from review due to trivial changes (3)
- pyproject.toml
- mkdocs.yml
- docs/release-notes/v0.3.2.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Add type hints to all function and method arguments and return values, must be mypy strict compliant
UseAnytype only at external library boundaries (xlwings, pandas, numpy, etc.)
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Each function must have a single responsibility; cyclomatic complexity must not exceed 12
Organize imports in the following order: 1) standard library, 2) third-party packages, 3) exstruct internal modules
Add Google-style docstrings to all functions and classes
Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer
Generated code must pass mypy strict mode with zero errors
Generated code must pass Ruff linter checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid God Functions (large monolithic functions that do too much)
Avoid God Objects (classes that have too many responsibilities)
Eliminate unnecessary nesting and deeply nested conditional branches
Avoid circular dependencies between modules
Write functions that are testable and avoid unnecessary side effects
Files:
tests/backends/test_merged_cells.py
🧬 Code graph analysis (1)
tests/backends/test_merged_cells.py (3)
src/exstruct/core/backends/com_backend.py (1)
ComBackend(20-159)src/exstruct/core/backends/openpyxl_backend.py (1)
OpenpyxlBackend(26-114)src/exstruct/models/__init__.py (1)
MergedCell(79-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (6)
sample/forms_with_many_merged_cells/ja_general_form/notebooklm.chat.md (1)
1-167: Clear and comprehensive form-completion guidance.This new documentation provides well-structured, Japanese-language Q&A guidance for the care insurance application form, with helpful cross-references between sections and visual organization. The intentional duplication (comparing NotebookLM output from JSON vs. PDF inputs) is appropriately labeled and adds pedagogical value.
tests/backends/test_merged_cells.py (5)
1-11: LGTM: Import organization follows guidelines.The imports are correctly organized in the prescribed order: standard library, third-party packages, and exstruct internal modules.
14-26: LGTM: All past review feedback addressed.The helper function now includes the return type annotation, Google-style docstring, and properly closes the workbook to prevent resource leaks. All issues from the previous review have been resolved.
29-35: LGTM: Test correctly verifies merged cell extraction.The test properly validates that OpenpyxlBackend extracts merged cells by checking all MergedCell fields (r1, c1, r2, c2, v) against expected values.
38-49: LGTM: Error handling test is well-structured.The test correctly verifies that
extract_merged_cellsreturns an empty dict when the underlying extraction fails, using proper monkeypatching and type-safe mock implementation.
52-55: LGTM: NotImplementedError test validates ComBackend behavior.The test correctly verifies that ComBackend raises NotImplementedError for merged cell extraction, consistent with the implementation shown in the relevant code snippets.
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.