Skip to content

Conversation

@harumiWeb
Copy link
Owner

@harumiWeb harumiWeb commented Dec 28, 2025

Summary by CodeRabbit

  • New Features

    • Added SmartArt extraction (nested nodes) and distinct Arrow/connector type; shapes now include a "kind" discriminator.
    • CLI rendering: export sheets as PDF and images with a safe fallback when full Excel COM is unavailable.
  • Documentation

    • Updated READMEs, docs, and samples to show SmartArt, output modes (standard/verbose), CLI rendering, and duplicated coverage note for visibility.
    • Updated schemas and examples to include Arrow, SmartArt, and SmartArtNode.

✏️ Tip: You can customize this high-level summary in your review settings.

- Arrow / SmartArt の生成時に type を付与しない
- テストの type 参照を Shape に限定
- データモデル仕様の記述を更新
…upport

- Updated README and documentation to reflect the addition of SmartArt and Arrow extraction features.
- Introduced new schemas for Arrow, SmartArt, and SmartArtNode to support structured data representation.
- Refactored JSON schema generation script to include new models.
- Improved extraction logic to handle SmartArt layouts and nested nodes.
- Enhanced output formats to include detailed metadata for shapes, arrows, and SmartArt.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 28, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Refactors shape model into three types (Shape, Arrow, SmartArt), adds SmartArt extraction and nested node construction, updates schemas, samples, docs, serialization, and tests to handle Shape | Arrow | SmartArt across extraction and I/O codepaths.

Changes

Cohort / File(s) Summary
Core models & types
src/exstruct/models/__init__.py, src/exstruct/core/modeling.py, src/exstruct/core/pipeline.py
Introduced BaseShape, added Shape, Arrow, SmartArt, SmartArtNode; changed sheet/print-area shape containers to `list[Shape
Shape extraction logic
src/exstruct/core/shapes.py
Added SmartArt detection/extraction and node-tree builders; emit SmartArt for shapes with SmartArt, Arrow for connectors; widened return types and refined connector wiring; updated angle->compass typing.
Serialization / I/O
src/exstruct/io/__init__.py
Accept/serialize Arrow and SmartArt; _filter_shapes_to_area() signature widened; dict_without_empty_values() type checks extended.
Schemas & schema generation
schemas/*.json, scripts/gen_json_schema.py
Added arrow.json, smartart.json, smartart_node.json; updated shape.json, sheet.json, print_area_view.json, workbook.json to include Arrow/SmartArt defs and union usage; gen script exports added.
Docs & guides
README.md, README.ja.md, docs/README.en.md, docs/README.ja.md, docs/agents/*, docs/schemas.md, docs/concept.md
Documented SmartArt support, CLI rendering note and fallback, added/duplicated "Note on coverage", updated feature lists, data-model docs and task/requirements entries.
Samples
sample/*, sample/smartart/*
Added SmartArt sample JSON and LLM-focused MD; updated existing samples to include kind discriminator and Arrow/shape normalization.
Tests
tests/**
Updated tests to import/use Arrow, SmartArt, SmartArtNode; added SmartArt utilities tests; adjusted assertions to isinstance checks and kind/field expectations.
Cleanup
docs/agents/CODE_REVIEW.md, docs/agents/FEATURE_SPEC.md, docs/agents/ROADMAP.md
Removed large CODE_REVIEW block and deprecated refactoring notes; added SmartArt roadmap entry and minor formatting edits.

Sequence Diagram(s)

sequenceDiagram
  participant Workbook as Workbook (xlwings/COM)
  participant Extractor as shapes.get_shapes_with_position
  participant SmartBuilder as SmartArt node builder
  participant Connector as Arrow wiring
  participant Serializer as io.dict_without_empty_values
  participant Output as CLI / JSON writer

  rect rgb(240,248,255)
  Workbook->>Extractor: read shapes from sheet (HasSmartArt?/geometry)
  end

  alt shape has SmartArt
    Extractor->>SmartBuilder: _collect/_build_smartart_tree
    SmartBuilder-->>Extractor: SmartArt(kind="smartart", layout, nodes, pos)
  else connector / line
    Extractor->>Connector: classify connector -> Arrow (direction, begin/end ids)
    Connector-->>Extractor: Arrow instance
  else normal shape
    Extractor-->>Extractor: construct Shape(kind="shape", text, pos)
  end

  Extractor->>Serializer: produce dict (includes Shape|Arrow|SmartArt)
  Serializer-->>Output: write JSON/PDF/images (CLI rendering uses COM or fallback)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through sheets and found a tree,
Nodes nested deep — a SmartArt spree.
Arrows point and shapes stand proud,
JSON blooms beneath a fluffy cloud.
Hooray — our data sings with glee! 🎋

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.79% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'SmartArt解析構造化機能追加' (Add SmartArt parsing and structuring feature) clearly and specifically describes the main change in this pull request, which comprehensively adds SmartArt extraction, modeling, and integration throughout the codebase.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev/smartart

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Dec 28, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 88.09524% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/exstruct/core/shapes.py 85.57% 15 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
tests/models/test_models_validation.py (1)

82-97: Consider using attribute access check pattern.

The hasattr assertions verify that Shape lacks Arrow-specific fields. While this works, accessing via model_fields would be more explicit for Pydantic models.

🔎 Alternative approach using model_fields
-    assert not hasattr(shape, "begin_id")
-    assert not hasattr(shape, "end_id")
+    assert "begin_id" not in Shape.model_fields
+    assert "end_id" not in Shape.model_fields

Note: The current hasattr approach is also valid since Pydantic BaseModel instances only expose defined fields as attributes.

src/exstruct/core/shapes.py (2)

113-142: Good use of Protocol for COM object typing.

These protocols properly abstract the xlwings/COM interface at the boundary layer, aligning with the coding guideline to normalize external entities. The @runtime_checkable decorators are present but note that they're currently unused for isinstance() checks—if runtime checking isn't needed, the decorators could be removed to reduce overhead, though this is optional.


305-352: Consider extracting repeated width/height logic.

The conditional for computing w and h is repeated identically across SmartArt, Arrow, and Shape construction:

w=int(shp.width) if mode == "verbose" or shape_type_str == "Group" else None,
h=int(shp.height) if mode == "verbose" or shape_type_str == "Group" else None,

This could be extracted into helper variables to reduce duplication and improve maintainability.

🔎 Proposed refactor
                shape_obj: Shape | Arrow | SmartArt
+               include_size = mode == "verbose" or shape_type_str == "Group"
+               w_val = int(shp.width) if include_size else None
+               h_val = int(shp.height) if include_size else None
                if has_smartart:
                    smartart_obj: _SmartArtLike | None = None
                    try:
                        smartart_obj = shp.api.SmartArt
                    except Exception:
                        smartart_obj = None
                    shape_obj = SmartArt(
                        id=shape_id,
                        text=text,
                        l=int(shp.left),
                        t=int(shp.top),
-                       w=int(shp.width)
-                       if mode == "verbose" or shape_type_str == "Group"
-                       else None,
-                       h=int(shp.height)
-                       if mode == "verbose" or shape_type_str == "Group"
-                       else None,
+                       w=w_val,
+                       h=h_val,
                        layout=_get_smartart_layout_name(smartart_obj),
                        nodes=_extract_smartart_nodes(smartart_obj),
                    )

Apply similar changes to the Arrow and Shape construction blocks.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4f7e3f and 6010edc.

⛔ Files ignored due to path filters (1)
  • sample/smartart/sample_smartart.xlsx is excluded by !**/*.xlsx
📒 Files selected for processing (37)
  • README.ja.md
  • README.md
  • docs/README.en.md
  • docs/README.ja.md
  • docs/agents/CODE_REVIEW.md
  • docs/agents/DATA_MODEL.md
  • docs/agents/EXCEL_EXTRACTION.md
  • docs/agents/FEATURE_SPEC.md
  • docs/agents/OVERVIEW.md
  • docs/agents/ROADMAP.md
  • docs/agents/TASKS.md
  • docs/agents/TEST_REQUIREMENTS.md
  • docs/concept.md
  • docs/schemas.md
  • sample/basic/sample.json
  • sample/flowchart/sample-shape-connector.json
  • sample/smartart/sample_smartart.json
  • sample/smartart/sample_smartart_for_llm.md
  • schemas/arrow.json
  • schemas/print_area_view.json
  • schemas/shape.json
  • schemas/sheet.json
  • schemas/smartart.json
  • schemas/smartart_node.json
  • schemas/workbook.json
  • scripts/gen_json_schema.py
  • src/exstruct/core/modeling.py
  • src/exstruct/core/pipeline.py
  • src/exstruct/core/shapes.py
  • src/exstruct/io/__init__.py
  • src/exstruct/models/__init__.py
  • tests/com/test_shapes_extraction.py
  • tests/core/test_mode_output.py
  • tests/core/test_shapes_positions_dummy.py
  • tests/io/test_print_area_views.py
  • tests/models/test_models_export.py
  • tests/models/test_models_validation.py
💤 Files with no reviewable changes (2)
  • docs/agents/FEATURE_SPEC.md
  • docs/agents/CODE_REVIEW.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Add type hints to all function and method arguments and return values, must be mypy strict compliant
Use Any type only at external library boundaries (xlwings, pandas, numpy, etc.)
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Each function must have a single responsibility; cyclomatic complexity must not exceed 12
Organize imports in the following order: 1) standard library, 2) third-party packages, 3) exstruct internal modules
Add Google-style docstrings to all functions and classes
Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer
Generated code must pass mypy strict mode with zero errors
Generated code must pass Ruff linter checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid God Functions (large monolithic functions that do too much)
Avoid God Objects (classes that have too many responsibilities)
Eliminate unnecessary nesting and deeply nested conditional branches
Avoid circular dependencies between modules
Write functions that are testable and avoid unnecessary side effects

Files:

  • tests/com/test_shapes_extraction.py
  • tests/core/test_mode_output.py
  • scripts/gen_json_schema.py
  • src/exstruct/models/__init__.py
  • src/exstruct/core/pipeline.py
  • src/exstruct/core/modeling.py
  • tests/io/test_print_area_views.py
  • src/exstruct/core/shapes.py
  • tests/core/test_shapes_positions_dummy.py
  • tests/models/test_models_export.py
  • tests/models/test_models_validation.py
  • src/exstruct/io/__init__.py
🧠 Learnings (2)
📚 Learning: 2025-12-27T14:32:54.706Z
Learnt from: CR
Repo: harumiWeb/exstruct PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T14:32:54.706Z
Learning: Applies to **/*.py : Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer

Applied to files:

  • tests/com/test_shapes_extraction.py
  • scripts/gen_json_schema.py
  • src/exstruct/io/__init__.py
📚 Learning: 2025-12-27T14:32:54.706Z
Learnt from: CR
Repo: harumiWeb/exstruct PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T14:32:54.706Z
Learning: Applies to **/*.py : Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data

Applied to files:

  • scripts/gen_json_schema.py
🧬 Code graph analysis (11)
tests/com/test_shapes_extraction.py (1)
src/exstruct/models/__init__.py (2)
  • Arrow (35-59)
  • Shape (28-32)
tests/core/test_mode_output.py (2)
src/exstruct/models/__init__.py (1)
  • Arrow (35-59)
tests/core/test_shapes_utils.py (1)
  • test_has_arrow (73-78)
scripts/gen_json_schema.py (1)
src/exstruct/models/__init__.py (4)
  • Arrow (35-59)
  • Shape (28-32)
  • SmartArt (69-76)
  • SmartArtNode (62-66)
src/exstruct/core/pipeline.py (1)
src/exstruct/models/__init__.py (3)
  • Arrow (35-59)
  • Shape (28-32)
  • SmartArt (69-76)
src/exstruct/core/modeling.py (1)
src/exstruct/models/__init__.py (8)
  • Arrow (35-59)
  • CellRow (79-88)
  • Chart (106-123)
  • PrintArea (126-132)
  • Shape (28-32)
  • SheetData (135-224)
  • SmartArt (69-76)
  • WorkbookData (227-290)
tests/io/test_print_area_views.py (1)
src/exstruct/models/__init__.py (5)
  • Arrow (35-59)
  • PrintArea (126-132)
  • Shape (28-32)
  • SmartArt (69-76)
  • SmartArtNode (62-66)
src/exstruct/core/shapes.py (1)
src/exstruct/models/__init__.py (4)
  • Arrow (35-59)
  • Shape (28-32)
  • SmartArt (69-76)
  • SmartArtNode (62-66)
tests/core/test_shapes_positions_dummy.py (2)
src/exstruct/models/__init__.py (1)
  • Arrow (35-59)
tests/core/test_shapes_utils.py (1)
  • test_has_arrow (73-78)
tests/models/test_models_export.py (1)
src/exstruct/models/__init__.py (7)
  • CellRow (79-88)
  • SheetData (135-224)
  • SmartArt (69-76)
  • SmartArtNode (62-66)
  • to_json (169-174)
  • to_json (235-241)
  • to_json (317-322)
tests/models/test_models_validation.py (1)
src/exstruct/models/__init__.py (5)
  • Arrow (35-59)
  • CellRow (79-88)
  • Shape (28-32)
  • SmartArt (69-76)
  • SmartArtNode (62-66)
src/exstruct/io/__init__.py (1)
src/exstruct/models/__init__.py (7)
  • Arrow (35-59)
  • CellRow (79-88)
  • Chart (106-123)
  • PrintArea (126-132)
  • PrintAreaView (293-372)
  • Shape (28-32)
  • SmartArt (69-76)
🪛 GitHub Check: Codacy Static Code Analysis
src/exstruct/core/shapes.py

[warning] 183-183: src/exstruct/core/shapes.py#L183
Try, Except, Continue detected.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (75)
docs/agents/TEST_REQUIREMENTS.md (4)

3-3: Version bump to 0.4 aligns with SmartArt feature addition.

The minor version increment is appropriate for introducing SmartArt support as a new feature.


54-54: Requirement [SHP-02a] clearly defines type field scope.

The new requirement correctly specifies that the type field is exclusive to Shape objects, establishing a clear contract for the refactored shape type hierarchy (Shape, Arrow, SmartArt).


64-69: New SmartArt extraction requirements are well-structured and testable.

The four requirements ([SHP-SA-01] through [SHP-SA-04]) provide clear, specific test criteria:

  • Mandatory layout field ensures SmartArt metadata is always present
  • Nested nodes structure with kids (not level) reflects the hierarchical nature of SmartArt
  • kind="smartart" discriminator provides unambiguous identification alongside the existing type-specific fields

64-69: Verify SmartArt requirements align with upstream data model definitions.

Confirm that the SmartArt, SmartArtNode models and their public schema (JSON/YAML/TOON output) precisely match these test requirements, particularly the layout, nodes, kids, and kind fields.

docs/agents/ROADMAP.md (1)

36-37: LGTM!

Roadmap entries appropriately document the Shape/Arrow separation and SmartArt parsing features added in v0.3.1.

docs/agents/OVERVIEW.md (2)

19-19: LGTM!

Extraction target list correctly updated to reflect the expanded shape model including Arrows and SmartArt with layout information.


27-27: LGTM!

The description accurately reflects the process_excel functionality for file/directory output.

README.ja.md (3)

7-7: LGTM!

SmartArt correctly added to the structured data output description alongside cells, shapes, charts, and other elements.


11-12: LGTM!

Output mode descriptions appropriately expanded to include SmartArt in standard and verbose modes.


399-403: LGTM!

The coverage note appropriately documents the intentional decision not to pursue exhaustive test coverage for the heuristic-based cell structure inference logic.

docs/README.en.md (2)

13-14: LGTM!

Output modes correctly document SmartArt inclusion in standard and verbose modes.


403-408: LGTM!

Coverage note section properly explains the intentional limited test coverage for heuristic-based cell inference logic.

docs/agents/EXCEL_EXTRACTION.md (1)

35-43: LGTM!

The extraction specification correctly documents:

  • The expanded shape types (Shapes / Arrows / SmartArt)
  • The type field restriction to Shape only
  • SmartArt's nested structure with layout/nodes/kids
README.md (2)

13-14: LGTM!

Output mode descriptions correctly include SmartArt in the feature list.


398-403: LGTM!

Coverage note properly documented.

docs/README.ja.md (3)

7-7: LGTM!

SmartArt correctly added to the structured data output description.


11-12: LGTM!

Output modes appropriately include SmartArt in standard and verbose descriptions.


404-408: LGTM!

Coverage note appropriately explains the intentional incomplete test coverage rationale.

schemas/print_area_view.json (5)

3-163: LGTM!

The Arrow definition correctly captures connector-specific metadata including:

  • Arrow styles for start/end points
  • Connected shape IDs (begin_id/end_id)
  • Direction as compass heading
  • Common positional fields (l, t, w, h)
  • Discriminant kind: "arrow"

409-507: LGTM!

The Shape definition correctly models normal shapes with:

  • Type field for Excel shape type name
  • Discriminant kind: "shape"
  • Common positional and dimensional fields

508-607: LGTM!

The SmartArt definition correctly models SmartArt diagrams with:

  • Layout name field
  • Nested nodes array referencing SmartArtNode
  • Discriminant kind: "smartart"
  • Required layout field in addition to common fields

660-677: LGTM!

The shapes array correctly uses anyOf to accept the union of Shape, Arrow, or SmartArt types, enabling polymorphic shape handling in PrintAreaView.


608-630: Remove the comparison to SmartArt.nodes as it also lacks a default value.

The SmartArtNode.kids property is an array without a default value. However, SmartArt.nodes similarly has no default value in the schema despite both being defined with default_factory=list in the Python model. If adding defaults to array fields is desired for consistency, this should apply to both properties, not just kids.

Likely an incorrect or invalid review comment.

docs/agents/DATA_MODEL.md (1)

3-3: LGTM! Documentation clearly describes the Shape refactoring.

The documentation effectively explains the separation of Shape into three discriminated union types (Shape, Arrow, SmartArt) with a shared BaseShape structure. The use of the kind discriminator field is well-documented, and the changelog entry properly tracks this architectural change.

Also applies to: 16-55, 138-138, 156-156, 228-228

schemas/sheet.json (1)

3-163: LGTM! Schema properly defines discriminated union types.

The JSON schema correctly implements the Shape/Arrow/SmartArt union with:

  • Proper discriminator field (kind with const values)
  • Consistent required fields across all shape types
  • Recursive SmartArtNode definition for nested structures
  • Clear separation of shape-specific fields (e.g., begin_id/end_id for Arrow, layout/nodes for SmartArt)

Also applies to: 508-630, 688-702

docs/concept.md (1)

111-111: LGTM! Documentation updated to reflect new shape types.

sample/smartart/sample_smartart_for_llm.md (1)

1-92: LGTM! Helpful sample documentation for SmartArt structures.

The examples effectively demonstrate different SmartArt layouts with both bullet lists and Mermaid flowcharts, providing clear references for LLM understanding of hierarchical and process-oriented structures.

src/exstruct/core/pipeline.py (1)

13-13: LGTM! Type annotations correctly updated for shape union types.

The import additions and ShapeData type declaration properly reflect the Shape | Arrow | SmartArt union, maintaining type safety throughout the pipeline.

Also applies to: 26-26

docs/schemas.md (1)

14-16: LGTM! Schema listing updated with new shape types.

tests/core/test_mode_output.py (2)

13-13: LGTM! Test correctly validates Arrow type with improved type safety.

The change from checking a type field to using isinstance(s, Arrow) is more type-safe and aligns with the new discriminated union approach. The assertion correctly verifies that textless shapes are Arrow instances with appropriate connector attributes.

Also applies to: 82-83


112-112: LGTM! Appropriate use of type ignore for negative test cases.

The type: ignore[arg-type] comments are correctly applied to intentionally invalid mode arguments in negative test cases.

Also applies to: 116-116

src/exstruct/core/modeling.py (1)

5-14: LGTM! Type annotations correctly updated for shape union types.

The imports and SheetRawData type declaration properly reflect the Shape | Arrow | SmartArt union, maintaining consistency with the broader refactoring. The code follows all coding guidelines with proper type hints and documentation.

Also applies to: 32-32

schemas/arrow.json (1)

1-162: Well-structured Arrow JSON Schema.

The schema correctly defines the Arrow connector metadata with proper nullable handling via anyOf, appropriate required fields (text, l, t), and a comprehensive direction enum for compass headings. The kind const discriminator enables proper union type validation.

sample/smartart/sample_smartart.json (1)

1-93: Comprehensive SmartArt sample data.

The sample effectively demonstrates three distinct SmartArt layouts (cycle, sequential process, org chart) with varying hierarchy depths. The nested node structure with text and kids fields aligns with the SmartArtNode model definition.

tests/com/test_shapes_extraction.py (4)

7-7: Correct type-aware imports for shape discrimination.

Importing Arrow and Shape separately enables proper isinstance checks throughout the tests.


74-91: Proper type discrimination for shape filtering.

The isinstance(s, Shape) checks correctly filter for regular shapes when validating text content and type properties, avoiding false matches on Arrow instances which don't have the type attribute.


102-107: Correct Arrow filtering for line/connector tests.

Using isinstance(s, Arrow) to identify connectors with arrow styles is the right approach for the refactored model structure.


119-134: Well-structured connector relationship test.

The test properly validates that connectors reference valid shape IDs and handles the case where Excel COM may fail to wire connectors by skipping gracefully.

sample/flowchart/sample-shape-connector.json (1)

1-310: Sample data correctly updated for Shape/Arrow type discrimination.

The sample file properly annotates shapes with "kind": "shape" and connectors with "kind": "arrow", aligning with the refactored model structure. Arrow entries include appropriate connector metadata (begin/end IDs, arrow styles, direction).

tests/io/test_print_area_views.py (4)

5-15: Extended imports for new shape types.

The imports correctly include Arrow, SmartArt, and SmartArtNode alongside existing models to support the expanded shape ecosystem.


21-31: Proper construction of SmartArt and Arrow test fixtures.

The SmartArt instance correctly includes nested SmartArtNode structure, and the Arrow instance with id=None exercises the optional ID case per the schema.


64-64: Shapes list expanded to include all shape types.

Including smartart_inside and arrow_inside in the shapes list ensures the print area filtering logic is tested against the full Shape | Arrow | SmartArt union.


85-86: Test assertion validates all shape kinds are present.

Checking that the filtered shapes include {"shape", "smartart", "arrow"} confirms the IO layer correctly handles the expanded shape type union.

tests/models/test_models_export.py (2)

1-8: Imports correctly organized with new model types.

The json import is properly placed in the standard library section, and SmartArt/SmartArtNode are added to the model imports.


101-126: Thorough test for SmartArt JSON serialization.

The test validates that:

  1. SmartArt's kind field serializes correctly
  2. Nested SmartArtNode structures are preserved
  3. Recursive kids arrays are properly serialized

This ensures the Pydantic model serialization handles the hierarchical SmartArt structure correctly.

scripts/gen_json_schema.py (2)

8-20: Imports correctly extended for new shape models.

Arrow, SmartArt, and SmartArtNode are properly imported from exstruct.models to enable schema generation.


50-52: Schema targets correctly expanded.

The new entries arrow, smartart, and smartart_node are properly mapped to their respective Pydantic model classes, ensuring JSON Schema generation covers all public shape types.

schemas/smartart.json (1)

1-126: Well-structured SmartArt JSON Schema with recursive node definition.

The schema correctly uses $defs for the recursive SmartArtNode type, enabling validation of arbitrarily nested hierarchies. The $ref pattern (#/$defs/SmartArtNode) properly handles self-referential node structures. Required fields (text, l, t, layout) align with the SmartArt model definition.

schemas/smartart_node.json (1)

1-29: LGTM!

The SmartArtNode schema is well-structured with proper recursive definition using $ref. The required text field and optional kids array align correctly with the corresponding Pydantic model definition.

tests/core/test_shapes_positions_dummy.py (2)

4-4: LGTM!

Import correctly added to support the new Arrow type assertions.


108-114: LGTM!

The test now properly validates that line shapes are returned as Arrow instances with the expected direction attribute. The isinstance check is more type-safe than the previous string-based type comparison.

sample/basic/sample.json (1)

35-248: LGTM!

Sample data correctly demonstrates the new shape type structure with proper kind discriminators. Shape entries use "kind": "shape" with their type field, while connector entries use "kind": "arrow" with connector-specific fields (begin_arrow_style, end_arrow_style, begin_id, end_id, direction).

schemas/workbook.json (3)

3-163: LGTM!

The Arrow definition is well-structured with proper nullable types using anyOf patterns, correct const kind discriminator, and appropriate connector-specific fields including the direction enum with 8 compass headings.


564-577: LGTM!

The anyOf pattern correctly implements the discriminated union for shapes, allowing Shape, Arrow, or SmartArt types in the shapes array.


594-716: LGTM!

SmartArt and SmartArtNode definitions are correctly structured. SmartArt requires text, l, t, and layout while keeping nodes optional (with implicit empty array default). SmartArtNode properly references itself recursively for the kids array.

src/exstruct/io/__init__.py (3)

10-19: LGTM!

Imports correctly updated to include Arrow and SmartArt from the models module, maintaining proper import grouping order.


44-55: LGTM!

The isinstance check correctly extended to handle Arrow and SmartArt types for recursive dict processing. SmartArtNode is intentionally omitted since it's only accessed as a nested structure within SmartArt.nodes and is handled by Pydantic's model_dump.


180-197: LGTM!

Function signature and internal type annotations correctly updated to accept and return the union type Shape | Arrow | SmartArt. The filtering logic correctly handles all shape variants since they share the common base properties (l, t, w, h) from BaseShape.

schemas/shape.json (1)

1-100: LGTM!

Schema correctly refactored to represent normal shapes only. The kind const discriminator enables proper union discrimination, and arrow-related properties are appropriately removed (now handled by the separate Arrow schema). The retained type field correctly captures Excel shape type names.

src/exstruct/models/__init__.py (5)

11-25: LGTM!

BaseShape correctly encapsulates common shape properties with appropriate type hints and descriptions. The # noqa: E741 comment correctly suppresses the ambiguous variable name warning for l (left offset).


28-59: LGTM!

Shape and Arrow classes are well-structured with proper inheritance from BaseShape. The Literal type for kind enables effective discriminated unions, and Arrow correctly encapsulates connector-specific fields including the 8-direction compass enum.


62-76: LGTM!

SmartArt hierarchy is correctly implemented. SmartArtNode supports recursive nesting through self-referential kids list, and SmartArt extends BaseShape with layout and nodes fields. Pydantic v2 with from __future__ import annotations handles the forward reference automatically.


141-143: LGTM!

SheetData.shapes correctly typed as a union of all shape variants, enabling type-safe handling throughout the codebase.


299-301: LGTM!

PrintAreaView.shapes consistently typed with the same union pattern as SheetData.shapes.

tests/models/test_models_validation.py (3)

5-14: LGTM!

Imports correctly updated to include the new model types for comprehensive validation testing.


17-38: LGTM!

Test comprehensively validates default values and optional fields for all three shape types. The assertions correctly verify kind discriminators and type-specific fields (layout, nodes for SmartArt; begin_arrow_style, end_arrow_style for Arrow).


70-72: LGTM!

Direction literal validation correctly moved to Arrow class where the direction field now resides.

docs/agents/TASKS.md (1)

1-34: LGTM - Well-structured task tracking document.

The task list is comprehensive and properly tracks the implementation progress for SmartArt support. One verification task remains open at line 27, which appropriately reflects the current state of the feature.

src/exstruct/core/shapes.py (7)

1-11: LGTM - Imports are well-organized.

Import order follows the coding guidelines: standard library → third-party → internal modules.


19-25: LGTM - Good type refinement.

The Literal return type and explicit cast improve type safety and satisfy mypy strict mode requirements.


180-194: Static analysis note: try-except-continue is acceptable here.

The Codacy warning about try-except-continue on line 183 is a false positive in this context. When iterating over COM objects via xlwings, individual nodes may fail to yield a Level property for various reasons (e.g., corrupted data, unsupported node types). Skipping such nodes is the correct resilient behavior.

The broad Exception catch is appropriate at this external library boundary per the coding guidelines.


197-210: LGTM - Clean tree-building algorithm.

The stack-based approach for constructing the nested SmartArtNode hierarchy from flat (level, text) tuples is correct and efficient. The algorithm properly handles arbitrary nesting depths.


259-268: Verify: SmartArt always included regardless of output mode.

When has_smartart is True, the _should_include_shape check is bypassed entirely. This means SmartArt shapes are included even in "light" mode, where other shapes are suppressed.

If this is intentional (SmartArt is considered essential content), consider adding a brief comment to document this behavior. If SmartArt should respect the output mode, the condition should be adjusted.


363-376: LGTM - Arrow-specific properties correctly guarded.

The isinstance(shape_obj, Arrow) checks properly ensure that direction, begin_arrow_style, and end_arrow_style are only set on Arrow objects, aligning with the model separation design.


412-421: LGTM - Pending connections correctly typed and resolved.

The pending_connections list is now properly typed as list[tuple[Arrow, str | None, str | None]], and the connection resolution loop correctly assigns begin_id and end_id only to Arrow objects.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/exstruct/core/shapes.py (2)

19-25: LGTM: Type hint improvement with Literal return type.

The Literal return type provides better type safety. The cast is correctly applied to satisfy the type checker.

Consider adding defensive validation for edge cases:

🔎 Optional: Add input validation for non-finite angles
 def angle_to_compass(
     angle: float,
 ) -> Literal["E", "SE", "S", "SW", "W", "NW", "N", "NE"]:
     """Convert angle to 8-point compass direction (0deg=E, 45deg=NE, 90deg=N, etc)."""
+    if not math.isfinite(angle):
+        angle = 0.0
     dirs = ["E", "NE", "N", "NW", "W", "SW", "S", "SE"]
     idx = int(((angle + 22.5) % 360) // 45)
     return cast(Literal["E", "SE", "S", "SW", "W", "NW", "N", "NE"], dirs[idx])

226-433: Consider decomposing the shape extraction function.

The get_shapes_with_position function has high cyclomatic complexity (marked with noqa: C901) and handles multiple responsibilities: iteration, filtering, SmartArt detection, shape construction, connector wiring, and position calculation. Breaking it into smaller focused functions would improve maintainability and testability.

Potential decomposition:

  • Extract shape filtering logic into a separate function
  • Extract shape object construction (lines 315-362) into a factory function
  • Extract connector wiring logic (lines 425-431) into a separate function
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6010edc and ac17a18.

📒 Files selected for processing (5)
  • README.md
  • docs/README.en.md
  • src/exstruct/core/shapes.py
  • tests/core/test_shapes_positions_dummy.py
  • tests/core/test_shapes_smartart_utils.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/README.en.md
  • tests/core/test_shapes_positions_dummy.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Add type hints to all function and method arguments and return values, must be mypy strict compliant
Use Any type only at external library boundaries (xlwings, pandas, numpy, etc.)
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Each function must have a single responsibility; cyclomatic complexity must not exceed 12
Organize imports in the following order: 1) standard library, 2) third-party packages, 3) exstruct internal modules
Add Google-style docstrings to all functions and classes
Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer
Generated code must pass mypy strict mode with zero errors
Generated code must pass Ruff linter checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid God Functions (large monolithic functions that do too much)
Avoid God Objects (classes that have too many responsibilities)
Eliminate unnecessary nesting and deeply nested conditional branches
Avoid circular dependencies between modules
Write functions that are testable and avoid unnecessary side effects

Files:

  • src/exstruct/core/shapes.py
  • tests/core/test_shapes_smartart_utils.py
🧬 Code graph analysis (2)
src/exstruct/core/shapes.py (1)
src/exstruct/models/__init__.py (4)
  • Arrow (35-59)
  • Shape (28-32)
  • SmartArt (69-76)
  • SmartArtNode (62-66)
tests/core/test_shapes_smartart_utils.py (1)
src/exstruct/core/shapes.py (5)
  • _shape_has_smartart (144-153)
  • _get_smartart_layout_name (156-165)
  • _SmartArtLike (137-141)
  • _collect_smartart_node_info (168-193)
  • _extract_smartart_nodes (220-223)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: test (windows-latest, 3.11)
🔇 Additional comments (20)
README.md (3)

7-7: Past formatting issue resolved.

The missing space between "smartart," and "print" from the previous review has been corrected. Line 7 now properly reads "smartart, print areas/views" with appropriate spacing.


13-14: SmartArt integration is consistent and well-documented.

The additions of SmartArt to both the feature list and output mode descriptions align well with the extraction enhancements. The descriptions clearly delineate which modes include SmartArt (standard and verbose), providing users with appropriate guidance.


398-402: Documentation note on coverage is appropriate.

The new "Note on coverage" section adequately contextualizes the heuristic-based approach and sets realistic expectations regarding coverage gaps. This is helpful for users understanding the library's scope and limitations.

src/exstruct/core/shapes.py (11)

3-3: LGTM: Import additions are well-organized.

The new imports follow the proper organization (standard library → third-party → internal) and all imports are utilized in the implementation.

Also applies to: 5-5, 10-10


113-142: LGTM: Well-defined protocols for SmartArt COM interface.

The protocols correctly use @runtime_checkable for runtime type checks and accurately model the Excel COM API structure. The PascalCase attribute names appropriately match the external API.


144-153: LGTM: Robust SmartArt detection with proper exception handling.

The function correctly handles missing API access and SmartArt availability with appropriate exception guards.


156-165: LGTM: Defensive layout name extraction.

The function safely handles None inputs and missing layout attributes with proper fallback to "Unknown".


168-193: LGTM: Well-structured node information collection.

The function correctly iterates through nodes, safely extracts level and text with exception handling, and returns a clean list of tuples.


196-201: LGTM: Safe level extraction.

The function properly handles conversion errors by returning None.


204-217: LGTM: Efficient tree construction algorithm.

The stack-based algorithm correctly builds a hierarchical tree from flat (level, text) tuples with O(n) complexity.


220-223: LGTM: Clean orchestration function.

The function appropriately composes the collection and tree-building steps.


266-267: LGTM: Light-mode suppression now correctly handles SmartArt.

The early continue for mode == "light" at line 266 now executes before SmartArt detection at line 269, ensuring that SmartArt shapes are properly suppressed in light mode. This addresses the concern raised in the previous review about SmartArt bypassing light-mode suppression.

Based on past review comments, this change resolves the light-mode regression issue.


315-362: LGTM: Clear shape construction branching.

The three-way branching (SmartArt → Arrow → Shape) correctly constructs typed objects based on shape characteristics. The conditional width/height inclusion based on mode and shape type is consistently applied across all branches.


373-374: LGTM: Proper type guards for Arrow-specific attributes.

The isinstance(shape_obj, Arrow) checks correctly narrow the type before accessing Arrow-specific attributes like direction, begin_arrow_style, and end_arrow_style, ensuring type safety.

Also applies to: 384-386, 422-423

tests/core/test_shapes_smartart_utils.py (6)

11-66: LGTM: Well-designed test fixtures.

The dummy dataclasses effectively mock the Excel COM API structure with appropriate type hints, frozen immutability for API objects, and exception-raising variants for error path testing. The N815 suppressions correctly accommodate Excel's PascalCase naming conventions.


68-81: LGTM: Comprehensive true/false case testing.

The test correctly verifies both True and False return values for SmartArt detection using appropriate mock objects.


84-93: LGTM: Thorough exception handling verification.

The test confirms that both API access errors and HasSmartArt property errors are handled gracefully by returning False.


96-109: LGTM: Complete layout name extraction coverage.

The test validates all edge cases: None input, missing name attribute, and valid name extraction.


112-143: LGTM: Validates both collection and tree construction.

The test comprehensively verifies node info collection (including nodes without text) and the hierarchical tree structure with parent-child relationships.


146-147: LGTM: Validates None input handling.

The test confirms that passing None to _collect_smartart_node_info safely returns an empty list.

@harumiWeb
Copy link
Owner Author

@coderabbitai generate docstrings

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 28, 2025

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 28, 2025

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #31

coderabbitai bot added a commit that referenced this pull request Dec 28, 2025
Docstrings generation was requested by @harumiWeb.

* #30 (comment)

The following files were modified:

* `scripts/gen_json_schema.py`
* `src/exstruct/core/shapes.py`
* `src/exstruct/io/__init__.py`
* `tests/com/test_shapes_extraction.py`
* `tests/core/test_mode_output.py`
* `tests/core/test_shapes_positions_dummy.py`
* `tests/core/test_shapes_smartart_utils.py`
* `tests/io/test_print_area_views.py`
* `tests/models/test_models_export.py`
* `tests/models/test_models_validation.py`
@harumiWeb harumiWeb merged commit cdb6437 into main Dec 28, 2025
9 checks passed
@harumiWeb harumiWeb deleted the dev/smartart branch December 28, 2025 10:06
@coderabbitai coderabbitai bot mentioned this pull request Dec 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants