-
Notifications
You must be signed in to change notification settings - Fork 14
SmartArt解析構造化機能追加 #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Arrow / SmartArt の生成時に type を付与しない - テストの type 参照を Shape に限定 - データモデル仕様の記述を更新
…upport - Updated README and documentation to reflect the addition of SmartArt and Arrow extraction features. - Introduced new schemas for Arrow, SmartArt, and SmartArtNode to support structured data representation. - Refactored JSON schema generation script to include new models. - Improved extraction logic to handle SmartArt layouts and nested nodes. - Enhanced output formats to include detailed metadata for shapes, arrows, and SmartArt.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughRefactors shape model into three types (Shape, Arrow, SmartArt), adds SmartArt extraction and nested node construction, updates schemas, samples, docs, serialization, and tests to handle Shape | Arrow | SmartArt across extraction and I/O codepaths. Changes
Sequence Diagram(s)sequenceDiagram
participant Workbook as Workbook (xlwings/COM)
participant Extractor as shapes.get_shapes_with_position
participant SmartBuilder as SmartArt node builder
participant Connector as Arrow wiring
participant Serializer as io.dict_without_empty_values
participant Output as CLI / JSON writer
rect rgb(240,248,255)
Workbook->>Extractor: read shapes from sheet (HasSmartArt?/geometry)
end
alt shape has SmartArt
Extractor->>SmartBuilder: _collect/_build_smartart_tree
SmartBuilder-->>Extractor: SmartArt(kind="smartart", layout, nodes, pos)
else connector / line
Extractor->>Connector: classify connector -> Arrow (direction, begin/end ids)
Connector-->>Extractor: Arrow instance
else normal shape
Extractor-->>Extractor: construct Shape(kind="shape", text, pos)
end
Extractor->>Serializer: produce dict (includes Shape|Arrow|SmartArt)
Serializer-->>Output: write JSON/PDF/images (CLI rendering uses COM or fallback)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tests/models/test_models_validation.py (1)
82-97: Consider using attribute access check pattern.The
hasattrassertions verify thatShapelacks Arrow-specific fields. While this works, accessing viamodel_fieldswould be more explicit for Pydantic models.🔎 Alternative approach using model_fields
- assert not hasattr(shape, "begin_id") - assert not hasattr(shape, "end_id") + assert "begin_id" not in Shape.model_fields + assert "end_id" not in Shape.model_fieldsNote: The current
hasattrapproach is also valid since Pydantic BaseModel instances only expose defined fields as attributes.src/exstruct/core/shapes.py (2)
113-142: Good use of Protocol for COM object typing.These protocols properly abstract the xlwings/COM interface at the boundary layer, aligning with the coding guideline to normalize external entities. The
@runtime_checkabledecorators are present but note that they're currently unused forisinstance()checks—if runtime checking isn't needed, the decorators could be removed to reduce overhead, though this is optional.
305-352: Consider extracting repeated width/height logic.The conditional for computing
wandhis repeated identically acrossSmartArt,Arrow, andShapeconstruction:w=int(shp.width) if mode == "verbose" or shape_type_str == "Group" else None, h=int(shp.height) if mode == "verbose" or shape_type_str == "Group" else None,This could be extracted into helper variables to reduce duplication and improve maintainability.
🔎 Proposed refactor
shape_obj: Shape | Arrow | SmartArt + include_size = mode == "verbose" or shape_type_str == "Group" + w_val = int(shp.width) if include_size else None + h_val = int(shp.height) if include_size else None if has_smartart: smartart_obj: _SmartArtLike | None = None try: smartart_obj = shp.api.SmartArt except Exception: smartart_obj = None shape_obj = SmartArt( id=shape_id, text=text, l=int(shp.left), t=int(shp.top), - w=int(shp.width) - if mode == "verbose" or shape_type_str == "Group" - else None, - h=int(shp.height) - if mode == "verbose" or shape_type_str == "Group" - else None, + w=w_val, + h=h_val, layout=_get_smartart_layout_name(smartart_obj), nodes=_extract_smartart_nodes(smartart_obj), )Apply similar changes to the
ArrowandShapeconstruction blocks.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
sample/smartart/sample_smartart.xlsxis excluded by!**/*.xlsx
📒 Files selected for processing (37)
README.ja.mdREADME.mddocs/README.en.mddocs/README.ja.mddocs/agents/CODE_REVIEW.mddocs/agents/DATA_MODEL.mddocs/agents/EXCEL_EXTRACTION.mddocs/agents/FEATURE_SPEC.mddocs/agents/OVERVIEW.mddocs/agents/ROADMAP.mddocs/agents/TASKS.mddocs/agents/TEST_REQUIREMENTS.mddocs/concept.mddocs/schemas.mdsample/basic/sample.jsonsample/flowchart/sample-shape-connector.jsonsample/smartart/sample_smartart.jsonsample/smartart/sample_smartart_for_llm.mdschemas/arrow.jsonschemas/print_area_view.jsonschemas/shape.jsonschemas/sheet.jsonschemas/smartart.jsonschemas/smartart_node.jsonschemas/workbook.jsonscripts/gen_json_schema.pysrc/exstruct/core/modeling.pysrc/exstruct/core/pipeline.pysrc/exstruct/core/shapes.pysrc/exstruct/io/__init__.pysrc/exstruct/models/__init__.pytests/com/test_shapes_extraction.pytests/core/test_mode_output.pytests/core/test_shapes_positions_dummy.pytests/io/test_print_area_views.pytests/models/test_models_export.pytests/models/test_models_validation.py
💤 Files with no reviewable changes (2)
- docs/agents/FEATURE_SPEC.md
- docs/agents/CODE_REVIEW.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Add type hints to all function and method arguments and return values, must be mypy strict compliant
UseAnytype only at external library boundaries (xlwings, pandas, numpy, etc.)
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Each function must have a single responsibility; cyclomatic complexity must not exceed 12
Organize imports in the following order: 1) standard library, 2) third-party packages, 3) exstruct internal modules
Add Google-style docstrings to all functions and classes
Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer
Generated code must pass mypy strict mode with zero errors
Generated code must pass Ruff linter checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid God Functions (large monolithic functions that do too much)
Avoid God Objects (classes that have too many responsibilities)
Eliminate unnecessary nesting and deeply nested conditional branches
Avoid circular dependencies between modules
Write functions that are testable and avoid unnecessary side effects
Files:
tests/com/test_shapes_extraction.pytests/core/test_mode_output.pyscripts/gen_json_schema.pysrc/exstruct/models/__init__.pysrc/exstruct/core/pipeline.pysrc/exstruct/core/modeling.pytests/io/test_print_area_views.pysrc/exstruct/core/shapes.pytests/core/test_shapes_positions_dummy.pytests/models/test_models_export.pytests/models/test_models_validation.pysrc/exstruct/io/__init__.py
🧠 Learnings (2)
📚 Learning: 2025-12-27T14:32:54.706Z
Learnt from: CR
Repo: harumiWeb/exstruct PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T14:32:54.706Z
Learning: Applies to **/*.py : Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer
Applied to files:
tests/com/test_shapes_extraction.pyscripts/gen_json_schema.pysrc/exstruct/io/__init__.py
📚 Learning: 2025-12-27T14:32:54.706Z
Learnt from: CR
Repo: harumiWeb/exstruct PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T14:32:54.706Z
Learning: Applies to **/*.py : Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Applied to files:
scripts/gen_json_schema.py
🧬 Code graph analysis (11)
tests/com/test_shapes_extraction.py (1)
src/exstruct/models/__init__.py (2)
Arrow(35-59)Shape(28-32)
tests/core/test_mode_output.py (2)
src/exstruct/models/__init__.py (1)
Arrow(35-59)tests/core/test_shapes_utils.py (1)
test_has_arrow(73-78)
scripts/gen_json_schema.py (1)
src/exstruct/models/__init__.py (4)
Arrow(35-59)Shape(28-32)SmartArt(69-76)SmartArtNode(62-66)
src/exstruct/core/pipeline.py (1)
src/exstruct/models/__init__.py (3)
Arrow(35-59)Shape(28-32)SmartArt(69-76)
src/exstruct/core/modeling.py (1)
src/exstruct/models/__init__.py (8)
Arrow(35-59)CellRow(79-88)Chart(106-123)PrintArea(126-132)Shape(28-32)SheetData(135-224)SmartArt(69-76)WorkbookData(227-290)
tests/io/test_print_area_views.py (1)
src/exstruct/models/__init__.py (5)
Arrow(35-59)PrintArea(126-132)Shape(28-32)SmartArt(69-76)SmartArtNode(62-66)
src/exstruct/core/shapes.py (1)
src/exstruct/models/__init__.py (4)
Arrow(35-59)Shape(28-32)SmartArt(69-76)SmartArtNode(62-66)
tests/core/test_shapes_positions_dummy.py (2)
src/exstruct/models/__init__.py (1)
Arrow(35-59)tests/core/test_shapes_utils.py (1)
test_has_arrow(73-78)
tests/models/test_models_export.py (1)
src/exstruct/models/__init__.py (7)
CellRow(79-88)SheetData(135-224)SmartArt(69-76)SmartArtNode(62-66)to_json(169-174)to_json(235-241)to_json(317-322)
tests/models/test_models_validation.py (1)
src/exstruct/models/__init__.py (5)
Arrow(35-59)CellRow(79-88)Shape(28-32)SmartArt(69-76)SmartArtNode(62-66)
src/exstruct/io/__init__.py (1)
src/exstruct/models/__init__.py (7)
Arrow(35-59)CellRow(79-88)Chart(106-123)PrintArea(126-132)PrintAreaView(293-372)Shape(28-32)SmartArt(69-76)
🪛 GitHub Check: Codacy Static Code Analysis
src/exstruct/core/shapes.py
[warning] 183-183: src/exstruct/core/shapes.py#L183
Try, Except, Continue detected.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (75)
docs/agents/TEST_REQUIREMENTS.md (4)
3-3: Version bump to 0.4 aligns with SmartArt feature addition.The minor version increment is appropriate for introducing SmartArt support as a new feature.
54-54: Requirement [SHP-02a] clearly defines type field scope.The new requirement correctly specifies that the
typefield is exclusive to Shape objects, establishing a clear contract for the refactored shape type hierarchy (Shape, Arrow, SmartArt).
64-69: New SmartArt extraction requirements are well-structured and testable.The four requirements ([SHP-SA-01] through [SHP-SA-04]) provide clear, specific test criteria:
- Mandatory
layoutfield ensures SmartArt metadata is always present- Nested
nodesstructure withkids(notlevel) reflects the hierarchical nature of SmartArtkind="smartart"discriminator provides unambiguous identification alongside the existing type-specific fields
64-69: Verify SmartArt requirements align with upstream data model definitions.Confirm that the SmartArt, SmartArtNode models and their public schema (JSON/YAML/TOON output) precisely match these test requirements, particularly the
layout,nodes,kids, andkindfields.docs/agents/ROADMAP.md (1)
36-37: LGTM!Roadmap entries appropriately document the Shape/Arrow separation and SmartArt parsing features added in v0.3.1.
docs/agents/OVERVIEW.md (2)
19-19: LGTM!Extraction target list correctly updated to reflect the expanded shape model including Arrows and SmartArt with layout information.
27-27: LGTM!The description accurately reflects the
process_excelfunctionality for file/directory output.README.ja.md (3)
7-7: LGTM!SmartArt correctly added to the structured data output description alongside cells, shapes, charts, and other elements.
11-12: LGTM!Output mode descriptions appropriately expanded to include SmartArt in
standardandverbosemodes.
399-403: LGTM!The coverage note appropriately documents the intentional decision not to pursue exhaustive test coverage for the heuristic-based cell structure inference logic.
docs/README.en.md (2)
13-14: LGTM!Output modes correctly document SmartArt inclusion in
standardandverbosemodes.
403-408: LGTM!Coverage note section properly explains the intentional limited test coverage for heuristic-based cell inference logic.
docs/agents/EXCEL_EXTRACTION.md (1)
35-43: LGTM!The extraction specification correctly documents:
- The expanded shape types (Shapes / Arrows / SmartArt)
- The
typefield restriction to Shape only- SmartArt's nested structure with layout/nodes/kids
README.md (2)
13-14: LGTM!Output mode descriptions correctly include SmartArt in the feature list.
398-403: LGTM!Coverage note properly documented.
docs/README.ja.md (3)
7-7: LGTM!SmartArt correctly added to the structured data output description.
11-12: LGTM!Output modes appropriately include SmartArt in standard and verbose descriptions.
404-408: LGTM!Coverage note appropriately explains the intentional incomplete test coverage rationale.
schemas/print_area_view.json (5)
3-163: LGTM!The
Arrowdefinition correctly captures connector-specific metadata including:
- Arrow styles for start/end points
- Connected shape IDs (begin_id/end_id)
- Direction as compass heading
- Common positional fields (l, t, w, h)
- Discriminant
kind: "arrow"
409-507: LGTM!The
Shapedefinition correctly models normal shapes with:
- Type field for Excel shape type name
- Discriminant
kind: "shape"- Common positional and dimensional fields
508-607: LGTM!The
SmartArtdefinition correctly models SmartArt diagrams with:
- Layout name field
- Nested nodes array referencing SmartArtNode
- Discriminant
kind: "smartart"- Required
layoutfield in addition to common fields
660-677: LGTM!The
shapesarray correctly usesanyOfto accept the union ofShape,Arrow, orSmartArttypes, enabling polymorphic shape handling in PrintAreaView.
608-630: Remove the comparison toSmartArt.nodesas it also lacks a default value.The
SmartArtNode.kidsproperty is an array without adefaultvalue. However,SmartArt.nodessimilarly has nodefaultvalue in the schema despite both being defined withdefault_factory=listin the Python model. If adding defaults to array fields is desired for consistency, this should apply to both properties, not justkids.Likely an incorrect or invalid review comment.
docs/agents/DATA_MODEL.md (1)
3-3: LGTM! Documentation clearly describes the Shape refactoring.The documentation effectively explains the separation of Shape into three discriminated union types (Shape, Arrow, SmartArt) with a shared BaseShape structure. The use of the
kinddiscriminator field is well-documented, and the changelog entry properly tracks this architectural change.Also applies to: 16-55, 138-138, 156-156, 228-228
schemas/sheet.json (1)
3-163: LGTM! Schema properly defines discriminated union types.The JSON schema correctly implements the Shape/Arrow/SmartArt union with:
- Proper discriminator field (
kindwith const values)- Consistent required fields across all shape types
- Recursive SmartArtNode definition for nested structures
- Clear separation of shape-specific fields (e.g.,
begin_id/end_idfor Arrow,layout/nodesfor SmartArt)Also applies to: 508-630, 688-702
docs/concept.md (1)
111-111: LGTM! Documentation updated to reflect new shape types.sample/smartart/sample_smartart_for_llm.md (1)
1-92: LGTM! Helpful sample documentation for SmartArt structures.The examples effectively demonstrate different SmartArt layouts with both bullet lists and Mermaid flowcharts, providing clear references for LLM understanding of hierarchical and process-oriented structures.
src/exstruct/core/pipeline.py (1)
13-13: LGTM! Type annotations correctly updated for shape union types.The import additions and ShapeData type declaration properly reflect the Shape | Arrow | SmartArt union, maintaining type safety throughout the pipeline.
Also applies to: 26-26
docs/schemas.md (1)
14-16: LGTM! Schema listing updated with new shape types.tests/core/test_mode_output.py (2)
13-13: LGTM! Test correctly validates Arrow type with improved type safety.The change from checking a
typefield to usingisinstance(s, Arrow)is more type-safe and aligns with the new discriminated union approach. The assertion correctly verifies that textless shapes are Arrow instances with appropriate connector attributes.Also applies to: 82-83
112-112: LGTM! Appropriate use of type ignore for negative test cases.The
type: ignore[arg-type]comments are correctly applied to intentionally invalid mode arguments in negative test cases.Also applies to: 116-116
src/exstruct/core/modeling.py (1)
5-14: LGTM! Type annotations correctly updated for shape union types.The imports and SheetRawData type declaration properly reflect the Shape | Arrow | SmartArt union, maintaining consistency with the broader refactoring. The code follows all coding guidelines with proper type hints and documentation.
Also applies to: 32-32
schemas/arrow.json (1)
1-162: Well-structured Arrow JSON Schema.The schema correctly defines the Arrow connector metadata with proper nullable handling via
anyOf, appropriate required fields (text,l,t), and a comprehensive direction enum for compass headings. Thekindconst discriminator enables proper union type validation.sample/smartart/sample_smartart.json (1)
1-93: Comprehensive SmartArt sample data.The sample effectively demonstrates three distinct SmartArt layouts (cycle, sequential process, org chart) with varying hierarchy depths. The nested node structure with
textandkidsfields aligns with theSmartArtNodemodel definition.tests/com/test_shapes_extraction.py (4)
7-7: Correct type-aware imports for shape discrimination.Importing
ArrowandShapeseparately enables properisinstancechecks throughout the tests.
74-91: Proper type discrimination for shape filtering.The
isinstance(s, Shape)checks correctly filter for regular shapes when validating text content and type properties, avoiding false matches onArrowinstances which don't have thetypeattribute.
102-107: Correct Arrow filtering for line/connector tests.Using
isinstance(s, Arrow)to identify connectors with arrow styles is the right approach for the refactored model structure.
119-134: Well-structured connector relationship test.The test properly validates that connectors reference valid shape IDs and handles the case where Excel COM may fail to wire connectors by skipping gracefully.
sample/flowchart/sample-shape-connector.json (1)
1-310: Sample data correctly updated for Shape/Arrow type discrimination.The sample file properly annotates shapes with
"kind": "shape"and connectors with"kind": "arrow", aligning with the refactored model structure. Arrow entries include appropriate connector metadata (begin/end IDs, arrow styles, direction).tests/io/test_print_area_views.py (4)
5-15: Extended imports for new shape types.The imports correctly include
Arrow,SmartArt, andSmartArtNodealongside existing models to support the expanded shape ecosystem.
21-31: Proper construction of SmartArt and Arrow test fixtures.The
SmartArtinstance correctly includes nestedSmartArtNodestructure, and theArrowinstance withid=Noneexercises the optional ID case per the schema.
64-64: Shapes list expanded to include all shape types.Including
smartart_insideandarrow_insidein the shapes list ensures the print area filtering logic is tested against the fullShape | Arrow | SmartArtunion.
85-86: Test assertion validates all shape kinds are present.Checking that the filtered shapes include
{"shape", "smartart", "arrow"}confirms the IO layer correctly handles the expanded shape type union.tests/models/test_models_export.py (2)
1-8: Imports correctly organized with new model types.The
jsonimport is properly placed in the standard library section, andSmartArt/SmartArtNodeare added to the model imports.
101-126: Thorough test for SmartArt JSON serialization.The test validates that:
- SmartArt's
kindfield serializes correctly- Nested
SmartArtNodestructures are preserved- Recursive
kidsarrays are properly serializedThis ensures the Pydantic model serialization handles the hierarchical SmartArt structure correctly.
scripts/gen_json_schema.py (2)
8-20: Imports correctly extended for new shape models.
Arrow,SmartArt, andSmartArtNodeare properly imported fromexstruct.modelsto enable schema generation.
50-52: Schema targets correctly expanded.The new entries
arrow,smartart, andsmartart_nodeare properly mapped to their respective Pydantic model classes, ensuring JSON Schema generation covers all public shape types.schemas/smartart.json (1)
1-126: Well-structured SmartArt JSON Schema with recursive node definition.The schema correctly uses
$defsfor the recursiveSmartArtNodetype, enabling validation of arbitrarily nested hierarchies. The$refpattern (#/$defs/SmartArtNode) properly handles self-referential node structures. Required fields (text,l,t,layout) align with the SmartArt model definition.schemas/smartart_node.json (1)
1-29: LGTM!The SmartArtNode schema is well-structured with proper recursive definition using
$ref. The requiredtextfield and optionalkidsarray align correctly with the corresponding Pydantic model definition.tests/core/test_shapes_positions_dummy.py (2)
4-4: LGTM!Import correctly added to support the new Arrow type assertions.
108-114: LGTM!The test now properly validates that line shapes are returned as
Arrowinstances with the expecteddirectionattribute. Theisinstancecheck is more type-safe than the previous string-based type comparison.sample/basic/sample.json (1)
35-248: LGTM!Sample data correctly demonstrates the new shape type structure with proper
kinddiscriminators. Shape entries use"kind": "shape"with theirtypefield, while connector entries use"kind": "arrow"with connector-specific fields (begin_arrow_style,end_arrow_style,begin_id,end_id,direction).schemas/workbook.json (3)
3-163: LGTM!The Arrow definition is well-structured with proper nullable types using
anyOfpatterns, correct constkinddiscriminator, and appropriate connector-specific fields including the direction enum with 8 compass headings.
564-577: LGTM!The
anyOfpattern correctly implements the discriminated union for shapes, allowingShape,Arrow, orSmartArttypes in the shapes array.
594-716: LGTM!SmartArt and SmartArtNode definitions are correctly structured. SmartArt requires
text,l,t, andlayoutwhile keepingnodesoptional (with implicit empty array default). SmartArtNode properly references itself recursively for thekidsarray.src/exstruct/io/__init__.py (3)
10-19: LGTM!Imports correctly updated to include
ArrowandSmartArtfrom the models module, maintaining proper import grouping order.
44-55: LGTM!The
isinstancecheck correctly extended to handleArrowandSmartArttypes for recursive dict processing.SmartArtNodeis intentionally omitted since it's only accessed as a nested structure withinSmartArt.nodesand is handled by Pydantic'smodel_dump.
180-197: LGTM!Function signature and internal type annotations correctly updated to accept and return the union type
Shape | Arrow | SmartArt. The filtering logic correctly handles all shape variants since they share the common base properties (l,t,w,h) fromBaseShape.schemas/shape.json (1)
1-100: LGTM!Schema correctly refactored to represent normal shapes only. The
kindconst discriminator enables proper union discrimination, and arrow-related properties are appropriately removed (now handled by the separate Arrow schema). The retainedtypefield correctly captures Excel shape type names.src/exstruct/models/__init__.py (5)
11-25: LGTM!
BaseShapecorrectly encapsulates common shape properties with appropriate type hints and descriptions. The# noqa: E741comment correctly suppresses the ambiguous variable name warning forl(left offset).
28-59: LGTM!Shape and Arrow classes are well-structured with proper inheritance from
BaseShape. TheLiteraltype forkindenables effective discriminated unions, and Arrow correctly encapsulates connector-specific fields including the 8-direction compass enum.
62-76: LGTM!SmartArt hierarchy is correctly implemented.
SmartArtNodesupports recursive nesting through self-referentialkidslist, andSmartArtextendsBaseShapewith layout and nodes fields. Pydantic v2 withfrom __future__ import annotationshandles the forward reference automatically.
141-143: LGTM!
SheetData.shapescorrectly typed as a union of all shape variants, enabling type-safe handling throughout the codebase.
299-301: LGTM!
PrintAreaView.shapesconsistently typed with the same union pattern asSheetData.shapes.tests/models/test_models_validation.py (3)
5-14: LGTM!Imports correctly updated to include the new model types for comprehensive validation testing.
17-38: LGTM!Test comprehensively validates default values and optional fields for all three shape types. The assertions correctly verify
kinddiscriminators and type-specific fields (layout,nodesfor SmartArt;begin_arrow_style,end_arrow_stylefor Arrow).
70-72: LGTM!Direction literal validation correctly moved to
Arrowclass where thedirectionfield now resides.docs/agents/TASKS.md (1)
1-34: LGTM - Well-structured task tracking document.The task list is comprehensive and properly tracks the implementation progress for SmartArt support. One verification task remains open at line 27, which appropriately reflects the current state of the feature.
src/exstruct/core/shapes.py (7)
1-11: LGTM - Imports are well-organized.Import order follows the coding guidelines: standard library → third-party → internal modules.
19-25: LGTM - Good type refinement.The
Literalreturn type and explicitcastimprove type safety and satisfy mypy strict mode requirements.
180-194: Static analysis note:try-except-continueis acceptable here.The Codacy warning about
try-except-continueon line 183 is a false positive in this context. When iterating over COM objects via xlwings, individual nodes may fail to yield aLevelproperty for various reasons (e.g., corrupted data, unsupported node types). Skipping such nodes is the correct resilient behavior.The broad
Exceptioncatch is appropriate at this external library boundary per the coding guidelines.
197-210: LGTM - Clean tree-building algorithm.The stack-based approach for constructing the nested
SmartArtNodehierarchy from flat(level, text)tuples is correct and efficient. The algorithm properly handles arbitrary nesting depths.
259-268: Verify: SmartArt always included regardless of output mode.When
has_smartartisTrue, the_should_include_shapecheck is bypassed entirely. This means SmartArt shapes are included even in"light"mode, where other shapes are suppressed.If this is intentional (SmartArt is considered essential content), consider adding a brief comment to document this behavior. If SmartArt should respect the output mode, the condition should be adjusted.
363-376: LGTM - Arrow-specific properties correctly guarded.The
isinstance(shape_obj, Arrow)checks properly ensure thatdirection,begin_arrow_style, andend_arrow_styleare only set onArrowobjects, aligning with the model separation design.
412-421: LGTM - Pending connections correctly typed and resolved.The
pending_connectionslist is now properly typed aslist[tuple[Arrow, str | None, str | None]], and the connection resolution loop correctly assignsbegin_idandend_idonly toArrowobjects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/exstruct/core/shapes.py (2)
19-25: LGTM: Type hint improvement with Literal return type.The Literal return type provides better type safety. The cast is correctly applied to satisfy the type checker.
Consider adding defensive validation for edge cases:
🔎 Optional: Add input validation for non-finite angles
def angle_to_compass( angle: float, ) -> Literal["E", "SE", "S", "SW", "W", "NW", "N", "NE"]: """Convert angle to 8-point compass direction (0deg=E, 45deg=NE, 90deg=N, etc).""" + if not math.isfinite(angle): + angle = 0.0 dirs = ["E", "NE", "N", "NW", "W", "SW", "S", "SE"] idx = int(((angle + 22.5) % 360) // 45) return cast(Literal["E", "SE", "S", "SW", "W", "NW", "N", "NE"], dirs[idx])
226-433: Consider decomposing the shape extraction function.The
get_shapes_with_positionfunction has high cyclomatic complexity (marked withnoqa: C901) and handles multiple responsibilities: iteration, filtering, SmartArt detection, shape construction, connector wiring, and position calculation. Breaking it into smaller focused functions would improve maintainability and testability.Potential decomposition:
- Extract shape filtering logic into a separate function
- Extract shape object construction (lines 315-362) into a factory function
- Extract connector wiring logic (lines 425-431) into a separate function
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
README.mddocs/README.en.mdsrc/exstruct/core/shapes.pytests/core/test_shapes_positions_dummy.pytests/core/test_shapes_smartart_utils.py
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/README.en.md
- tests/core/test_shapes_positions_dummy.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Add type hints to all function and method arguments and return values, must be mypy strict compliant
UseAnytype only at external library boundaries (xlwings, pandas, numpy, etc.)
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Each function must have a single responsibility; cyclomatic complexity must not exceed 12
Organize imports in the following order: 1) standard library, 2) third-party packages, 3) exstruct internal modules
Add Google-style docstrings to all functions and classes
Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external entities to Pydantic models at the boundary layer
Generated code must pass mypy strict mode with zero errors
Generated code must pass Ruff linter checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid God Functions (large monolithic functions that do too much)
Avoid God Objects (classes that have too many responsibilities)
Eliminate unnecessary nesting and deeply nested conditional branches
Avoid circular dependencies between modules
Write functions that are testable and avoid unnecessary side effects
Files:
src/exstruct/core/shapes.pytests/core/test_shapes_smartart_utils.py
🧬 Code graph analysis (2)
src/exstruct/core/shapes.py (1)
src/exstruct/models/__init__.py (4)
Arrow(35-59)Shape(28-32)SmartArt(69-76)SmartArtNode(62-66)
tests/core/test_shapes_smartart_utils.py (1)
src/exstruct/core/shapes.py (5)
_shape_has_smartart(144-153)_get_smartart_layout_name(156-165)_SmartArtLike(137-141)_collect_smartart_node_info(168-193)_extract_smartart_nodes(220-223)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: test (windows-latest, 3.11)
🔇 Additional comments (20)
README.md (3)
7-7: Past formatting issue resolved.The missing space between "smartart," and "print" from the previous review has been corrected. Line 7 now properly reads "smartart, print areas/views" with appropriate spacing.
13-14: SmartArt integration is consistent and well-documented.The additions of SmartArt to both the feature list and output mode descriptions align well with the extraction enhancements. The descriptions clearly delineate which modes include SmartArt (standard and verbose), providing users with appropriate guidance.
398-402: Documentation note on coverage is appropriate.The new "Note on coverage" section adequately contextualizes the heuristic-based approach and sets realistic expectations regarding coverage gaps. This is helpful for users understanding the library's scope and limitations.
src/exstruct/core/shapes.py (11)
3-3: LGTM: Import additions are well-organized.The new imports follow the proper organization (standard library → third-party → internal) and all imports are utilized in the implementation.
Also applies to: 5-5, 10-10
113-142: LGTM: Well-defined protocols for SmartArt COM interface.The protocols correctly use
@runtime_checkablefor runtime type checks and accurately model the Excel COM API structure. The PascalCase attribute names appropriately match the external API.
144-153: LGTM: Robust SmartArt detection with proper exception handling.The function correctly handles missing API access and SmartArt availability with appropriate exception guards.
156-165: LGTM: Defensive layout name extraction.The function safely handles None inputs and missing layout attributes with proper fallback to "Unknown".
168-193: LGTM: Well-structured node information collection.The function correctly iterates through nodes, safely extracts level and text with exception handling, and returns a clean list of tuples.
196-201: LGTM: Safe level extraction.The function properly handles conversion errors by returning None.
204-217: LGTM: Efficient tree construction algorithm.The stack-based algorithm correctly builds a hierarchical tree from flat (level, text) tuples with O(n) complexity.
220-223: LGTM: Clean orchestration function.The function appropriately composes the collection and tree-building steps.
266-267: LGTM: Light-mode suppression now correctly handles SmartArt.The early
continueformode == "light"at line 266 now executes before SmartArt detection at line 269, ensuring that SmartArt shapes are properly suppressed in light mode. This addresses the concern raised in the previous review about SmartArt bypassing light-mode suppression.Based on past review comments, this change resolves the light-mode regression issue.
315-362: LGTM: Clear shape construction branching.The three-way branching (SmartArt → Arrow → Shape) correctly constructs typed objects based on shape characteristics. The conditional width/height inclusion based on mode and shape type is consistently applied across all branches.
373-374: LGTM: Proper type guards for Arrow-specific attributes.The
isinstance(shape_obj, Arrow)checks correctly narrow the type before accessing Arrow-specific attributes likedirection,begin_arrow_style, andend_arrow_style, ensuring type safety.Also applies to: 384-386, 422-423
tests/core/test_shapes_smartart_utils.py (6)
11-66: LGTM: Well-designed test fixtures.The dummy dataclasses effectively mock the Excel COM API structure with appropriate type hints, frozen immutability for API objects, and exception-raising variants for error path testing. The N815 suppressions correctly accommodate Excel's PascalCase naming conventions.
68-81: LGTM: Comprehensive true/false case testing.The test correctly verifies both True and False return values for SmartArt detection using appropriate mock objects.
84-93: LGTM: Thorough exception handling verification.The test confirms that both API access errors and HasSmartArt property errors are handled gracefully by returning False.
96-109: LGTM: Complete layout name extraction coverage.The test validates all edge cases: None input, missing name attribute, and valid name extraction.
112-143: LGTM: Validates both collection and tree construction.The test comprehensively verifies node info collection (including nodes without text) and the hierarchical tree structure with parent-child relationships.
146-147: LGTM: Validates None input handling.The test confirms that passing None to
_collect_smartart_node_infosafely returns an empty list.
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @harumiWeb. * #30 (comment) The following files were modified: * `scripts/gen_json_schema.py` * `src/exstruct/core/shapes.py` * `src/exstruct/io/__init__.py` * `tests/com/test_shapes_extraction.py` * `tests/core/test_mode_output.py` * `tests/core/test_shapes_positions_dummy.py` * `tests/core/test_shapes_smartart_utils.py` * `tests/io/test_print_area_views.py` * `tests/models/test_models_export.py` * `tests/models/test_models_validation.py`
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.