ExcelAlchemy began as a practical response to a recurring backend problem: Excel was the delivery format, but the real work was template control, validation, data normalization, and row-level feedback.
Over time, this repository became more than a utility library. It became a place to practice and demonstrate architecture decisions in public:
- how to evolve a codebase without rewriting it from scratch
- how to isolate framework churn behind adapters
- how to remove dependencies that no longer fit the problem
- how to expose extension points without making the API noisy
The project is built around one core belief:
Excel import/export is not a file problem first. It is a contract problem first.
The “file” is only the transport. The actual system has to answer harder questions:
- What is the expected shape of the data?
- Which fields are required?
- How should users discover valid input?
- Where should validation errors be written back?
- How do we keep backend code and spreadsheet semantics aligned?
- How do we avoid hard-wiring infrastructure choices into business logic?
ExcelAlchemy answers those questions with schema-driven design.
- Prefer explicit schemas over implicit conventions.
- Keep workbook metadata separate from validation-framework internals.
- Treat Excel as a contract, not a loosely structured blob.
- Keep the public API small and boring.
- Move complexity behind focused internal components.
- Prefer composition over giant coordinator classes.
- Put adapters at unstable integration boundaries.
- Depend on protocols where implementations can vary.
- Optimize for migration-friendly seams.
- Avoid hidden runtime magic.
- Make user-facing failures easy to understand.
- Keep architecture honest to the real problem domain.
- Remove dependencies that do not earn their cost.
- Use modern Python features where they reduce incidental complexity.
- Prefer typed contracts over stringly typed plumbing.
- Make storage a strategy, not a product lock-in.
- Keep tests focused on behavior and contracts.
- Modernize incrementally, not theatrically.
- Separate workbook display text from runtime error text.
- Let internationalization start with message boundaries, not with global complexity.
- Accept compatibility where it helps adoption, but isolate it.
- Document tradeoffs, not just outcomes.
- Build a library that teaches its own architecture.
ExcelAlchemy is intentionally a facade.
It exposes the user-facing workflow, but delegates internals to specialized components:
- schema extraction
- header parsing and validation
- row aggregation
- import execution
- rendering
- storage
This lets the public surface stay stable while the inside evolves.
FieldMetaInfo is the center of workbook metadata.
It knows about:
- labels
- ordering
- required-ness
- comments
- option mappings
- date and numeric display constraints
This metadata does not belong to Pydantic internals. That separation was critical for the Pydantic v2 migration.
The project used to be more tightly coupled to Pydantic implementation details. Today the approach is different:
- Pydantic models define structure
- ExcelAlchemy extracts model shape through a small adapter layer
- runtime Excel validation remains owned by ExcelAlchemy
This is not “anti-framework”; it is a boundary decision.
Minio is useful, but it is not the architecture.
The architecture is ExcelStorage.
That means the system can support:
- Minio-compatible object storage
- local file storage
- in-memory test doubles
- custom backends
without making those choices leak into the core workflow.
Runtime exceptions are aimed at developers and integrators. Workbook text is aimed at Excel users.
That is why the project now separates:
- runtime message lookup
- display message lookup
This is a small but meaningful design distinction.
The project now documents its locale behavior explicitly instead of leaving it as an implementation detail.
- runtime messages are English-first and stable for the 2.x line
- workbook display text supports
zh-CNanden - workbook display defaults to
zh-CN
That policy is written down in docs/locale.md, so users do not have to infer it from scattered examples.
The move to src/excelalchemy eliminated misleading import behavior from repository-root execution.
That change made packaging and test semantics more honest.
Before the v2 migration, the dangerous part was not syntax changes. It was the deeper coupling between Excel metadata and Pydantic field internals.
The metadata layer was pulled apart first. That reduced migration risk dramatically.
The migration replaced older patterns with:
model_fieldsmodel_validate- an adapter layer around field access
The key win was not just “support v2”. It was making future framework upgrades less invasive.
The codebase now uses:
typealiases- PEP 695 generic syntax in core places
- a tighter modern Python target
This was done after narrowing the support policy. The syntax decision followed the compatibility decision, not the other way around.
pandas was mostly acting as a transport layer, not as a data analysis engine.
Replacing it with openpyxl + WorksheetTable better matched the actual workload and removed a dependency chain the project did not need.
Minio support remains available, but the project no longer treats it as the only meaningful storage model. That shift makes the library more reusable and architecturally cleaner.
Internationalization was intentionally staged:
- unify runtime errors
- introduce a message layer
- move workbook display text onto locale-aware display messages
That sequence avoided premature framework complexity.
| Concern | Earlier coupling risk | Current design |
|---|---|---|
| Field access | direct dependence on internals | adapter over stable v2 APIs |
| Excel metadata | mixed with validation details | owned by FieldMetaInfo |
| Custom validation flow | framework-driven | explicitly orchestrated |
| Migration surface | wide | narrowed |
The important lesson is not “v2 is newer”. The important lesson is that framework upgrades are easier when the framework does not own the whole architecture.
This project does not need:
- joins
- groupby pipelines
- vectorized analysis
- multi-index machinery
It does need:
- deterministic workbook IO
- cell-level error positioning
- header semantics
- light table manipulation
So the code now uses a table abstraction that matches the problem. That is a better engineering fit.
The switch to uv was part of the broader modernization effort:
- faster setup
- simpler CI flow
- clearer local commands
- less tool sprawl
The build backend remains conservative (flit_core), while the workflow frontend is modern.
That was an intentional risk balance.
No design here is “free”. Some deliberate tradeoffs:
- The library favors explicit structure over maximum implicit flexibility.
- Workbook comments and labels are verbose by design because user guidance matters.
- The public API remains smaller than the set of available internal extension points.
- Compatibility is preserved where it reduces migration pain, but older patterns are gradually de-emphasized.
If you want the shortest path:
- Start with README.md
- Read docs/architecture.md
- Look at
src/excelalchemy/core/ - Then inspect tests under
tests/contracts/
That path shows both the architecture and the behavioral safety net.
ExcelAlchemy is intentionally opinionated. It is not trying to be every possible spreadsheet abstraction. Its goal is narrower and, because of that, stronger:
to make typed Excel workflows explicit, maintainable, and evolvable.