Skip to content

Major refactor#53

Open
sahiljhawar wants to merge 132 commits into
mainfrom
bhaas/refactor_data_standard_strategies
Open

Major refactor#53
sahiljhawar wants to merge 132 commits into
mainfrom
bhaas/refactor_data_standard_strategies

Conversation

@sahiljhawar
Copy link
Copy Markdown
Contributor

This pull request makes significant improvements to the data standards and saving strategies within the el_paso package, focusing on modernization, code simplification, and clearer documentation. The main changes include replacing the legacy DataOrgStandard with the new GFZStandard, updating the data standard base class for better extensibility and type safety, refactoring the consistency checking logic, and cleaning up unused or outdated documentation and configuration.

Key changes include:

Data Standards Modernization

  • Removed the legacy DataOrgStandard and replaced it with the new, more extensible GFZStandard, updating all imports, documentation, and references accordingly. (el_paso/data_standards/data_org_standard.py removed, el_paso/data_standards/__init__.py, docs/API_reference/data_standards/gfz.md, docs/API_reference/overview.md, [1] [2] [3]
  • Refactored the DataStandard base class to use generics, improved type safety, and added a new VariableInfo structure for variable metadata. The standardization logic is now more generic and easier to extend. (el_paso/data_standard.py, el_paso/data_standard.pyL6-R109)

Consistency Checking Refactor

  • Simplified and generalized the consistency checking mechanism by introducing a flexible dimension-length tracking system, replacing the previous hardcoded checks for time, pitch angle, and energy dimensions. (el_paso/data_standard.py, el_paso/data_standard.pyL58-R164)

Saving Strategies and Documentation Updates

  • Renamed and updated saving strategy classes and documentation to use more descriptive and consistent names (e.g., MonthlyFileStrategy instead of MonthlyH5Strategy, GFZStrategy instead of DataOrgStrategy). (docs/API_reference/saving_strategies/gfz.md, docs/API_reference/saving_strategies/monthly.md, docs/API_reference/saving_strategies/data_org.md renamed, [1] [2]
  • Removed outdated or redundant documentation files, such as the monthly NetCDF strategy reference. (docs/API_reference/saving_strategies/monthly_netcdf.md, docs/API_reference/saving_strategies/monthly_netcdf.mdL1-L11)

Package Initialization and API Cleanup

  • Cleaned up the el_paso/__init__.py file by removing unnecessary imports, adding new ones for utils and typing, and updating the __all__ list to reflect the new public API. (el_paso/__init__.py, [1] [2] [3]

Developer Tooling

  • Removed the unused pyright-pretty pre-commit hook from the configuration. (.pre-commit-config.yaml, .pre-commit-config.yamlL23-L28)
  • ty is used as the main type checker

These changes modernize the codebase, improve maintainability, and provide a clearer and more extensible framework for data standards and saving strategies.

sahiljhawar and others added 30 commits May 4, 2026 18:30
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
… make sure mat and pickle files have the same data (fixes #45)
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- Add `MonthlyFileStrategy` with write and load dispatch methods for h5, nc, cdf, and mat
- Implement generic append/load/merge/write flow
- Preserve custom monthly NetCDF writer and dependencies
- Harden CDF metadata writing and append loading for empty attributes/variables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants