You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This epic serves as the parent issue for all work related to implementing and documenting the data ingest process for each provider (EMSL, JGI, NMDC, ESS-DIVE). All subtasks and deliverables under this epic must comply with the standardized folder structure, file naming conventions, and data validation requirements outlined in issue #9.
Scope
Coordinate the creation and population of the ingest directory with subfolders for each data provider.
Ensure all ingest files conform to the latest release schema.
Enforce splitting of files to limit each to ~25 MB, with no record spanning multiple files.
Require all data files to be formatted as JSON lists (or the agreed format as documented).
Apply the standardized naming convention: _<padded 5 number>.json (e.g., emsl_00001.json).
Document the folder structure, file formats, splitting strategy, and any tools/scripts used for ETL (to be placed in contrib/).
All implementation and documentation tasks related to these requirements should be tracked as subtasks under this epic.
Acceptance Criteria
All subtasks necessary to implement the ingest process and documentation are completed.
Overview
This epic serves as the parent issue for all work related to implementing and documenting the data ingest process for each provider (EMSL, JGI, NMDC, ESS-DIVE). All subtasks and deliverables under this epic must comply with the standardized folder structure, file naming conventions, and data validation requirements outlined in issue #9.
Scope
Acceptance Criteria