Skip to content

Build MVP datasets: CSV sources in data-raw → exported data objects + Parquet #2

@dlebauer

Description

@dlebauer

Goal

  • Add the MVP datasets to the package so users can load them as exported data objects (per r-pkgs data guidance),
    with CSV sources versioned in git and Parquet artifacts for distribution/performance.

Scope (MVP)

  • Primary exported dataset: traitsview (packaged name for traits_and_yields_view).
  • Supporting metadata tables required to interpret records (variables, sites, species, citations, treatments, cultivars,
    entities, methods, PFT mappings, priors). Exact table list is already known from prior scoping; do not re-argue it.

Tasks

  • Place editable sources under data-raw/csv/:
    • data-raw/csv/traitsview.csv (traits_and_yields_view exported as CSV or recreated with joins)
    • data-raw/csv/<support_table>.csv (support tables)
  • Create data-raw/make-data.R that:
    • reads CSVs (stable column typing)
    • writes exported data objects via usethis::use_data(..., overwrite = TRUE)
      • exported objects: traitsview, plus each support table as its own object (e.g., variables, sites, etc.)
    • writes Parquet copies under inst/extdata/parquet/ (e.g., inst/extdata/parquet/traitsview.parquet)
  • Ensure exported objects load with library(betydata); head(traitsview).

Acceptance criteria

  • After installation, users can do:
    • library(betydata); head(traitsview)
    • library(betydata); head(variables) (and other support tables)
  • data-raw/csv/ contains the CSV sources committed to git.
  • inst/extdata/parquet/ contains Parquet artifacts for all shipped datasets.
  • A clean rebuild path exists: running source('data-raw/make-data.R') regenerates data objects and Parquet.

Non-goals

  • No accessor-function API (avoid bety_tbl() style).

Metadata

Metadata

Assignees

Labels

dataData build, formats, objectsmvpMinimum viable product

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions