-
Notifications
You must be signed in to change notification settings - Fork 0
Implement v0.2 configuration format with date variables and comprehensive documentation #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements create_date_var() with distribution options and prop_invalid parameter for all generators. **Core date functionality:** - Parse SAS date formats (YYMMDD10., DATE9., etc.) with support for ranges - Generate Date objects within specified periods - Three distribution types: - uniform: Flat distribution across range (default) - gompertz: Gompertz survival distribution (mortality patterns) - exponential: Exponential distribution (early events) - Support for prop_NA (missing dates using NA codes) **prop_invalid parameter (new):** - Added to create_date_var(), create_cat_var(), create_con_var() - Generates invalid/out-of-range values for testing validation pipelines - Date variables: 1-5 years before/after valid range - Categorical: values outside defined categories - Continuous: values outside min/max range - Critical for testing data quality workflows **Parser enhancements:** - parse_sas_date_format(): Extract min/max dates from SAS formats - Support for MDY, YMD, DATE formats with various widths - Handle period ranges like '01JAN2001'd to '31MAR2017'd **Documentation:** - New dates.qmd vignette with comprehensive examples - Distribution comparisons and use cases - prop_invalid demonstrations for all variable types - Updated function documentation with @family tags **Testing:** - 250 tests passing (61 new tests added) - Coverage for all distributions - prop_invalid edge cases - Date parsing validation **Vignette improvements:** - Apply devtools::load_all() pattern to cchs/chms/demport examples - Consistent with PR #7 vignette standards
…prehensive documentation
This PR implements v0.2 configuration format with date variables, garbage data support, and comprehensive documentation restructuring following the Divio framework.
## Overview
Implements date variable support for DemPoRT v2, along with garbage data generation for validation testing, and major documentation improvements.
## Key Changes
### New Features:
- **Date variable generation** with `create_date_var()` supporting three distributions (uniform, gompertz, exponential)
- **Survival analysis** with `create_survival_dates()` for cohort studies with temporal ordering
- **Garbage data support** via `prop_invalid` parameter across all variable types for validation testing
- **v0.2 configuration format** - expanded for dates and garbage data specifications
- **rType field** for proper R type coercion (factor, integer, numeric, Date)
- **Proportion-based generation** for all variable types via `determine_proportions()`
### Documentation (Divio Framework):
- **Tutorials**: getting-started, tutorial-config-files, tutorial-dates, tutorial-missing-data, tutorial-garbage-data
- **How-to guides**: cchs-example, chms-example, demport-example
- **Explanation**: dates, advanced-topics
- **Reference**: reference-config
- Updated all vignette frontmatter (authors, callouts, next steps)
- Standardized heading capitalization (sentence case)
- Added "recodeflow universe" and Statistics Canada acknowledgements to README
- **Clarified mock vs synthetic data distinctions** with appropriate use cases and limitations
### Package Quality:
- Add renv for reproducible package management (R >= 4.2.0)
- Update DESCRIPTION: Juan Li as author, recodeflow contributors
- Add `.claude/AI.md` for project-specific AI development guidelines
- Add CONTRIBUTING.md with pkgdown build instructions
- Improve pkgdown reference page section descriptions
- **Add Quarto-style callout CSS** for native callout syntax with proper styling in pkgdown
### GitHub Actions:
- Automated pkgdown deployment via GitHub Actions
- Deploy `main` branch → root (/)
- Deploy `create-date-var` branch → /dev
- Deploy other branches → /preview/{branch-name}
- **Fix locale issue** (en_CA → en_US.UTF-8) for Ubuntu compatibility
- **Use r-lib/actions/setup-r-dependencies@v2** for dependency management
- Generate documentation with roxygen2::roxygenize() before build
### Bug Fixes:
- Fixes #5 - Improved 'else' handling in recEnd rules
- Fixed continuous variable missing codes to generate proper numeric values
- Fixed locale issues in date parsing for cross-platform compatibility
### Breaking Changes
⚠️ Configuration format updated to v0.2 - requires additional fields:
- `uid` - unique identifier for each variable configuration
- `rType` - R type specification (factor, integer, numeric, Date)
- `proportion` - distribution weights for categorical values
- Date variables require `role` containing "date" and `rType = "Date"`
- Date variables use `variableType = "Continuous"` (for recodeflow compatibility)
## Files Changed
- R/create_date_var.R - locale fix for cross-platform compatibility
- R/create_survival_dates.R - v0.2 format implementation
- .github/workflows/pkgdown.yaml - comprehensive workflow improvements
- pkgdown/extra.css - NEW: Quarto-style callout styling
- _pkgdown.yml - add CSS include, improve reference sections
- README.md - comprehensive mock data documentation
- vignettes/*.qmd - updated all 11 vignettes
- .claude/AI.md - documented debugging lessons learned
All code examples tested and verified in executable vignettes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
bf43e8d to
d371c99
Compare
|
In your README or somewhere in the repo and section or file that describes the organization of the repo indicating what goes where and why. Specially, describe the purpose of the scripts folder and what goes there. |
See this commit |
This PR implements v0.2 configuration format with date variables, garbage data support, and documentation restructuring.
This PR replaces PR #8, which was deleted when the master branch was renamed to main during GH pages initialization.
Current review priority
The main goal of this PR is to get the code working for use by @karimhalal, @rafdoodle, and @caitlink12.
For this purpose, the DemPoRT example in the vignette is most helpful. Options to review include:
Future review
This PR includes several features to support the creation of mock data that require discussion and review for consideration within the overall recodeflow universe. Specifically, how should recodeflow support data types such as
datesandintegers? They fall withincontinuousandcategoricaldata types -- the only types currently supported.Issues that arose during the development of this PR included the generation of mock:
dateor other temporal data types. This necessitated the creation ofsourceDataandrType, but other potential solutions exist for more generalized support of an expanded range of data types.Overview
Implements date variable support for DemPoRT v2, along with garbage data generation for validation testing, and major documentation improvements.
Key Changes
New Features:
create_date_var()supporting three distributions (uniform, gompertz, exponential)create_survival_dates()for cohort studies with temporal orderingprop_invalidparameter across all variable types for validation testingdetermine_proportions()Documentation (Divio Framework):
Package Quality:
.claude/AI.mdfor project-specific AI development guidelinesGitHub Actions:
mainbranch → root (/)create-date-varbranch → /devBug Fixes:
Breaking Changes
uid- unique identifier for each variable configurationrType- R type specification (factor, integer, numeric, Date)proportion- distribution weights for categorical valuesrolecontaining "date" andrType = "Date"variableType = "Continuous"(for recodeflow compatibility)Preview
Related
All code examples tested and verified in executable vignettes.