I often find myself in data-modeling situations where the existing functions in rsample for setting up a proper assessment/analysis or test/train do not suffice.
Example: A multivariate regression problem, where numeric predictor data distributions are very frequent and centered around a region and only fewer observation are more distant, while the intention is to learn on all data especially effects when moving outside those frequent centered regions.
The risk of just learning the effect in the center by sampling sampling randomly test/train or assessment/analsis or even with some univariate stratification is high.Also the risk of getting inconsistent model performance results is higher.
I suggest to add functionality to rsample which has extended capability for sampling for these cases:
They ensure maximum coverage of data space for both test/train, resp. Assessment/analysis.
The problem is adressed by calibration sampling methods:
Have a look here for some:
https://cran.r-project.org/web/packages/prospectr/vignettes/prospectr.html#duplex-duplex
Literature:
I often find myself in data-modeling situations where the existing functions in rsample for setting up a proper assessment/analysis or test/train do not suffice.
Example: A multivariate regression problem, where numeric predictor data distributions are very frequent and centered around a region and only fewer observation are more distant, while the intention is to learn on all data especially effects when moving outside those frequent centered regions.
The risk of just learning the effect in the center by sampling sampling randomly test/train or assessment/analsis or even with some univariate stratification is high.Also the risk of getting inconsistent model performance results is higher.
I suggest to add functionality to rsample which has extended capability for sampling for these cases:
They ensure maximum coverage of data space for both test/train, resp. Assessment/analysis.
The problem is adressed by calibration sampling methods:
Have a look here for some:
https://cran.r-project.org/web/packages/prospectr/vignettes/prospectr.html#duplex-duplex
Literature: