Skip to content

003.12.2 Estimation - New Recalculation Flow #380

@howardwkim

Description

@howardwkim

Recalculation Logic

  • Remove the rule that blocks recalculation for model and satellite datasets.
  • Recalculation flow:
    1. If row counts can be estimated, then compute them using our estimation function
    2. If not computable and the dataset is cluster-only (tblDatasetsServer), skip it.
    3. Otherwise, run the standard row-count calculation.

Estimation Logic and Calculation

But all the tables that we need for this calculation should be added to controllers/catalog/fullCatalogDb.js
mapSpatialResolutionToNumber/mapTemporalResolutionToNumber should be created on the backend. Rather than hardcoding it, we should have a a function that calculates from the resolution to the number.

cmap-react/src/Components/Visualization/ControlPanel/estimateDataSize.js - existing code. We have a rule where we try not to touch legacy code. So the new estimation function that we create, we'll take inspiration from this old one, but we'll create a new one and put it in the shared directory.

  • Spatial resolution: if not irregular, use existing uniform-grid logic. tblSpatial_Resolutions has columns ID and Spatial_Resolution
Image
const mapSpatialResolutionToNumber = (resolution) => {
  let map = {
    [spatialResolutions.halfDegree]: 0.5,
    [spatialResolutions.quarterDegree]: 0.25,
    [spatialResolutions.twentyFifthDegree]: 0.04,
    [spatialResolutions.fourKm]: 0.041672,
    [spatialResolutions.twelfthDegree]: 0.083333,
    [spatialResolutions.oneDegree]: 1,
    [spatialResolutions.seventyKm]: 0.25,
    [spatialResolutions.nineKm]: 0.083333,
    [spatialResolutions.twentyFiveKm]: 0.23148,
    [spatialResolutions.fortyEighthDegree]: 0.020833333,
  };

  return map[resolution];
};
  • Temporal resolution: use existing interval logic for values like 3-day or 8-day; investigate. tblTemporal_Resolutions has columns ID and Temporal_Resolution
Image
const mapTemporalResolutionToNumber = (resolution) => {
  let map = {
    [temporalResolutions.threeMinutes]: 1,
    [temporalResolutions.sixHourly]: 1,
    [temporalResolutions.daily]: 1,
    [temporalResolutions.weekly]: 7,
    [temporalResolutions.monthly]: 30,
    [temporalResolutions.annual]: 365,
    [temporalResolutions.irregular]: null,
    [temporalResolutions.monthlyClimatology]: 30,
    [temporalResolutions.threeDay]: 3,
    [temporalResolutions.eightDayRunning]: 8,
    [temporalResolutions.eightDays]: 8,
  };

  return map[resolution];
};
  • Depth resolution: use depth-bin metadata from tblDarwin_Depth and tblPisces_Depth for Darwin and Pisces. Both are single column tables, column name depth_level. These are the only two datasets with depth-bin data. Otherwise, depth will require standard row-count calculation.

Cluster Logic

  • Cluster-only datasets are determined using tblDataset_Servers i.e. only server type is cluster. This should be added to the backend only.
    The following returns the dataset IDs whose entries in tblDataset_Servers use only the server alias ‘cluster’ and no others.
SELECT Dataset_ID
FROM tblDataset_Servers
GROUP BY Dataset_ID
HAVING 
    COUNT(*) = SUM(CASE WHEN Server_Alias = 'cluster' THEN 1 ELSE 0 END);

Other

  • Review current query-time counting logic for simplification or optimization.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions