-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
I took a closer look at plot_xy(), and here are some suggestions for improvement.
- plot_xy() currently queries data from two data sources. Maybe we can make it handle more than two, and produce a (n choose 2) scatterplots, just like as in plot() on a matrix with more than 2 columns.
- Big issue Since it uses a simple merge(), it deletes a lot of points when the two datasets have different time resolutions. The main problem is that it does an exact match of dates in the merge, i.e. an inner join. So even if both dataset had regular weekly measurements, if one took measurements on Sundays and one took measurements on Mondays, the merge() would produce an empty dataset. Here's an example:
dt1 = '2015-01-01'
dt2 = '2015-05-30'
lat1 = 28
lat2 = 38
lon1 = -71
lon2 = -50
depth1 = 0
depth2 = 100
par(mfrow=c(3,1))
dat.chl = get_timeseries(tableName = "tblCHL_REP", varName = 'CHl',
lat1, lat2, lon1, lon2, dt1, dt2, depth1, depth2)
dat.sst = get_timeseries(tableName = "tblSST_AVHRR_OI_NRT", varName = 'SST',
lat1, lat2, lon1, lon2, dt1, dt2, depth1, depth2)
dat.sss = get_timeseries(tableName = "tblSSS_NRT", varName = 'SSS',
lat1, lat2, lon1, lon2, dt1, dt2, depth1, depth2)
print(dim(dat.chl))
print(dim(dat.sss))
dat = dat.sss %>% inner_join(dat.chl, by="time")
print(dim(dat))
plot_xy(tableList = c("tblSSS_NRT", "tblCHL_REP"),
varList = c('SSS', "CHl"),
agg_var = "time",
lat1, lat2, lon1, lon2, dt1, dt2, depth1, depth2)
## Here, you can see that the each separte dataset has 19 and 61 rows each,
## but the combined one only has 7 because there are only 7 exactly
## overlapping rows.- How to overcome this difference in data resolution (that is, if the user wants this): instead of using a simple merge() for exact date matches, we can imagine making a synthetic cruise track for the desired time/lat/lon range, and then performing colocalization. Since colocalization uses time/lat/lon boxes for averaging each dataset over the trajectory of the cruise, it will certainly not discard the data whose time points don't exactly match.
- This new function can be called plot_xy_approx() or plot_xy_smooth() something.
- The existing plot_xy() can be called plot_xy_exact() or plot_xy_exact_match().
- I think the plotting part of plot_xy() is still useful to the user.
Metadata
Metadata
Assignees
Labels
No labels