Data deduplication between IntakeESGFDataSource and LocalDataSource has not yet been implemented. It would be nice to implement this. The issue was very clearly described by @schlunma in this comment:
But the note says that "Deduplicating data found via esmvalcore.io.intake_esgf data sources and the esmvalcore.io.local data sources has not yet been implemented"?
I think the problem here is that file.name is different for local and ESGF files:
Local file:
print(file) # /work/ik1017/CMIP6/data/CMIP6/CMIP/BCC/BCC-ESM1/1pctCO2/r1i1p1f1/fx/areacella/gn/v20190613/areacella_fx_BCC-ESM1_1pctCO2_r1i1p1f1_gn.nc
print(file.name) # areacella_fx_BCC-ESM1_1pctCO2_r1i1p1f1_gn.nc
ESGF file:
print(file) # IntakeESGFDataset(name='CMIP6.CMIP.BCC.BCC-ESM1.1pctCO2.r1i1p1f1.fx.areacella.gn')
print(file.name) # CMIP6.CMIP.BCC.BCC-ESM1.1pctCO2.r1i1p1f1.fx.areacella.gn
Originally posted by @schlunma in #2936
Data deduplication between IntakeESGFDataSource and LocalDataSource has not yet been implemented. It would be nice to implement this. The issue was very clearly described by @schlunma in this comment:
Originally posted by @schlunma in #2936