The file appears to contain duplicate entries, missing structure annotations, and a likely misalignment between compound names/RTs and PubChem fields.
Observed Issues
1. Possible annotation shift in later rows
Rows 0052_00105-0052_00137 appear suspicious.
The compound name and rt fields seem to repeat earlier compounds, but the structure annotations appear to be shifted/misaligned.
Example:
0052_00105
- name:
Myo-inositol
- rt:
1.1
- formula:
C22H18O11
- PubChem CID:
65064
- InChIKey: starts with
WMBWRE...
These structure fields appear to correspond to (-)-Epigallocatechin gallate, not Myo-inositol.
Similar mismatches repeat through 0052_00137.
2. Duplicate-related comments are widespread
The comment column contains many duplicate-related notes:
doublet: 62 rows
removed another duplicate entry: 69 rows
potential duplicate: 4 rows
standardized from inchi; removed another duplicate entry: 1 row
It is unclear which entries are intended biological/analytical doublets and which should be removed or consolidated.
Please Check
Could you please check whether:
- rows
0052_00105-0052_00137 have shifted PubChem annotations,
- the
doublet / potential duplicate / removed another duplicate entry labels reflect the intended final state,
Thanks for maintaining RepoRT. This dataset is very useful for retention time prediction benchmarking.
The file appears to contain duplicate entries, missing structure annotations, and a likely misalignment between compound names/RTs and PubChem fields.
Observed Issues
1. Possible annotation shift in later rows
Rows
0052_00105-0052_00137appear suspicious.The compound
nameandrtfields seem to repeat earlier compounds, but the structure annotations appear to be shifted/misaligned.Example:
0052_00105Myo-inositol1.1C22H18O1165064WMBWRE...These structure fields appear to correspond to
(-)-Epigallocatechin gallate, notMyo-inositol.Similar mismatches repeat through
0052_00137.2. Duplicate-related comments are widespread
The
commentcolumn contains many duplicate-related notes:doublet: 62 rowsremoved another duplicate entry: 69 rowspotential duplicate: 4 rowsstandardized from inchi; removed another duplicate entry: 1 rowIt is unclear which entries are intended biological/analytical doublets and which should be removed or consolidated.
Please Check
Could you please check whether:
0052_00105-0052_00137have shifted PubChem annotations,doublet/potential duplicate/removed another duplicate entrylabels reflect the intended final state,Thanks for maintaining RepoRT. This dataset is very useful for retention time prediction benchmarking.