You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
develop-InMem- Branch with IO into pandas dataframe direct from uci, with minimal (or no) write-backs during execution?
Goals:
Be able to send hsp2 all needed inputs from CSV(s), omitting the UCI
Eliminate the requirement of the hdf5 store for all data, allowing file storage instead?
Code:
Tasks:
Code the class IOManagerDF as child of IOManager
Insure all IOManager classes are compatible with the get_timeseries() function (a standalone function in the hsp2.hsp2.utilities file)
read_ts: The get_timeseries() function lives outside of the IOManager class but expects an IOManager interface compatible object containing a read_ts method, thus, by sub-classing read_ts on the IOManagerDF we can make them seamlessly integrate and eliminate the redundant function in HSP2UtilitiesInMem.py
_get_in_memory() must read the pandas df (should also behave exactly same as hdf5 which already caches in the object?)
Falls back on reading from the hdf5, but this should instead look at the self._input, which is the object passed in as io_combined at class creation.
Merge duplicate code into HSP2UtilitiesInMem.py, hsp2/main.py, hsp2io/io.py, and hsp2tools/readUCI.py
Run test scripts
PR
Questions:
Test script/UCI:
Should readUCIinMem.py replace readUCI.py entirely? (looks like it does)
Redundant/unused hdf5 class: There are classes named HDF5 defined in two places:
IO class (read hdf) allows caching of timeseries, so that there is no need to reload a series that has already been loaded from disk.
KO class does not currently have the ability to store data in memory, save_timeseries pushes the data to disk, and then, that seems to incur large overhead.
@PaulDudaRESPEC @timcera
develop-InMem- Branch with IO into pandas dataframe direct from uci, with minimal (or no) write-backs during execution?IOManagerDFas child ofIOManagerIOManagerclasses are compatible with theget_timeseries()function (a standalone function in thehsp2.hsp2.utilitiesfile)read_ts: Theget_timeseries()function lives outside of theIOManagerclass but expects anIOManagerinterface compatible object containing aread_tsmethod, thus, by sub-classingread_tson theIOManagerDFwe can make them seamlessly integrate and eliminate the redundant function inHSP2UtilitiesInMem.py_get_in_memory()must read the pandas df (should also behave exactly same as hdf5 which already caches in the object?)self._input, which is the object passed in asio_combinedat class creation.HSP2UtilitiesInMem.py,hsp2/main.py,hsp2io/io.py, andhsp2tools/readUCI.pyreadUCIinMem.pyreplacereadUCI.pyentirely? (looks like it does)HDF5defined in two places:src/hsp2/hsp2io/hdf.pysrc/hsp2/hsp2tools/HDF5.pyfgrep -iR "import from hsp2tools" src/*|grep hdf -iCode development
Caching
save_timeseriespushes the data to disk, and then, that seems to incur large overhead.Documentation
write_ts,read_tswrite_ts()HSPsquared/src/hsp2/hsp2io/io.py
Line 58 in 6f40cc3