Enable OpenVINO model caching#118
Conversation
|
OpenVINO model caching effects all models and is an overdue change to handle automatically. Good find. I think the best way is to auto inject the
LMK if you have any questions. Don't worry, most of the work for this one is testing to see what happens in all the cases for each To be clear, like you found we expect first load time to be long; and subsequent loads to be slow, and discover what happens when you play with settings, but have not cleared the cached model files. Also I like the env approach- we need to start making openarc more container friendly. Can you add a toggle to shut the auto cache behavior off (to save fighting configs if this ever breaks) Again, great find and thanks for the PR |
|
When you say I'm hesitant to place the cache beside the model. This means that the model itself can no longer be stored on a read-only filesystem. Instead I'd propose this: allow the user to specify a cache directory, then place then load/store the compiled model in a model-specific subdirectory of this path. For example, if the user specifies
I don't think that this is strictly true, rather, I think that compiled models are specific to |
Yes, exactly.
OK that framing of how to organize caching makes much more sense. in this case
During JIT creation of model cache hardware features are considered in ways that aren't obvious, which can change if your device supports different data types or instruction sets. In general there is a plugin system with it's own compile behavior. Most concerning is this line from the docs
Ok so you are imagining a scenario where we compile once and reuse- I think that should work; however changing device setting for an already compiled cache will cause the runtime to build a new cache. from above link: "Cache files can be reused within the same Model Server version, target device, hardware, model and the model shape parameters. The Model Server, automatically detects if the cache is present and re-generates new cache files when required." Unfortunately this assertion seems underspecified vs what I have observed in practice, so LMK what you find out. I will do some tests later tonight and report back. |
|
All this sounds great, thanks! FYI I have been building the cache on one machine and running on another (absolutely identical hardware and software with basically every single dependency version pinned from the OS up) with great success. With this setup I can even update the cache while openarc is running in most cases, and then have a fast restart to load it. It should also supports multiple openarc instances on multiple identical nodes, though I haven't rolled this out yet. |
38860c3 to
92e90bb
Compare
|
Ran through a series of tests covering edge cases. Here's what I found:
@SearchSavior I probably need a minor doc update, but do you see any other issues with this work given the above? |
Signed-off-by: solidDoWant <fred.heinecke@yahoo.com>
Signed-off-by: solidDoWant <fred.heinecke@yahoo.com>
Signed-off-by: solidDoWant <fred.heinecke@yahoo.com>
Signed-off-by: solidDoWant <fred.heinecke@yahoo.com>
Signed-off-by: solidDoWant <fred.heinecke@yahoo.com>
Signed-off-by: solidDoWant <fred.heinecke@yahoo.com>
412630a to
bfb4e5b
Compare
Signed-off-by: solidDoWant <fred.heinecke@yahoo.com>
When the
OPENARC_OV_CACHE_DIRvar is set, compiled models will now be cached/restored from cache. This results in an enormous decrease in peak memory requirements after first startup.