I guess I need to force it onto GPU. But how to do so?
(base) nvidia@gx10-9074:~/Development/vibevoice.cpp$ time ./build/bin/vibevoice-cli asr --model models/vibevoice-asr-q4_k.gguf --tokenizer models/tokenizer.gguf --audio 2p_argument.wav
asr: loaded 1644800 samples (68.53s)
[vv I] backend: CPU
[vv I] loaded models/vibevoice-asr-q4_k.gguf: 1177 tensors, 22 kv (backend=CPU)
[vv I] vibevoice_load: hidden=3584 layers=28+0 vocab=152064 scaling=0.0000 bias=0.0000
[vv I] loaded models/tokenizer.gguf: 0 tensors, 13 kv (no tensor data)
[vv I] Tokenizer: loaded 151665 tokens, 151387 merges, 14 special
[{"Start":0,"End":12.24,"Speaker":0,"Content":"I can't believe you did it again. I waited for two hours. Two hours! Not a single call, not a text. Do you have any idea how embarrassing that was? Just sitting there alone?"},{"Start":12.4,"End":23.17,"Speaker":1,"Content":"Look, I know, I'm sorry, alright? Work was a complete nightmare. My boss dropped a critical deadline on me at the last minute. I didn't even have a second to breathe, let alone check my phone."},{"Start":23.17,"End":34.24,"Speaker":0,"Content":"A nightmare? That's the same excuse you used last time. I'm starting to think you just don't care. It's easier to say work was crazy than to just admit that I'm not a priority for you anymore."},{"Start":34.24,"End":45.49,"Speaker":1,"Content":"That's not fair. Of course you're a priority. You think I enjoyed being stuck in that office, drowning in spreadsheets while knowing I was letting you down?
asr: timing load=4.6s inference=180.9s audio=68.5s RTF=2.640
real 3m6.492s
user 5m56.223s
sys 3m0.181s
(base) nvidia@gx10-9074:~/Development/vibevoice.cpp$
I guess I need to force it onto GPU. But how to do so?