-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Description
测试设备:骁龙8gen3 (Android)
测试模型:sense-voice-small-q4_0.gguf
在Linux上交叉编译成功,将可执行文件、模型文件和.wav文件拷贝到安卓设备中
在CPU上可以运行,但准确率非常差(语音识别、语种检测、情感识别、事件识别均有错误)
推理选项不加-ng 使其运行在OpenCL上(编译时有链接OpenCL库),推理崩溃
./sense-voice-main -m /sdcard/models/sense-voice-small-q4_0.gguf -f ./yuanshen.wav -t 4 -ng
sense_voice_small_init_from_file_with_params_no_state: loading model from '/sdcard/models/sense-voice-small-q4_0.gguf'
sense_voice_init_with_params_no_state: use gpu = 1
sense_voice_init_with_params_no_state: flash attn = 0
sense_voice_init_with_params_no_state: gpu_device = 0
ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) 750'
ggml_opencl: device OpenCL version: OpenCL 3.0 Adreno(TM) 750
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.16
ggml_opencl: vector subgroup broadcast support: false
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 1024
ggml_opencl: max mem alloc size: 1024 MB
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: true
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: true
ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
sense_voice_init_with_params_no_state: devices = 2
sense_voice_init_with_params_no_state: backends = 2
sense_voice_model_load: version: 3
sense_voice_model_load: alignment: 32
sense_voice_model_load: data offset: 423680
sense_voice_model_load: loading model
sense_voice_model_load: n_vocab = 25055
sense_voice_model_load: n_encoder_hidden_state = 512
sense_voice_model_load: n_encoder_linear_units = 2048
sense_voice_model_load: n_encoder_attention_heads = 4
sense_voice_model_load: n_encoder_layers = 50
sense_voice_model_load: n_mels = 80
sense_voice_model_load: ftype = 2
sense_voice_model_load: vocab[25055] loaded
sense_voice_default_buffer_type: using device GPUOpenCL (QUALCOMM Adreno(TM) 750)
sense_voice_model_load: OpenCL total size = 181.86 MB
sense_voice_model_load: n_tensors: 1212
sense_voice_model_load: load SenseVoiceSmall takes 3.558000 second
sense_voice_backend_init_gpu: using GPUOpenCL backend
sense_voice_backend_init: using CPU backend
sense_voice_init_state: kv pad size = 3.67 MB
Segmentation fault (core dumped)
问题一:在CPU上的运行效果符合预期吗?换成FP32模型同样识别错误非常多。
问题二:看ggml里是支持OpenCL后端的,为什么使用OpenCL会崩溃呢?
Metadata
Metadata
Assignees
Labels
No labels
