Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream by singalsu · Pull Request #10814 · thesofproject/sof

singalsu · 2026-05-26T15:46:04Z

This patch adds commits to previous VAD add PR

audio: mfcc: switch to source/sink API, int32 output, and DTX
base_fw: advertise BESPOKE codec for MFCC compress capture
audio: mfcc: update decode tools and add Python compress scripts
tools: topology: add MFCC compress capture for jack and DMIC

A kernel PR for encoder type ALSA controlx fix is needed to run this.

Add mfcc_vad module with A-weighted energy-based voice activity detection that operates on the Mel log spectrum produced by the MFCC component. The algorithm tracks a per-bin noise floor with instant-down and slow-rise behavior, then computes a weighted energy delta above the floor. Speech is declared when the delta exceeds a threshold (0.35 in Q9.23) with a 20-frame hangover to prevent rapid toggling. The VAD is gated on the new enable_vad flag in sof_mfcc_config. Add struct mfcc_data_header with six int32 fields (magic, frame_number, reserved, energy, noise_energy, vad_flag) prepended to every output frame in all format paths (S16, S24, S32). This replaces the previous magic-word-only header. The header carries the VAD decision and energy values from the DSP for downstream consumers. Extend sof_mfcc_config in user/mfcc.h with reserved16[3] padding for 32-bit alignment, and new boolean fields enable_vad, enable_dtx, update_controls, and reserved_bool[5]. The config blob size increases from 104 to 116 bytes. Update Matlab/Octave decode scripts (decode_mel.m, decode_ceps.m, decode_all.m) and setup_mfcc.m for the expanded header and config struct. Regenerate topology2 configuration blobs (default.conf, mel80.conf) with the new blob size. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Add sof_mel_to_text_live_dsp_vad.py that captures mel spectrogram frames from ALSA with embedded DSP VAD flag and performs live speech-to-text transcription using OpenVINO Whisper. The script buffers mel frames during speech and triggers Whisper inference when silence is detected after speech. Capture runs continuously in a separate thread during inference to avoid frame drops. Replace the old README.txt with a comprehensive README.md that documents the MFCC tuning tools, testbench usage with run_mfcc.sh, output file formats, Matlab/Octave decode and plotting scripts, and the new live transcription workflow. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Add IPC4 notification that sends the VAD state to user space via a switch control whenever the VAD decision changes between speech and silence. The notification is initialized during prepare and sent from the audio processing path on VAD state transitions. The implementation follows the TDFB/sound_dose notification pattern: mfcc_ipc4.c contains the IPC4-specific notification init and send functions, while mfcc.c provides weak stubs so IPC3 builds link without the IPC4 dependencies. Add handling for SOF_IPC4_SWITCH_CONTROL_PARAM_ID in mfcc_get_config and mfcc_set_config so the kernel driver can read back the current VAD state after receiving a notification. The switch control is read-only from the DSP side. Both the notification init and the VAD state change detection are gated on the update_controls flag in the configuration blob struct. Add a switch control (mixer) to the MFCC topology2 widget definition for the VAD notification. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

mfcc_reset() did not free buffers allocated by mfcc_setup(), so a stop->reset->prepare->start cycle would leak all MFCC allocations (FFT buffers, mel filterbank, DCT matrix, lifter, VAD buffers). This patch fixes the issue by calling mfcc_free_buffers() from mfcc_reset(). The pointers are set to NULL after free via a helper function mfcc_free_and_null(), so mfcc_free() won't double-free when it calls mfcc_free_buffers() again later. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Switch from process_audio_stream to source/sink API. Add compress PCM output mode (variable-size frames, no zero padding) alongside legacy mode (full period with zero-fill). Unify all output to int32 Q9.23 regardless of source format. Remove out_data_ptr_32, mel_spectra int16 copy, mfcc_func typedef, and per-format output functions from mfcc_common/hifi3/hifi4. Add DTX for compress mode: suppress silence frames after configurable trailing count, with optional periodic keepalive. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Register SND_AUDIOCODEC_BESPOKE capture in codec info TLV when CONFIG_COMP_MFCC is enabled so the kernel detects compress capture support via IPC4_SOF_CODEC_INFO. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Update Octave decode scripts for int32 Q9.23 output and DTX gap filling. Add DTX blob generation to setup_mfcc.m. Add Python compress capture tools: sof_mel_spectrogram_compress.py, sof_ceps_spectrogram_compress.py, sof_mel_to_text_live_compress.py. Refactor sof_mel_to_text_live_dsp_vad.py to use shared compress capture code. Add README with usage examples. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Add sdw-jack-audio-feature-compress.conf (PCM 48, pipeline 132) and sdw-dmic-audio-feature-compress.conf (PCM 49, pipeline 133) for compress MFCC capture with DTX blobs. Fix buffer sizes: set MFCC obs and host-copier ibs/obs to 344 bytes (24-byte header + 80 x int32). Add mel and ceps compress topology targets for MTL and ARL. Rename normal MFCC topologies to *-mfcc-mel-normal for clarity. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

singalsu · 2026-05-26T16:33:10Z

Note: To run the MFCC compress topologies, need kernel patches thesofproject/linux#5647 and thesofproject/linux#5789.

singalsu · 2026-05-26T16:38:32Z

+ *
+ * \return Number of bytes actually written (limited by max_bytes).
+ */
+static size_t mfcc_sink_write_bytes(uint8_t **dst, uint8_t *buf_start,


to mfcc_common.c

singalsu · 2026-05-26T16:40:09Z

+		return -ENODATA;
+
+	/* Copy input audio from source to MFCC internal circular buffer */
+	cd->source_func(sources[0], &state->buf, &state->emph, frames, state->source_channel);


this has grown far too big, it should be in multiple smaller functions in mfcc_common.c

singalsu · 2026-05-26T16:41:14Z

+
 #if CONFIG_FORMAT_S16LE
-void mfcc_source_copy_s16(struct input_stream_buffer *bsource, struct mfcc_buffer *buf,
+void mfcc_source_copy_s16(struct sof_source *source, struct mfcc_buffer *buf,


these should be in mfcc_common.c while there are no xtensa HiFi versions.

singalsu added 8 commits May 25, 2026 20:31

base_fw: advertise BESPOKE codec for MFCC compress capture

d9e590a

Register SND_AUDIOCODEC_BESPOKE capture in codec info TLV when CONFIG_COMP_MFCC is enabled so the kernel detects compress capture support via IPC4_SOF_CODEC_INFO. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

singalsu changed the title q Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream May 26, 2026

singalsu mentioned this pull request May 26, 2026

ASoC: dapm: Add encoder and decoder widget types to kcontrol handling thesofproject/linux#5789

Open

singalsu commented May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream#10814

Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream#10814
singalsu wants to merge 8 commits into
thesofproject:mainfrom
singalsu:mfcc_compress_encoder

singalsu commented May 26, 2026 •

edited

Loading

Uh oh!

singalsu commented May 26, 2026

Uh oh!

singalsu May 26, 2026

Uh oh!

singalsu May 26, 2026

Uh oh!

singalsu May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

singalsu commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

singalsu commented May 26, 2026

Uh oh!

singalsu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

singalsu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

singalsu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

singalsu commented May 26, 2026 •

edited

Loading