Skip to content

[BCI] QVAC-17071 feat: add BCI neural signal support (variable conv1 kernel + windowed attention)#10

Open
sharmaraju352 wants to merge 7 commits intomasterfrom
feat/bci-patches-v184
Open

[BCI] QVAC-17071 feat: add BCI neural signal support (variable conv1 kernel + windowed attention)#10
sharmaraju352 wants to merge 7 commits intomasterfrom
feat/bci-patches-v184

Conversation

@sharmaraju352
Copy link
Copy Markdown

@sharmaraju352 sharmaraju352 commented Apr 16, 2026

Summary

Adds two changes to whisper.cpp to support brain-computer interface (BCI) neural signal transcription. Based on v1.8.4.1.

1. Variable conv1 kernel size

  • Reads n_audio_conv1_kernel from model hparams (defaults to 3 for standard whisper models)
  • Allows BCI models to use a different first convolution kernel size

2. Windowed self-attention for encoder layers

  • Adds n_audio_window_size and n_audio_last_window_layer hparams
  • When present, encoder self-attention is restricted to a local window for layers up to last_window_layer
  • When windowed attention is active, the encoder bypasses flash attention and uses the standard softmax path (Metal flash attention does not support custom F32 masks)
  • Flash attention remains enabled for non-BCI models and for the decoder
  • Adds proper SOS token (language + transcribe) initialization for BCI models

Backward compatibility

Both changes are backward-compatible:

  • n_audio_conv1_kernel defaults to 3 (standard whisper behavior)
  • n_audio_window_size defaults to 0 and n_audio_last_window_layer defaults to -1, which disables windowed attention entirely
  • Standard whisper models are completely unaffected

Context

Required by the new @qvac/bci-whispercpp addon: tetherto/qvac#1583

Test plan

  • Standard whisper transcription still works (no regression)
  • BCI model loads and transcribes neural signals correctly on v1.8.4.1
  • Verified locally: 10.4% average WER across 5 BCI test samples (identical to v1.7.6 baseline)

Test results

BCI package (@qvac/bci-whispercpp)

  • 4/4 integration tests pass (9/9 assertions)
  • 10.4% average WER across 5 neural signal samples — identical output to v1.7.6 baseline

Standard whisper package (@qvac/transcription-whispercpp)

  • Integration: 10/10 tests pass (40/40 assertions)
  • Unit: 13/13 tests pass (57/57 assertions)
  • No regression in standard audio transcription (s16le + f32le formats verified)

@sharmaraju352 sharmaraju352 requested review from a team as code owners April 16, 2026 10:22
Raju added 5 commits April 16, 2026 19:57
Read n_audio_conv1_kernel from model hparams to allow BCI models
to use a non-standard first convolution kernel size. Standard
whisper models default to kernel size 3.

Made-with: Cursor
- Add n_audio_window_size and n_audio_last_window_layer hparams
- When present, encoder self-attention is restricted to a local window
  for layers up to last_window_layer
- Bypass flash attention when windowed mask is active (Metal FA does
  not support custom F32 masks); flash attention remains enabled for
  non-BCI models and for the decoder
- Populate window_mask data on the encoder graph (not the cross graph)
- Add proper SOS token (language + transcribe) initialization for BCI
  models

Backward-compatible: n_audio_window_size defaults to 0 and
n_audio_last_window_layer defaults to -1, disabling windowed
attention entirely for standard whisper models.

Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
@sharmaraju352 sharmaraju352 force-pushed the feat/bci-patches-v184 branch from 5326bf7 to bbb3535 Compare April 16, 2026 14:28
Comment thread src/whisper.cpp Outdated
Comment thread src/whisper.cpp Outdated
Comment thread src/whisper.cpp Outdated
Raju added 2 commits April 18, 2026 10:46
Address review feedback:

1. Guard read_safe for BCI-specific hparams (n_audio_conv1_kernel,
   n_audio_window_size, n_audio_last_window_layer) behind a
   n_mels > 256 check. Standard whisper models have n_mels <= 128
   and do not contain these fields — reading them unconditionally
   would corrupt the file position and break model loading.

2. Add explicit is_bci flag to hparams struct, set when BCI fields
   are detected during loading.

3. Use is_bci flag (instead of n_audio_window_size > 0) to guard
   the BCI-specific decoder SOS token initialization.

4. Log BCI-specific hparams when a BCI model is detected.

Made-with: Cursor
The windowed attention mask values depend only on n_ctx and
window_size, both fixed after model load. Move the O(n_ctx^2)
computation from whisper_encode_internal (called every encode)
to whisper_init_state (called once). The encode path now just
copies the precomputed data to the graph tensor.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants