Skip to content

feat: GPU support, hotkey, beep notifications, and clipboard copy#8

Open
hoiyada7-maker wants to merge 12 commits into
parkscloud:masterfrom
hoiyada7-maker:feature/gpu-detection-fix
Open

feat: GPU support, hotkey, beep notifications, and clipboard copy#8
hoiyada7-maker wants to merge 12 commits into
parkscloud:masterfrom
hoiyada7-maker:feature/gpu-detection-fix

Conversation

@hoiyada7-maker
Copy link
Copy Markdown

Summary

  • GPU detection: Switch from torch to ctranslate2; auto-register NVIDIA pip-package DLL dirs on startup so CUDA Toolkit is not required
  • CUDA error handling: Dialog when GPU runtime DLLs are missing — 'Switch to CPU' or 'Install CUDA Toolkit' instead of a silent crash
  • HuggingFace models: Custom HF Whisper model support with CTranslate2 conversion; add transformers/torch to requirements
  • VAD chunking: Variable-length audio chunking with VAD silence detection (cherry-picked from feature/korean)
  • Hotkey: Global Ctrl+Alt+R to toggle recording, configurable from Settings with live key capture
  • Beep notifications: Optional beeps for recording start, stop, and transcript save
  • Clipboard: Optional copy of transcript body to clipboard on save
  • UI: Translate all Korean strings to English; fix exc scope NameError in download error handler (Python 3.12+)

Test plan

  • GPU inference works without CUDA Toolkit installed (pip packages only)
  • CUDA error dialog appears and Switch to CPU restarts recording
  • Korean HF model downloads, converts, and transcribes correctly
  • Ctrl+Alt+R starts/stops recording; hotkey rebindable from Settings
  • Beeps play at correct moments and can be individually disabled
  • Clipboard contains transcript body after save when option is enabled
  • No Korean text visible anywhere in the UI

Generated with Claude Code

hoiyada7-maker and others added 12 commits June 1, 2026 12:57
faster-whisper uses ctranslate2 as its inference backend, not PyTorch.
The previous detection relied on `import torch` which was never listed
as a dependency, causing GPU detection to silently fall back to CPU for
all users regardless of their hardware.

Switch to `ctranslate2.get_cuda_device_count()` so detection reflects
the same CUDA stack that actually runs inference. torch is still used
opportunistically for GPU name and VRAM info when available, with a
name-based VRAM lookup table as a fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When torch is absent, fall back to nvidia-smi for the actual GPU name
and VRAM (MiB → GB), so the UI shows the real device name instead of
the generic "CUDA Device 0" label.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the CUDA driver is present but CUDA Toolkit 12.x is not installed
(cublas64_12.dll etc. missing), ctranslate2 and faster-whisper crash at
runtime rather than at device detection time.

- gpu_detect: probe the CUDA runtime with a tiny StorageView allocation
  before reporting cuda_available=True; warns and falls back to CPU when
  runtime DLLs are absent.
- engine: catch RuntimeError on load() for "cannot be loaded" and
  automatically retry with device=cpu / compute_type=int8 so the app
  stays functional without a hard crash.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When device=cuda is configured but CUDA runtime DLLs are missing,
engine.load() now raises CudaUnavailableError instead of silently
switching to CPU.

app.py catches this on the background loader thread and posts a dialog
to the main thread offering two actions:
- "CPU로 변경": updates and saves config, restarts recording on CPU
- "CUDA Toolkit 설치": opens the NVIDIA download page in the browser

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users who install nvidia-cublas-cu12 / nvidia-cuda-runtime-cu12 via pip
no longer need the full CUDA Toolkit. On startup, cuda_dlls.py scans all
site-packages roots for nvidia/*/bin directories and registers each one
with os.add_dll_directory() before ctranslate2 is imported.

Works for any user regardless of Python install path (user site-packages,
venv, or system), so cublas64_12.dll and friends are always discoverable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ments

Allows pip install -r requirements.txt to pull CUDA runtime DLLs
automatically, enabling GPU inference without a full CUDA Toolkit install.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… them

os.add_dll_directory() covers Python extension module loading but not
ctranslate2's internal ctypes.CDLL("cublas64_12") calls, which only
search PATH on Windows. Now both mechanisms are set so cublas64_12.dll
is found at inference time as well as at import time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add HF_CUSTOM_MODELS dict with 2 Korean models:
  * SungBeom/whisper-small-ko (small-ko)
  * seastar105/whisper-medium-ko-zeroth (medium-ko-zeroth)
- Implement automatic CTranslate2 int8 conversion on first use
- Add model_manager functions: is_hf_custom_model, resolve_model_path, download_and_convert
- Update engine.load() to use resolve_model_path for local CTranslate2 models
- Enhance SettingsWindow with model download progress UI and status hints
- Skip re-download if model already converted (caching)
- Fix ct2-transformers-converter discovery for pip --user installs
- Add transformers>=4.23.0 dependency

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Replace fixed 30s chunks with adaptive silence-based cuts:
- Add _ChunkAccumulator: buffers audio, cuts when >=5s buffered and 1s trailing
  silence detected, or unconditionally at 30s hard cap (Whisper context window)
- Each chunk carries absolute start_time so timestamps remain accurate across
  variable-length chunks — eliminates chunk_index*30 drift from overlap
- Both-mode silence detection combined across loopback+mic (cut only when both quiet)
- Pipeline and MarkdownWriter updated to consume (chunk_index, start_time, audio) tuples
- First transcription text now appears in ~10s vs ~42s with the old fixed 30s chunks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Settings window additions:
- Hotkey: configurable global hotkey (default ctrl+alt+r) with live
  capture — click Capture then press any modifier+key combo
- Beep notifications: three independent checkboxes for recording
  start, stop, and MD file save completion (winsound.Beep)
- Clipboard: optional checkbox to copy full transcript body (no
  timestamps, no header/footer) to clipboard after MD save

App wiring:
- keyboard.add_hotkey registers/re-registers on startup, wizard
  complete, and settings save; unregistered on quit
- Hotkey callback dispatches to main thread via safe_after so
  tkinter state is never touched from the keyboard thread
- Beeps run in daemon threads to avoid blocking recording teardown
- Clipboard extraction reads the finalized MD body between header
  and --- footer marker; written to tk clipboard on main thread

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ct2-transformers-converter needs torch to load and convert HuggingFace
Whisper models. torch is a one-time conversion dependency only — GPU
inference continues to use ctranslate2 directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dd transformers dep

- settings_window: translate beep/clipboard checkbox labels to English
- app: translate CUDA error dialog title, body, and buttons to English
- settings_window: fix NameError in _download_bg — capture str(exc) into
  lambda default arg before it goes out of scope (Python 3.12+ behavior)
- requirements.txt: transformers was already listed; verified present

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant