feat: GPU support, hotkey, beep notifications, and clipboard copy#8
Open
hoiyada7-maker wants to merge 12 commits into
Open
feat: GPU support, hotkey, beep notifications, and clipboard copy#8hoiyada7-maker wants to merge 12 commits into
hoiyada7-maker wants to merge 12 commits into
Conversation
faster-whisper uses ctranslate2 as its inference backend, not PyTorch. The previous detection relied on `import torch` which was never listed as a dependency, causing GPU detection to silently fall back to CPU for all users regardless of their hardware. Switch to `ctranslate2.get_cuda_device_count()` so detection reflects the same CUDA stack that actually runs inference. torch is still used opportunistically for GPU name and VRAM info when available, with a name-based VRAM lookup table as a fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When torch is absent, fall back to nvidia-smi for the actual GPU name and VRAM (MiB → GB), so the UI shows the real device name instead of the generic "CUDA Device 0" label. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the CUDA driver is present but CUDA Toolkit 12.x is not installed (cublas64_12.dll etc. missing), ctranslate2 and faster-whisper crash at runtime rather than at device detection time. - gpu_detect: probe the CUDA runtime with a tiny StorageView allocation before reporting cuda_available=True; warns and falls back to CPU when runtime DLLs are absent. - engine: catch RuntimeError on load() for "cannot be loaded" and automatically retry with device=cpu / compute_type=int8 so the app stays functional without a hard crash. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When device=cuda is configured but CUDA runtime DLLs are missing, engine.load() now raises CudaUnavailableError instead of silently switching to CPU. app.py catches this on the background loader thread and posts a dialog to the main thread offering two actions: - "CPU로 변경": updates and saves config, restarts recording on CPU - "CUDA Toolkit 설치": opens the NVIDIA download page in the browser Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users who install nvidia-cublas-cu12 / nvidia-cuda-runtime-cu12 via pip no longer need the full CUDA Toolkit. On startup, cuda_dlls.py scans all site-packages roots for nvidia/*/bin directories and registers each one with os.add_dll_directory() before ctranslate2 is imported. Works for any user regardless of Python install path (user site-packages, venv, or system), so cublas64_12.dll and friends are always discoverable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ments Allows pip install -r requirements.txt to pull CUDA runtime DLLs automatically, enabling GPU inference without a full CUDA Toolkit install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… them
os.add_dll_directory() covers Python extension module loading but not
ctranslate2's internal ctypes.CDLL("cublas64_12") calls, which only
search PATH on Windows. Now both mechanisms are set so cublas64_12.dll
is found at inference time as well as at import time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add HF_CUSTOM_MODELS dict with 2 Korean models: * SungBeom/whisper-small-ko (small-ko) * seastar105/whisper-medium-ko-zeroth (medium-ko-zeroth) - Implement automatic CTranslate2 int8 conversion on first use - Add model_manager functions: is_hf_custom_model, resolve_model_path, download_and_convert - Update engine.load() to use resolve_model_path for local CTranslate2 models - Enhance SettingsWindow with model download progress UI and status hints - Skip re-download if model already converted (caching) - Fix ct2-transformers-converter discovery for pip --user installs - Add transformers>=4.23.0 dependency Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Replace fixed 30s chunks with adaptive silence-based cuts: - Add _ChunkAccumulator: buffers audio, cuts when >=5s buffered and 1s trailing silence detected, or unconditionally at 30s hard cap (Whisper context window) - Each chunk carries absolute start_time so timestamps remain accurate across variable-length chunks — eliminates chunk_index*30 drift from overlap - Both-mode silence detection combined across loopback+mic (cut only when both quiet) - Pipeline and MarkdownWriter updated to consume (chunk_index, start_time, audio) tuples - First transcription text now appears in ~10s vs ~42s with the old fixed 30s chunks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Settings window additions: - Hotkey: configurable global hotkey (default ctrl+alt+r) with live capture — click Capture then press any modifier+key combo - Beep notifications: three independent checkboxes for recording start, stop, and MD file save completion (winsound.Beep) - Clipboard: optional checkbox to copy full transcript body (no timestamps, no header/footer) to clipboard after MD save App wiring: - keyboard.add_hotkey registers/re-registers on startup, wizard complete, and settings save; unregistered on quit - Hotkey callback dispatches to main thread via safe_after so tkinter state is never touched from the keyboard thread - Beeps run in daemon threads to avoid blocking recording teardown - Clipboard extraction reads the finalized MD body between header and --- footer marker; written to tk clipboard on main thread Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ct2-transformers-converter needs torch to load and convert HuggingFace Whisper models. torch is a one-time conversion dependency only — GPU inference continues to use ctranslate2 directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dd transformers dep - settings_window: translate beep/clipboard checkbox labels to English - app: translate CUDA error dialog title, body, and buttons to English - settings_window: fix NameError in _download_bg — capture str(exc) into lambda default arg before it goes out of scope (Python 3.12+ behavior) - requirements.txt: transformers was already listed; verified present Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
Generated with Claude Code