Skip to content

Add Korean Whisper models with automatic CTranslate2 conversion#7

Open
hoiyada7-maker wants to merge 2 commits into
parkscloud:masterfrom
hoiyada7-maker:feature/korean
Open

Add Korean Whisper models with automatic CTranslate2 conversion#7
hoiyada7-maker wants to merge 2 commits into
parkscloud:masterfrom
hoiyada7-maker:feature/korean

Conversation

@hoiyada7-maker
Copy link
Copy Markdown

Summary

  • small-ko (SungBeom/whisper-small-ko, 244M): 한국어 특화 Whisper 소형 모델
  • medium-ko-zeroth (seastar105/whisper-medium-ko-zeroth, 769M): Zeroth 코퍼스 파인튜닝 중형 모델

모델 선택 시 HuggingFace에서 다운로드 후 CTranslate2 int8 형식으로 자동 변환, 이후에는 캐시 재사용.

Changes

  • constants.py: HF_CUSTOM_MODELS 추가 (repo_id, 파라미터 수, VRAM 정보)
  • model_manager.py: HF 모델 다운로드 + CTranslate2 변환 파이프라인 구현
    • is_hf_custom_model(), resolve_model_path(), get_hf_model_local_path()
    • ct2-transformers-converter 자동 탐색 (pip --user 경로 포함)
  • engine.py: load() 에서 resolve_model_path() 호출 → 로컬 CTranslate2 경로 지원
  • settings_window.py: 모델 변경 시 다운로드 UI 추가
    • 커스텀 모델 선택 시 힌트 표시 ("다운로드 필요" / "변환 완료")
    • Save 클릭 → 미변환 모델이면 진행 바 표시 후 백그라운드 변환, 완료 시 자동 저장·닫기
  • requirements.txt: transformers>=4.23.0 추가

Test Results

✅ 모델 목록에 small-ko, medium-ko-zeroth 표시
✅ small-ko 다운로드 → CTranslate2 변환 완료 (model.bin 생성)
✅ 이미 변환된 모델은 재다운로드 없이 스킵
✅ engine.load() 로컬 CTranslate2 모델 로드 성공
✅ 추론 정상 동작 확인

🤖 Generated with Claude Code

hoiyada7-maker and others added 2 commits May 30, 2026 11:07
- Add HF_CUSTOM_MODELS dict with 2 Korean models:
  * SungBeom/whisper-small-ko (small-ko)
  * seastar105/whisper-medium-ko-zeroth (medium-ko-zeroth)
- Implement automatic CTranslate2 int8 conversion on first use
- Add model_manager functions: is_hf_custom_model, resolve_model_path, download_and_convert
- Update engine.load() to use resolve_model_path for local CTranslate2 models
- Enhance SettingsWindow with model download progress UI and status hints
- Skip re-download if model already converted (caching)
- Fix ct2-transformers-converter discovery for pip --user installs
- Add transformers>=4.23.0 dependency

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Replace fixed 30s chunks with adaptive silence-based cuts:
- Add _ChunkAccumulator: buffers audio, cuts when >=5s buffered and 1s trailing
  silence detected, or unconditionally at 30s hard cap (Whisper context window)
- Each chunk carries absolute start_time so timestamps remain accurate across
  variable-length chunks — eliminates chunk_index*30 drift from overlap
- Both-mode silence detection combined across loopback+mic (cut only when both quiet)
- Pipeline and MarkdownWriter updated to consume (chunk_index, start_time, audio) tuples
- First transcription text now appears in ~10s vs ~42s with the old fixed 30s chunks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant