π― Intelligent bulk audio transcription with smart content filtering and automated summary generation
A specialized audio transcription automation tool designed for interview content, featuring intelligent content filtering, key moment detection, and automated cliff notes generation. Perfect for journalists, researchers, and content creators working with large volumes of interview recordings.
- π Bulk Processing: Efficiently transcribe large volumes of audio files
- π§ Smart Filtering: Automatically identify and extract key moments while filtering background noise
- ποΈ Interview Detection: Recognize and highlight important discussions and quotes
- π§ Audio Enhancement: Noise reduction and audio quality optimization
- π Organized Output: Structured transcripts, key moments, and cliff notes
- β Quality Control: Confidence scoring and relevance filtering
- πͺ Windows Friendly: Batch files for easy execution on Windows
- Journalists processing interview recordings
- Researchers transcribing focus groups and discussions
- Content creators extracting quotes from video audio
- Podcasters generating show notes and highlights
- Students transcribing lectures and presentations
-
Clone the repository
git clone https://github.com/cotrk/batch-transcribe-tool.git cd batch-transcribe-tool -
Run setup (one-time installation)
setup.bat
-
Transcribe audio files
transcribe_rodepro.bat # For camera audio files transcribe_dr10l.bat # For recorder audio files
-
Install dependencies
pip install -r requirements.txt
-
Run transcription
python interview_transcriber.py
batch-transcribe-tool/
βββ interview_transcriber.py # Main transcription engine
βββ requirements.txt # Python dependencies
βββ setup.bat # Windows setup script
βββ transcribe_rodepro.bat # RodePro transcription launcher
βββ transcribe_dr10l.bat # DR-10L transcription launcher
βββ README.md # This file
βββ output/ # Generated transcriptions
βββ transcripts/ # Full transcriptions with timestamps
βββ key_moments/ # Extracted quotes and insights
βββ cliff_notes/ # Concise summaries
βββ processing_summary.md # Overall statistics
- WAV files (primary support)
- MP3, M4A, FLAC (via librosa conversion)
- Video audio tracks (extracted automatically)
- Multiple sample rates and bit depths
# Interview_001.wav Transcript
**Duration:** 15:32
**Word Count:** 2,847
**Key Moments:** 12
## Full Transcript
[00:15] Good morning! Thanks for joining us today...
[00:22] Thank you for having me. I'm excited to share...
# Interview_001.wav - Key Moments & Quotes
β **[03:45]** The most important thing I learned was that persistence matters more than talent.
*Relevance: 0.92*
π **[07:23]** When we first started this project, we had no idea it would become so successful.
*Relevance: 0.78*
# Interview_001.wav - Cliff Notes
1. **[03:45]** The most important thing I learned was that persistence matters more than talent.
2. **[05:12]** Our breakthrough came when we stopped trying to be perfect and started being authentic.
3. **[08:34]** The data shows that engagement increases by 40% when content is personalized.
Edit interview_transcriber.py to configure your audio paths:
# For RodePro camera audio
input_folder = r"your\rodepro\audio\path"
output_folder = r"your\output\path"
# For DR-10L recorder audio
input_folder = r"your\dr10l\audio\path"
output_folder = r"your\output\path"Choose Whisper model based on your needs:
| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
| tiny | β‘β‘β‘ | β | Quick drafts, testing |
| base | β‘β‘ | βββ | Recommended balance |
| medium | β‘ | ββββ | High-quality results |
| large | π | βββββ | Best accuracy, slow |
Edit the .bat files to change models:
model_size = "medium" # Change from "base"Adjust relevance thresholds to filter content:
# Lower threshold for more content (0.4 = more inclusive)
if relevance_score > 0.4: # Default is 0.6
# Higher threshold for less content (0.8 = very selective)
if relevance_score > 0.8:from interview_transcriber import InterviewTranscriber
# Initialize with custom settings
transcriber = InterviewTranscriber(
input_folder="path/to/audio",
output_folder="path/to/output",
model_size="medium"
)
# Process specific file
result = transcriber.transcribe_file(Path("interview.wav"))
if result:
transcriber.save_results(result, Path("interview.wav"))
# Process with custom pattern
results = transcriber.process_batch("interview_*.wav")# Process only specific days
results = transcriber.process_batch("DAY1_*.wav")
# Process multiple formats
results = transcriber.process_batch("*.{wav,mp3,m4a}")- Python 3.8+
- Windows 10/11 (for .bat files)
- 8GB+ RAM recommended
- 10GB+ free disk space for outputs
- Internet connection for initial model download
- Install Python from python.org
- Clone repository
git clone https://github.com/cotrk/batch-transcribe-tool.git cd batch-transcribe-tool - Install dependencies
pip install -r requirements.txt
- Download Whisper models (automatic on first run)
- openai-whisper: Speech recognition
- librosa: Audio processing and enhancement
- soundfile: Audio file handling
- torch: Deep learning framework
- numpy/scipy: Numerical computing
- ffmpeg-python: Audio format conversion
Error: Python is not installed or not in PATH
Solution: Install Python 3.8+ and ensure "Add to PATH" is checked during installation.
CUDA out of memory
Solution: Use smaller model or process files individually:
model_size = "tiny" # Use smaller modelAudio enhancement failed for file.wav
Solution: Check file format and integrity. The system will fall back to original audio.
Permission denied: output folder
Solution: Ensure write access to output directory or run as administrator.
- Use SSD storage for faster I/O
- Close other applications during processing
- Process files in smaller groups for memory constraints
- Consider GPU acceleration for large batches
- Check
transcription.login output folders for detailed error information - Review
processing_summary.mdfor overall results - Each transcription includes confidence scores for quality assessment
- Open an issue for support
The system automatically tracks:
- Files processed successfully
- Processing failures and reasons
- Total word count and duration
- Key moments identified
- Average confidence scores
- Processing time per file
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Additional audio format support
- GPU acceleration improvements
- Web interface development
- Additional language models
- Performance optimizations
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for the Whisper speech recognition model
- librosa team for audio processing tools
- PyTorch team for the deep learning framework
- π§ Email: [your-email@example.com]
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
β If you find this tool useful, please give it a star on GitHub!
Made with β€οΈ for journalists, researchers, and content creators everywhere.