Batch Transcribe Tool

🎯 Intelligent bulk audio transcription with smart content filtering and automated summary generation

A specialized audio transcription automation tool designed for interview content, featuring intelligent content filtering, key moment detection, and automated cliff notes generation. Perfect for journalists, researchers, and content creators working with large volumes of interview recordings.

✨ Features

🚀 Bulk Processing: Efficiently transcribe large volumes of audio files
🧠 Smart Filtering: Automatically identify and extract key moments while filtering background noise
🎙️ Interview Detection: Recognize and highlight important discussions and quotes
🔧 Audio Enhancement: Noise reduction and audio quality optimization
📊 Organized Output: Structured transcripts, key moments, and cliff notes
✅ Quality Control: Confidence scoring and relevance filtering
🪟 Windows Friendly: Batch files for easy execution on Windows

🎯 Perfect For

Journalists processing interview recordings
Researchers transcribing focus groups and discussions
Content creators extracting quotes from video audio
Podcasters generating show notes and highlights
Students transcribing lectures and presentations

🚀 Quick Start

Windows (Recommended)

Clone the repository

git clone https://github.com/cotrk/batch-transcribe-tool.git
cd batch-transcribe-tool

Run setup (one-time installation)
```
setup.bat
```

Transcribe audio files

transcribe_rodepro.bat    # For camera audio files
transcribe_dr10l.bat      # For recorder audio files

Cross-Platform (Python)

Install dependencies
```
pip install -r requirements.txt
```
Run transcription
```
python interview_transcriber.py
```

📁 File Structure

batch-transcribe-tool/
├── interview_transcriber.py    # Main transcription engine
├── requirements.txt            # Python dependencies
├── setup.bat                   # Windows setup script
├── transcribe_rodepro.bat      # RodePro transcription launcher
├── transcribe_dr10l.bat        # DR-10L transcription launcher
├── README.md                   # This file
└── output/                     # Generated transcriptions
    ├── transcripts/             # Full transcriptions with timestamps
    ├── key_moments/           # Extracted quotes and insights
    ├── cliff_notes/           # Concise summaries
    └── processing_summary.md  # Overall statistics

🎧 Audio Sources Supported

WAV files (primary support)
MP3, M4A, FLAC (via librosa conversion)
Video audio tracks (extracted automatically)
Multiple sample rates and bit depths

📊 Output Types

1. Full Transcripts

# Interview_001.wav Transcript

**Duration:** 15:32
**Word Count:** 2,847
**Key Moments:** 12

## Full Transcript

[00:15] Good morning! Thanks for joining us today...
[00:22] Thank you for having me. I'm excited to share...

2. Key Moments & Quotes

# Interview_001.wav - Key Moments & Quotes

⭐ **[03:45]** The most important thing I learned was that persistence matters more than talent.
   *Relevance: 0.92*

📌 **[07:23]** When we first started this project, we had no idea it would become so successful.
   *Relevance: 0.78*

3. Cliff Notes

# Interview_001.wav - Cliff Notes

1. **[03:45]** The most important thing I learned was that persistence matters more than talent.
2. **[05:12]** Our breakthrough came when we stopped trying to be perfect and started being authentic.
3. **[08:34]** The data shows that engagement increases by 40% when content is personalized.

⚙️ Configuration

Customizing Audio Sources

Edit interview_transcriber.py to configure your audio paths:

# For RodePro camera audio
input_folder = r"your\rodepro\audio\path"
output_folder = r"your\output\path"

# For DR-10L recorder audio  
input_folder = r"your\dr10l\audio\path"
output_folder = r"your\output\path"

Model Selection

Choose Whisper model based on your needs:

Model	Speed	Accuracy	Use Case
tiny	⚡⚡⚡	⭐	Quick drafts, testing
base	⚡⚡	⭐⭐⭐	Recommended balance
medium	⚡	⭐⭐⭐⭐	High-quality results
large	🐌	⭐⭐⭐⭐⭐	Best accuracy, slow

Edit the .bat files to change models:

model_size = "medium"  # Change from "base"

Content Filtering

Adjust relevance thresholds to filter content:

# Lower threshold for more content (0.4 = more inclusive)
if relevance_score > 0.4:  # Default is 0.6

# Higher threshold for less content (0.8 = very selective)
if relevance_score > 0.8:

🎛️ Advanced Usage

Custom Processing Script

from interview_transcriber import InterviewTranscriber

# Initialize with custom settings
transcriber = InterviewTranscriber(
    input_folder="path/to/audio",
    output_folder="path/to/output",
    model_size="medium"
)

# Process specific file
result = transcriber.transcribe_file(Path("interview.wav"))
if result:
    transcriber.save_results(result, Path("interview.wav"))

# Process with custom pattern
results = transcriber.process_batch("interview_*.wav")

Batch Processing with Filters

# Process only specific days
results = transcriber.process_batch("DAY1_*.wav")

# Process multiple formats
results = transcriber.process_batch("*.{wav,mp3,m4a}")

🔧 Installation

System Requirements

Python 3.8+
Windows 10/11 (for .bat files)
8GB+ RAM recommended
10GB+ free disk space for outputs
Internet connection for initial model download

Manual Installation

Install Python from python.org

Clone repository

git clone https://github.com/cotrk/batch-transcribe-tool.git
cd batch-transcribe-tool

Install dependencies
```
pip install -r requirements.txt
```
Download Whisper models (automatic on first run)

Dependencies

openai-whisper: Speech recognition
librosa: Audio processing and enhancement
soundfile: Audio file handling
torch: Deep learning framework
numpy/scipy: Numerical computing
ffmpeg-python: Audio format conversion

🐛 Troubleshooting

Common Issues

1. Python Not Found

Error: Python is not installed or not in PATH

Solution: Install Python 3.8+ and ensure "Add to PATH" is checked during installation.

2. Memory Issues

CUDA out of memory

Solution: Use smaller model or process files individually:

model_size = "tiny"  # Use smaller model

3. Audio Quality Issues

Audio enhancement failed for file.wav

Solution: Check file format and integrity. The system will fall back to original audio.

4. Permission Errors

Permission denied: output folder

Solution: Ensure write access to output directory or run as administrator.

Performance Optimization

Use SSD storage for faster I/O
Close other applications during processing
Process files in smaller groups for memory constraints
Consider GPU acceleration for large batches

Getting Help

Check transcription.log in output folders for detailed error information
Review processing_summary.md for overall results
Each transcription includes confidence scores for quality assessment
Open an issue for support

📈 Performance Metrics

The system automatically tracks:

Files processed successfully
Processing failures and reasons
Total word count and duration
Key moments identified
Average confidence scores
Processing time per file

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Areas for Contribution

Additional audio format support
GPU acceleration improvements
Web interface development
Additional language models
Performance optimizations

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for the Whisper speech recognition model
librosa team for audio processing tools
PyTorch team for the deep learning framework

📞 Support

📧 Email: [your-email@example.com]
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

⭐ If you find this tool useful, please give it a star on GitHub!

Made with ❤️ for journalists, researchers, and content creators everywhere.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
prompts		prompts
.gitignore		.gitignore
README.md		README.md
interview_transcriber.py		interview_transcriber.py
ml_transcribe.py		ml_transcribe.py
requirements.txt		requirements.txt
setup.bat		setup.bat
transcribe_dr10l.bat		transcribe_dr10l.bat
transcribe_ml_mode.bat		transcribe_ml_mode.bat
transcribe_rodepro.bat		transcribe_rodepro.bat

Folders and files

Latest commit

History

Repository files navigation