Skip to content

cotrk/batch-transcribe-tool

Repository files navigation

Batch Transcribe Tool

🎯 Intelligent bulk audio transcription with smart content filtering and automated summary generation

Python License Platform

A specialized audio transcription automation tool designed for interview content, featuring intelligent content filtering, key moment detection, and automated cliff notes generation. Perfect for journalists, researchers, and content creators working with large volumes of interview recordings.

✨ Features

  • πŸš€ Bulk Processing: Efficiently transcribe large volumes of audio files
  • 🧠 Smart Filtering: Automatically identify and extract key moments while filtering background noise
  • πŸŽ™οΈ Interview Detection: Recognize and highlight important discussions and quotes
  • πŸ”§ Audio Enhancement: Noise reduction and audio quality optimization
  • πŸ“Š Organized Output: Structured transcripts, key moments, and cliff notes
  • βœ… Quality Control: Confidence scoring and relevance filtering
  • πŸͺŸ Windows Friendly: Batch files for easy execution on Windows

🎯 Perfect For

  • Journalists processing interview recordings
  • Researchers transcribing focus groups and discussions
  • Content creators extracting quotes from video audio
  • Podcasters generating show notes and highlights
  • Students transcribing lectures and presentations

πŸš€ Quick Start

Windows (Recommended)

  1. Clone the repository

    git clone https://github.com/cotrk/batch-transcribe-tool.git
    cd batch-transcribe-tool
  2. Run setup (one-time installation)

    setup.bat
  3. Transcribe audio files

    transcribe_rodepro.bat    # For camera audio files
    transcribe_dr10l.bat      # For recorder audio files

Cross-Platform (Python)

  1. Install dependencies

    pip install -r requirements.txt
  2. Run transcription

    python interview_transcriber.py

πŸ“ File Structure

batch-transcribe-tool/
β”œβ”€β”€ interview_transcriber.py    # Main transcription engine
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ setup.bat                   # Windows setup script
β”œβ”€β”€ transcribe_rodepro.bat      # RodePro transcription launcher
β”œβ”€β”€ transcribe_dr10l.bat        # DR-10L transcription launcher
β”œβ”€β”€ README.md                   # This file
└── output/                     # Generated transcriptions
    β”œβ”€β”€ transcripts/             # Full transcriptions with timestamps
    β”œβ”€β”€ key_moments/           # Extracted quotes and insights
    β”œβ”€β”€ cliff_notes/           # Concise summaries
    └── processing_summary.md  # Overall statistics

🎧 Audio Sources Supported

  • WAV files (primary support)
  • MP3, M4A, FLAC (via librosa conversion)
  • Video audio tracks (extracted automatically)
  • Multiple sample rates and bit depths

πŸ“Š Output Types

1. Full Transcripts

# Interview_001.wav Transcript

**Duration:** 15:32
**Word Count:** 2,847
**Key Moments:** 12

## Full Transcript

[00:15] Good morning! Thanks for joining us today...
[00:22] Thank you for having me. I'm excited to share...

2. Key Moments & Quotes

# Interview_001.wav - Key Moments & Quotes

⭐ **[03:45]** The most important thing I learned was that persistence matters more than talent.
   *Relevance: 0.92*

πŸ“Œ **[07:23]** When we first started this project, we had no idea it would become so successful.
   *Relevance: 0.78*

3. Cliff Notes

# Interview_001.wav - Cliff Notes

1. **[03:45]** The most important thing I learned was that persistence matters more than talent.
2. **[05:12]** Our breakthrough came when we stopped trying to be perfect and started being authentic.
3. **[08:34]** The data shows that engagement increases by 40% when content is personalized.

βš™οΈ Configuration

Customizing Audio Sources

Edit interview_transcriber.py to configure your audio paths:

# For RodePro camera audio
input_folder = r"your\rodepro\audio\path"
output_folder = r"your\output\path"

# For DR-10L recorder audio  
input_folder = r"your\dr10l\audio\path"
output_folder = r"your\output\path"

Model Selection

Choose Whisper model based on your needs:

Model Speed Accuracy Use Case
tiny ⚑⚑⚑ ⭐ Quick drafts, testing
base ⚑⚑ ⭐⭐⭐ Recommended balance
medium ⚑ ⭐⭐⭐⭐ High-quality results
large 🐌 ⭐⭐⭐⭐⭐ Best accuracy, slow

Edit the .bat files to change models:

model_size = "medium"  # Change from "base"

Content Filtering

Adjust relevance thresholds to filter content:

# Lower threshold for more content (0.4 = more inclusive)
if relevance_score > 0.4:  # Default is 0.6

# Higher threshold for less content (0.8 = very selective)
if relevance_score > 0.8:

πŸŽ›οΈ Advanced Usage

Custom Processing Script

from interview_transcriber import InterviewTranscriber

# Initialize with custom settings
transcriber = InterviewTranscriber(
    input_folder="path/to/audio",
    output_folder="path/to/output",
    model_size="medium"
)

# Process specific file
result = transcriber.transcribe_file(Path("interview.wav"))
if result:
    transcriber.save_results(result, Path("interview.wav"))

# Process with custom pattern
results = transcriber.process_batch("interview_*.wav")

Batch Processing with Filters

# Process only specific days
results = transcriber.process_batch("DAY1_*.wav")

# Process multiple formats
results = transcriber.process_batch("*.{wav,mp3,m4a}")

πŸ”§ Installation

System Requirements

  • Python 3.8+
  • Windows 10/11 (for .bat files)
  • 8GB+ RAM recommended
  • 10GB+ free disk space for outputs
  • Internet connection for initial model download

Manual Installation

  1. Install Python from python.org
  2. Clone repository
    git clone https://github.com/cotrk/batch-transcribe-tool.git
    cd batch-transcribe-tool
  3. Install dependencies
    pip install -r requirements.txt
  4. Download Whisper models (automatic on first run)

Dependencies

  • openai-whisper: Speech recognition
  • librosa: Audio processing and enhancement
  • soundfile: Audio file handling
  • torch: Deep learning framework
  • numpy/scipy: Numerical computing
  • ffmpeg-python: Audio format conversion

πŸ› Troubleshooting

Common Issues

1. Python Not Found

Error: Python is not installed or not in PATH

Solution: Install Python 3.8+ and ensure "Add to PATH" is checked during installation.

2. Memory Issues

CUDA out of memory

Solution: Use smaller model or process files individually:

model_size = "tiny"  # Use smaller model

3. Audio Quality Issues

Audio enhancement failed for file.wav

Solution: Check file format and integrity. The system will fall back to original audio.

4. Permission Errors

Permission denied: output folder

Solution: Ensure write access to output directory or run as administrator.

Performance Optimization

  • Use SSD storage for faster I/O
  • Close other applications during processing
  • Process files in smaller groups for memory constraints
  • Consider GPU acceleration for large batches

Getting Help

  1. Check transcription.log in output folders for detailed error information
  2. Review processing_summary.md for overall results
  3. Each transcription includes confidence scores for quality assessment
  4. Open an issue for support

πŸ“ˆ Performance Metrics

The system automatically tracks:

  • Files processed successfully
  • Processing failures and reasons
  • Total word count and duration
  • Key moments identified
  • Average confidence scores
  • Processing time per file

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Areas for Contribution

  • Additional audio format support
  • GPU acceleration improvements
  • Web interface development
  • Additional language models
  • Performance optimizations

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI for the Whisper speech recognition model
  • librosa team for audio processing tools
  • PyTorch team for the deep learning framework

πŸ“ž Support


⭐ If you find this tool useful, please give it a star on GitHub!

Made with ❀️ for journalists, researchers, and content creators everywhere.

About

Intelligent bulk audio transcription with smart content filtering and automated summary generation

Topics

Resources

Stars

Watchers

Forks

Contributors