- Overview
- Getting Started
- Core Features
- Authentication & API Configuration
- Video Processing
- Transcript Management
- AI-Powered Features
- Cost Tracking & Billing
- Export Options
- Keyboard Shortcuts
- Advanced Features
- Troubleshooting
Transcript Parser is an advanced AI-powered desktop application that converts video/audio files into searchable, editable transcripts with speaker diarization. Built with Electron, React, and Google Gemini AI, it provides professional-grade transcription with intelligent speaker identification and comprehensive editing capabilities.
- AI-Powered Transcription: Uses Google Gemini 2.5 Flash for accurate speech-to-text
- Speaker Diarization: Automatically identifies and separates different speakers
- AI Name Detection: Intelligently detects speaker names from introductions
- Real-Time Editing: Edit transcripts with full undo/redo support
- Advanced Search: Search across transcripts with highlighting
- Cost Tracking: Real-time token usage and monthly billing breakdown
- Multiple Export Formats: TXT, JSON, SRT, VTT formats
- Cross-Platform: Available for Windows, macOS, and Linux
- Download
Transcript Parser-Setup-1.0.0.exefrom the releases - Run the installer and follow the setup wizard
- Launch Transcript Parser from Start Menu
- Download
Transcript Parser-Portable-1.0.0.exe - Run directly without installation (no admin rights required)
-
Set up API Access
- Click the settings icon (⚙️) in the top-right
- Choose your API configuration method
- Enter credentials as needed
-
Load Your First Video
- Click "Choose Video File" or drag & drop a video
- Supported formats: MP4, AVI, MOV, WebM, MP3, WAV
- Maximum file size: 2GB recommended
Upload Methods:
- Click Button: Click "Choose Video File" button
- Drag & Drop: Drag video/audio files directly into the upload area
- Recent Files: Access recently processed files from history
Supported Formats:
- Video:
.mp4,.avi,.mov,.webm - Audio:
.mp3,.wav,.m4a,.flac
Processing:
- File size: Up to 2GB (recommended)
- Conversion: Automatically converts to WebM/Opus for optimal processing
- Progress: Real-time progress bar with percentage and status updates
How It Works:
- Upload your media file
- Click "Start Transcription"
- AI processes audio and generates transcript
- Speakers are automatically identified and separated
Transcription Features:
- Automatic Speaker Detection: Identifies unique speakers
- Timestamps: Precise start/end times for each segment
- Confidence Scores: Quality indicators for each segment
- Real-Time Updates: See transcript build as processing completes
Best for: Individual users, full control
- Click Settings (⚙️) → "API Configuration"
- Select "Use Own API Key"
- Get API key from Google AI Studio
- Paste key and click "Save"
Advantages:
- Full cost control
- No access codes needed
- Direct billing through Google
Best for: Shared access, organizations
- Select "Use Access Code"
- Enter 10-digit code:
XXX-XXXX-XXX - Developer's API key is used automatically
Access Code Validation:
- Format: 3-4-3 digits
- Example:
123-4567-890 - Contact admin for codes
Coming soon: Pay-as-you-go with monthly billing
Playback Controls:
▶️ Play/Pause: Click video or pressSpace- ⏩ Fast Forward: Press
→orL - ⏪ Rewind: Press
←orJ - 🔇 Mute/Unmute: Press
M - 📽️ Fullscreen: Press
F
Transcript Sync:
- Click Entry: Jump to timestamp in video
- Auto-Highlight: Current speaking segment highlighted
- Seek Bar: Visual timeline with segment markers
Advanced Controls:
- Playback speed: 0.25x to 2x
- Volume control: 0% to 100%
- Frame-by-frame navigation
Layout:
- Left Panel: Speaker analytics and statistics
- Center Panel: Transcript entries with search
- Right Panel: (Optional) Video player
Entry Information:
- Speaker name/color
- Start and end timestamps
- Confidence percentage
- Full text content
Enable Editing:
- Toggle edit mode in top toolbar
- Double-click any entry to edit
- Modify text, start time, or end time
Edit Operations:
-
Text Editing:
- Double-click entry text
- Make changes in textarea
- Click "Save" or press
Enter
-
Timestamp Editing:
- Click edit icon
- Modify start/end times (in seconds)
- Format: Decimal (e.g., 12.5)
-
Undo/Redo:
- Undo:
Ctrl+Z(Windows) /Cmd+Z(Mac) - Redo:
Ctrl+Shift+Z/Cmd+Shift+Z
- Undo:
Visual Indicators:
- 🔄 Edited Badge: Shows modified entries
- ✓ Save Confirmation: Visual feedback on save
- ❌ Cancel Option: Discard changes
Rename Speakers:
- Find speaker in left panel
- Click edit icon (✏️) next to speaker name
- Type new name
- Press
Enteror click ✓ - Name updates across all entries
Speaker Colors:
- Blue, Emerald, Purple, Orange, Pink, Cyan
- Automatically assigned
- Consistent throughout transcript
Automatic Detection:
- Click "Detect Names" button (✨ Sparkles icon)
- AI analyzes first 30 entries of each speaker
- Looks for self-introduction patterns
- Returns suggestions with confidence levels
Detection Patterns:
- "My name is [name]"
- "I'm [name]"
- "This is [name]"
- "Hi, I'm [name]"
- "[name] here"
Confidence Levels:
- High: Clear, unambiguous introduction (e.g., "My name is John Smith")
- Medium: Less formal introduction (e.g., "I'm Sarah")
- Low: Ambiguous or indirect reference
Review Suggestions:
- Evidence Quote: See exact text where name was detected
- Accept: Applies name to all speaker entries
- Reject: Dismisses suggestion
- Dismiss All: Remove all suggestions
Text Search:
- Type query in search box
- Live results with match count
- Highlighting in transcript entries
- Case-insensitive matching
Filters:
- Speaker Filter: Show only specific speakers
- Time Range: Filter by start/end timestamps
- Confidence: (Future) Filter by confidence score
Combined Filters:
- Search + Speaker: Find text from specific speaker
- Search + Time: Find text in time range
- All filters stack together
What's Tracked:
- Input tokens (prompt + audio analysis)
- Output tokens (generated transcript)
- Total tokens per operation
- Estimated cost in USD
Cost Calculation:
- Gemini 2.5 Flash: $0.075/1M input, $0.30/1M output
- Real-time updates from API responses
- Persists across sessions (localStorage)
- Click "Cost Summary" button (💰)
- See overview cards:
- Total Tokens Used
- Total Cost (USD)
- Total Operations
- Average Cost per Operation
Current Month Card:
- Tokens used this month
- Current month cost
- Operations count
- Highlighted in amber/gold
Historical Billing:
- All past months sorted newest first
- Monthly totals: tokens, cost, operations
- Format: "December 2024"
Usage by Category:
- By Model: Gemini 2.5 Flash, 1.5 Flash, etc.
- By Operation: Video Transcription, Name Detection
- Speaker labels
- Timestamps in
[HH:MM:SS]format - Clean, readable format
- Best for: Documentation, notes
Example:
[00:00:05] Speaker 1: Hello everyone, welcome to the meeting.
[00:00:12] Speaker 2: Thanks for having me.
- Complete structured data
- All metadata preserved
- Speakers, timestamps, confidence
- Best for: Developers, data analysis
Example:
{
"entries": [{
"id": "1",
"speaker": "Speaker 1",
"speakerNumber": 1,
"startTime": 0.0,
"endTime": 5.2,
"text": "Hello everyone",
"confidence": 0.95
}],
"speakers": [...],
"metadata": {...}
}- Standard subtitle format
- Numbered sequences
- Timestamp format:
HH:MM:SS,mmm - Best for: Video subtitles, YouTube
Example:
1
00:00:00,000 --> 00:00:05,200
Speaker 1: Hello everyone
2
00:00:05,200 --> 00:00:12,000
Speaker 2: Thanks for having me
- Web video text tracks
- HTML5 compatible
- Metadata support
- Best for: Web players, accessibility
Example:
WEBVTT
00:00:00.000 --> 00:00:05.200
<v Speaker 1>Hello everyone
00:00:05.200 --> 00:00:12.000
<v Speaker 2>Thanks for having me
- Click "Export" button (📥)
- Select format from dropdown
- Choose save location
- File is generated and downloaded
| Shortcut | Action |
|---|---|
Ctrl/Cmd + O |
Open video file |
Ctrl/Cmd + S |
Save transcript |
Ctrl/Cmd + E |
Export transcript |
Ctrl/Cmd + F |
Focus search box |
Ctrl/Cmd + Z |
Undo edit |
Ctrl/Cmd + Shift + Z |
Redo edit |
Escape |
Clear search/filters |
| Shortcut | Action |
|---|---|
Space |
Play/Pause |
→ or L |
Skip forward 5s |
← or J |
Skip backward 5s |
↑ |
Volume up |
↓ |
Volume down |
M |
Mute/Unmute |
F |
Toggle fullscreen |
0-9 |
Jump to 0%-90% |
| Shortcut | Action |
|---|---|
↑/↓ |
Navigate entries |
Enter |
Play entry timestamp |
Double-Click |
Edit entry (if enabled) |
Ctrl/Cmd + Click |
Multi-select |
Statistics Displayed:
- Total speaking time per speaker
- Percentage of total conversation
- Number of segments
- Average segment duration
Visual Indicators:
- Progress bars for speaking time
- Color-coded speakers
- Segment count badges
- Recently processed files
- Quick reload previous transcripts
- Automatic save on process
- Indexed for fast search
Virtual Scrolling:
- Handles 10,000+ entries smoothly
- Only renders visible entries
- Smooth 60fps scrolling
Progressive Loading:
- Entries load as transcription completes
- No waiting for full completion
- Real-time updates
Solution:
- Verify API key is correct
- Check key has Gemini API access enabled
- Regenerate key from Google AI Studio
- Ensure billing is enabled on Google Cloud
Possible Causes:
- File too large (>2GB)
- Unsupported format
- Poor audio quality
- API quota exceeded
Solutions:
- Compress video/audio file
- Convert to supported format (MP4, WebM)
- Improve audio quality
- Check API quota limits
Solutions:
- Update Electron app to latest version
- Check video codec compatibility
- Convert video to WebM format
- Verify file isn't corrupted
Solutions:
- Close unnecessary background apps
- Process smaller files
- Enable hardware acceleration
- Increase available RAM
Solutions:
- Check disk space
- Verify write permissions
- Choose different save location
- Check file name validity
Support Channels:
- GitHub Issues: Report bugs
- Documentation: Check implementation guides in
docs/implementation/ - Community: (Future) Discord/Slack channels
Before Reporting:
- Check this manual
- Review error messages
- Check console logs (
Ctrl+Shift+Iin app) - Note steps to reproduce
- Include system info (OS, version)
Minimum System Requirements:
- OS: Windows 10+, macOS 10.13+, Ubuntu 18.04+
- RAM: 4GB (8GB recommended)
- Disk: 500MB for app + space for videos
- Internet: Required for transcription API calls
Recommended:
- RAM: 8GB+ for large files
- CPU: Multi-core processor for faster processing
- SSD: For better video loading performance
- Bandwidth: Stable connection for API calls
Data Handling:
- Local Processing: Videos stay on your device
- API Transmission: Only audio data sent to Google
- No Storage: Google doesn't store your audio
- Encryption: HTTPS for all API calls
API Key Security:
- Keys stored in localStorage (encrypted)
- Never transmitted except to Google
- Rotatable at any time
- Access codes don't expose developer keys
Best Practices:
- Don't share API keys
- Rotate keys periodically
- Use access codes for teams
- Review cost usage regularly
Current Version: 1.0.0
Recent Features:
- ✨ AI-powered speaker name detection
- 📊 Monthly billing breakdown
- 💰 Real-time cost tracking
- 🎨 Enhanced UI with speaker colors
- ⌨️ Keyboard shortcuts
- 📝 Inline transcript editing
- 🔍 Advanced search and filters
Coming Soon:
- Cloud sync for transcripts
- Team collaboration features
- Custom vocabulary/terminology
- Batch processing multiple files
- AI summarization
- Translation support
Q: How accurate is the transcription? A: Accuracy depends on audio quality. Gemini 2.5 Flash provides 85-95% accuracy for clear audio with minimal background noise.
Q: Can I edit the transcript? A: Yes! Enable edit mode and double-click any entry to modify text or timestamps.
Q: What languages are supported? A: Currently supports English. Additional languages coming soon via Gemini multilingual models.
Q: How much does transcription cost? A: Cost varies by audio length. Typical 1-hour meeting: ~$0.05-$0.15. Check Cost Summary for exact usage.
Q: Can I process multiple files at once? A: Not currently. Batch processing is planned for a future release.
Q: Is my data private? A: Yes. Videos never leave your device. Only audio is temporarily sent to Google's Gemini API (not stored).
Q: Can I use this offline? A: No. Internet connection required for AI transcription. Local playback and editing work offline.
Q: How long does transcription take? A: Typically 1-3 minutes per hour of audio, depending on network speed and API response time.
License: MIT License Developer: Keven W. Markham AI Model: Google Gemini 2.5 Flash Framework: Electron + React + TypeScript UI Library: shadcn/ui + Tailwind CSS
Third-Party Licenses:
- React (MIT)
- Electron (MIT)
- Google Generative AI SDK (Apache 2.0)
- Lucide Icons (ISC)
GitHub: transcript-parser
Issues: Report bugs or request features
Email: (Contact info)
Documentation: See docs/ folder for technical guides
Last Updated: December 19, 2024 • Version 1.0.0