A highly efficient voice-to-text tool based on the ChatGPT Web API, specifically designed for Linux (GNOME) environments. It enables a seamless "Record -> Transcribe -> Beautify -> Copy to Clipboard" workflow with a simple hotkey toggle.
This project uses uv for Python environment and dependency management.
Ensure your system has portaudio (for recording) and wl-clipboard (for clipboard operations on Wayland) installed:
# For Debian/Ubuntu
sudo apt install libportaudio2 libportaudiocpp0 portaudio19-dev wl-clipboardTo call the ChatGPT transcription API, you need to manually capture the Request Headers after logging in:
- Open your browser and log in to ChatGPT.
- Open Developer Tools (F12) and go to the Network tab.
- Record a short voice command on the page to trigger the
transcribeAPI. - Find the
transcriberequest and copy its Request Headers (excluding headers starting with:). - Create a file named
.request-header.txtin the project root. You can use the provided template as a reference, ensuring it includes required fields likeAuthorizationandCookie.
uv syncuv run src/main.py- First Run: Starts recording; a notification will appear.
- Second Run: Stops recording; the system begins transcription and beautification.
- Completion: The result is automatically saved to
output/output.txtand copied to your clipboard.
It is recommended to bind uv run /path/to/Audio-To-Text/src/main.py to a system shortcut (e.g., Ctrl + Alt + T) for the smootmost experience.
src/main.py: Main script for toggling recording states and orchestration.src/transcribe.py: Handles communication with the ChatGPT API.src/beautify.py: Logic for text beautification and punctuation correction.src/record.py: Underlying recording implementation.output/: Directory for temporary audio files and final transcription results.