Skip to content

prithviyadav/LipRead

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lip Reader: End-to-End Visual Speech Recognition Model

Authors

Prithvi Yadav

Introduction

This is the repository of End-to-End Visual Speech Recognition Model which is the successor of End-to-End Audio-Visual Speech Recognition with Conformers. By using this repository, you can achieve the performance of 19.1%, 1.0% and 0.9% WER for automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR) on LRS3.

Preparation

  1. Clone the repository and enter it locally:
git clone https://github.com/prithviyadav/LipRead.git
cd LipRead
  1. Setup the environment.
conda create -y -n autoavsr python=3.8
conda activate autoavsr
  1. Install pytorch, torchvision, and torchaudio by following instructions here, and install all packages:
pip install -r requirements.txt
conda install -c conda-forge ffmpeg
  1. Download and extract a pre-trained model and/or language model from model zoo to:
  • ./benchmarks/${dataset}/models

  • ./benchmarks/${dataset}/language_models

  1. [For VSR and AV-ASR] Install RetinaFace or MediaPipe tracker.

Benchmark evaluation

python eval.py config_filename=[config_filename] \
               labels_filename=[labels_filename] \
               data_dir=[data_dir] \
               landmarks_dir=[landmarks_dir]
  • [config_filename] is the model configuration path, located in ./configs.

  • [labels_filename] is the labels path, located in ${lipreading_root}/benchmarks/${dataset}/labels.

  • [data_dir] and [landmarks_dir] are the directories for original dataset and corresponding landmarks.

  • gpu_idx=-1 can be added to switch from cuda:0 to cpu.

Speech prediction

python infer.py config_filename=[config_filename] data_filename=[data_filename]
  • data_filename is the path to the audio/video file.

  • detector=mediapipe can be added to switch from RetinaFace to MediaPipe tracker.

Mouth ROIs cropping

python crop_mouth.py data_filename=[data_filename] dst_filename=[dst_filename]
  • dst_filename is the path where the cropped mouth will be saved.

Overview

We support a number of datasets for speech recognition:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages