Lip Reader: End-to-End Visual Speech Recognition Model

Authors

Prithvi Yadav

Introduction

This is the repository of End-to-End Visual Speech Recognition Model which is the successor of End-to-End Audio-Visual Speech Recognition with Conformers. By using this repository, you can achieve the performance of 19.1%, 1.0% and 0.9% WER for automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR) on LRS3.

Preparation

Clone the repository and enter it locally:

git clone https://github.com/prithviyadav/LipRead.git
cd LipRead

Setup the environment.

conda create -y -n autoavsr python=3.8
conda activate autoavsr

Install pytorch, torchvision, and torchaudio by following instructions here, and install all packages:

pip install -r requirements.txt
conda install -c conda-forge ffmpeg

Download and extract a pre-trained model and/or language model from model zoo to:

./benchmarks/${dataset}/models
./benchmarks/${dataset}/language_models

[For VSR and AV-ASR] Install RetinaFace or MediaPipe tracker.

Benchmark evaluation

python eval.py config_filename=[config_filename] \
               labels_filename=[labels_filename] \
               data_dir=[data_dir] \
               landmarks_dir=[landmarks_dir]

[config_filename] is the model configuration path, located in ./configs.
[labels_filename] is the labels path, located in ${lipreading_root}/benchmarks/${dataset}/labels.
[data_dir] and [landmarks_dir] are the directories for original dataset and corresponding landmarks.
gpu_idx=-1 can be added to switch from cuda:0 to cpu.

Speech prediction

python infer.py config_filename=[config_filename] data_filename=[data_filename]

data_filename is the path to the audio/video file.
detector=mediapipe can be added to switch from RetinaFace to MediaPipe tracker.

Mouth ROIs cropping

python crop_mouth.py data_filename=[data_filename] dst_filename=[dst_filename]

dst_filename is the path where the cropped mouth will be saved.

Overview

We support a number of datasets for speech recognition:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
benchmarks		benchmarks
configs		configs
espnet		espnet
hydra_configs		hydra_configs
pipelines		pipelines
tools		tools
.gitignore		.gitignore
README.md		README.md
crop_mouth.py		crop_mouth.py
eval.py		eval.py
infer.py		infer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lip Reader: End-to-End Visual Speech Recognition Model

Authors

Introduction

Preparation

Benchmark evaluation

Speech prediction

Mouth ROIs cropping

Overview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lip Reader: End-to-End Visual Speech Recognition Model

Authors

Introduction

Preparation

Benchmark evaluation

Speech prediction

Mouth ROIs cropping

Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages