Skip to content

balusu-bhanu-prakash/Video-processing-SLR

Repository files navigation

Video-processing-SLR

A comparative study of Graph Convolutional Networks with traditional Neural Networks for Multimodal Data(Video data).

Our motivation for utilizing Graph Convolutional Networks in the analysis of the WLASL and LSA64 datasets is grounded in their exceptional ability to capture both spatial and temporal relationships within sign language data. Their versatility in handling diverse data types, capacity to process information from multiple video frames, and adaptability across various domains make GCNs a compelling choice for enhancing sign language recognition systems. These networks hold significant promise in enhancing accuracy and robustness, especially when dealing with intricate and dynamic sign language gestures.

Furthermore, by complementing GCNs with Recurrent Neural Networks (RNNs) for modeling temporal sequences and Convolutional Neural Networks (CNNs) for extracting spatial features, we can gain a deeper understanding of both dynamic and static elements within sign language, contributing to more precise and reliable recognition systems.

In summary, the fusion of GCNs, RNNs, and CNNs presents a promising path for advancing sign language recognition. This approach not only has the potential to benefit the WLASL and LSA64 datasets but also holds wide-reaching applications in the field of sign language communication and accessibility.

Guide to our project

For datasets refer the below websites

https://github.com/dxli94/WLASL

https://facundoq.github.io/datasets/lsa64/

The mentioned models GCN with keypoints and frames are both in the same file GCN with manually selected keypoints. All the model names are self explanatory.

About

A comparative study of Graph Convolutional Networks with traditional Neural Networks for Multimodal Data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published