A comparative study of Graph Convolutional Networks with traditional Neural Networks for Multimodal Data(Video data).
Our motivation for utilizing Graph Convolutional Networks in the analysis of the WLASL and LSA64 datasets is grounded in their exceptional ability to capture both spatial and temporal relationships within sign language data. Their versatility in handling diverse data types, capacity to process information from multiple video frames, and adaptability across various domains make GCNs a compelling choice for enhancing sign language recognition systems. These networks hold significant promise in enhancing accuracy and robustness, especially when dealing with intricate and dynamic sign language gestures.
Furthermore, by complementing GCNs with Recurrent Neural Networks (RNNs) for modeling temporal sequences and Convolutional Neural Networks (CNNs) for extracting spatial features, we can gain a deeper understanding of both dynamic and static elements within sign language, contributing to more precise and reliable recognition systems.
In summary, the fusion of GCNs, RNNs, and CNNs presents a promising path for advancing sign language recognition. This approach not only has the potential to benefit the WLASL and LSA64 datasets but also holds wide-reaching applications in the field of sign language communication and accessibility.
For datasets refer the below websites
https://github.com/dxli94/WLASL
https://facundoq.github.io/datasets/lsa64/
The mentioned models GCN with keypoints and frames are both in the same file GCN with manually selected keypoints. All the model names are self explanatory.