Article
Version 1
Preserved in Portico This version is not peer-reviewed
Lips Reading Using 3D Convolution and LSTM
Version 1
: Received: 12 December 2023 / Approved: 13 December 2023 / Online: 13 December 2023 (03:24:11 CET)
How to cite: Inamdar, R.; sundarr, K.; Khandelwal, D.; KB, A. Lips Reading Using 3D Convolution and LSTM. Preprints 2023, 2023120928. https://doi.org/10.20944/preprints202312.0928.v1 Inamdar, R.; sundarr, K.; Khandelwal, D.; KB, A. Lips Reading Using 3D Convolution and LSTM. Preprints 2023, 2023120928. https://doi.org/10.20944/preprints202312.0928.v1
Abstract
This paper introduces an innovative approach to lipreading, leveraging a web application designed to generate subtitles for videos where the speaker's mouth is visible and a comprehensive literature review that precedes the discussion, encompassing a thorough examination of various lipreading methods employed over the past decade. Our method employs a powerful deep learning model, featuring a 3D-convolution network and bidirectional LSTM, enabling accurate sentence-level predictions based solely on visual lip movements. With an impressive accuracy of 97%, our model is trained using pre-segmented lips regions, transformed into animated GIFs for effective pre-training. This work stands as a significant contribution to the evolving landscape of lipreading research, offering a practical and accurate solution for real-world applications.
Keywords
deep learning; computer vision; 3D convolution; lstm; lip reading
Subject
Computer Science and Mathematics, Computer Vision and Graphics
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment