Visual Lip Reading Dataset in Turkish

Ali Berkol; Talya Tümer-Sivri; Melike Colak; Nergis Pervan-Akman; Hamit Erdem

doi:10.20944/preprints202212.0118.v1

Submitted:

07 December 2022

Posted:

07 December 2022

You are already at the latest version

Abstract

The promised dataset was obtained from the daily Turkish words and phrases pronounced by various people in the videos posted on YouTube. The purpose of collecting the dataset is to provide detection of the spoken word by recognizing patterns or classifying lip movements with supervised, unsupervised, semi-supervised learning and machine learning algorithms. Most of the datasets related with lip reading consist of people recorded on camera with fixed backgrounds and the same conditions, but the dataset presented here consists of images compatible with machine learning models developed for real-life challenges. It contains a total of 2335 instances taken from TV series, movies, vlogs, and song clips on YouTube. The images in the dataset vary due to factors such as the way people say words, accent, speaking rate, gender and age. Furthermore, the instances in the dataset consist of videos with different angles, shadows, resolution, and brightness that are not created manually. The most important feature of our lip reading dataset is that we contribute to the non-synthetic Turkish dataset pool, which does not have wide dataset varieties. Machine learning studies can be carried out in many areas, such as the defense industry and social life, with this dataset.

Keywords:

Lip reading

;

Visual speech recognition

;

Turkish dataset

;

Face parts detection

Subject:

Computer Science and Mathematics - Information Systems

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Visual Lip Reading Dataset in Turkish

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe