Skip Navigation Links.
Collapse HomeHome
Expand BackgroundBackground
Expand Image DatasetsImage Datasets
Expand Active ProjectsActive Projects
Expand Future ProjectsFuture Projects
Related Links
About Us
School of Computer Science

Lincoln School of Computer Science
University of Lincoln
Brayford Pool
Web Enquiries Tel + 44 (0)1522 882000
Minicom 01522 886055

An Arabic Speech Recognition

This project investigated the possibility of developing Arabic speech recognition (ASR) for large vocabulary based on phonemes. An ASR recognizer was developed based on an a combination of Hidden Markov Model (HMM) with neural network (NN). During this project a new database "an Arabic speech corpus" was developed.

The output of this project was published in two papers and a database which can be downloaded and used.

The abstracts of these two papers are given below:

The abstract of the first paper is:

This paper describes the creation of new Arabic speech corpus (ASC) for Large Vocabulary Continuous Speech Recognition (LVCSR) technology. It describes the steps of creation and the process of recording the database used for the ASC. The ASC is designed to be comparable to corpora of other natural languages. the corpus contains 4740 utterances from six speakers (three males and three females).there are 620 statements for training and 171 statements for testing and evaluation for each speaker. There are 3622 words, with 27725 triphones, where 5034 of them are unique.

The abstract of the second paper is:

One way to keep up a decent recognition of results- with increasing vocabulary- is the use of base units rather than words. This paper presents a Continuous Speech Large Vocabulary Recognition System-for Arabic, which is based on tri-phones. In order to train and test the system, a dictionary and a 39-dimensional Mel Frequency Cepstrum Coefficient (MFCC) feature vector was computed. The computations involve: Hamming Window, Fourier Transformation, Average Spectral Value (ASV), Logarithm of ASV, Normalized Energy, as well as, the first and second order time derivatives of 13-coefficients. A combination of a Hidden Markov Model and a Neural Network Approach was used in order to model the basic temporal nature of the speech signal. The results obtained by testing the recognizer system with 7841 tri-phones. 13-coefficients indicate accuracy level of 58%. 39-coeefficents indicates 62%. With Cepstrum Mean Normalization, there is an indication of 71%. With these small available data-only 620 sentences-these results are very encouraging.

Contact us

Dr. Bashir Al-Diri (PhD., MSc., BSc., FHEA)
Lincoln School of Computer Science
University of Lincoln
Brayford Pool
Lincoln LN6 7TS
United Kingdom
Email My Webpage Phone: +44 1522 837111
Fax: +44 1522 886974