APPEARANCE FEATURE EXTRACTION VERSUS IMAGE TRANSFORM-BASED .

3y ago

24 Views

2 Downloads

752.43 KB

22 Pages

Last View : 16d ago

Last Download : 3m ago

Upload by : Dahlia Ryals

Report this link

Download PDF

Transcription

August 12, 2006 18:30 WSPC/157-IJCIA00180International Journal of Computational Intelligence and ApplicationsVol. 6, No. 1 (2006) 101–122c Imperial College Press APPEARANCE FEATURE EXTRACTION VERSUS IMAGETRANSFORM-BASED APPROACH FOR VISUALSPEECH RECOGNITIONALAA SAGHEERDepartment of Intelligent Systems, Kyushu University6-1, Kasuga-Koen, Kasuga, Fukuoka 816-8580, Japanalaa@limu.is.kyushu-u.ac.jpNAOYUKI TSURUTADepartment of Electronics Engineering and Computer ScienceFukuoka University, 8-9-1, NanakumaJonan-ku, Fukuoka 814-0180, Japantsuruta@tl.media.fukuoka-u.ac.jpRIN-ICHIRO TANIGUCHIDepartment of Intelligent Systems, Kyushu University6-1, Kasuga-Koen, Kasuga, Fukuoka 816-8580, Japanrin@limu.is.kyushu-u.ac.jpSAKASHI MAEDADepartment of Electronics Engineering and Computer ScienceFukuoka University, 8-9-1, NanakumaJonan-ku, Fukuoka 814-0180, Japanmaeda@tl.media.fukuoka-u.ac.jpReceived 10 September 2005Revised 15 February 2006In this paper we propose a new appearance based system which consists of twostages: visual speech feature extraction and classiﬁcation, followed by recognition of theextracted feature, thereby the result is a complete lip-reading system. This lip-readingsystem employs our Hyper Column Model (HCM) approach to extract and classify thevisual features and uses the Hidden Markov Model (HMM) for recognition. This paperaddresses mainly the ﬁrst stage; i.e. feature extraction and classiﬁcation. We investigate the HCM performance to achieve feature extraction and classiﬁcation and thencompare the performance when replacing HCM with Fast Discrete Cosine Transform(FDCT). Unlike FDCT, HCM could extract the entire features without any loss. Alsothe experiments have shown that HCM is generally better than FDCT and provides agood distribution of the phonemes in the feature space for recognition purposes. Forfair comparison, two databases are exploited with three diﬀerent sets of resolution foreach database. One of these two databases is designed to include shifted and scaledobjects. Experiments reveal that HCM is capable of recovering and dealing with suchimage restrictions whereas the eﬀectiveness of FDCT drops drastically especially for newsubjects.Keywords: Visual speech recognition; feature extraction; self organizing map; hypercolumn model; discrete cosine transform.101

August 12, 2006 18:30 WSPC/157-IJCIA10200180A. Sagheer et al.1. IntroductionRecently, visual speech recognition (or automatic lip-reading) systems ﬁnd their wayto many application areas such as speaker veriﬁcation, multimedia telephony forhearing impaired and interaction with terminals and machines for the handicappedand also the elderly in home health care systems. In principal, the visual speechrecognition problem is comprised to two stages: (1) Visual speech feature extractionand classiﬁcation and (2) Visual speech feature recognition. In other words, thepattern (word or sentence) to be recognized is ﬁrst converted to some features,believed to carry the class identity of the pattern, and then the set of features isclassiﬁed as one of the possible classes. Although signiﬁcant advances have beenmade in visual speech recognition technology, it is still a diﬃcult problem to designa speech recognition system that can generalize well without loss of features andimage/subject restrictions.1,2 In our opinion, this is due to the large appearancevariability during lip movements. In addition, diﬀerences between the appearenceof the subjects, lip size, face features, and in illumination conditions cause extradiﬃculty.3This paper is concerned with the ﬁrst task; feature extraction and classiﬁcation.Diﬀerent approaches have been reported in the literature to able to perform thistask. These approaches can be broadly classiﬁed into three main categories:1. Geometric-feature based.2. Image transform based.3. Appearance based.Geometric-feature based approach obtains information from geometric featuresof the lip such as its height or width or color or shape or all of them.4,5 In the imagetransform based approach, the original gray level image containing the lip is transformed to a space of features by some image transform technique.6,7 Appearancebased approach learns the decision boundary among diﬀerent articulations fromtraining data without any extraction of geometric features. In this approach, features depend on the intensity values of image pixels that include lip.8 The approachpresented for extraction of the visual features in this paper falls into the thirdcategory.Due to data reduction involved in the ﬁrst and second categories, considerableamount of information related to features is lost, which may aﬀect the recognitionaccuracy and results in relatively poor performance.9 In contrast, the last categoryuses the entire available information about the object; as will be explained shortly,and so results in better recognition accuracy. Another advantage of this approachis that important features can be represented in a low-dimensional space and canoften be made invariant to image transforms like translation, scaling, rotation andlighting whereas the second approach fails.7,10 The only disadvantage of the thirdcategory is that it needs a large amount of training data to learn the system so thatit can extract faithfully the features from an arbitrary input data. Much eﬀort has

August 12, 2006 18:30 WSPC/157-IJCIA00180Appearance Feature Extraction Versus Image Transform-Based Approach103been put in to propose a lip reading system by combining any two or all the abovecategories to trade oﬀ the disadvantages of each individual approach.9It follows in general, from a variety of contributions reported in the literaturethat the performance of appearance based approach is better than that of geometricbased approach.9,11–13 Essential target of this paper is to show that the performanceof the appearance based approach is also better than that of the image transformbased approach. Additionally, during the development of our system, we focus onanother four issues. They are as follow:1. What is the appropriate set of visual units (or features) around the mouth forrepresenting the visual information?2. The system should extract the entire of features without reduction.3. Holding a parametric feature space with low dimensionality such that thedistribution of each phoneme should be simple and approximated by normaldistributions.4. How is the generalization of the system and what about its performance if thesubject is shifted or scaled?We believe that these four issues represent fundamental requirements for anyvisual speech recognition system. The system proposed in this paper try to satisfythese requirements. To do the evaluation of our system away with any bias, we conducted several experiments replacing HCM17 with two diﬀerent feature extractorapproaches; Self Organizing Map18 (SOM) and Fast Discrete Cosine Transform19(FDCT). In separate experiments we combined Hidden Markov Model20 (HMM)as a feature recognizer with each one from the three approaches. All the experiments of each combination are conducted under same conditions and using samedatabases.1.1. Related worksImpaired and deaf people who are void of the hearing ability can understand speechby merely reading the speaker’s lips without any acoustic information. Motivatedby this ability of the impaired and the deaf people, the problem of automatic lipreading was studied and a lot of work has been established in this ﬁeld. Recentlywith the development of computers there has been much research in trying to enablecomputers to perform the components of lip-reading using several approaches.Luettin11,12 used HMM based active shape models to extract active speech featuresset that includes derivative information and compared its performance with that ofa static feature set. Matthews13,14 compared three image transform based methodswith active appearance model (AAM) to extract features from lip image sequencesfor recognition using HMM. He utilized DCT, wavelet transform (WT) and principal component analysis (PCA) as image transform based methods. Heckmann7investigated diﬀerent tactics to choose the coeﬃcients of DCT to enhance feature extraction. Using asymmetrically boosted HMM, Yin et al.15 developed an

August 12, 2006 18:30 WSPC/157-IJCIA10400180A. Sagheer et al.automatic visual speech feature extraction to deal with their own ill-posed multiclass sample distribution problem. Guitarte et al.16 compared the Active ShapeModel (ASM) and DCT for feature extraction task in an embedded implementation. Hazen8 investigated several visual model structures, each of which providesa diﬀerent means for deﬁning the units of the visual classiﬁer and the synchronyconstraints between the audio and visual stream.The arrangement of the rest of this paper is as follow: the two databasesemployed in our experiments are described in Sec. 2. Section 3 gives an overview toSOM. HCM will be elaborated in Sec. 4. Feature recognition by HMM is describedin Sec. 5. In Sec. 6, an overview of FDCT and its recognition results have been provided. Experimental results and comparison among the three systems will be presented in Sec. 7 with results analyses. Discussion of the paper’s results is given inSec. 8. Future work and the paper conclusion are given in Sec. 9.2. DatabaseOne of the most challenging problems in visual speech recognition domain is tocope with the large variation across speakers and individual appearance and features where sizes of lip vary greatly across diﬀerent speakers. To accommodatethis challenge, we designed our database according to speaker-independent-basedrule using diﬀerent speakers during training and testing phases. This rule enablesus to investigate how well the proposed system generalizes to new speakers. Ourdatabase consists of two diﬀerent sets concerning two diﬀerent languages; Japaneseand Arabic.2.1. Sentences databaseBoth databases include nine sentences, each sentence consisting of two words inJapanese set and three words in Arabic set but one. Table 1 lists the Japanese andArabic sentences along with their respective English meanings. Each of the nineTable 1.Sentence database, Japanese (left) and Arabic (right).Japanese Sentence123456789-ATAMA ITAISENAKA ITAIONAKA SUITAMUNE ITAITEACHI ITAIATAMA OMOIONAKA ITAIMUNE KURUSHITEACHI SHIBIRERUEnglish Meaning-A headache in headA pain in backFeel hungryA pain in chestA pain in limbsHeavy headA pain in stomachDiﬃcult breathSpasm in hand and legEnglish Meaning123456789-A pain in my teethA headache in headA swelling in my backA pain in my gumThe Arabic SalutationA swelling in my legA pain in my backA swelling in my toothA pain in my headArabic Sentence

August 12, 2006 18:30 WSPC/157-IJCIA00180Appearance Feature Extraction Versus Image Transform-Based Approach105subjects (male and female) uttered all sentences one time without repeating. Inorder not to miss any part of the uttered sentence, the subject was requested tobegin and end each sentence with silence. Each Arabic sentence consists of threewords represented by 80 visual frames whereas the Japanese one includes two wordsin 70 frames.2.2. Image databaseThe Japanese database includes 5670 gray scale images subdivided into a traininggroup and a test group. The training group consists of 3780 images for 6 diﬀerentsubjects. The test group has 1890 images for 3 Japanese subjects entirely diﬀerentfrom those belongs to the training group. Similarly, the Arabic database includes6480 gray scale images, of which 4320 images are reserved for the training phaseand the remaining images are used for the test phase.Images of both databases were captured in the Laboratory of Spoken Languageand Image Processing, Fukuoka University, Japan, using an EVI-G20 Sony camera.However, although the capturing process was performed in a natural environmentwithout using special lighting eﬀects or lip markers or coloring, there are somediﬀerences between the two sets.1. Position restriction: In the Japanese set, the subject was restricted to centralizehis/her mouth as he/she can, as shown in Fig. 1(a). In contrast, the subject inArabic set is free to shift his/her mouth or scale his/her face from the camera.In other words, the Arabic subject need not to center or put his/her mouth in aspeciﬁc position in front of the camera. The only restriction was that the user’lip should lie inside the frame not outside. Figure 1(b) shows samples for threediﬀerent subjects and it is easy to remark the shifted and scaled object in eachsample.2. Background : In Japanese data set, the background was simple or plain whilethe Arabic set uses a complex or natural environment as shown in Figs. 1(a)and 1(b).In order to obtain meaningful experimental results we conduct the experimentsusing 3 diﬀerent resolution sets for each database. Deﬁnitely, we use the originalsize and another two sizes after cropping the region of interest (ROI) in the originalimage. These three sizes can be detailed as follow:1. Image set 1: The resolution of each image is 160 120 pixels without any croppingto the mouth area or the background as shown in Figs. 1(a) and (b) for bothdatabases.2. Image set 2: The resolution of each image is 140 140 pixels to include theROI only such that the rest of the image pixels, or around ROI, are white; seeFig. 1(c).

August 12, 2006 18:30 WSPC/157-IJCIA10600180A. Sagheer et al.(a)(b)(c)(d)Fig. 1. Snapshots of diﬀerent subjects for each database: (a) Japanese subjects with plain background (b) Arabic subjects are shifted and scaled with a complex background (c) 140 140 imageset 1 (d) 128 128 image set 2.3. Image set 3: The resolution of each image is 128 128 pixels to include the ROIonly such that the rest of the image pixels, region of interest (ROI), are gray;see Fig. 1(d).The reason why we chose the latter two resolution sets is to be able to implementfast DCT; more details will be provided in Sec. 6. Also the reason that we use twocolors (white a

gate the HCM performance to achieve feature extraction and classiﬁcation and then compare the performance when replacing HCM with Fast Discrete Cosine Transform (FDCT). Unlike FDCT, HCM could extract the entire features without any loss. Also the experiments have shown that HCM is generally better than FDCT and provides a

Related Documents:

Review of Extraction Techniques

Advance Extraction Techniques - Microwave assisted Extraction (MAE), Ultra sonication assisted Extraction (UAE), Supercritical Fluid Extraction (SFE), Soxhlet Extraction, Soxtec Extraction, Pressurized Fluid Extraction (PFE) or Accelerated Solvent Extraction (ASE), Shake Flask Extraction and Matrix Solid Phase Dispersion (MSPD) [4]. 2.

33 Views

1y ago

ENVI DEM Extraction Module User's Guide - L3Harris Geospatial

Licensing the ENVI DEM Extraction Module DEM Extraction User's Guide Licensing the ENVI DEM Extraction Module The DEM Extraction Module is automatically installed when you install ENVI. However, to use the DEM Extraction Module, your ENVI licen se must include a feature that allows access to this module. If you do not have an ENVI license .

18 Views

11m ago

Practice Workbook Answers

L2: x 0, image of L3: y 2, image of L4: y 3, image of L5: y x, image of L6: y x 1 b. image of L1: x 0, image of L2: x 0, image of L3: (0, 2), image of L4: (0, 3), image of L5: x 0, image of L6: x 0 c. image of L1– 6: y x 4. a. Q1 3, 1R b. ( 10, 0) c. (8, 6) 5. a x y b] a 21 50 ba x b a 2 1 b 4 2 O 46 2 4 2 2 4 y x A 1X2 A 1X1 A 1X 3 X1 X2 X3

66 Views

2y ago

Comparative Analysis of Feature Extraction and Pixel-based ...

Feature extraction classification is distinguished from the pixel-based classification method because, instead of direct pixels, it operates on the pixel group. Feature extraction classification has two steps: (1) segmentation of images for producing segmented images, and (2) segmented image classification [4]. The essential and crucial step in .

7 Views

11m ago

ENVI SARscape and SAR analytics in ArcPro

The ENVI DEM Extraction Module is used to quickly and easily create spatially accurate DEMs from geospatial imagery. DEM DEM Extraction Module Find and extract specific objects of interest from all types of imagery with the ENVI Feature Extraction Module (ENVI FX). FX Feature Extraction

38 Views

2y ago

ENVI DEM Extraction Module User's Guide - NASA

Licensing the ENVI DEM Extraction Module The ENVI DEM Extraction Module is automatically installed when you install ENVI 4.3. However, to use the DEM Extraction Module, your ENVI license must include a feature that allows access to this module. If you do not have an ENVI license that includes this feature, contact ITT Visual Information .

7 Views

11m ago

Unsupervised Feature Selection for Multi-Cluster Data

5 10 feature a feature b (a) plane a b 0 5 10 0 5 10 feature a feature c (b) plane a c 0 5 10 0 5 10 feature b feature c (c) plane b c Figure 1: A failed example for binary clusters/classes feature selection methods. (a)-(c) show the projections of the data on the plane of two joint features, respectively. Without the label .

29 Views

2y ago

Solutions to APPLIED ENGLISH GRAMMAR AND COMPOSITION

APPLIED ENGLISH GRAMMAR AND COMPOSITION [For Classes IX & X] English (Communicative) & English (Language and Literature) By Dr Madan Mohan Sharma M.A., Ph.D. Former Head, Department of English University College, Rohtak New Saraswati House (India) Pvt. Ltd. Second Floor, MGM Tower, 19 Ansari Road, Daryaganj, New Delhi-110002 (India) Ph: 91-11-43556600 Fax: 91-11-43556688 E-mail: delhi .

202 Views

3y ago

Recent Views

Novell SUSE Linux Package Description and Support Level .

aspell-eo An Esperanto Dictionary for Aspell L2 aspell-es A Spanish Dictionary for ASpell L2 aspell-et An Estonian dictionary for aspell L2 aspell-fa A Persian dictionary for aspell L2 aspell-fi Finnish Dictionary Package L2 aspell-fo A Faroese Dictionary for ASpell L2 aspell-fr A French Dictionary for ASpell L2 aspell-ga An Irish Dictionary .

2y ago

348 Views

Dictionary of Aviation - THE AIRLINE PILOTS

Dictionary of Accounting 0 7475 6991 6 . Dictionary of Computing 0 7475 6622 4 Dictionary of Economics 0 7136 8203 5 Dictionary of Environment and Ecology 0 7475 7201 1 Dictionary of Food Science and Nutrition 0 7136 7784 8 Dictionary of Human Resources and Personnel Management 0 7136 8142 X

2y ago

162 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Oxford and the Dictionary - Oxford English Dictionary

What makes an Oxford Dictionary? People find dictionary-making fascinating. The 250th anniversary of Samuel Johnson’s Dictionary in 2005 was widely celebrated, and the recent BBC television series Balderdash and Piffle had a huge response to its call to viewers to help track down elusive word and phrase or

2y ago

210 Views

Cambridge Essential English Dictionary

These Dictionary Guide Worksheets are downloadable versions of the Guide to the Dictionary presented in the Cambridge Essential English Dictionary, Second Edition. The Guide is designed to help you develop skills in using a dictionary. The worksheets are grouped as five separate units, whi

2y ago

516 Views

The Interactive Arabic Dictionary: Another Collaboratively .

the Interactive Arabic Dictionary” [11], and “Conceptual Design of the Interactive Arabic Dictionary” [12], were the main studies used in HIAST to implement the Interactive dictionary. 2.1. Objectives IAD is a Monolingual dictionary (Arabic-Arabic), targeted to

2y ago

333 Views

Dictionary-guided Scene Text Recognition

A dictionary is an explicit language model, and the ben-eﬁts of a dictionary for scene text recognition are well es-tablished. In most previous works, a dictionary was used to ensure that the output sequence of characters is a legit-imate word from the dictionary, and it improved the accu-r

2y ago

313 Views

Going Online with a German Collocations Dictionary - unibas.ch

dictionary articles on two levels: a minimalistic view for the search and navigation stage and a more detailed view once a collocation is found. Keywords: online dictionary, collocations, dictionary design, learners' dictionary, German language . 1. Introduction Many dictionaries are available on the Web today. However, as yet there areno well-

7m ago

66 Views

A Fault Dictionary-Based Fault Diagnosis Approach for CMOS Analog .

Step 5: Fault dictionary construction: The fault dictionary is a collection of potential faulty and fault-free responses. The signatures obtained will be stored in the dictionary. This dictionary involves for each fault a correspondence between the faulty circuit responses and the defect sites.

4m ago

56 Views

On Entries for Neologisms in English-Chinese Learner's Dictionaries

A New English Chinese Dictionary of Journalism (2007) by Hu Zhiyong, An English -Chinese Dictionary of Neologisms (2009) by Li Mingyi, English-Chinese Neologism Dictionary (2013) by Wu Xuemei, A Dictionary of New Chinese Phrases in English (2015) by China Daily and A Chinese-English Dictionary of New Words and Expressions (2015) by Wu .

4m ago

63 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Ross E. Davies, George Mason University School of Law

Jan 15, 2012 · 4. Bryan A. Garner, Preface to the First Pocket Edition of BLACK‘S LAW DICTIONARY, reprinted in BLACK‘S LAW DICTIONARY vii (3d Pocket ed. 2006). Garner is the current editor-in-chief of Black‟s Law Dictionary and (even more surely than was Black in his own time) the most influential contemporary scho-lar of American legal language. 5.

2y ago

297 Views

APPEARANCE FEATURE EXTRACTION VERSUS IMAGE TRANSFORM-BASED .

It looks like you're using an ad-blocker