Facial Shape Tracking Via Spatio-Temporal Cascade Shape .

2y ago

15 Views

3 Downloads

827.64 KB

9 Pages

Last View : 8d ago

Last Download : 3m ago

Upload by : Aarya Seiber

Report this link

Download PDF

Transcription

Facial shape tracking via spatio-temporal cascade shape regressionJing YangJiankang DengKaihua Zhangnuist ngshan Liuqsliu@nuist.edu.cnNanjing University of Information Science and TechnologyNanjing, ChinaAbstractIn this paper, we develop a spatio-temporal cascadeshape regression (STCSR) model for robust facial shapetracking. It is different from previous works in three aspects. Firstly, a multi-view cascade shape regression (MCSR) model is employed to decrease the shape variance inshape regression model construction, which is able to makethe learned regression model more robust to shape variances. Secondly, a time series regression (TSR) model isexplored to enhance the temporal consecutiveness betweenadjacent frames. Finally, a novel re-initialization mechanism is adopted to effectively and accurately locate the facewhen it is misaligned or lost. Extensive experiments on the300 Videos in the Wild (300-VW) demonstrate the superiorperformance of our algorithm.1. IntroductionFace alignment is among the most popular and wellstudied problem in the field of computer vision with a widerange of applications, such as facial attribute analysis [20],face verification [17], [28], and face recognition [31], [38],to name a few. In the past two decades, a lot of algorithmshave been proposed [6], which can be roughly categorizedas either generative or discriminative methods.Generative methods typically optimize the shape parameters iteratively with the purpose of best approximately reconstructing input image by a facial deformable model. Active Shape Models (ASMs) [10] and Active AppearanceModels (AAMs) [13], [9], [21] are typical representativesubject to this category. In the ASMs, a global shape isconstructed by applying the Principal Component Analysis(PCA) method to the aligned training shapes, and then theappearance is modeled locally via discriminatively learnedtemplates. In the AAMs, the shape model has the samepoint distribution as that is in the ASMs, while the globalappearance is modeled by PCA after removing shape variation in canonical coordinate frame. Discriminative methods attempt to infer a face shape through a discriminative regression function by directly mapping textual features to shape. In [12], a cascaded regression method builton pose-index feature has been proposed to pose estimation with excellent performance. Cao et al. [5] combinetwo-level boosted regression, shape indexed features anda correlation-based feature selection method to make theregression more effective and efficient. Xiong et al. [32]concatenate SIFT features of each landmark as the featureand obtain regression matrix via linear regression. In[29],a learning strategy is devised for a cascaded regression approach by considering the structure of the problem.Although these methods have achieved much success forfacial landmark localization, it remains an unsolved problem when applied to facial shape tracking in the real worldvideo due to the challenging factors such as expression, illumination, occlusion, pose, image quality and so on. A successful facial shape tracking includes at least two characteristics. On the one hand, face alignment on images is supposed to perform well. On the other hand, face relationshipbetween the consecutive frames should provide a solid transition. A typical work linking to face relationship betweenthe consecutive frames is multi-view face tracking [8]. [11]demonstrates that a small number of view-based statisticalmodels of appearance can represent the face from a widerange of viewing angles, in which constructed model is suitable to estimate head orientation and to track faces throughwide angle changes. In [23], S. Romdhani et al. adopt anonlinear PCA, i.e., the Kernel PCA [26], which is basedon Support Vector Machines [30] for nonlinear model transformation to track profile-to-profile faces. In [14], an onlinelinear predictor tracker without need for offline learning hasbeen introduced for fast simultaneous modeling and tracking. [2] proposes an incremental parallel cascade linear regression (iPar-CLR) method for face shape tracking, which1 41

automatically tailor itself to the tracked face and becomeperson-specific over time. [34] proposes a Global Supervised Descent Method (GSDM), an extension of SDM [32]by dividing the search space into regions of similar gradientdirections.In this paper, we construct a spatio-temporal cascadeshape regression model for robust facial shape tracking,which aims at transferring spatial domain alignment intotime-sequence alignment. A multi-view regression modelis employed into robust face alignment, which greatly decreases the shape variance from face pose, thereby making the learned regression model more robust to shape variances. Futhermore, a time series regression model is explored to face alignment between the consecutive frames,thereby enhancing the temporal consecutiveness betweenalignment result in the former frame and initialization inthe latter. In addition, a novel re-initialization mechanism isadopted to effectively and accurately locate the face whenthe face is misaligned or lost.In summary, the main contributions are summarized asfollows: (1) We improve the cascade shape regression model by constructing a multi-view cascade shape regression,making the learned regression model more view-specific,and better for generalization and robustness. (2) Our spatiotemporal cascade shape regression model is fully automaticand achieves fast speed for online facial shape tracking evenon a CPU. (3) Extensive experiments on the 300 Videos inthe Wild (300-VW) demonstrate the superior performanceof our algorithm.will be discussed in section 2.2. When the score of thealignment result is larger than threshold, time series regression is performed for facial shape tracking, which will bediscussed in section 2.3. When the score of the alignmentresult is smaller than a threshold, a re-initialization mechanism is adopted to avoid false convergence during facialshape tracking, which will be discussed in section 2.4.Shape initialization from the JDA face detector and thealignment result of the previous frame are under a unifiedframework. On images, JDA is able to provide five faciallandmarks to estimate face pose on images. Meanwhile, weassume that the face shape will not change abruptly betweenthe consecutive frames on videos. So the parameters of similarity transformation and the yaw angle of the -th shape areable to initialize the shape of the 1-th frame. Based onthe face pose, the algorithm selects the view-specific modeland transforms the view-specific mean shape with similaritytransformation parameters.2.2. Multi-view cascade shape regressionThe main idea of the cascade shape regression model isto combine a sequence of regressors in an additive mannerin order to approximate an intricate nonlinear mapping between the initial shape and the ground truth. Specifically,Given a set of images { } 1 and their corresponding.Alinearcascade shape regressionground truth { } 1model [32] is formulated as: ( 1 ) ( , 1 ) 2 ,arg min2. The proposed method2.1. OverviewFigure 1 illustrates the proposed spatio-temporal cascade shape regression (STCSR) model for robust face shapetracking.Figure 1. Overview of STCSR. MCSR denotes multi-view cascadeshape regression. Re-initialization will be discussed in Section 2.4In the first frame, the JDA [7] (Joint detection and alignment) face detector is utilized to initialize the system. Similarity transformation parameters (rotation, translation, andscale) are estimated from the five landmarks and the faceview (left, front, and right) is also predicted by those fivelandmarks. Then a multi-view cascade shape regression isemployed to predict face shape in the current frame, which 1 (1)where is the linear regression matrix, which maps the 1shape-indexed features to the shape update. stands forthe intermediate shape of image , 1, , is the iteration number, Φ is the shape-index feature descriptor, and counts the perturbations. Usually, training data is augmented with multiple initializations for one image, which servesas an effective method for improving the generation capability of training. Inspired by the subspace regression [34] thatsplits the search space into regions of similar gradient directions and obtains better and more efficient convergence.We decrease shape variation by dividing the training datainto three views (right, frontal, and left), then specific-viewmodel is trained within each dataset. We estimate the faceview with five landmarks (left eye center, right eye center,nose tip, left corner of mouth, right corner of mouth).As shown in Figure 2, five facial landmarks indicate theface layout, so we use the locations of five landmarks toestimate the view status byarg min 42 12 2 ,(2)

Figure 3. Box tracking. A visual tracker is employed to predictface location at present. Initial shape is the mean shape.Figure 2. Illustrations of view specific shape initialization.where is the view status. ℝ10 1 is the locations ofthe five facial landmarks. is the regression matrix, whichcan be solved by least square method. In the experiments,we only categorize the face views into the frontal ( 15 15 ), left ( 30 0 ), and right (0 30 ) views, whichcover all of the face poses from the 300-W training dataset1 .The overlaps between the frontal view and the profile viewsare used to make view estimation more robust.The shape variance of each view set is much smaller thanthat of the whole set, and the mean shape of each viewis much closer to the expected result, so the view-specificshape model is not only able to decrease the shape variance,but also it can accelerate the shape convergence.2.3. Time series regressionPerforming face detection on each frame for face alignment is time-consuming. Futhermore it tends to decreasethe alignment accuracy on videos, because the initial meanshape is far from the ground truth shape under large facepose variation. So establishing a correlationship betweenthe consecutive frames is of great importance. In this section, we propose three methods (box tracking, landmarktracking, and pose tracking) to link the consecutive frames.Figure 3 shows the workflow of box tracking. In thismethod, we build a tracker based on face appearance model. Face location ( , , , ℎ) at the current frame is estimated based on the tracker. Then a CSR is performed to predict the landmark locations from the mean shape based onthe shape indexed features. This procedure is repeated untilthe last frame comes. The whole procedure combines theprevious frame and the current frame with the face appearance information, and overlooks the relationship betweentwo consecutive frames’ landmarks. It is obvious that sucha method is extremely time-consuming. Even worse, longtime tracking will cause tracking drift due to tremendousvariation in the object appearance caused by illuminationchanges, partial occlusion, deformation and so on.Figure 4 shows the workflow of landmark tracking. Inthis method, we deliver shape in previous frame directly to1 Facepose of each training image is ace/download.html.bycurrent frame as initial shape. Then MCSR is performedto predict the landmark locations from the alignment result of previous frame. For training CSR method in imagedatasets, the initial set of perturbations (Δ ) are obtainedby Monte-Carlo sampling procedure [32], in that perturbations are randomly drawn within a fixed pre-defined rangearound the groundtruth shape . Direct shape deliver approach cannot guarantee residual between previous shapeand current shape within perturbation and might not converge to final shape due to cumulative error on videos.Figure 4. Landmark tracking. Shape in the previous frame is delivered directly to current frame as initial shape. Initial shape isprevious shape.Figure 5 shows the workflow of pose tracking. In thismethod, we deliver shape similarity transform parameters of previous frame to the current one. Parameters of facerigid changes from the previous shape is employed to adjustthe mean shape, and the adjusted mean shape is taken as initial shape in current frame. MCSR is performed to predictthe landmark locations from the transformed view-specificmean shape. Compared to landmark tracking, the noise ofthe initial shape from the previous frame is smoothed bypose tracking, thus making the facial shape tracking morestable.Figure 5. Pose tracking. Similarity transform parameters of theprevious frame are delivered to the current frame. Initial shape iscalculated based on the above information.2.4. Re-initializationAs has been discussed above, MCSR is exploited to predict landmark location on each frame, while time series re43

Algorithm 1 Facial shape tracking via spatio-temporal cascade shape regressionRequire: the -th image frame in face video1: if 1 then2:detect face location at the current frame( , , , ℎ )ˆ via MCSR3:predict face shape 4: else5:if ( ) 0.7 then6:Pose tracking is employed to predict the faceshape.7:else8:detect face location at current frame9:if non face is detected then10:Adaptive compressive tracker is used to predictface location ( , , , ℎ ).ˆ via MCSR11:predict face shape 12:elseˆ via MCSR13:predict face shape 14:end if15:end if16: end ifˆ at -th image frame.Ensure: face shape gression is employed to create a link between the consecutive frames. Both steps work when previous alignment isreliable to predict the current frame. If the previous alignment tends to drift, which will lead to face misaligned orlost, a novel re-initialization mechanism is adopted to effectively and accurately locate the face. In this work, we introduce the fitting score, which corresponds to the goodnessof alignment. When fitting score is lower than the settedthreshold (0.7), shape re-initialization is performed. For thispurpose, we train an SVM classifier to differentiate betweenthe aligned and misaligned images based on the last shapeindexed features. We generate the positive samples from annotations and then randomly generate samples around theground truth to generate the negative samples. The scorefrom the trained SVM is used as the criteria to judge thegoodness of alignment. In our experiments, confidence offace alignment above 0.7 is seen as a successful landmarklocation. Given a face video, if fitting score from the previous frame alignment is below 0.7, face detector embarkson face detection at the current frame. If non face is detected, adaptive compressive tracker [19] starts to locate theface with the appearance model built on the face appearanceonce alignment confidence is below 0.7.3. ExperimentsWe test our algorithm on two scenarios. One is facealignment on images, which is initialized with the output ofa face detector. Another is face alignment on videos, whichis initialized by the alignment result of the previous frame.3.1. Experimental DataImage datasets. A number of face image datasets [3, 18, 37] with different facial expression, pose, illumination and occlusion variations have been collected for evaluating face alignment algorithms. In [24], AFW [37],LFPW [3], and HELEN [18] are re-annotated2 by the wellestablished landmark configuration of Multi-PIE [16] usingthe semi-supervised methodology [25]. A new wild datasetcalled IBUG is also created by [24], which covers differentvariations like unseen subjects, pose, expression, illumination, background, occlusion, and image quality. IBUG aimsto examine the ability of face alignment methods to handlenaturalistic, unconstrained face images. In this paper, AFW,LFPW, HELEN and IBUG are used to train the multi-viewcascade shape regression model.Video datasets. Even though comprehensive benchmarks exist for localizing facial landmark in static images,very limited effort has been made towards benchmarking facial landmark tracking in videos [27]. 300-VW (300 Videosin the Wild) has collected a large number of long facialvideos recorded in the wild. Each video has duration ofabout 1 minute (at 25-30 fps). All frames have been annotated with regards to the well-established landmark configuration of Multi-PIE [16]. 50 videos3 are provided forvalidation, and 150 facial videos are selected for test. Thisdataset aims at testing the ability of current systems for fitting unseen subjects, independently of variations in pose,expression, illumination, background, occlusion, and image quality. There are three subsets for test with differentdifficulty:Scenario 1: This scenario aims to evaluate algorithmsthat are suitable for facial motion analysis in laboratory andnaturalistic well-lit conditions. There are 50 tested videos ofpeople recorded in well-lit conditions displaying arbitraryexpressions in various head poses but without large occlusions.Scenario 2: This scenario aims to evaluate algorithms that are suitable for facial motion analysis in real-worldhuman-computer interaction applications. There are 50 tested videos of people recorded in unconstrained conditionsdisplaying arbitrary expressions in various head poses butwithout large occlusions.Scenario 3: This scenario aims to assess the performance of facial landmark tracking in arbitrary conditions.The main steps of our facial shape tracking are summarized in Algorithm 1.2 notations/3 300VW44Clips 2015 07 26.zip

There are 50 tested videos of people recorded in completelyunconstrained conditions including the illumination conditions, occlusions, make-up, expression, head pose, etc.3.2. Experimental settingData augmentation. Data augmentation serves as an effective method for improving the generation of training. Weflip all of the training data and augment them with ten ini fromtializations for each image. We first get mean shape all ground truth shapes by Procrustes Analysis [15], then wetrain a linear regression to remove the translation and scaledifference between the initial mean shape and the groundtruth shape by the location of the face rectangle. Finally,the residual distribution between the initial mean shape andthe ground truth shape is utilised to generate other initialshapes of identical distribution. Actually, the expectation ofall of those initial shapes are the mean shape.Shape initialization. Generally, the normalized meanshape is used as the initial shape during face alignment onimages. The scale and the translation parameters of theinitial shape are estimated from the output face rectangleof a face detector. The stability of the face detector is ofgreat importance, because the drift from a face detector hasmore or less effect on the following face alignment. Onvideos, the initialization shape is generated from the alignment result of the previous frame, which makes face alignment more accurate due to the more accurate translation,scale, and face pose (yaw, pitch, roll) information inheritedfrom the previous frame. However, in this paper we unifyface alignment on images and videos by the proposed TSRmodel. Shape initialization is always from the five faciallandmarks, which are utilized to remove rotation, translation and scale differences and select the view-specific models. The only difference is that the five facial landmarks aregenerated from JDA face detector on images and the previous alignment result on videos. We compare these differentshape initialization methods and report the alignment resulton IBUG dataset.Regularization. To avoid overfitting, an additional L2penalty term is added to the original least square objectivefunction to regularize the linear projection. The regularization parameter is set as the number of the training exampleaccording to our experiment.Evaluation metric. Fitting performance is usually assessed by

facial landmark localization, it remains an unsolved prob-lem when applied to facial shape tracking in the real world video due to the challenging factors such as expression, illu-mination, occlusion, pose, image quality and so on. A suc-cessful facial shape tracking includes at least two character-istics.

Related Documents:

Simultaneous Facial Feature Tracking and Facial Expression ...

Simultaneous Facial Feature Tracking and Facial Expression Recognition Yongqiang Li, Yongping Zhao, Shangfei Wang, and Qiang Ji Abstract The tracking and recognition of facial activities from images or videos attracted great attention in computer vision ﬁeld. Facial activities are characterized by three levels: First, in the bottom level,

32 Views

2y ago

A Novel Approach for Face Pattern Identification and ...

simultaneous facial feature tracking and expression recognition and integrating face tracking with video coding. However, in most of these works, the interaction between facial feature tracking and facial expression recognition is one-way, i.e., facial feature tracking results are fed to facial expression recognition. There is

32 Views

2y ago

Facial Action Unit Tracking and Facial Activity ...

recognition, facial feature tracking, simultaneous tracking and recognition. I INTRODUCTION The recovery of facial activities in image sequence is an important and challenging problem. In recent years, plenty of computer vision techniques have been developed to track or recognize facial activities in three levels. First, in the

22 Views

2y ago

Spontaneous Facial Expression Recognition: A Part Based ...

Another approach of simultaneous facial feature tracking and facial expression recognition by Li et.al [21] describes about the facial activity levels and explores the probabilistic framework i.e. Bayesian networks to all the three levels of facial involvement. In general, the facial activity analysis is done either in one level or two level.

31 Views

2y ago

ST-Hadoop: A MapReduce Framework for Spatio-temporal Data

The current efforts to process big spatio-temporal data on MapReduce en-vironment either use: (a) General purpose distributed frameworks such as . operations on highly skewed data. ST-Hadoop is designed as a generic MapReduce system to support spatio-temporal queries, and assist developers in implementing a wide selection of spatio- .

14 Views

1y ago

ST-Hadoop: A MapReduce Framework for Spatio-Temporal Data - GitHub Pages

source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and Spatial-Hadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST-Hadoop provides built in spatio-temporal data types .

10 Views

5m ago

Facial Expression Recognition in the Presence of Head Motion

simultaneous tracking and recognition of facial expressions. In contrast to the mainstream approach "tracking then recognition", this framework simultaneously retrieves the facial actions and expression using a particle filter adopting multi-class dynamics that are conditioned on the expression. 2. Face and facial action tracking

27 Views

2y ago

MAESTRÍA EN ANTROPOLOGÍA FACULTAD DE FILOSOFÍA Y LETRAS ...

Alfredo López Austin TEMARIO SEMESTRAL DEL CURSO V. LOS PRINCIPALES SISTEMAS DEL COMPLEJO, LAS FORMAS DE EXPRESIÓN Y LAS TÉCNICAS 11. La religión 11.1. El manejo de lo k’uyel. 11.1.1. La distinción entre religión, magia y manejo de lo k’uyel impersonal. Los ritos específicos. 11.2. Características generales de la religión mesoamericana. 11.3. La amplitud social del culto. 11.3.1 .

106 Views

3y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

10m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Share a Google Doc in Schoology - fcps.edu

After you have connected your Google Drive to Schoology (directions in a separate handout), another way to share a Doc with students is to use the Google Drive Resource App. To share a Google Doc using the Google Drive Resources App: 1. From the Add Materials drop down menu, select Import from Resources. 2. Select Apps. Then Google Drive .

1y ago

92 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

Google Analytics 101 - Content Jam

Google Analytics 101 201 301 Google Ads 101 201 Google Tag Manager 101 Google Data Studio 101 Google Optimize 101. Welcome Fun Facts: Share . Google Analytics 301 35 Web Property The web property ID is of the form UA-XXXXXX-YY. It's often called the "UA number" since it starts with

1y ago

107 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Facial Shape Tracking Via Spatio-Temporal Cascade Shape .

It looks like you're using an ad-blocker