A Fully Adaptive Normalized Nonlinear Gradient Descent .

2y ago
42 Views
2 Downloads
746.93 KB
10 Pages
Last View : 10d ago
Last Download : 3m ago
Upload by : Kelvin Chao
Transcription

2540IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 10, OCTOBER 2003A Fully Adaptive Normalized Nonlinear GradientDescent Algorithm for Complex-ValuedNonlinear Adaptive FiltersAndrew Ian Hanna and Danilo P. Mandic, Member, IEEEAbstract—A fully adaptive normalized nonlinear complex-valued gradient descent (FANNCGD) learning algorithm fortraining nonlinear (neural) adaptive finite impulse response (FIR)filters is derived. First, a normalized nonlinear complex-valuedgradient descent (NNCGD) algorithm is introduced. For rigour,the remainder of the Taylor series expansion of the instantaneousoutput error in the derivation of NNCGD is made adaptive atevery discrete time instant using a gradient-based approach. Thisresults in the fully adaptive normalized nonlinear complex-valuedgradient descent learning algorithm that is suitable for nonlinearcomplex adaptive filtering with a general holomorphic activationfunction and is robust to the initial conditions. Convergenceanalysis of the proposed algorithm is provided both analyticallyand experimentally. Experimental results on the prediction ofcolored and nonlinear inputs show the FANNCGD outperformingother algorithms of this kind.Index Terms—Adaptive filtering, nonlinear complex-valued filtering, normalized gradient descent, prediction.I. INTRODUCTIONDAPTIVE filtering techniques are an important facetto many scientific disciplines such as communications,biomedical engineering, and life sciences. As these areas developed so did the character class of processed data. The majorityof these diverse data existed in the real domain; however,increasing amounts started to root in the complex domain. Thisin turn lead to the development of complex-valued learningalgorithms for nonlinear adaptive filters. For linear complexadaptive filtering, the complex least mean square (CLMS)algorithm [1] was developed. As the architectures of nonlinearneural network models became more involved, the complexbackpropagation (CBP) algorithm was derived [2]–[5]. Thecomplication with the CBP algorithm is finding a suitable activation function that is analytic and completely bounded in thecomplex plane [6]. Liouville’s theorem states “a bounded entire function in the complex domain is a constant” [6]–[8], andso, to be able to employ gradient descent-based algorithms, afully complex activation function must be analytic and boundedalmost everywhere in the complex domain for which thereare many choices. Originally, a split complex activation wasAManuscript received October 17, 2001; revised March 5, 2003. The associateeditor coordinating the review of this paper and approving it for publication wasDr. Inbar Fijalkow.A. I. Hanna is with the School of Computing Sciences, University of EastAnglia, Norwich, NR4 7TJ, Norfolk, U.K. (e-mail: aih@cmp.uea.ac.uk.).D. P. Mandic is with the Communications and Signal Processing Group of theDepartment of Electrical and Electronic Engineering, Imperial College of Science, Technology, and Medicine, London, U.K. (e-mail: d.mandic@ic.ac.uk).Digital Object Identifier 10.1109/TSP.2003.816878used in the processing of complex-valued signals.1 Howevera split complex activation function cannot be analytic. To thiscause, it is illustrated in [6] that the class of transcendentalfunctions can be used as fully complex-valued activation functions successfully. For practical purposes, a complex-valuedactivation function proposed in [3] is frequently used.For nonlinear adaptive filtering applications, a simple extension of an FIR filter is a dynamical perceptron, which is in factan FIR filter superseded with a continuous nonlinear activationfunction. In Control Theory, this is also known as a Wienermodel [7], [9]. Here, we consider such a filter realized as a dynamical complex neuron, as shown in Fig. 1.A recent result provides novel ways of how to normalize thebackpropagation algorithm [10]; however, for a highly ill-conditioned input correlation matrix, close to zero input vectors andsignals with long time correlation and large dynamical range, itis difficult to choose the parameters of the algorithm for eachparticular case. In this paper, we embark upon the previouslyderived normalized nonlinear gradient descent (NNGD) algorithm [11] for real-valued adaptive filtering and extend it to becompliant with signals in the field of complex numbers . TheNNGD algorithm is a member of the class of fully adaptivenormalized nonlinear gradient descent algorithms, which in thelinear real-valued case have been developed in [12]–[15]. Thederivation of the NNGD algorithm [7] performs a Taylor seriesexpansion of the instantaneous output error, which is then truncated leaving the driving terms of the algorithm. This results in asuboptimal algorithm due to an approximation of the expansion.The choice of activation function, however, has major influences on the performance of algorithms for nonlinear filters.Therefore, based on the real-valued normalized nonlinear gradient descent algorithm, we first derive a normalized nonlinearcomplex-valued gradient descent (NNCGD) for a general complex-valued activation function. For rigour, we make the constant term, which is included to balance the truncated Taylorseries expansion in the derivation of the NNCGD algorithm,adaptive using a gradient-based approach that produces the fullyadaptive normalized nonlinear complex-valued gradient descent(FANNCGD) algorithm derived for a general holomorphic activation function. Experiments on the prediction of complexvalued colored and nonlinear signals show that the proposed algorithm outperforms the previously derived algorithms of thiskind.1For a complex number j , a split complex activation is of the form8( ) 8( ) j 8( ), whereas a fully complex activation function is of theform 8( ) 8 ( ; ) j 8 ( ; ):1053-587X/03 17.00 2003 IEEE

HANNA AND MANDIC: FULLY ADAPTIVE NORMALIZED NONLINEAR GRADIENT DESCENT ALGORITHM2541Fig. 1. Nonlinear FIR filter.B. Normalized Nonlinear Complex Gradient DescentAlgorithmII. NORMALIZED NONLINEAR COMPLEX GRADIENTDESCENT ALGORITHMA. Nonlinear Complex Gradient Descent AlgorithmThe equations that describe the nonlinear complex-valuedgradient descent (NCGD) algorithm for a complex-valued dynamical perceptron, shown in Fig. 1, employed as a nonlinearFIR filter with a single output neuron are given by(1)is the instantaneous output error of the filter at timewhereis the output from the complex-valued nonlinearinstant ,is the desired output,isactivation function,some holomorphic function that is bounded almost everywherein the complex domain [6], andInput signals with unknown and possibly very large dynamical range, an ill-conditioned tap input autocorrelation matrix,and the coupling between different signal modes slow down thelearning process. In order to speed up learning, it is desirable tocalculate an optimal learning rate that normalizes the modelaccording to a minimization of the instantaneous output errorat every iteration. The optimal learning rate of the NNCGDalgorithm is calculated similarly to the real case [10], [11] byexpanding the instantaneous output error by a Taylor seriesexpansion(2)denotes the complexwhere,input such thatfrom Fig. 1. The complex weight vector is denoted by, andis the number of tapinputs. For simplicity, we state that(3)and, respectively, denote the realwhere the superscripts. Weand imaginary parts of a complex quantity, andcan then split up the error term (1) into its real and imaginaryparts as(9)The higher order terms of the polynomial can be neglected ifis sufficiently small [16]; however, during the training period of the algorithm, this conditionmay not be held; thus, the term regarding higher order derivatives must be adjusted automatically. However, as an onlineandlearning model, we do not know the values ofa priori.2 To this cause, we truncate the expansion to includethe driving terms of the algorithm, namely, the weight vector,which gives(4)(10)(5)denotes the truncated terms of the expansion. Sincewhereis a complex function, we can apply the Cauchy–Riemannequations to give3is the conventional cost function of the network [1],wheredenotes the complex conjugate. The weight adaptationandin the nonlinear complex gradient descent (NCGD) algorithm istherefore given by [3](11)(6)and therefore(7)where is the learning rate. The NCGD algorithm can be writtenin the compact form as(8)(12)2For instance, in unsupervised offline batch processing, the values of d (k )are still unknown.3For a full derivation, see Appendix A.

2542IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 10, OCTOBER 2003(a)Fig. 2.0.8.(b)Convergence curves for NNCGD with varying C . (a) Convergence curve using NNCGD with CFor simplicity, we take only the first two terms of (10), and substituting in (7) and (12) yields 0.1. (b) Convergence curve using NNCGD with C In each case, the order of the filter was, and the nonlinearity was the complex-valued hyperbolic tangent function,which was defined as(16)(13), and. Fig. 2(a) shows the performance ofwherereaching 22 dB, andthe NNCGD algorithm withconFig. 2(b) shows the NNCGD algorithm withverging to 34 dB, which is an increase in performance of12 dB, showing that the NNCGD algorithm is sensitive to thechoice of .III. FULLY ADAPTIVE NNCGD ALGORITHMFor convenience, we employ the method given in [7] to solve) to be zero,for . For the output error at time instant (the term in the square brackets must be zero, which gives thelearning rate of the NNCGD algorithm as(14)The convergence curves in Fig. 2 clearly show a difference inperformance according to varying values of , which was addedto balance the exclusion of the terms from (9). For this reason, itfrom (14) be introis proposed that an online adaptive termduced, providing a fully adaptive normalized nonlinear complexgradient descent (FANNCGD) learning algorithm. The equationis given bythat defines the update of(17)denotes a term added to balance the exclusion ofIn (14),second and higher order derivatives, which is denoted in (10) asand the truncated terms in (9) from the Taylor series expansion. In the real-valued NNGD algorithm [11], this termhas been kept constant. The value of this term can have substantial effects on the convergence of the nonlinear adaptive filter,and the effects of this term will vary for different modes of application. To illustrate this, 500 independent simulations on theprediction of colored input were averaged to produce the convergence curves. The colored input was generated with comwith zero mean and unit variance,plex-valued white noisewhich was then passed through a stable AR filter described bywheredenotes the step size of the algorithm, and therefore(18)To calculate two partial derivative equations given in (18), it isnecessary to use the Cauchy–Riemann equations to obtain4(19)(15)4Fora full derivation of (19) and (20), see Appendix B.

HANNA AND MANDIC: FULLY ADAPTIVE NORMALIZED NONLINEAR GRADIENT DESCENT ALGORITHMand(20)2543This yields the FANNCGD learning algorithm for nonlinear FIRfilters realized as dynamical perceptrons, which is given byWriting the weight update term excluding the learning rateto give(21)we can derive(22)where we have (23) and (24), shown at the bottom of the page.Therefore(27)where is the step size of the proposed algorithm and is chosento be a small positive constant.IV. CONVERGENCE OF THE FANNCGD ALGORITHM(25)The FANNCGD algorithm determines the optimal learningfor the class of complex-valued nonlinear gradientratedescent algorithms. Although the FANNCGD algorithm converges in the mean squared error for a range of values and, we can show that for uniform convergenceasby(28).For simplicity, we will denoteThe gradient of the cost function with respect to the termfrom (14) added to compensate for the truncation in (10)becomesFor this term to converge in the mean squared error sense, itmust stand that(29)and therefore, the range for(26)becomes(30)(23)and(24)

2544IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 10, OCTOBER 2003(a)(b)Fig. 3. Convergence curves of NNCGD and FANNCGD on colored input with the hyperbolic tangent function. (a) Convergence curves for NNCGD on coloredinput. (b) Convergence curve for FANNCGD on colored input.(a)(b)Fig. 4. Convergence curves of NNCGD and FANNCGD on nonlinear input with the hyperbolic tangent function. (a) Convergence curves for NNCGD on nonlinearinput. (b) Convergence curve for FANNCGD on nonlinear input.Substituting in the update term for, (27), we can then writeored and nonlinear input signals with various complex-valuedactivation functions.A. Hyperbolic Tangent Function(31)and solving forgives(32)which are the convergence conditions for the FANNCGD algorithm. Convergence analysis for the mean error, mean squarederror, and steady state conforms to the analysis in [7].V. EXPERIMENTAL RESULTSTo investigate the performance of the FANNCGD algorithmcompared with the NCGD and NNCGD algorithms, they wereall applied to the problem of time-series prediction by averaging the performance curves of 500 independent simulations.For rigour, all algorithms were tested on complex-valued col-The algorithms were employed on single neuron FIR complex-valued nonlinear adaptive filters for prediction of coloredand nonlinear complex-valued signals. The activation functionwas the complex-valued hyperbolic tangent function (16), withand a tap input of size. The input to all filterswith zero mean and unitwas complex-valued white noisevariance, which was then passed through a stable AR filter givenby (15) for the linear prediction. Fig. 3 shows the performancecurves for the NCGD, NNCGD, and FANNCGD algorithms ontime series prediction of colored input. The quantitative measure of performance was a logarithmic scale of the averaged cost. Fig. 3(a) shows the NCGD algofunctionrithm performance curve reaching 15.5 dB with a learning rate. The NNCGD algorithm performance curve reached28.8 dB and 17 dB for values ofand, respectively. The FANNCGD algorithm [see Fig. 3(b)] convergedto 29 dB, which is at least as good at the best choice of inthe NNCGD algorithm. For the second experiment of nonlinear

HANNA AND MANDIC: FULLY ADAPTIVE NORMALIZED NONLINEAR GRADIENT DESCENT ALGORITHM(a)Nonlinear complex-valued activation function 8(z )(1 )jzj).Fig. 5.2545(b) z ( (1 ) z ). (a) Magnitude of 8(z) z ( (1 ) z ). (b) Phase of 8(z) z ( j jtime series prediction, the input signalwas passed througha benchmark nonlinear filter described by [7](33)and the nonlinearity in the output neuron was the complex.valued hyperbolic tangent function given in (16) withFor the task of nonlinear prediction (33), Fig. 4(a) shows theperformance curve of the NCGD algorithm reaching 23 dB,and the performance curves of the NNCGD algorithm reachingand, reto 30 dB and 49 dB for values ofspectively. The performance curve of the FANNCGD algorithm[see Fig. 4(b)] converged to value of 50 dB, which is at leastas good as the best performance of in the NNCGD algorithm.In Figs. 3 and 4, it is shown that the FANNCGD algorithmreaches the best performance of the NNCGD algorithm whenan optimal constant is chosen. This optimal value of in theNNCGD algorithm is not known before training, and thus, theFANNCGD algorithm is a robust generalization of the NNCGDalgorithm.The simulation results have shown the FANNCGD algorithmoutperforming the NNCGD algorithm for complex-valuedlinear and nonlinear input signals. It is shown that the NNCGDalgorithm can achieve optimal performance given certain inputsignals for a specific value of . However, over an averagednumber of simulations, the NNCGD algorithm will not obtainas high a global performance as the proposed FANNCGDalgorithm.B. Practical Complex-Valued Activation FunctionIt is known from Liouville’s statement [6] that a function thatis analytic and nonlinear cannot be bounded on the entire complex domain. There are many choices of activation functions thatsatisfy the desirable constraints defined in [6]; however, the proposed FANNCGD algorithm is derived for any complex-valuednonlinear function that satisfies these conditions. To further illustrate this, we employ the frequently used complex-valuedfunction given in [3] and shown in Fig. 5(34)j jwhere and are real positive constants. Although the activation function does not satisfy the Cauchy–Riemann equations,forit does satisfy the constraint in [3] and [6] if, where; then, ifsomeit means thatis not a suitable activation function. Thisto a uniquefunction has the property of mapping a pointon the open disc, and the parameterpointcontrols the slope of the activation function.The partial derivatives of (34) are given by [3]ififififififif(35)(36)if. Fig. 6 shows the performance curves for thewhereNCGD, NNCGD, and FANNCGD algorithms on adaptive preand. The quantidiction of colored input, (15), withtative measure of performance was a logarithmic scale of the av. Fig. 6(a) shows theeraged cost functionNCGD algorithm performance curve reaching 15.9 dB with. The NNCGD algorithm performancea learning ratecurve reached 24.0 dB and 8.1 dB for values ofand, respectively. The FANNCGD algorithm [seeFig. 6(b)] converged to 24 dB, which is at least as good at thebest choice of in the NNCGD algorithm. Fig. 7 shows the performance curves for the NCGD, NNCGD, and FANNCGD algorithms on time series prediction of nonlinear input (33), withand. Fig. 7(a) shows the performance curve of theNCGD algorithm reaching 22 dB and the performance curvesof the NNCGD algorithm reaching 34 dB and 49 dB for

2546IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 10, OCTOBER 2003(a)(b)Fig. 6. Convergence curves of NNCGD and FANNCGD on colored input with a practical complex-valued activation function (34). (a) Convergence curves forNNCGD on colored output. (b) Convergence curves for FANNCGD on colored output.(a)(b)Fig. 7. Convergence curves of NNCGD and FANNCGD on colored input with a practical complex-valued activation function (34). (a) Convergence curves forNNCGD on nonlinear input. (b) Convergence curves for FANNCGD on nonlinear input.values ofand, respectively. The performancecurve of the FANNCGD algorithm [see Fig. 7(b)] converged tovalue of 50 dB, which is at least as good as the best performance of in the NNCGD algorithm.VI. ROBUSTNESS OF THE FANNCGD ALGORITHMWith all nonlinear stochastic models, the initial conditionscan effect the performance of the systems dramatically. To thiscause, an experiment to investigate the robustness of the fullyadaptive normalized nonlinear complex gradient descent (FANwasNCGD) algorithm according to the initial choice ofcarried out on a nonlinear adaptive filter with a single dynamicalperceptron using the complex-valued activation function givenin (34) as the nonlinearity. The task was time series prediction ofthat was then passed throughcomplex-valued white noisethe stable AR filter described in (15). The quantitative measure,of performance was prediction gaindenotes the variance of the expected signal, andwhereFig. 8.Prediction gain for varying values of C (0) for FANNCGD.denotes the variance of the prediction error. Fig. 8 shows the ef. The maximumfects on the prediction gain forvariance of prediction gain for this range of initial conditions is

HANNA AND MANDIC: FULLY ADAPTIVE NORMALIZED NONLINEAR GRADIENT DESCENT ALGORITHM2.5 dB, which reinforces the robustness of the proposed FANNCGD algorithm with respect to the initial conditions.2547givingVII. CONCLUSIONSA fully adaptive normalized nonlinear complex-valuedgradient descent (FANNCGD) algorithm for training nonlinearadaptive filters realized as a dynamical perceptron has been derived. The previously derived real-valued normalized nonlineargradient descent (NNGD) algorithm has first been extended tomanage signals in the complex domain , resulting in the normalized nonlinear complex-valued gradient descent (NNCGD)algorithm. A fundamental constant term in the derivation ofthe N

2540 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 10, OCTOBER 2003 A Fully Adaptive Normalized Nonlinear Gradient Descent Algorithm for Complex-Valued Nonlinear Adaptive Filters Andrew Ian Hanna and Danilo P. Mandic, Member, IEEE Abstract— A fully adaptive normalized nonlinear com-plex-valued

Related Documents:

Chapter Two first discusses the need for an adaptive filter. Next, it presents adap-tation laws, principles of adaptive linear FIR filters, and principles of adaptive IIR filters. Then, it conducts a survey of adaptive nonlinear filters and a survey of applica-tions of adaptive nonlinear filters. This chapter furnishes the reader with the necessary

Sybase Adaptive Server Enterprise 11.9.x-12.5. DOCUMENT ID: 39995-01-1250-01 LAST REVISED: May 2002 . Adaptive Server Enterprise, Adaptive Server Enterprise Monitor, Adaptive Server Enterprise Replication, Adaptive Server Everywhere, Adaptive Se

40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 Mix 2-2 TA, 2.2000 mg Integral 6.80 mJ Normalized 3.09 Jg 1 Onset 50.95 C Peak Integral Normalized Onset Peak Integral Normalized Onset Peak Integral Normalized Onset 53.44 C Peak 9.99 mJ 4.54 Jg 1 122.

Nonlinear Finite Element Analysis Procedures Nam-Ho Kim Goals What is a nonlinear problem? How is a nonlinear problem different from a linear one? What types of nonlinearity exist? How to understand stresses and strains How to formulate nonlinear problems How to solve nonlinear problems

Third-order nonlinear effectThird-order nonlinear effect In media possessing centrosymmetry, the second-order nonlinear term is absent since the polarization must reverse exactly when the electric field is reversed. The dominant nonlinearity is then of third order, 3 PE 303 εχ The third-order nonlinear material is called a Kerr medium. P 3 E

Outline Nonlinear Control ProblemsSpecify the Desired Behavior Some Issues in Nonlinear ControlAvailable Methods for Nonlinear Control I For linear systems I When is stabilized by FB, the origin of closed loop system is g.a.s I For nonlinear systems I When is stabilized via linearization the origin of closed loop system isa.s I If RoA is unknown, FB provideslocal stabilization

approach and an adaptive architecture may be required.2 This is in fact the most common strategy adopted in the past few years for helicopter nonlinear flight con-trol:3,4,5 a Nonlinear Dynamic Inversion (NDI) of an ap-proximate model (linearized at a pre-specified trim con-dition) together with adaptive elements to compensate

When designing a storage tank, it is necessary to meet the requirements of the design code (in this case, API 650), and also with all those requirements of the codes involved in the process of the tank. Some of them are listed below: API-RP 651: Cathodic Protection of Aboveground Petroleum Storage Tanks