2y ago

18 Views

2 Downloads

1.25 MB

8 Pages

Transcription

Neurocomputing 74 (2011) 1564–1571Contents lists available at ScienceDirectNeurocomputingjournal homepage: www.elsevier.com/locate/neucomGrey-box radial basis function modellingSheng Chen a, , Xia Hong b, Chris J. Harris aabSchool of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UKSchool of Systems Engineering, University of Reading, Reading RG6 6AY, UKa r t i c l e i n f oa b s t r a c tArticle history:Received 28 September 2010Received in revised form23 December 2010Accepted 2 January 2011Communicated by K. LiAvailable online 21 March 2011A fundamental principle in data modelling is to incorporate available a priori information regarding theunderlying data generating mechanism into the modelling process. We adopt this principle andconsider grey-box radial basis function (RBF) modelling capable of incorporating prior knowledge.Speciﬁcally, we show how to explicitly incorporate the two types of prior knowledge: (i) the underlyingdata generating mechanism exhibits known symmetric property, and (ii) the underlying process obeysa set of given boundary value constraints. The class of efﬁcient orthogonal least squares regressionalgorithms can readily be applied without any modiﬁcation to construct parsimonious grey-box RBFmodels with enhanced generalisation capability.& 2011 Elsevier B.V. All rights reserved.Keywords:Data modellingRadial basis function networkBlack-box modelGrey-box modelOrthogonal least squares algorithmSymmetryBoundary value constraint1. IntroductionThe radial basis function (RBF) network has found wide-rangingapplications in diverse ﬁelds of engineering [1–17], and the class oforthogonal least squares (OLS) regression algorithms [18–22] offerspowerful and efﬁcient tools for constructing parsimonious RBFmodels that generalise well. This approach is equally applicable tothe supervised regression [18–22] and classiﬁcation [23–25] as wellas the unsupervised probability density function estimation [26–28].Like many other data modelling approaches, the RBF model constitutes a black-box data modelling approach. Adopting a black-boxmodelling is appropriate if no a priori information exists regardingthe underlying data generating mechanism. However, if there areknown prior knowledge concerning the underlying process, theyshould be incorporated into the model structure explicitly. The use ofprior knowledge in data modelling often leads to enhanced modelling performance. A general discussion on learning from known priorknowledge or hints is given in [29]. A few works have exploited thesymmetric properties of some underlying systems in regressionapplications [30,31] as well as in classiﬁcation problems [32,33].System identiﬁcation has a long history of investigating greybox based techniques, and some studies on how to incorporating a Corresponding author.E-mail addresses: sqc@ecs.soton.ac.uk (S. Chen),x.hong@reading.ac.uk (X. Hong), cjh@ecs.soton.ac.uk (C.J. Harris).0925-2312/ - see front matter & 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.neucom.2011.01.023priori system knowledge into the model structure can be found in[34–39]. For the linear system identiﬁcation, the work [34] hasshown how to translate crucial physical knowledge, such asprocess stability and sign of stationary gains, into linear inequalityconstraints on the black-box model to yield the grey-box modelclass in which a Bayesian approach is adopted to associate thephysical knowledge with a prior distribution. The authors of [36]have proposed an approach which can potentially incorporate thesystem knowledge naturally into the linear-in-the-parameternonlinear black-box model. They argue that, instead of a blackbox polynomial expansion, various nonlinear functions or basescan be adopted to form an extended model set and the choices ofnonlinear bases may be determined from physical knowledge ofthe system to be modelled. The study [37] has emphasised that forpractical nonlinear engineering systems, some of the underlyingphysical parameters are usually known a priori and, therefore, agrey-box nonlinear model should be adopted to explicitly utilisethe a priori system knowledge. The works [38,39] have furtherreﬁned the concept of the extended model set [36] and haveproposed a novel eng-genes framework which chooses the activation functions of neural network nodes or nonlinear bases toreﬂect physical reality of the process to be modelled. It should beemphasised, however, that there does not exist a generic grey-boxmodel which can represent any a priori system knowledge.How to incorporating prior knowledge to form a grey-boxmodel is highly problem dependent and is really an art. But thereexist some desired objectives in using a grey-box model. Firstly, by

S. Chen et al. / Neurocomputing 74 (2011) 1564–1571incorporating a priori information regarding the underlyingprocess to be modelled, better generalisation performance shouldbe achieved. Secondly, a grey-box modelling should not result inan increased computational complexity. For example, when developing a grey-box RBF model, it is highly desirable that the existinglearning algorithms for the black-box RBF model can readily beused, and one is not forced to derive new learning algorithms. Inthis contribution, we speciﬁcally consider two types of a prioriinformation. In the ﬁrst type of data modelling problems, theunderlying data generating mechanism exhibits a knownsymmetric property and we introduce the symmetric RBF (SRBF)model that guarantees to possess the known symmetry. For thesecond type of applications, the underlying process obeys a set ofthe given boundary value constraints (BVCs) and we adopt thenovel BVC-RBF structure which automatically meets the givenBVCs. All the learning algorithms originally derived for the blackbox RBF model can be applied to these two grey-box RBF modelswithout the need for any modiﬁcation. In particular, the class ofOLS learning algorithms [18–22] provides efﬁcient means ofbuilding parsimonious grey-box RBF models with improvedgeneralisation performance.The remainder of this contribution is structured as follows.Section 2 summarises the black-box RBF modelling based on theclass of efﬁcient OLS learning algorithms. The two grey-box RBFmodels are derived in Sections 3 and 4, respectively, by incorporating a priori knowledge of symmetric property and a given set ofBVCs. Our conclusions are offered in Section 5.Furthermore, the model (4) over the training data set DK can beexpressed asy ¼ PK hK þ eðKÞGive the training data set DK ¼ fxðkÞ,yðkÞgKk ¼ 1 , wherexðkÞ ¼ ½x1 ðkÞ xm ðkÞ T A Rm is the input vector and yðkÞ A R is thedesired output for xðkÞ. The data is generated by the unknownnonlinear data generating mechanism with the nonlinear mapping f : Rm -R asyðkÞ ¼ f ðxðkÞÞ þ eðkÞPK ¼ ½p1 pK MXyi pi ðxðkÞ; sÞð2Þi¼1is constructed from the training data DK to realise the underlyingdata generating mechanism f : Rm -R, where M is the number ofRBF units, and each RBF basispi ðx; sÞ ¼ jðJx ci J sÞð3Þm2is speciﬁed by its centre vector ci A R , RBF variance s and thechosen basis function jð Þ. This is a black-box modellingapproach, as no prior knowledge regarding f is required andeverything is learnt from the data, which is inherently stochasticdue to the observation noise. The class of efﬁcient OLS learningalgorithms [18–22] have been developed to construct the RBFmodel from the training data DK.Use every data xðkÞ as a candidate RBF centre and assume thata common RBF variance s2 is obtained separately via crossvalidation. Then the resulting K-term RBF model over the trainingdata ðxðkÞ,yðkÞÞ A DK can be expressed asyðkÞ ¼ ðpðKÞ ðkÞÞT hK þ eðKÞ ðkÞð4ÞðKÞwhere e ðkÞ ¼ yðkÞ y ðkÞ is the K-term modelling error, hK ¼½y1 yK T is the RBF weight vector, and pðKÞ ðkÞ ¼ ½p1 ðkÞ pK ðkÞ TwithðKÞð7ÞTwith pi ¼ ½pi ð1Þ pi ðKÞ . Note that pk is the kth column of PK , whileðpðKÞ ðkÞÞT denotes the kth row of PK .Let an orthogonal decomposition of the regression matrix PKbe PK ¼ WK AK with231 a1,2 a1,K601& 767ð8ÞAK ¼ 674 & & aK 1,K 500 1and the orthogonal regression matrixWK ¼ ½w1 wK ð9ÞwTi wj¼ 0 if ia j. Then the regression model (6) canthat satisﬁesbe written equivalently asy ¼ WK gK þ eðKÞð10Þwhere the weight vector gK satisﬁes the relationship AK hK ¼ gK .Similar to (4), y(k) can be modelled byyðkÞ ¼ ðwðKÞ ðkÞÞT gK þ eðKÞ ðkÞð11ÞTwhere w ðkÞ ¼ ½w1 ðkÞ wK ðkÞ is the kth row of WK .The OLS forward selection procedure chooses model terms oneby one from the full K-term candidate set. Speciﬁcally, after then 1-th stage of the subset selection, the selected model containsn 1 model columns while the candidate pool contains the remaining K n þ 1 candidate columns, as illustrated in the following:23candidate poolselected model termszﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{ zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{4 w1 w2 wn 1 j pn pn þ 1 pK 5ð1Þwhere eðkÞ is the observation noise. The RBF model of the formðMÞðMÞy ðkÞ ¼ f ðxðkÞÞ ¼ð6Þby introducing the notations y ¼ ½yð1Þ yðKÞ T , eðKÞ ¼ ½eðKÞ ð1Þ eðKÞ ðKÞ T andðKÞ2. Black-Box RBF modelling1565pi ðkÞ ¼ jðJxðkÞ xðiÞJ sÞ,1 ri r Kð5ÞAt the n-th stage of the subset selection, one model term is selectedfrom the candidate pool as the nth selected model term to add tothe selected subset model. This selected model term wn shouldmaximally improve the modelling performance of the n-termsubset model according to some speciﬁed criterion.2.1. D-optimality enhanced ROLS algorithmIn the D-optimality enhanced regularised OLS (ROLS) algorithm, the criterion for the subset model selection is the combined regularised training mean square error (MSE) and theD-optimality criterion [21] deﬁned byJCRD ðgK , kK , bÞ ¼ ðeðKÞ ÞT eðKÞ þ gTK KK gK Þ þ bKX logðwTn wn Þð12Þn¼1where kK ¼ ½l1 lK T is the regularisation parameter vector, KK ¼diagfl1 , , lK g, and b is the D-optimality weighting. The localregularisation term gTK KK gK in the criterion (12) enhances thegeneralisation and sparseness of the selected model [40], whilethe D-optimality criterion, the last term in JCRD ðgK , kK , bÞ, preventsthe selection of an oversized ill-posed model and reduces theparameter estimate variances [21,41].Denote the value of JCRD for the selected n 1-term subsetðn 1Þmodel as JCRD. Then at the n-th stage of selection, the selectedmodel term is the one that minimises the combined criterionðnÞðn 1Þ¼ JCRD gn2 ðwTn wn þ ln Þ blogðwTn wn ÞJCRDð13Þ

1566S. Chen et al. / Neurocomputing 74 (2011) 1564–1571As shown in [21], with an appropriate chosen value for b, thereexists an ‘‘optimal’’ subset model size M 5K such that: for n rM,ðnÞthe criterion JCRDdecreases as n increases, whileðMÞðM þ 1Þo JCRDJCRDð14ÞThus, the subset model selection is automatically terminated,yielding an M-term RBF model. The regularisation parameters canbe updated using the evidence procedure [21,22,40]. The detailedalgorithm can be found in [21], and it will not be repeated here. Inparticular, when no regularisation is employed, i.e. ln ¼ 0 for all n,this algorithm reduces to the D-optimality assisted OLS algorithmpresented in [42].2.2. ROLS algorithm based on LOO statisticsIt is highly desirable to select model terms by directly optimisingthe model generalisation performance, instead of the trainingperformance. Model generalisation can be evaluated by the testperformance on the data not used in training the model, and acommonly used cross validation method is the leave-one-out (LOO)cross validation [43,44]. The idea of LOO cross validation is as follow.Remove the kth data from the training set DK ¼ fxðkÞ,yðkÞgKk ¼ 1 , anduse the remaining K 1 data DK \ðxðkÞ,yðkÞÞ to identify the n-termmodel, which is denoted by y point not used in training iseðn, kÞ ðkÞ ¼ yðkÞ y ðn, kÞ ðkÞðn, kÞ. The test error on the single datað15ÞRepeating the procedure for each k leads to the LOO test MSE for then-term modelðnÞJLOOK1X¼ðeðn, kÞ ðkÞÞ2Kk¼1ð16ÞðnÞwhich is a generalisation measure for the model y identiﬁed usingthe whole DK [43,44]. For the linear-in-the-weights models, whichthe model (6) is, the above steps of the LOO cross validation arevirtual, as the LOO test errors can be generated, without actuallysequentially splitting the training data set and repeatedly estimatingthe associated models, by applying the Sherman–Morrison–Woodbury theorem [43,44].In particular, the use of the equivalent orthogonal model (10)leads to an efﬁcient computation of the LOO test MSE [22,45]. Thisis because the LOO error can be expressed as [44]eðn, kÞ ðkÞ ¼eðnÞ ðkÞZðnÞ ðkÞð17Þ3. Symmetric RBF modellingConsider again the training data set DK ¼ fxðkÞ,yðkÞgKk ¼ 1 that isgenerated by the underlying system (1). The system mappingf : Rm -R is unknown. However, the system f is known to possessthe odd symmetryf ð xÞ ¼ f ðxÞð21ÞThis a priori information may come from the known physics lawgoverning the system. For example, from physics, the underlyingoptimal discriminant function or detector for the binary digitalsignals has this old symmetry [32]. Although we consider the oldsymmetry in this contribution, the even symmetry can be treatedin a similar way. In fact, our approach can be extended to dealwith more complex symmetric properties, such as those encountered in the complex-valued digital signal detection [33].3.1. Symmetric RBF networkOur goal is to construct the RBF model (2) from the data DK todiscover the underlying data generating mechanism f. To the defenceof the black-box RBF model with the standard RBF node (3), it has agood learning capability and should be able to approximate theðMÞunderlying system f well. Thus, f learnt from the training data setDK alone should approximately possess the odd symmetry. However,this is not guaranteed, particularly when the training data DK isnoisy. Since the underlying system is known to possess the oldsymmetry (21), we would like the model to possess the same oldsymmetry, namely,ðMÞðMÞf ð xÞ ¼ f ðxÞð22ÞFurthermore, we would like to exploit the prior knowledge (21) forimproving the modelling efﬁciency as well.To explicitly incorporate the prior knowledge (21), we adoptthe following symmetric RBF (SRBF) nodepi ðx; sÞ ¼ jðJx ci J sÞ jðJxþ ci J sÞð23ÞWith this symmetric node structure, the prior information is naturallyincorporated into the model structure and the resulting SRBF modelguarantees to have the same odd symmetry as the underlying system.Moreover, this grey-box RBF model with the symmetric nodestructure (23) has the same regression modelling form as the blackbox RBF model discussed in Section 2. Therefore, we do not need todevelop any new learning algorithm for this grey-box RBF model.Instead, the class of OLS learning algorithms [18–22] can readily beused to identify a parsimonious SRBF model based on DK.3.2. A symmetric modelling examplewhere the n-term modelling error eðnÞ ðkÞ and the associated LOOerror weighting ZðnÞ ðkÞ can be calculated recursively according to[22,45]The system to be identiﬁed was given by sinðx1 5Þsinðx2 5Þ sinðx1 þ 5Þsinðx2 þ 5Þ f ðx1 ,x2 Þ ¼ 10ðx1 5Þðx2 5Þðx1 þ 5Þðx2 þ5ÞeðnÞ ðkÞ ¼ eðn 1Þ ðkÞ wn ðkÞgnThis system has the odd symmetry and f ðx1 ,x2 Þ is plotted in Fig. 1(a) using a grid of 90 601 points. The training data set DK contained961 noisy data points as shown in Fig. 1(b), where the system noiseeðkÞ was a white Gaussian noise with variance s2e ¼ 0:16. The basisfunction was chosen to be the Gaussian function. The ROLS algorithmbased on the LOO test MSE, summarised in Section 2.2 (also see [22]),was used to automatically identify both the conventional RBF andSRBF models. The RBF variance s2 ¼ 8:0 was determined separatelyusing cross validation. A separate test data set of Ktest ¼ 961 noisydata points was also generated to compute the test MSE according toZðnÞ ðkÞ ¼ Zðn 1Þ ðkÞ w2n ðkÞwTn wn þ lnð18Þð19ÞAs shown in [45], the LOO test MSE has the following desiredproperty, namely, there exists an ‘‘optimal’’ subset size M 5K suchðnÞthat: for n r M, the criterion JLOOdecreases as n increases, whileðMÞðM þ 1Þo JLOOJLOOð20ÞThus, the subset model selection is automatically terminated,yielding an M-term RBF model. The detailed ROLS algorithm basedon the LOO test MSE can be found in [22].hiðMÞMSE ¼ E ðyðkÞ y ðkÞÞ2 ¼Ktest1 XðMÞðyðkÞ y ðkÞÞ2Ktest k ¼ 1ð24Þð25Þ

S. Chen et al. / Neurocomputing 74 (2011) 1564–157115670.660.4model errorf(x1,x2)420 2 40.20 0.2 0.4 6151510101550x2x20 5 10 15x1 1050 5 5 10 15 15 10x1 150.6640.420.2model errorf(x1,x2) n1005 5155100 20 0.2 4 0.4 6151510155x2 5 10 10x2x150 5 5 10 15 15Fig. 1. (a) The underlying symmetric function f ðx1 ,x2 Þ shown on the grid of 90 601points, and (b) the 961 noisy training data points.Table 1Performance comparison between the conventional RBF and SRBF models for thesymmetric system identiﬁcation example.Model sizeTraining MSETest MSETest MME105680.15430.15660.20470.18390.02940.0093The generalisation performance was also evaluated with the meanmodelling error (MME)MME ¼ E½ðf ðx1 ,x2 Þ f ðx1 ,x2 ÞÞ2 ð26Þby averaging over the grid of 90 601 points, where f ðx1 ,x2 Þ denotesthe identiﬁed model mapping.Table 1 compares the performance of the two RBF modelsobtained. Fig. 2(a) and (b) show the modelling error f ðx1 ,x2 Þ f ðx1 ,x2 Þ on the grid of 90 601 points for the two obtained models,respectively. It can be seen that, by incorporating the prior information, the SRBF model offers a signiﬁcantly better generalisationperformance. Speciﬁcally, its test MME is three times smaller thanthat of the standard RBF model. It is also interesting to compare theefﬁciency of model construction for the two models. For the class ofOLS learning algorithms [18–22], the complexity of selecting anM-term model from the K-term candidate set is well known to beC ¼ ðM þ1Þ OðK 2 Þ21000 51555 15RBFSRBF10100ð27Þ2where OðK Þ stands for the order of K . For the SRBF model, weobtained M¼68, while M¼105 was arrived for the black-box RBF 10x1 15Fig. 2. (a) The modelling error f ðx1 ,x2 Þ f ðx1 ,x2 Þ of the standard RBF model, and(b) the modelling error f ðx1 ,x2 Þ f ðx1 ,x2 Þ of the SRBF model, for the symmetricsystem identiﬁcation example.model. Thus, for this example, the complexity of the SRBF modelconstruction is only 65% of the complexity for the standard RBFmodel construction. By incorporating the prior information naturally, we also improve the efﬁciency of model construction procedure. Finally, the prediction complexity of the two models areapproximately the same. This is because, although the SRBF unit(23) requires more computation than the standard RBF unit (3), theSRBF model has fewer RBF units and, therefore, the computationalrequirements for calculating a test data point are roughly equal forthe two models.4. BVC-RBF modellingAgain consider the identiﬁcation of the unknown system f of(1) using the RBF model (2) based on the noisy training data setDK. In addition, the unknown system mapping f is known tosatisfy a set of the L BVCs given byf ðxj Þ ¼ dj ,1r j r Lð28Þwhere xj A Rm and dj A R are known. These BVCs may representthe fact that at some critical regions, there is a complete knowledge about the system. For example, at some boundary points xj ,the behaviour of the process is completely determined by theknown physics laws that govern the process. Note that the sensorobservations on these points xj are, however, stochastic because

1568S. Chen et al. / Neurocomputing 74 (2011) 1564–1571of the observation noise. Thus, from the noisy DK , the BVCs (28)may not be seen clearly.4.1. BVC-RBF networkSince the BVCs of (28) are critical to the underlying system f toðMÞis required to strictly meetbe identiﬁed, any identiﬁed model f these BVCs, that is,ðMÞf ðxj Þ ¼ dj ,y ¼ ½yð1Þ qðxð1ÞÞ yð2Þ qðxð2ÞÞ yðKÞ qðxðKÞÞ T1 r j rLð29ÞIt is obvious that the black-box RBF model with the nodestructure (3) cannot guarantee to satisfy the known set of BVCs.The conventional way of incorporating the BVCs (29) as a set ofequality constraints in the learning will complicate the resultingoptimisation problem and dramatically increases the learningcomplexity. The novel BVC-RBF network model proposed in [46]has the capacity of satisfying the given BVCs automatically without any added algorithmic complexity and computational cost.The BVC-RBF model derived in [46] takes the formy ðMÞThe above properties 1. and 2. of the BVC-RBF nodes (31) andthe offset function (33) are illustrated in Fig. 3 for a onedimensional function f(x) with the two BVCs of f ð0:1Þ ¼ 2 andf ð0:5Þ ¼ 3.With this BVC-RBF model, no constrained optimisation isneeded. In fact, deﬁne the desired output vector for training thisgrey-box RBF model asðMÞðkÞ ¼ f ðxðkÞÞ ¼MXyi pi ðxðkÞ; sÞ þ qðxðkÞÞð30Þi¼1with the novel RBF node structurepi ðx; sÞ ¼ hðxÞjðJx ci J sÞð31Þð36Þwhere ðxðkÞ,yðkÞÞ A DK , 1 r kr K. Then the learning of this greybox RBF model with the node structure (31) and the offsetfunction (33) takes the same regression modelling form as theblack-box RBF model discussed in Section 2. Thus, the class of OLSlearning algorithms [18–22] can readily be applied to identify aparsimonious BVC-RBF model from the noisy training data DK.4.2. A BVC modelling exampleA 31 31 meshed data set f ðx1 ,x2 Þ, as depicted in Fig. 4(a), wasgenerated by using Matlab command membrane.m for the thirdeigenfunction of the L-shaped membrane, which was deﬁned overa unit square input region ðx1 ,x2 Þ A ½0,1 2 . In Fig. 4(b), the requiredL¼120 BVCs, given by the coordinates of fðx1 ,x2 Þ,f ðx1 ,x2 Þg, aremarked by the cross points at the corresponding fðx1 ,x2 ��ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃu LYuLJx xj JhðxÞ ¼ tð32Þj¼1is the geometric mean of the data sample x to the set of boundaryvalues xj , 1 rj r L. The function qðxÞ is known as the offsetfunction which takes the formqðxÞ ¼LXgj e ðJx xj J2 tÞð33Þj¼1where t is a positive scalar, and cL ¼ ½g1 gL T is the set ofparameters that are obtained by solving the set of linear equations qðxj Þ ¼ dj , 1 r j rL, as follows:cL ¼ Q 1L dLð34ÞTwhere dL ¼ ½d1 dL and221e ðJx1 x2 J tÞ6 ðJx2 x1 J2 tÞ6e1QL ¼ 66 &422e ðJxL x1 J tÞ e ðJxL x2 J tÞ && 2e ðJx1 xL J tÞ372e ðJx2 xL J tÞ 777 51ð35ÞIn the case that (35) is ill-conditioned, a regularisation techniquecan be applied to the above solution.It is easy to verify that with this BVC-RBF model, the BVCs (29)are automatically satisﬁed. To elaborate further, we note thefollowing features of the BVC-RBF structure.1. The BVC-RBF nodes (31) have the property of zero forcing at theboundary points xj , 1r j r L, and the adjustable RBF weights yihave no effects on the summation term in (30) at any of theboundary points.2. The term qðxÞ passes all the predetermined boundary valuesf ðxj Þ ¼ qðxj Þ ¼ dj , 1 r j rL, and it is completely determined bythe BVCs (28) but does not contain any adjustable parametersdependent on DK.3. Over the input range, the set of smooth BVC-RBF nodes pi ðx; sÞhas diverse local responses, and has non-zero adjustablecontribution towards modelling f ðxÞ via the adjustable parameters yi which are learnt based on the training set DK .Fig. 3. (a) Five BVC-RBF nodes with zero forcing at the two boundary points, and(b) the offset passing function qðxÞ, for the one-dimensional function f ðxÞ with thetwo BVCs of f ð0:1Þ ¼ 2 and f ð0:5Þ ¼ 3.

S. Chen et al. / Neurocomputing 74 (2011) 1564–1571156910.90.810.70.5f(x1,x2)0.6x200.50.4 0.50.3 110.210.80.50.10.6x20.40x10.2000.20.40.60.81x11BVC RBF predictions1Noisy observations00.50 0.50.50 0.5 11 11110.80.50.6x20.400.2x10.80.50.6x20.4000.2x10Fig. 4. (a) The underlying function f ðx1 ,x2 Þ shown on the grid of 961 points, (b) the L¼ 120 BVCs, xj for 1 r j r L, marked as cross points, (c) the 961 noisy training datapoints, and (d) the prediction f ðx1 ,x2 Þ of the resulting BVC-RBF model.Table 2Performance comparison between the conventional RBF and BVC-RBF models forthe BVCs system identiﬁcation example.RBFBVC-RBFModelsizeTraining MSE(inside DK )Test MME(inside boundary)Test MME(on boundary)42341.2254 10 49.8634 10 54.6043 10 51.8230 10 58.5540 10 55.1462 10 11The noisy training data set DK was generated by adding a whiteGaussian noise of variance s2e ¼ 0:012 to f ðx1 ,x2 Þ, and DK is plottedin Fig. 4(c). We used all the data points of DK that were inside theboundary as the training samples and applied the D-optimalityaided OLS regression algorithm, discussed in Section 2.1 (also see[42]), to construct both the standard RBF and BVC-RBF models.The basic function jð Þ was chosen to be Gaussian and the RBFvariance s2 ¼ 0:2 was determined separately based on crossvalidation. For the offset function (33), t ¼ 0:2 was found to beappropriate. The D-optimality weighting for the combined costfunction (12) was chosen to be b ¼ 10 6 .Table 2 compares the performance of the conventional RBFmodel obtained with that of the novel BVC-RBF modelconstructed, where the sizes of the two models were automaticallydetermined by the learning algorithm. Fig. 5(a) and (b) depict themodelling error f ðx1 ,x2 Þ f ðx1 ,x2 Þ of the two obtained models,respectively, where f denotes the model mapping identiﬁed. Theresulting BVC-RBF model is also shown in Fig. 4(d). From Table 2,it can be seen that the BVC-RBF model has a much bettergeneralisation performance than the black-box RBF model.Speciﬁcally, the MME calculated inside the boundary marked bythe cross points in Fig. 4(b) is more than two times smaller thanthat for the conventional RBF model. More signiﬁcantly, the MMEof the BVC-RBF model calculated on the boundary is effectivelyzero, conﬁrming that all the L¼120 BVCs are strictly met by theBVC-RBF model. By contrast, the black-box RBF model cannotsatisfy these BVCs strictly. The results obtained also conﬁrm thatthe model construction is more efﬁcient for the BVC-RBF model, asa smaller model size was achieved for this grey-box RBF model.Similar to the case of the SRBF modelling, it can be argued that theprediction complexity of the conventional RBF and BVC-RBFmodels are approximately the same.5. ConclusionsIn this contribution, we have discussed the art of incorporatingthe prior knowledge to form the appropriate grey-box RBF model.Two types of a priori information have been considered. In the ﬁrstcase, the underlying data generating mechanism exhibits theknown symmetry property, while in the second case, the underlying

1570S. Chen et al. / Neurocomputing 74 (2011) 1564–1571Modelling error0.050 0.05 0.110.810.60.8x2 0.40.60.40.20.20x10Modelling error0.050 0.05 0.110.810.60.8x2 0.40.60.40.20.20x10Fig. 5. (a) The modelling error f ðx1 ,x2 Þ f ðx1 ,x2 Þ of the standard RBF model, and(b) the modelling error f ðx1 ,x2 Þ f ðx1 ,x2 Þ of the BVC-RBF model, for the BVCssystem identiﬁcation example.process obeys a set of boundary value constraints. The novel SRBFmodel and the BVC-RBF model have been proposed, respectively, toincorporate these two types of a priori information naturally. Theexisting state-of-the-arts RBF learning methods for the black-boxRBF model can readily be applied to construct these two grey-boxRBF models efﬁciently, without any modiﬁcation or added algorithmic complexity and computational cost. This contribution hasclearly demonstrated that incorporating appropriate prior knowledge naturally into the model structure leads to a better generalisation performance, a smaller model size and a reduced complexityin model construction.References[1] S. Chen, S.A. Billings, C.F.N. Cowan, P.M. Grant, Non-linear systems identiﬁcation using radial basis functions, Int. J. Syst. Sci. 21 (12) (1990) 2513–2539.[2] J.A. Leonard, M.A. Kramer, Radial basis function networks for classifyingprocess faults, IEEE Control Systems Mag. 11 (3) (1991) 31–38.[3] S. Chen, B. Mulgrew, P.M. Grant, A clustering technique for digital communications channel equalization using radial basis function networks, IEEETrans. Neural Networks 4 (4) (1993) 570–579.[4] A. Caiti, T. Parisini, Mapping ocean sediments by RBF networks, IEEE J.Oceanic Eng. 19 (4) (1994) 577–582.[5] D. Gorinevsky, A. Kapitanovsky, A. Goldenberg, Radial basis function networkarchitecture for nonholonomic motion planning and control of free-ﬂyingmanipulators, IEEE Trans. Robotics and Autom. 12 (3) (1996) 491–496.[6] I. Cha, S.A. Kassam, RBFN restoration of nonlinearly degraded images, IEEETrans. Image Process. 5 (6) (1996) 964–975.[7] M. Rosenblum, L.S. Davis, An improved radial basis function network forvisual autonomous road following, IEEE Trans. Neural Networks 7 (5) (1996)1111–1120.[8] J.A. Refaee, M. Mohandes, H. Maghrabi, Radial basis function networks forcontingency analysis of bulk power systems, IEEE Trans. Power Syst. 14 (2)(1999) 772–778.[9] S. Muraki, T. Nakai, Y. Kita, K. Tsuda, An attempt for coloring multichannelMR imaging data, IEEE Trans. Visualization and Comput. Graphics 7 (3)(2001) 265–274.[10] R. Mukai, V.A. Vilnrotter, P. Arabshahi, V. Jamnejad, Adaptive acquisition andtracking for deep space array feed antennas, IEEE Trans. Neural Networks 13(5) (2002) 1149–1162.[11] C.-T. Su, T. Yang, C.-M. Ke, A neural-network approach for semiconductor waferpost-sawing inspection, IEEE Trans. Semicond. Manuf. 15 (2) (2002) 260–266.[12] Y. Li, N. Sundararajan, P. Saratchandran, Z. Wang, Robust neuro-H1 controllerdesign for aircraft auto-landing, IEEE Trans. Aerosp. Electron. Syst. 40 (1)(2004) 158–167.[13] M.-J. Lee, Y.-K. Choi, An adaptive neurocontroller using RBFN for robotmanipulators, IEEE Trans. Ind. Electron. 51 (3) (2004) 711–717.[14] S.X. Ng, M.-S. Yee, L. Hanzo, Coded modulation assisted radial basis functionaided turbo equalization for dispersive Rayleigh-fading channels, IEEE Trans.Wireless Commun. 3 (6) (2004) 2198–2206.[15] Y.-J. Oyang, S.-C. Hwang, Y.-Y. Ou, C.-Y. Chen, Z.W. Chen, Data classiﬁcationwith radial basis function networks based on a novel kernel densityestimation algorithm, IEEE Trans. Neural Networks 16 (1) (2005) 225–236.[16] N. Acir, I. Oztu

Data modelling Radial basis function network Black-box model Grey-box model Orthogonal least squares algorithm Symmetry Boundary value constraint abstract A fundamental principle in data modelling is to incorporate available a priori information regarding the underlying data generating mechanism into the modelling process. We adopt this .

Related Documents: