IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. XXX, NO. XXX, XXX 2019 1 Deep .

1y ago
15 Views
2 Downloads
2.80 MB
14 Pages
Last View : 2d ago
Last Download : 3m ago
Upload by : Julius Prosser
Transcription

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. XXX, NO. XXX, XXX 20191Deep Fuzzy Cognitive Maps for InterpretableMultivariate Time Series PredictionJingyuan Wang, Zhen Peng, Xiaoda Wang, Chao Li, and Junjie WuAbstract—The Fuzzy Cognitive Map (FCM) is a powerfulmodel for system state prediction and interpretable knowledgerepresentation. Recent years have witnessed the tremendousefforts devoted to enhancing the basic FCM, such as introducingtemporal factors, uncertainty or fuzzy rules to improve interpretation, and introducing fuzzy neural networks or Waveletsto improve time series prediction. But how to achieve highprecision yet interpretable prediction in cross-domain real-lifeapplications remains a great challenge. In this paper, we proposea novel FCM extension called Deep FCM for multivariatetime series forecasting, in order to take both the advantageof FCM in interpretation and the advantage of deep neuralnetworks in prediction. Specifically, to improve the predictivepower, Deep FCM leverages a fully connected neural network tomodel connections (relationships) among concepts in a system,and a recurrent neural network to model unknown exogenousfactors that have influences on system dynamics. Moreover, tofoster model interpretability encumbered by the embedded deepstructures, a partial derivative-based approach is proposed tomeasure the connection strengths between concepts in DeepFCM. An Alternate Function Gradient Descent algorithm is thenproposed for parameter inference. The effectiveness of Deep FCMis validated over four publicly available datasets with the presenceof seven baselines. Deep FCM indeed provides an important clueto building interpretable predictors for real-life applications.Index Terms—Fuzzy Cognitive Maps, Time Series Prediction,Deep Neural Networks, Interpretable PredictionI. I NTRODUCTIONThe Fuzzy Cognitive Map (FCM) is a flexible and powerfulmodel for system state prediction and interpretable knowledgerepresentation [1]. The FCM model describes a system withmultiple interactive components (i.e., concepts) as a weighteddirected graph, where the vertexes denote system componentsand edges denote the interactions between components. Sincethe knowledge about a system is represented as a graph withclear interactive relationships, FCM is deemed naturally interpretable for system dynamics and has been widely adopted inmany interpretation-sensitive prediction applications, such aspublic policy making [2], business management [3], healthcare diagnosis [4], and behavioral analysis [5].J. Wang, is with School of Computer Science and Engineering, BeijingAdvanced Innovation Center for Big Data and Brain Computing, and StateKey Laboratory of Software Development Environment, Beihang University,Beijing 100191, China.Z. Peng is with School of Economics & Management, Beijing Institute ofPetrochemical Technology, Beijing 102617, China.X. Wang, and C. Li are with School of Computer Science and Engineering,and MOE Engineering Research Center of ACAT, Beihang University, Beijing100191, China.J. Wu is with School of Economics and Management, and Beijing AdvancedInnovation Center for Big Data and Brain Computing, Beihang University,Beijing 100191, China.Corresponding author: Z. Peng, e-mail: zhenpeng@bipt.edu.cnInterpretable knowledge representation and high performance prediction, unluckily, are often mutually exclusive. TheFCM model is not an exception. In the basic FCM model, thegraph edges can only describe static and linear relationships,which degrades the prediction performance of FCM in manycomplex real-world applications, especially when comparedwith neural network-based deep learning models [6]. The deeplearning models, however, are often criticized for their blackbox nature with incomprehensible variables in deep layers,let alone influence relationship explanations to the variables.How to improve the capability of FCM in non-linear dynamicsprediction while keeping the interpretation advantage in themeanwhile, or in other words, to achieve satisfactory interpretable prediction, remains an open and essential problem.In the literature, many extensions have been proposedto enhance the performance of the basic FCM model interms of interpretation and prediction. For instance, temporalfactors are introduced into the FCM framework to modeldynamic relationships, resulting in Dynamical Cognitive Networks, Fuzzy Time Cognitive Maps and Evolutionary FuzzyCognitive Maps [7]–[9]. Fuzzy Grey Cognitive Maps, Intuitionistic Fuzzy Cognitive Maps, and Rough Cognitive Mapsare proposed to model uncertain relationships among systemcomponents [10]–[12]. Rule-based Fuzzy Cognitive Maps andExtended Fuzzy Cognitive Maps adopt logic rules to expressnon-linear relationships in FCM [13], [14]. The fuzzy neuralnetworks and wavelet transform are also adopted to improvethe performance of the FCM framework in time series forecasting applications [15]–[18]. While the above studies indeedhave improved the basic FCM model in different aspects, it isstill in great need to design a new FCM model to gain highprecision yet interpretable prediction power for cross-domainreal-life applications, where non-linear complex dynamicswith unknown exogenous factors are commonly seen.In this paper, the advantage of deep neural network models in high-performance prediction is introduced into theinterpretable FCM framework to build a novel interpretablepredictor called Deep FCM (or DFCM for short). Our modelis designed for the task of multivariate time series forecasting.It extends the basic FCM to a general framework, whichconsists of a fully connected neural network to model nonlinear and non-monotonic influences among system concepts,and a recurrent neural network (RNN) to model unknownexogenous factors that have latent influence on system dynamics. An Alternate Function Gradient Descent algorithm isthen carefully designed for efficient parameter inference ofDeep FCM with built-in deep neural networks. In this way,Deep FCM is equipped with much greater power than the

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. XXX, NO. XXX, XXX 2019c2w12c1c1w52w24w15 w51w41W c3w34w53c5c1w23w45c2c3c5w15c2w24c3w32w34c4 w41c4c4w12w45c5 w51 w52 w53Fig. 1. An illustration of fuzzy cognitive maps.basic FCM in time series prediction.Beyond prediction, we also adopt a partial derivative-basedmethod to measure the connection strength between eachpair of system concepts. This is to ensure that the excellentinterpretability of the basic FCM would not be undermined bythe black-box nature of deep neural network components inDeep FCM. In this way, our model could achieve improvedperformance in multivariate time series prediction while keeping the interpretability of the FCM framework at the sametime. That is why we called Deep FCM an interpretableprediction model.The effectiveness of Deep FCM is verified over four publicly available datasets obtained from different applicationdomains, and is compared with seven competitive baselines.The experimental results show that Deep FCM indeed canachieve much better performance in system state prediction.Meanwhile, the non-linear concept relationships in complexreal-life systems indeed can be accurately captured and clearlyinterpreted by the partial derivative-based method. We alsoverifies the effectiveness of the RNN component in modellingperiodical exogenous factors, which indeed improves the prediction power of Deep FCM.II. R ELATED W ORKSA. Fuzzy Cognitive MapsIn the basic FCM framework [1], a system consisting ofseveral interactive components is described by three elements:Concepts, Activation States, and Relationships. Concepts represent components in a system, activation states representstates of components, and relationships represent influencesamong components.As illustrated in Fig. 1, the FCM models a system withI concepts as a weighted directed graph. We denote the i-thgraph vertex as ci , which is used to express the i-th conceptin the system. The edge weight for vertex i to vertex j isdenoted as wij , which expresses the relationship of ci to cj .The value of wij is in the range of [ 1, 1]. Moreover, eachconcept in FCM has a fuzzy activation state ai [0, 1]. ai 1means ci is completely activated and ai 0 means completelyunactivated. The activation states for ci is a dynamic time(1)(t)(T )(t)series {ai , . . . , ai , . . . ai }, where ai is the state at the(t 1)time t. The state aiin FCM is influenced by the states ofother concepts at the time t as (t 1)ai (t) ϕ ai Xj6 i(t)wji aj ,(1)2where the function ϕ(·) is a membership function to fuzzifythe activation states in [0, 1], and the value of wij is in therange of [ 1, 1] [19].(t)In real-world applications, the activation levels {ai } areobservable time series, and the relationships wij are unknownknowledge to be learnt from the observable activation levels.Given random initial values, the DHL algorithm adjusts wijusing the observable data at time t as follows: (t)(t)(t)(t 1)(t)(2)wij wij λ(t) ai aj wij ,(t)(t)(t 1)where ai ai ai, and λ(t) 0.1(1 t/(1.1q))is a dynamic learning rate, with the parameter q adopted to(t 1)ensure that wij [ 1, 1]. The value of wijis iterativelyupdated until convergence or some stopping criterion is met.The DHL algorithm has many improved versions, such asNHL (Nonlinear Hebbian learning) [20] and AHL (ActiveHebbian learning) [21]. Moreover, evolutionary optimizationsare also adopted to learn W , such as the real-coded geneticalgorithm (RCGA) [22] and the particle swarm optimization(PSO) [23].B. Extensions of FCMDespite the great success made, the basic FCM yet hassome limitations. Firstly, relationships in many real-worldsystems are highly nonlinear and non-monotonic; however, therelationships modeled by basic FCM are linear and monotonic.In the literatures, many FCM extensions have been proposed toovercome this drawback [24]. One main stream of these extensions is using non-linear tools, such as logic rules, to describecomplex concept relationships. For instance, RBFCM usesqualitative fuzzy rules to replace the quantitative mathematicaldescription of relationships [13], and FRI-FCM uses fuzzy IFTHEN rules to express non-linear relationships [25], [26]. Theother stream of the extensions is to introduce uncertainty intorelationships. For instance, FGCM uses the grey system theoryto handle highly uncertain relationships in incomplete andsmall datasets [10], iFCM introduces the intuitionistic fuzzysets to handle the hesitancy in human decision makings [11],BDD-FCM replaces absolute linguistic terms as belief degreedistributions to describe uncertain relationships [27], and therough sets are introduced by RCM to represent diversity ofthe relationship among concepts [12].The second limitation of the basic FCM is its relationshipsare static, which hinders its applications in dynamic systems.To overcome this drawback, temporal factors are introducedinto FCM by many studies. For instance, DCN introduces adynamic function into FCM relationships [7], [28], DRFCMadopts a reinforced learning procedure to update relationshipsdynamically [29], EFCM proposes asynchronous updates ofthe variables to handle dynamics of concept interactions [9],and TAFCM uses timed automata to model dynamic relationships between concepts [30]. A common idea of these worksis to model relationships as a function of time.In summary, while the above-mentioned studies performexcellently in adapting FCM to nonlinear and dynamic relationships, a drawback of these extension models is usershave to design complicated logic rules, uncertain and dynamic

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. XXX, NO. XXX, XXX 20193TABLE IM ATH NOTATIONS .NotationciDefinitionThe i-th concept in a system.(t)aiawijfi (a)The fuzzy activation state of the concept i. The instance at time t is denoted as ai .The fuzzy activation state vector of all concepts (the system state). The instance at time t is denoted as a(t) .The relationship of ci to cj , which is a constant in basic FCM but is a function of a in DFCM.ui (t)The function to model the influence of exogenous factors to ai , named as u-function.y(m,k)v(nm,k)rijwij (ak )The function to model the relationship of the system state a to ai , named as f -function.(t)The output of the m-th neuron in the k-th hidden layer of a f -function. y(m,k) is an instance at time tThe weight of the n-th input of the m-th neuron in the k-th hidden layer of the f -function.The general relationship of the concept ci to cj , which is a function of a in DFCM.A general strength measurement of the causal relationship of the concept ci to cj . Nonlinear FunctionsExternal Factors(a) Basic FCM(b) Deep FCMFig. 2. The framework of Deep FCM.function according to specific applications. On the contrary,the representational learning capacity of neural networks letsusers free from complicated rules designing. There is muchroom in improving the adaptability of the FCM framework indifferent application scenarios. One possible way is to leveragethe representational learning ability of neural networks.C. FCM for Time Series PredictionsThe capabilities of FCM in time series modeling havealready been widely acknowledged. Ref. [31] proposed aframework that first transforms state of a univariate timeseries into fuzzy sets and then uses a basic FCM model topredict the time series. Ref. [32] uses historical states in amoving window as concepts of basic FCM models to predictfuture state of a univariate time series. Ref. [32] proposeda mechanism to optimize the FCM structure, membershipfunctions and moving window size in time series predictiondynamically. Ref. [33] improved the framework of [31] byusing fuzzy C-means to transform time series into informationgranules. Ref. [34] adopted the ARIMA model to improvethe performance of FCMs in time series prediction. In order to handle large-scale nonstationary time series, WaveletHFCM [15] applies redundant Haar wavelet transform todecompose univariate time series into multivariate time series,and uses ridge regression to train FCM models for forecasting.The FCM model was also applied in the multivariate timeseries prediction problem, where the time series of multivariateare considered as states of concepts in a system [16], [35],[36]. Papageorgiou et al. [35] proposed a modified error function to optimize the performance for multi-step multivariatetime series prediction of FCM. Froelich et al. [36] proposeda dynamic optimization for FCM parameter and structureselection in multivariate time series prediction. Papageorgiouet al. [16] proposed a two-stage prediction model which usesevolutionary FCMs to select the most important attributes asinputs in an ANN to make time series prediction.Neural network structures were also adopted by FCM fortime series prediction. Ref. [17] implements FCM based on afuzzy neural network for time series prediction, and Ref. [18]adopts a similar FCM structure to model chaotic time series.However, in order to infer and express relationships betweenconcepts, the structures of fuzzy neural networks in [17]and [18] are strictly limited with small numbers of layers.III. D EEP F UZZY C OGNITIVE M APSIn this section, we propose the Deep Fuzzy Cognitive Mapsmodel (deep FCM or DFCM for short) for multivariate statetime series prediction and influence analysis among systematicconcepts. Fig. 2 shows the framework of Deep FCM. Thenotations used in Deep FCM are listed in Table I.A. Time Series FuzzificationGiven a system consisting of a group of concepts, wedenote the original time series of a concept j as xj (t)(1)(t)(T )xj , . . . , x j , . . . , x j, xj R, j, t. The deep FCMfirst adopts a z-score normalization to preprocess the raw state(t)value xj as(t)(t)zj xj µj,σj(3)where µj and σj are the mean and standard deviation ofxj . Next, deep FCM uses a sigmoid membership function to(t)(t)fuzzify the normalized time series zj into aj (0, 1) asfollows: 1(t)(t)aj ϕ zj ,(4)(t)1 e zjwhere ϕ(·) is the sigmoid membership function that has a(t)(t)range of (0, 1). Apparently, when zj , aj 1

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. XXX, NO. XXX, XXX 2019(t)(t)indicating the active state, and when zj , aj 0(t)(t)indicating the inactive state. When zj ( , ), aj (0, 1), which indicates the state of active to a certain degree.In this way, the original time series values of concepts arerepresented by the fuzzy values of activation levels in [0,1].(t 1)Given a fuzzy activation state aj [0, 1] predicted byDFCM, we use the following function to defuzzify a fuzzyactivation state as its raw value cj :(t 1)xj (t) ϕ 1 aj· σ j µj .(5)In the default assumption, input time series xj are not(t)in crisp values. For the condition xj {0, 1}, we skipthe normalization and fuzzification steps and directly set(t)(t)(t)aj xj . When aj 1 indicating the active state, and(t)when aj 0 indicating the inactive state.B. Modelling Nonlinear InfluenceOne drawback of the basic FCM is its weak capacity inmodeling nonlinear relationships. To deal with this, Deep FCMextends Eq. (1) of the basic FCM to a general form as(t 1)aj ϕ uj (t) fj a(t) .(6)Here, the function fj (·) is used to model relationships of a(t)(t)(t)to aj , where a(t) (a1 , . . . , ai , . . . , aI ) denotes theactivation states of all concepts (i.e., the system state) at timet. The function uj (t) is used to model influences of unknownexogenous factors to aj . We name the two functions as the f function and u-function, respectively. In Eq. 6, the summationof the f -function and the u-function is fuzzified by a sigmoid(t 1)membership function ϕ to generate aj. Obviously, whenP(t)I(t)uj (t) 0 and fj (a ) i 1 wij ai with wjj 1, DFCMdegenerates to the basic FCM. In other words, the basic FCMis a special case of DFCM.The neural network is a powerful model with universalapproximation capability [37]. The f -functions of DFCM areimplemented by feedforward neural networks [6]. Specifically,we define fj (a(t) ) in Eq. (6) as a feedforward neural networkwith K hidden layers. The number of neurons in the layer kis denoted by Mk . In the time slice t, the output of the m-th(t)neuron in the k-th layer, i.e., y(m,k) , is generated by Mk 1(t)y(m,k) ReLU X(t)v(nm,k) y(n,k 1) ,(7)n 1where v(nm,k) is the connection weight from the neuron n inthe layer k 1 to the neuron m in the layer k. ReLU(·) isa Rectified Linear Unit (ReLU) activation function, which isdefined as ReLU(z) z, z 0.0, z 0(8)As mentoined in Ref. [6] the ReLU activation function haveadvanced performance. Moreover, ReLU can also ensurefj (a) 0 at the origin a1 · · · ai · · · aI 0,which is consistent with the basic FCM.4(t)(t)In the input layer of fj (a(t) ), we set y(n,0) an . In theoutput layer, we calculate a predictive output as(t 1)y(K 1) MKX(t)v(n1,K 1) y(n,K) fj (a(t) ).(9)n 1In the f -function, we do not include a bias term in neuronsand do not applied the ReLU activation to the output layerneither. Both of the two treatments are to ensure that DFCM,a deep-structure enhanced FCM, is consistent with the basicFCM. As shown in Eq. (1), it is obvious that the expressionof the basic FCM does not contain bias terms, which allows(t 1)ai 0 at the origin a1 · · · ai · · · aI 0. Inorder to ensure that DFCM has the same feature, we did notinclude bias terms in DFCM, which implies that fj (a) 0at the origin a1 · · · aI 0. In addition, it is easy to noteP(t)(t)that the term ai j6 i wji aj is in the range of ( , ).Therefore, in order to ensure fj (a) ( , ), we did notapply the ReLU activation to the output layer, whose outputwould be in the range of [0, ). Given the above treatments, itPI(t)is obvious that when ui (t) 0 and fi (a(t) ) i 1 wji ajwith wjj 1, the expression of DFCM in Eq. (6) degeneratesto the form of the basic FCM in Eq. (1).C. Modelling Exogenous FactorsThe exogenous factors in the deep FCM refer to thoseexogenous factors that have influence to the system state a butcan not be predefined and directly measured. Let us take forexample the Deep FCM for a road transportation system (moredetails can be found in the experimental section). In this case,the road segments can be modeled as concepts and whetherthe segment congest can be modeled as activation states.The traffic speeds of near road segments can influence eachother, which form relationships among concepts and can bemodeled by f -functions. However, the traffic speeds are alsoinfluenced by some exogenous factors, such as the commutingpatterns of residents, traffic controls, important events and soon. Because the states of these exogenous factors cannotbe directly measured, we cannot use the predefined FCMconcepts to describe them. How to handle these factors isan age-old challenge for system modelling and attributionanalysis studies [38]. In the literature of FCM, influencesof exogenous factors are often modeled as static constantinputs [39], [40]. Our DFCM model is designed for time seriesprediction tasks, so we pay special attention to time-relatedfactors and introduce a LSTM-based u-function to capture theexogenous factors with time dependence.In the deep FCM framework defined in Eq. (6), the influenceof exogenous factors to aj are indirectly measured by a ufunction as (t 1)uj (t) ϕ 1 aj fj a(t) ,(10)i.e., the component of aj that cannot be modeled by the systeminternal relationships through the f -function. The deep FCMadopts a Recurrent Neural Network (RNN) to implement theu-function asuj (t) RNN (t, mod(t, τ ), uj (t 1)) ,(11)

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. XXX, NO. XXX, XXX 2019which has three inputs: the time stamp t, the time stamp tmodulo a period length τ , and the history state uj (t 1).mod(a, b) is a function to calculate a modulo b.We design the u-function in the form of Eq. (11) basedon three considerations: i) uj (t) is a function of time stampt since the influence of exogenous factors usually changewith time. ii) In many scenarios, exogenous factors exhibitperiodicity, such as one day, one week, one month and thelike, so the time stamp t modulo a period length τ is alsoadopted as an input. iii) Moreover, the dynamics of exogenousfactors usually have “memory”, i.e., depend on their historicalstates. Therefore, we use Recurrent Neural Networks (RNN) toimplement the u-function where the historical state uj (t 1) isadopted as an input. In practice, the version of RNN in DFCMis Long Short-Term Memory (LSTM) [41]. The calculation ofuj (t) starts from t 2, where the input uj (t 1) is set as(2)uj (1) ϕ 1 (aj ) fj (a(1) ).5In practices, P (a) and P (a k ) are unknown. A practicablemethod is to use frequency to approximate probability. Forak in a small interval [α, β], we assume there are M samplesfalling in this range. According to the Large Number Law, ther̄ij for ak [α, β] can be calculated approximately asr̄ij (ak ) 1M(O)The overall influence r̄ij(O)r̄ij Xrij (ak , a k ) .can then be approximated asM 1 Xrij a(m) ,M m 1(C)(O)The biggest advantage of FCM lies in its ability in uncovering concept relationships in a complex system. This advantageis also called as interpretability of FCM. The basic FCM useswij to measure the strength of relationships between concepts.The value of wij has the following interpretations: wij 0 means concept ci has no influence at all toconcept cj ;wij (0, 1] means ci has a positive influence to cj ;wij [ 1, 0) means ci has a negative influence to cj .From a computational perspective, the basic FCM also regards wij as “the level of aj ’s increase when ai increases”.Analogously, we proposed a partial derivative-based methodto measure the relationship of ai to aj in DFCM as:rij (a) lim ai 0 fj (a)fj (ai , a i ) fj (ai ai , a i ) ,ai (ai ai ) ai(12)where a i denotes all the elements of a except ai . Thefunction rij (a) expresses the degree of fj ’s increasing whenai increases ai , given the system state a as a condition. Tounderstand the necessity of introducing the condition, we canthink about how human’s body weight influences his healthwith a given age — ideal body weights are different fordifferent ages.Note that the function rij (ak ) here is also a function of theactivation states of unconcerned concepts: a k . To removethe impact of the unconcerned concepts, we calculate theexpectation of rij (ak ) for all possible values of a k asZr̄ij (ak ) P (a k )rij (ak , a k ) dσ ,(13)Da kwhereR P (a k ) is the probability density function of a k ,and Da · dσ is an integral over all possible value of a k . kFurthermore, the overall influence of ai to aj is calculated asthe expectation of rij for all possible values of a, i.e.,(O)r̄ijZ P (a)rij (a) dσ .Da (O) Tanh r̄ij,(17)where Tanh(·) is in the form ofD. Measuring Concept Relationships (16)where a(m) is the system state of the m-th sample.The FCM framework requires the values of relationships tobe in the range of [ 1, 1], so we use the hyperbolic tangentfunction to resize r̄ij aswij (ak ) Tanh (r̄ij (ak )) , wij (15)ak [α,β](14)Tanh(z) ez e z.ez e z(18)(C)The function wij (ak ) is called the Conditional RelationshipStrength of wij w.r.t. ak , which is used to express how therelationship of ci to cj changes with the system state ak . The(O)variable wij is called the Overall Relationship Strength of cito cj , which is used to express the overall relationship strengthof ci to cj .The problem remains unsolved is to calculate the partialderivative fj (a)/ ai in Eq. (12) given a deep structure.According to the chain rule, the partial derivative of fj tothe input y(m,k) in the layer k can be recursively expressed asX y(n,k 1) fj fj . y(m,k) y(n,k 1) y(m,k)n(19)Based on the definition in Eq. (7), the partial derivative y(n,k 1) / y(m,k) is calculated as Mk 1X y(n,k 1)0 v(mn,k 1) ϕv(mn,k 1) y(m,k) , y(m,k)m 1(20)where ϕ0 is the derivative of the ReLU activation function.Note that in the input layer, ai y(i,0) . In this way, rij canbe recursively calculated for any given system state a.IV. PARAMETERS I NFERENCEA. Objective Function(t)(t)(t)Letting zj (xj µj )/σj , i.e., the Z-score of xj , ourDeep FCM defined in Eq. (6) can be rewritten as(t)ẑj(t) fj (a(t) θf ) uj (t θu ),(21)(t)where ẑj denotes the predicted value of zj , and θf , θudenote the parameters of fj and uj , respectively.(t)Because ẑj contains the influence of all concepts and(t)(t)exogenous factors, the residual between ẑj and zj should

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. XXX, NO. XXX, XXX 2019(t)(t)be a random error. We assume ej ẑj zjGaussian noise asej N (0, σe2 ),is a zero mean j,(22)where σe2 is the variance. Given a training data set Zj (1)(t)(T ){zj , . . . , zj , . . . , zj }, the likelihood of Zj for given theDFCM parameters θf , θu is expressed asP (Zj θf , θu ) TY t 1(t)(t)(zj ẑj )21 exp 2σe2σe 2π(23)!(24)T 21 X (t).fj (a(t) θf ) uj (t θu ) zjL(θf , θu ) 2 t 1(25)B. Principle of AFGD AlgorithmThe biggest difference between DFCM and the basic FCMlies in that the DFCM contains many deep neural networkcomponents, such as f -function and u-function. However, thetraditional FCM training algorithms, including the Hebbianlike methods and the evolutionary optimizations, cannot bedirectly used to train deep neural networks, which motivates usto find a new training method, named Alternate Function Gradient Descent (AFGD), to optimize the objective in Eq. (25).Since the Back-Propagation (BP) algorithm [6] is an effectivealgorithm for deep neural network training, we designed theAFGD algorithm based on BP to train the DFCM model.AFGD learns the parameters θf and θu via an iterativeapproach. Specifically, at the q-th round of iteration, AFGDupdates θf and θu such that the following equations hold:(t)uj(q)θf (t) fj (q 1)θf ηf · L(fj ) fj L(uj )(t)θu(q) uj θu(q 1) ηu · uj,(t)fj fj (q 1)θf ,(t)fj fj θu (t)(q)(q 1)(26)(t)(q)(q)where we denote fj (θf ) as fj (a(t) θf ) and uj (θu ) as(q)(q)(q)uj (t θu ) for short, θf and θu are the parameters learnt inthe q-th iteration, and ηf , ηu are two updating parametersrepresenting the learning rates. Eq. (26) ensures that theparameters updating direction is along the negative gradientdirection of the loss function to the functions fj and uj , so wename our algorithm as Alternate Function Gradient Descent.According to the derivations in Appendix A, Eq. (26) isequal to following parameter iteration functions:(q)(t)fj (a(t) θf ) zj uj (t θu(q 1) ),(27)(t)zj(28)uj (t θu(q) ) (t) fj (a(q) θf ).(t)(t)(t)8:yj zj uj (t θu ) for t [1, T ]9: until convergence10: return θf ,θu We uses a the Maximum Likelihood Estimation (MLE)method to infer the parameters θf and θu , which is equal tominimize the loss function defined as (t)yj zj fj (a(t) θf ) for t [1, T ](t)θu BP uj (θu ), {(t, yj )}Tt 1 1: Input: BP g(θ), {(x(t) , y (t) )}Tt 1 , where g(θ) is a neuralt 1(t)6:Algorithm 2 The BP algorithm.T 2X(t)(t)zj ẑj. ln P (Zj θf , θu , σe2 ) (t)Input: Training dataset D {(t, a(t) , zj )}Tt 1 .Initialize θf , θu randomly.(t)(t)Initialize yj zj for t [1, T ].repeat (t)5:θf BP fj (θf ), {(a(t) , yj )}Tt 11:2:3:4:.The negative log likelihood of Zj can be formulated asfjAlgorithm 1 The AFGD algorithm.7:(t)N (ẑj , σe2 )t 1TY6network, and D {(x(t) , y (t) )}Tt 1 is a training dataset.2: repeat3:for all (x(t) , y (t) ) D do4:5:6:7:8: 2Loss(θ) y (t) g(x(t) , θ) Loss(θ)θ θ λ θend foruntil convergencereturn θEqs. (27) and (28) could be intuitively understood as that theu-function and f -function alternately use the other’s predictionresiduals to train their parameters, that is why we call our algorithm as Alternate Function Gradient Descent. The predictionresiduals of the f -function are the influences that cannot bemodeled by the internal FCM concepts, i.e., the influences ofexogenous-factors, and therefore should be absorbed by the ufunction. In

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. XXX, NO. XXX, XXX 2019 1 Deep Fuzzy Cognitive Maps for Interpretable Multivariate Time Series Prediction Jingyuan Wang, Zhen Peng, Xiaoda Wang, Chao Li, and Junjie Wu Abstract—The Fuzzy Cognitive Map (FCM) is a powerful model for system state prediction and interpretable knowledge representation.

Related Documents:

808 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 6, DECEMBER 2006 Interval Type-2 Fuzzy Logic Systems Made Simple Jerry M. Mendel, Life Fellow, IEEE, Robert I. John, Member, IEEE, and Feilong Liu, Student Member, IEEE Abstract—To date, because of the computational complexity of using a general type-2 fuzzy set (T2 FS) in a T2 fuzzy logic system

IEEE 3 Park Avenue New York, NY 10016-5997 USA 28 December 2012 IEEE Power and Energy Society IEEE Std 81 -2012 (Revision of IEEE Std 81-1983) Authorized licensed use limited to: Australian National University. Downloaded on July 27,2018 at 14:57:43 UTC from IEEE Xplore. Restrictions apply.File Size: 2MBPage Count: 86Explore furtherIEEE 81-2012 - IEEE Guide for Measuring Earth Resistivity .standards.ieee.org81-2012 - IEEE Guide for Measuring Earth Resistivity .ieeexplore.ieee.orgAn Overview Of The IEEE Standard 81 Fall-Of-Potential .www.agiusa.com(PDF) IEEE Std 80-2000 IEEE Guide for Safety in AC .www.academia.eduTesting and Evaluation of Grounding . - IEEE Web Hostingwww.ewh.ieee.orgRecommended to you b

ing fuzzy sets, fuzzy logic, and fuzzy inference. Fuzzy rules play a key role in representing expert control/modeling knowledge and experience and in linking the input variables of fuzzy controllers/models to output variable (or variables). Two major types of fuzzy rules exist, namely, Mamdani fuzzy rules and Takagi-Sugeno (TS, for short) fuzzy .

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 11, NO. 4, AUGUST 2003 429 Noise Reduction by Fuzzy Image Filtering Dimitri Van De Ville, Member, IEEE, Mike Nachtegael, Dietrich Van der Weken, Etienne E. Kerre, Wilfried Philips, Member, IEEE, and Ignace Lemahieu, Senior Member, IEEE Abstract— A new fuzzy filter is presented for the noise reduc-

1130 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 20, NO. 6, DECEMBER 2012 Fuzzy c-Means Algorithms for Very Large Data Timothy C. Havens, Senior Member, IEEE, James C. Bezdek, Life Fellow, IEEE, Christopher Leckie, Lawrence O. Hall, Fellow, IEEE, and Marimuthu Palaniswami, Fellow, IEEE Abstract—Very large (VL) data or big data are any data that you cannot load into your computer's working memory.

fuzzy controller that uses an adaptive neuro-fuzzy inference system. Fuzzy Inference system (FIS) is a popular computing framework and is based on the concept of fuzzy set theories, fuzzy if and then rules, and fuzzy reasoning. 1.2 LITERATURE REVIEW: Implementation of fuzzy logic technology for the development of sophisticated

Different types of fuzzy sets [17] are defined in order to clear the vagueness of the existing problems. D.Dubois and H.Prade has defined fuzzy number as a fuzzy subset of real line [8]. In literature, many type of fuzzy numbers like triangular fuzzy number, trapezoidal fuzzy number, pentagonal fuzzy number,

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 4, AUGUST 2001 637 The Shape of Fuzzy Sets in Adaptive Function Approximation Sanya Mitaim and Bart Kosko Abstract— The shape of if-part fuzzy sets affects how well feed-forward fuzzy systems approximate continuous functions. We ex-plore a wide range of candidate if-part sets and derive supervised