Incremental Learning Algorithms And Applications

3y ago
45 Views
2 Downloads
443.21 KB
13 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Randy Pettway
Transcription

Incremental learning algorithms and applicationsAlexander Gepperth, Barbara HammerTo cite this version:Alexander Gepperth, Barbara Hammer. Incremental learning algorithms and applications. EuropeanSymposium on Artificial Neural Networks (ESANN), 2016, Bruges, Belgium. hal-01418129 HAL Id: 1418129Submitted on 16 Dec 2016HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Incremental learning algorithms and applicationsAlexander Gepperth1 , Barbara Hammer2 1- U2IS, ENSTA ParisTechU2IS, ENSTA ParisTech, Inria, Université Paris-Saclay828 Bvd des Maréchaux, 91762 Palaiseau Cedex, France2- Bielefeld University, CITEC centre of excellenceUniversitätsstrasse 21-23D-33594 Bielefeld, GermanyAbstract. Incremental learning refers to learning from streaming data,which arrive over time, with limited memory resources and, ideally, without sacrificing model accuracy. This setting fits different application scenarios where lifelong learning is relevant, e.g. due to changing environments, and it offers an elegant scheme for big data processing by means ofits sequential treatment. In this contribution, we formalise the concept ofincremental learning, we discuss particular challenges which arise in thissetting, and we give an overview about popular approaches, its theoreticalfoundations, and applications which emerged in the last years.1What is incremental learning?Machine learning methods offer particularly powerful technologies to infer structural information from given digital data; still, the majority of current applications restrict to the classical batch setting: data are given prior to training,hence meta-parameter optimisation and model selection can be based on thefull data set, and training can rely on the assumption that the data and itsunderlying structure are static. Lifelong learning, in contrast, refers to the situation of continuous model adaptation based on a constantly arriving data stream[38, 149]. This setting is present whenever systems act autonomously such asin autonomous robotics or driving [156, 5, 112, 65]. Further, online learning becomes necessary in any interactive scenario where training examples are providedbased on human feedback over time [134]. Finally, many digital data sets, albeitstatic, can become so big that they are de facto dealt with as a data stream, i.e.one incremental pass over the full data set [116]. Incremental learning investigates how to learn in such a streaming setting. It comes in various forms in theliterature, and the use of the term is not always consistent. Therefore, first, wegive a meaning to the terms online learning, incremental learning, and conceptdrift, giving particular attention to the supervised learning paradigm.1.1Online learning methodsIn supervised learning, data D (( x1 , y1 ), ( x2 , y2 ), ( x3 , y3 ), . . . , ( xm , ym )) areavailable with input signals xi and outputs yi . The task is to infer a modelM p(y x) from such data. Machine learning algorithms are often trained in abatch mode, i.e., they use all examples ( xi , yi ) at the same time, irrespective oftheir (temporal) order, to perform, e.g., a model optimisation step. This research/work was supported by the Cluster of Excellence Cognitive Interaction Technology ’CITEC’ (EXC 277) at Bielefeld University, which is funded by the German ResearchFoundation (DFG).

Challenge 1: Online model parameter adaptation. In many applicationexamples, data D are not available priorly, but examples arrive over time, andthe task is to infer a reliable model Mt after every time step based on theexample ( xt , yt ) and the previous model Mt 1 only. This is realised by onlinelearning approaches, which use training samples one by one, without knowingtheir number in advance, to optimise their internal cost function. There is acontinuum of possibilities here, ranging from fully online approaches that adapttheir internal model immediately upon processing of a single sample, over socalled mini-batch techniques that accumulate a small number of samples, tobatch learning approaches, which store all samples internally.Online learning is easily achieved by stochastic optimisation techniques suchas online back-propagation, but there are also extensions of the support vectormachine (SVM) [164]. Prototype-based models such as vector quantisation, radial basis function networks (RBF), supervised learning vector quantisation, andself-organising maps (SOM) all naturally realise online learning schemes sincethey rely on a (approximate) stochastic gradient technique [140, 115, 15, 83].Second order numeric optimisation methods and advanced optimisation schemescan be extended as well, such as variational Bayes, convex optimization, secondorder perceptron learning based on higher order statistics in primal or dualspace, and online realisations of the quasi-Newton Broyden-Fletcher-GoldfarbShanno technique [114, 117, 49, 125, 62]. Stochastic optimization schemes canbe developed also for non-decomposable cost function, [80]. Further, lazy learners such as k-nearest neighbour (k-NN) methods lend itself to online scenariosby their design [140]. Interestingly, online learning has already very early beenaccompanied by their exact mathematical investigations [162].1.2Incremental learning methodsIncremental learning refers to online learning strategies which work with limitedmemory resources. This rules out approaches which essentially work in batchmode for the inference of Mt by storing all examples up to time step t in memory; rather, incremental learning has to rely on a compact representation of thealready observed signals, such as an efficient statistics of the data, an alternative compact memory model, or an implicit data representation in terms of themodel parameters itself. At the same time, it has to provide accurate results forall relevant settings, despite its limited memory resources.Challenge 2: Concept drift. Incremental learning shares quite a numberof challenges with online learning, with memory limitations adding quite a fewextras. One prominent problem consists in the fact that, when the temporalstructure of data samples is taken into account, one can observe changes in datastatistics that occur over time, i.e. samples ( xi , yi ) are not i.i.d. Changes inthe data distribution over time are commonly referred to as concept drift [88,157, 33, 126]. Different types of concept drift can be distinguished: changes inthe input distribution p( x) only, referred to as virtual concept drift or covariateshift, or changes in the underlying functionality itself p(y x), referred to as realconcept drift. Further, concept drift can be gradual or abrupt. In the latter caseone often uses the term concept shift. The term local concept drift characteriseschanges of the data statistics only in a specific region of data space [157]. Aprominent example is the addition of a new, visually dissimilar object class to a

classification problem. Real concept drift is problematic since it leads to conflictsin the classification, for example when a new but visually similar class appearsin the data: this will in any event have an impact on classification performanceuntil the model can be re-adapted accordingly.Challenge 3: The stability-plasticity dilemma. In particular for noisyenvironments or concept drift, a second challenge consists in the question whenand how to adapt the current model. A quick update enables a rapid adaptationaccording to new information, but old information is forgotten equally quickly.On the other hand, adaption can be performed slowly, in which case old information is retained longer but the reactivity of the system is decreased. The dilemmabehind this trade-off is usually denoted the stability-plasticity dilemma, which isa well-known constraint for artificial as well as biological learning systems [113].Incremental learning techniques, which adapt learned models to concept driftonly in those regions of the data space where concept drift actually occurs, offera partial remedy to this problem. Many online learning methods alone, albeitdealing with limited resources, are not able to solve this dilemma since theyexhibit a so-called catastrophic forgetting behaviour [108, 132, 45, 44, 103] evenwhen the new data statistics do not invalidate the old ones.One approach to deal with the stability-plasticity dilemma consists in theenhancement of the learning rules by explicit meta-strategies, when and how tolearn. This is at the core of popular incremental models such as ART networks[77, 56], or meta-strategies to deal with concept drift such as the just-in-timeclassifier JIT [3], or hybrid online/offline methods [120, 43]. One major ingredient of such strategies consist in a confidence estimation of the actual modelprediction, such as statistical tests, efficient surrogates, or some notion of selfevaluation [8, 43, 78]. Such techniques can be enhanced to complex incrementalschemes for interactive learning or learning scaffolding [130, 84].Challenge 4: Adaptive model complexity and meta-parameters. Forincremental learning, model complexity must be variable, since it is impossible toestimate the model complexity in advance if the data are unknown. Dependingon the occurrence of concept drift events, an increased model complexity mightbecome necessary. On the other hand, the overall model complexity is usuallybounded from above by the limitation of the available resources. This requiresthe intelligent reallocation of ressources whenever this limit is reached. Quite anumber of approaches propose intelligent adaptation methods for the model complexity such as incremental architectures [166], self-adjustment of the numberof basic units in extreme learning machines [31, 177] or prototype-based models[77, 144, 98], incremental base function selection for a sufficiently powerful datarepresentation [23], or self-adjusting cluster numbers in unsupervised learning[79]. Such strategies can be put into the more general context of self-evolvingsystems, see e.g. [92] for an overview. An incremental model complexity is notonly mandatory whenever concept drift is observed, hence a possibly changingmodel complexity is present, but it can also dramatically speed-up learning inbatch scenarios, since it makes often tedious model selection superfluous.In batch learning, not only the model complexity, but also essential metaparameters such as learning rate, degree of error, regularisation constants, etc.are determined prior to training. Often, time consuming cross-validation is usedin batch learning, whereby first promising result how to automate the process

have been presented [155]. However, these are not suited for incremental learningscenarios: Concept drift turns critical meta-parameters such as the learning rateinto model parameters, since their choice has to be adapted according to the(changing) data characteristics. Due to this fact, incremental techniques oftenrely on models with robust meta-parameters (such as ensembles), or they usemeta-heuristics how to adapt these quantities during training.Challenge 5: Efficient memory models. Due to their limited resources,incremental learning models have to store the information provided by the observed data in compact form. This can be done via suitable system invariants(such as the classification error for explicit drift detection models [33]), via themodel parameters in implicit form (such as prototypes for distance- based models[63]), or via an explicit memory model [96, 98]. Some machine learning models offer a seamless transfer of model parameters and memory models, such asprototype- or exemplar–based models, which store the information in the formof typical examples [63]. Explicit memory models can rely on a finite windowof characteristic training examples, or represent the memory in the form of aparametric model. For both settings, a careful design of the memory adaptationis crucial since it directly mirrors the stability-plasticity dilemma [96, 98].Challenge 6: Model benchmarking. There exist two fundamentally different possibilities to assess the performance of incremental learning algorithms:(1) Incremental -vs- non-incremental: In particular in the absence of conceptdrift, the aim of learning consists in the inference of the stationary distributionp(y x) for typical data characterised by p( x). This setting occurs e.g. wheneverincremental algorithms are used for big data sets, where they compete with oftenparallelised batch algorithms. In such settings, the method of choice evaluatethe classification accuracy of the final model Mt on a test set, or within a crossvalidation. While incremental learning should attain results in the same rangeas batch variants, one must take into account that they deal with restrictedknowledge due to their streaming data access. It has been shown, as an example,that incremental clustering algorithms cannot reach the same accuracy as batchversions if restricted in terms of their resources [2].(2) Incremental -vs- incremental: When facing concept drift, different costfunctions can be of interest. Virtual concept drift aims for the inference of astationary model p(y x) with drifting probability p( x) of the inputs. In suchsettings, the robustness of the model when evaluated on test data which follow apossibly skewed distribution is of interest. Such settings can easily be generatede.g. by enforcing imbalanced label distributions for test and training data [73].Whenever real confidence drift is present, the online behaviour of the classification error kMt ( xt 1 ) yt 1 k for the next data point is usually the method ofchoice; thereby, a simple average of these errors can be accompanied by a detailed inspection of the overall shape of the online error, since it provides insightinto the rates of convergence e.g. for abrupt concept drift.(3) Formal guarantees on the generalisation behaviour: Since many classicalalgorithms such as the simple perceptron or large margin methods have beenproposed as online algorithms, there exists an extensive body of work investigating their learning behaviour, convergence speed, and generalisation ability,classically relying on the assumption of data being i.i.d. [162]. Some resultsweaken the i.i.d. assumption e.g. requiring only interchangeability [146]. Re-

cently, popular settings such as learning a (generalised) linear regression couldbe accompanied by convergence guarantees for arbitrary distributions p( x) bytaking a game theoretic point of view: in such settings, classifier Mt and trainingexample xt 1 can be taken in an adversial manner, still allowing fast convergencerates in relevant situations [158, 131, 87, 151] The approach [117] even providesfirst theoretical results for real context drift, i.e. not only the input distribution,but also the conditional distribution p(y x) can follow mild changes.2Incremental learning modelsIncremental learning comes in various forms in the literature, and the use of theterm is not always consistent; for some settings, as an example, a memory limitation cannot be guaranteed, or models are designed for stationary distributionsonly. We will give an overview over popular models in this context. Thereby,we will mostly focus on supervised methods due to its popularity. Online orincremental learning techniques have also been developed for alternative taskssuch as clustering [109, 91], dimensionality reduction [25, 12, 24, 6, 123, 93],feature selection and data representation[179, 59, 42, 173, 27, 72], reinforcementlearning [11, 60], mining and inference [54, 129].Explicit treatment of concept drift. Dealing with concept drift at execution time constitutes a challenging task [88, 157, 33, 126]. There exist differenttechniques to address concept drift, depending on its type. Mere concept shiftis often addressed by so-called passive methods, i.e. learning technologies whichsmoothly adapt model parameters such that the current distribution is reliablyrepresented by the model. Rapid concept changes, however, often require activemethods, which detect concept drift and react accordingly.Virtual concept drift, which concerns the input distribution only, can easilyoccur e.g. due to highly imbalanced classes over time. One popular state-ofthe-art technology accounts for this fact by so-called importance weighting, i.e.strategies which explicitly or implicitly reweight the observed samples such thata greater robustness is achieved [81, 10, 73]. Alternatively, concept shift canhave its reason in novelty within the data or even new classes. Such settingscan naturally be incorporated into local models provided they offer an adaptivemodel complexity [144, 43, 56, 133, 100].Real concept drift can be detected by its effect on characteristic features ofthe model such as the classification accuracy. Such quantitative features can beaccompanied by statistical tests which can judge the significance of their chance,hence concept drift. Tests can rely on well-known statistics such as the Hoeffdingbound [48], or alternatively on suitable distances such as the Hellinger distance,which can measure the characteristics of value distributions of such characteristicfeatures. When integrated into robust classifiers such as ensemble techniques,models which can simultaneously deal with different types of drift result [16].Support vector machines and generalised linear models. Several incremental SVM models exist [164]. Some rely on heuristics, like retraining a modelwith all support vectors plus a new ”incremental” batch of data [35, 152], butwithout theoretical guarantees. Other incorporate modification of the SVM costfunction to facilitate incrementality [141] and also possibly control complexity

[58, 57]. Still, their resources are not strictly limited. As an alternative, adiabatically SVM training has been proposed, i.e., presenting one example at atime while maintaining the relevant optimality conditions on all previously seenexamples. However this requires all previously seen samples need to be stored,although the approach can considerably simplify SVM training. Ensemble learning algorithms based on SVM [127, 164] achieve incremental learning by trainingnew classifiers for new batches of data, and combining all existing classifiers onlyfor decision making. Another hybrid scheme combines a SVM classifier with aprotoype-based data representation, whereby the latter can be designed as anonline model based on which training examples for SVM can be generated [169].Alternatively, SVMs can directly be trained in primal space, where online learning is immediate [22]. Online versions have also been proposed for more complexgeneralised linear models such as Gaussian Process regression [53, 110], wherebynone of these models can yet easily deal with concept drift.Connectionist models. As the problem of catastrophic forgetting was firstremarked for multilayer perceptrons (MLP) [108, 132], it is hardly surprisingthat there exists significant work how to avoid it in connectionist systems. Initial consensus traced catastrophic forgetting back to their distributed informationrepresentation [46]. Indeed, localist connectionist models such as radial basisfunction (RBF) networks can work reliably in incremental settings [133, 100],whereby care has to be taken to guarantee their generalisation performance [147].Both capabilities are combined in semi-distributed representations. A number ofalgorithmic mo

Incremental learning algorithms and applications Alexander Gepperth1, Barbara Hammer2 1- U2IS, ENSTA ParisTech U2IS, ENSTA ParisTech, Inria, Universit e Paris-Saclay 828 Bvd des Mar echaux, 91762 Palaiseau Cedex, France 2- Bielefeld University, CITEC centre of excellence Universit atsstrasse 21-23 D-33594 Bielefeld, Germany Abstract.

Related Documents:

Some other works on incremental learning and its applications include the incremental learning fuzzy neural (ILFN) network for fault detection and classification [5], incremental learning for multi-sensor data fusion [6], incremental genetic learning for data classification [7], incremental semi-supervised learn-ing [8], incremental learning .

It is worth mentioning that our proposed algorithms can be easily extended to an asynchronous decentralized parallel setting and thus can further meet the requirements of large-scale applications. 2 Epigraphical Projection-based Incremental Algorithms In this section, we present our incremental algorithms for solving the ‘ p-DRSVM problem. For

Incremental learning for a mining algorithm, especially the classification mining algorithms, is a very important ability. Many studies of incremental learning ability were down with many classification methods like RBF neural network, Support vector and k-Nearest Neighbor [6-9]. And the applications of incremental classification focus

Incremental learning refers to learning from streaming data, . and applications which emerged in the last years. . ent possibilities to assess the performance of incremental learning algorithms: (1) Incremental-vs- non-incremental: In particular in the absence of concept

fundamentally different from the incremental algorithms studied in this paper.2 The paper is organized as follows. In Section 2, we formulate the problem of interest, and introduce the cyclic incremental and Markov randomized incremental method with stochastic errors. We also discuss some applications that motivate our interest in these methods.

Currently FUFP, pre-FUFP and IMBT algorithms have been developed that support incremental learning. The IMBT uses a binary tree data structure called an Incremental mining binary tree. This work proposes a novel incremental learning algorithm that makes use of a data structure called Item-Itemset(I-Is) tree that is a variation of B tree.

44 Incremental Backups: John Snow; FOSDEM 2017 Life Cycle - First Incremental (The first step of our journey) Example 3: Create an incremental backup. Can be done via transaction or single QMP command. { "execute": "drive-backup", "arguments": {"device": "drive0", "bitmap": "bitmap0", "target": "inc.0.qcow2", "format": "qcow2", "sync": "incremental",

An Incremental Learning Algorithm with Confidence Estimation for Automated . Scenarios requiring incremental learning arise often in NDE applications. For example, in nuclear power plants, . including online learning algorithms that learn one instance at a time [16], [17], and