7m ago

11 Views

1 Downloads

1.07 MB

11 Pages

Transcription

LiBRe: A Practical Bayesian Approach to Adversarial DetectionZhijie Deng1 , Xiao Yang1 , Shizhen Xu2 , Hang Su1 , Jun Zhu1 *1Dept. of Comp. Sci. and Tech., BNRist Center, Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab1Tsinghua University, Beijing, 100084, China 2 RealAI{dzj17,yangxiao19}@mails.tsinghua.edu.cn, shizhen.xu@realai.ai, {suhangss,dcszj}@tsinghua.edu.cnAbstractDespite their appealing flexibility, deep neural networks(DNNs) are vulnerable against adversarial examples. Various adversarial defense strategies have been proposed to resolve this problem, but they typically demonstrate restrictedpracticability owing to unsurmountable compromise on universality, effectiveness, or efficiency. In this work, we propose a more practical approach, Lightweight Bayesian Refinement (LiBRe), in the spirit of leveraging Bayesian neural networks (BNNs) for adversarial detection. Empowered by the task and attack agnostic modeling under Bayesprinciple, LiBRe can endow a variety of pre-trained taskdependent DNNs with the ability of defending heterogeneous adversarial attacks at a low cost. We develop andintegrate advanced learning techniques to make LiBRe appropriate for adversarial detection. Concretely, we buildthe few-layer deep ensemble variational and adopt the pretraining & fine-tuning workflow to boost the effectivenessand efficiency of LiBRe. We further provide a novel insight to realise adversarial detection-oriented uncertaintyquantification without inefficiently crafting adversarial examples during training. Extensive empirical studies covering a wide range of scenarios verify the practicability of LiBRe. We also conduct thorough ablation studies to evidencethe superiority of our modeling and learning strategies.11. IntroductionThe blooming development of deep neural networks(DNNs) has brought great success in extensive industrialapplications, such as image classification [23], face recognition [9] and object detection [49]. However, despitetheir promising expressiveness, DNNs are highly vulnerable to adversarial examples [56, 19], which are generated byadding human-imperceptible perturbations upon clean examples to deliberately cause misclassification, partly due totheir non-linear and black-box nature. The threats from ad* Corresponding author1 Code at module benign adversarialtask-dependentpredictionDeterministic layersaccept/rejectFigure 1: Given a pre-trained DNN, LiBRe converts its last fewlayers (excluding the task-dependent output head) to be Bayesian,and reuses the pre-trained parameters. Then, LiBRe launchesseveral-round adversarial detection-oriented fine-tuning to renderthe posterior effective for prediction and meanwhile appropriatefor adversarial detection. In the inference phase, LiBRe estimatesthe predictive uncertainty and task-dependent predictions of theinput concurrently, where the former is used for adversarial detection and determines the fidelity of the latter.versarial examples have been witnessed in a wide spectrumof practical systems [51, 12], raising an urgent requirementfor advanced techniques to achieve robust and reliable decision making, especially in safety-critical scenarios [13].Though increasing methods have been developed totackle adversarial examples [41, 67, 25, 18, 66], they arenot problemless. On on hand, as one of the most popular adversarial defenses, adversarial training [41, 67] introduces adversarial examples into training to explicitly tailorthe decision boundaries, which, yet, causes added trainingoverheads and typically leads to degraded predictive performance on clean examples. On the other hand, adversarial detection methods bypass the drawbacks of modifyingthe original DNNs by deploying a workflow to detect theadversarial examples ahead of decision making, by virtueof auxiliary classifiers [43, 18, 66, 5] or designed statistics [14, 39]. Yet, they are usually developed for specifictasks (e.g., image classification [66, 31, 18]) or for specificadversarial attacks [38], lacking the flexibility to effectivelygeneralize to other tasks or attacks.By regarding the adversarial example as a special caseof out-of-distribution (OOD) data, Bayesian neural networks (BNNs) have shown promise in adversarial detection [14, 37, 53]. In theory, the predictive uncertainty acquired under Bayes principle suffices for detecting hetero-972

geneous adversarial examples in various tasks. However, inpractice, BNNs without a sharpened posterior often presentsystematically worse performance than their deterministiccounterparts [60]; also relatively low-cost Bayesian inference methods frequently suffer from mode collapse andhence unreliable uncertainty [15]. BNNs’ requirement ofmore expertise for implementation and more efforts fortraining than DNNs further undermine their practicability.In the work, we aim to develop a more practical adversarial detection approach by overcoming the aforementionedissues of BNNs. We propose Lightweight Bayesian Refinement (LiBRe), depicted in Fig. 1, to reach a good balanceamong predictive performance, quality of uncertainty estimates and learning efficiency. Concretely, LiBRe followsthe stochastic variational inference pipeline [2], but is empowered by two non-trivial designs: (i) To achieve efficientlearning with high-quality outcomes, we devise the FewlAyer Deep Ensemble (FADE) variational, which is reminiscent of Deep Ensemble [30], one of the most effective BNNmethods, and meanwhile inspired by the scalable last-layerBayesian inference [28]. Namely, FADE only performsdeep ensemble in the last few layers of a model due to theircrucial role for determining model behaviour, while keepsthe other layers deterministic. To encourage various ensemble candidates to capture diverse function modes, we develop a stochasticity-injected learning principle for FADE,which also benefits to reduce the gradient variance of theparameters. (ii) To further ease and accelerate the learning, we propose a Bayesian refinement paradigm, where weinitialize the parameters of FADE with the parameters ofits pre-trained deterministic counterpart, thanks to the highalignment between FADE and point estimate. We then perform fine-tuning to constantly improve the FADE posterior.These designs make the whole learning procedure analogous to training a standard DNN, freeing the end users fromthe piecemeal details of Bayesian learning.As revealed by [22], the uncertainty quantification purelyacquired from Bayes principle may be unreliable for perceiving adversarial examples, thus it is indispensable to pursue an adversarial detection-oriented uncertainty correction.For universality, we place no assumption on the adversarialexamples to detect, so we cannot take the common strategy of integrating the adversarial examples crafted by specific attacks into detector training [39]. Alternatively, wecheaply create uniformly perturbed examples and demandhigh predictive uncertainty on them during Bayesian refinement to make the model be sensitive to data with any style ofperturbation. Though such a correction renders the learnedposterior slightly deviated from the true Bayesian one, it cansignificantly boost adversarial detection performance.The task and attack agnostic designs enable LiBRe toquickly and cheaply endow a pre-trained task-dependentDNN with the ability to detect various adversarial exampleswhen facing new tasks, as testified by our empirical studiesin Sec 5. Furthermore, LiBRe has significantly higher inference (i.e., testing) speed than typical BNNs thanks to theadoption of lightweight variational. We can achieve furtherspeedup by exploring the potential of parallel computing,giving rise to inference speed close to the DNN in the samesetting. Extensive experiments in scenarios ranging fromimage classification, face recognition, to object detectionconfirm these claims and testify the superiority of LiBRe.We further perform thorough ablation studies to deeply understand the adopted modeling and learning strategies.2. Related WorkDetecting adversarial examples to bypass their safetythreats has attracted increasing attention recently. Manyworks aim at distinguishing adversarial examples from benign ones via an auxiliary classifier applied on statisticalfeatures [18, 66, 5, 7, 63]. [21] introduces an extra class inthe classifier for adversarial examples. Some recent worksexploit neighboring statistics to construct more powerfuldetection algorithms: [31] fits a Gaussian mixture model ofthe network responses, and resorts to the Mahalanbobis distance for adversarial detection in the inference phase; [39]introduces the more advanced local intrinsic dimensionality to describe the distance distribution and observes betterresults. RCE [46] is developed with the promise of leadingto an enhanced distance between adversarial and normal images for kernel density [14] based detection. However, mostof the aforementioned methods are restricted in the classification scope, and the detectors trained against certain attacks may not effectively generalize to unseen attacks [38].Bayesian deep learning [20, 59, 2, 1, 35, 26] providesus with a more theoretically appealing way to adversarialdetection. However, though the existing BNNs manage toperceive adversarial examples [14, 48, 53, 37, 47, 32], theyare typically limited in terms of training efficiency, predictive performance, etc., and thus cannot effectively scale upto real-world settings. More severely, the uncertainty estimates given by the BNNs for adversarial examples are notalways reliable [22], owing to the lack of particular designsfor adversarial detection. In this work, we address these issues with elaborated techniques and establish a more practical adversarial detection approach.3. BackgroundIn this section, we motivate Lightweight Bayesian Refinement (LiBRe) by briefly reviewing the background ofadversarial defense, and then describe the general workflowof Bayesian neural networks (BNNs).3.1. Adversarial DefenseTypically, let D {(xi , yi )}ni 1 denote a collection of ntraining samples with xi Rd and yi Y as the input data973

and label, respectively. A deep neural network (DNN) parameterized by w Rp is frequently trained via maximuma posteriori estimation (MAP):maxwn11Xlog p(yi xi ; w) log p(w),n i 1n(1)where p(y x; w) refers to the predictive distribution of theDNN model. By setting the prior p(w) as an isotropicGaussian, the second term amounts to the L2 (weight decay) regularizer with a tunable coefficient λ in optimization.Generally speaking, the adversarial example correspondingto (xi , yi ) against the model is defined asxadv xi arg min log p(yi xi δi ; w),ispectrum of approximate Bayesian inference methods, variational BNNs are particularly attractive due to their close resemblance to standard backprop [20, 2, 36, 54, 55, 52, 45].Generally, in variational BNNs, we introduce a variationaldistribution q(w θ) with parameters θ and maximize theevidence lower bound (ELBO) for learning (scaled by 1/n):max Eq(w θ)θp(y x, D) Eq(w θ) [p(y x; w)] δ Si3.2. Bayesian Neural NetworksIn essence, the problem of distinguishing adversarialexamples from benign ones can be viewed as a specialized out-of-distribution (OOD) detection problem of particular concern in safety-sensitive scenarios – with the modeltrained on the clean data, we expect to identify the adversarial examples from a shifted data manifold, though theshift magnitude may be subtle and human-imperceptible. Inthis sense, we naturally introduce BNNs into the picture attributed to their principled OOD detection capacity alongwith the equivalent flexibility for data fitting as DNNs.Modeling and training. Typically, a BNN is specifiedby a parameter prior p(w) and an NN-instantiated data likelihood p(D w). We are interested in the parameter posteriorp(w D) instead of a point estimate as in DNN. It is knownthat precisely deriving the posterior is intractable owing tothe high non-linearity of neural networks. Among the wide#n11Xlog p(yi xi ; w) DKL (q(w θ)kp(w)).n i 1n(3)Inference. The obtained posterior q(w θ)2 offers usthe opportunities to predict robustly. For computationaltractability, we usually estimate the posterior predictive via:(2)where S {δ : kδk ǫ} is the valid perturbation setwith ǫ 0 as the perturbation budget and k·k as some norm(e.g., l ). Extensive attack methods have been developedwith promise to solve the above minimization problem [19,40, 4, 57], based on gradients or not.The central goal of adversarial defense is to protect themodel from making undesirable decisions for the adversarial examples xadvi . A representative line of work approachesthis objective by augmenting the training data with on-thefly generated adversarial examples and forcing the model toyield correct predictions on them [41, 67]. But their limitedtraining efficiency and compromising performance on cleandata pose a major obstacle for real-world adoption. As analternative, adversarial detection methods focus on distinguishing the adversarial examples from the normal ones soas to bypass the potentially harmful outcomes of makingdecisions for adversarial examples [43, 5, 39]. However,satisfactory transferability to unseen attacks and tasks beyond image classification remains elusive [38]."T1 Xp(y x; w(t) ), (4)T t 1where w(t) q(w θ), t 1, ., T denote the Monte Carlo(MC) samples. In other words, the BNN assembles the predictions yielded by all likely models to make more reliableand calibrated decisions, in stark contrast to the DNN whichonly cares about the most possible parameter point.Measure of uncertainty. For adversarial detection, weare interested in the epistemic uncertainty which is indicative of covariate shift. A superior choice of uncertainty metric is the softmax variance given its previous success for adversarial detection in image classification [14] and insightful theoretical support [53]. However, the softmax outputof the model may be less attractive during inference (e.g.,in open-set face recognition), letting alone that not all thecomputer vision tasks can be formulated as pure classification problems (e.g., object detection). To make the metric faithful and readily applicable to diverse scenarios, weconcern the predictive variance of the hidden feature z corresponding to x, by mildly assuming the information flowinside the model as x z y. We utilize an unbiasedvariance estimator and summarize the variance of all coordinates of z into a scalar via:#" TTX (t) 21 X (t) 21kz k2 T (kz k2 ) ,U (x) T 1 t 1T t 1(5)where z (t) denotes the features of x under parameter sample w(t) q(w θ), t 1, ., T , with k·k2 as ℓ2 norm. Itis natural to simultaneously make prediction and quantifyuncertainty via Eq. (4) and Eq. (5) when testing.4. Lightweight Bayesian RefinementDespite their theoretical appealingness, BNNs are seldom adopted for real-world adversarial detection, owing toa wide range of concerns on their training efficiency, predictive performance, quality of uncertainty estimates, and2 We use q(w θ) equivalently with p(w D) in the following if there isno misleading.974

inference speed. In this section, we provide detailed andnovel strategies to relieve these concerns and build the practical Lightweight Bayesian Refinement (LiBRe) framework.Variational configuration. At the core of variationalBNNs lies the configuration of the variational distribution. The recent surge of variational Bayes has enabled usto leverage mean-field Gaussian [2], matrix-variate Gaussian [36, 54], multiplicative normalizing flows [37] andeven implicit distributions [33, 52] to build expressive andflexible variational distributions. However, on one side,there is evidence to suggest that more complex variationals are commonly accompanied with less user-friendly andless scalable inference processes; on the other side, morepopular and more approachable variationals like mean-fieldGaussian, low-rank Gaussian [15] and MC Dropout [17]tend to concentrate on a single mode in the function space,rendering the yielded uncertainty estimates unreliable [15].Deep Ensemble [30], a powerful alternative to BNNs,builds a set of parameter candidates θ {w(c) }Cc 1 , whichare separately trained to account for diverse function modes,and uniformly assembles their corresponding predictionsfor inference. In a probabilistic view,PCDeep Ensemblebuilds the variational q(w θ) C1 c 1 δ(w w(c) )with δ as the Dirac delta function. Yet, obviously, optimizing the parameters of such a variational is computationally prohibitive [30]. Motivated by the success of last-layerBayesian inference [28], we propose to only convert the lastfew layers of the feature extraction module of a DNN, e.g.,the last residual block of ResNet-50 [23], to be Bayesianlayers whose parameters take the deep ensemble variational.Formally, breaking down w into wb and w b , which denote the parameters of the tiny Bayesian sub-module andthe other parameters in the model respectively, we devisethe Few-lAyer Deep Ensemble (FADE) variational:q(w θ) C1 X(c)(0)δ(wb wb )δ(w b w b ),C c 1(0)(1)(6)(C)where θ {w b , wb , ., wb }. Intuitively, FADE willstrikingly ease and accelerate the learning, permitting scaling Bayesian inference up to deep architectures trivially.ELBO maximization. Given the FADE variational, wedevelop an effective and user-friendly implementation forlearning. Equally assuming an isotropic Gaussian prior asthe MAP estimation for DNN, the second term of the ELBOin Eq. (3) boils down to weight decay regularizers with co(c)(0)efficients λ on w b and Cλ on wb , c 1, ., C, which canbe easily implemented inside the optimizer.3 Then, we onlyneed to explicitly deal with the first term in the ELBO. Analytically estimating the expectation in this term is feasiblebut may hinder different parameter candidates from exploring diverse function modes (as they may undergo similar3 The derivation is based on relaxing the Dirac distribution as Gaussianwith small variance. See Sec 3.4 of [16] for detailed derivation insights.optimization trajectories). Thus, we advocate maximizing astochastic estimate of it on top of stochastic gradient ascent:max L θ1 B (c)X(0)log p(yi xi ; wb , w b ),(7)(x ,y ) Biiwhere B is a stochastic mini-batch, and c is drawn fromunif{1, C}, i.e., the uniform distribution over {1, ., C}.However, intuitively, w L exhibits high variance acrossiterations due to its correlation with the varying choice of c,which is harmful for the convergence (see Sec 5.4 and [27]).To disentangle such correlation, we propose to replace the(c)batch-wise parameter sample wb with instance-wise ones(0) b(c )i.i.d.(0)wb , ci unif{1, C}, i 1, ., B , which ensures w bto comprehensively consider the variable behaviour of theBayesian sub-module at per iteration. Formally, we solvethe following problem for training:imax L θ1 B X(c )(0)log p(yi xi ; wb , w b ).i(8)(x ,y ) BiiUnder such a learning criterion, each Bayesian parameter candidate accounts for a stochastically assigned, separate subset of B. Such stochasticity will be injected intothe gradient ascent dynamics and serves as an implicit(c)regularization [42], leading {wb }Cc 1 to investigate diverse weight sub-spaces and ideally diverse function modes.Compared to Deep Ensemble [30] which depends on random initialization to avoid mode collapse, our approach ismore theoretically motivated and more economical.Though computing L involves the same FLOPS ascomputing L, there is a barrier to make the computationcompatible with modern autodiff libraries and time-saving– de facto computational kernels routinely process a batchgiven shared parameters while estimating L needs thekernels to embrace instance-specialized parameters in theBayesian sub-module. In spirit of parallel computing, weresort to the group convolution, batch matrix multiplication,etc. to address this issue. The resultant computation burdenis negligibly more than the original DNN thanks to the support of powerful backends like cuDNN [6] for these operators and the tiny size of the Bayesian sub-module.Adversarial example free uncertainty correction. It isa straightforward observation that the above designs of theBNN are OOD data agnostic, leaving the ability to detectadversarial examples solely endowed by the rigorous Bayesprinciple. Nevertheless, as a special category of OOD data,adversarial examples hold several special characteristics,e.g., the close resemblance to benign data and the strongoffensive to the behaviour of black-box deep models, whichmay easily destroy the uncertainty based adversarial detection [22]. A common strategy to address this issue is toincorporate adversarial examples crafted by specific attacksinto detector training [39], which, yet, is costly and may975

limit the learned models from generalizing to unseen attacks. Instead, we propose an adversarial example free uncertainty correction strategy by considering a superset of theadversarial examples. We feed uniformly perturbed traininginstances (which encompass all kinds of adversarial examples) into the BNN and demand relatively high predictiveuncertainty on them. Formally, with ǫtrain as the trainingperturbation budget, we perturb a mini-batch of data viai.i.d.Algorithm 1: Lightweight Bayesian Refinement1234x̃i xi δi , δi U( ǫtrain , ǫtrain ) , i 1, ., B . (9)5Then we calculate the uncertainty measure U cheaply withT 2 MC samples, and regularize the outcome via solvingthe following margin loss:7max R θ1 B dX(c )i,ji,1(c ) z̃i k22 , γ),i,2(10)(x ,y ) Biwhere z̃i(c )min(kz̃iirefers to the features of x̃i given parameter sam(c )(0)i.i.d.ple w(c ) {wb , w b }, with ci,j unif{1, C} andci,1 6 ci,2 , i 1, ., B , j 1, 2. γ is a tunable threshold. Surprisingly, this regularization remarkably boosts theadversarial detection performance (see Sec 5.4).Efficient learning by refining pre-trained DNNs.Though from-scratch BNN training is feasible, a recentwork demonstrate that it probably incurs worse predictive performance than a fairly trained DNN [60]. Therefore, given the alignment between the posterior parameters(0)(1)(C)θ {w b , wb , ., wb } and their DNN counterparts,we suggest to perform cost-effective Bayesian refinementupon a pre-trained DNN model, which renders our workflow more appropriate for large-scale learning.With the pre-training DNN parameters denoted as w† (c)(0)††}, we initialize w b as w band wb as wb† for{wb† , w bc 1, ., C. Continuing from this, we fine-tune the variational parameters to maximize L αR4 under weightdecay regularizers with suitable coefficients to realise adversarial detection-oriented posterior inference. The wholealgorithmic procedure is presented in Algorithm 1. Sucha practical and economical refinement significantly benefitsfrom the prevalence of open-source DNN model zoo, and ispromised to maintain non-degraded predictive performanceby the well-evaluated pre-training & fine-tuning workflow.Inference speedup. After learning, a wide criticism onBNNs is their requirement for longer inference time thanDNNs. This is because BNNs leverage a collection of MCsamples to marginalize the posterior for prediction and uncertainty quantification, as shown in Eq. (4) and Eq. (5).However, such a problem is desirably alleviated in our approach thanks to the adoption of the FADE variational. Themain part of the model remains deterministic, allowing usto perform only once forward propagation to reach the entryof the Bayesian sub-module. In the Bayesian sub-module,i,ji,j4αrefers to a trade-off coefficient.689Input: pre-trained DNN parameters w† , weight decaycoefficient λ, threshold γ, trade-off coefficient α(c)(0)†Initialize {wb }Cc 1 and w b based on wBuild optimizers optb and opt b with weight decay λ/C(c)(0)and λ for {wb }Cc 1 and w b respectivelyfor epoch 1, 2, ., E do B for mini-batch B {(xi , yi )}i 1 in D doEstimate the log-likelihood L via Eq. (8)Uniformly perturb the clean data via Eq. (9)Estimate the uncertainty penalty R via Eq. (10)Backward the gradients of L αR via autodiffPerform 1-step gradient ascent with optb & opt bwe expect to take all the C parameter candidates into account for prediction to thoroughly exploit their heterogeneous predictive behaviour, i.e., T C. Naively sequentially calculating the outcomes under each parameter can(c)didate wb is viable, but we can achieve further speedupby unleashing the potential of parallel computing. Take theconvolution layer in the front of the Bayesian sub-module asan example (we abuse some notations here): Given a batchof features xin Rb i h w and C convolution kernelsw(c) Ro i k k , c 1, ., C, we first repeat xin at thechannel dimension for C times, getting x′in Rb Ci h w ,′Co i k kand concatenate {w(c) }C. Then,c 1 as w Rwe estimate the outcomes in parallel via group convolution:x′out conv(x′in , w′ , groups C), and the outcome cor(c)responding to w(c) is xout x′out [:, co o : co, .]. Thecooperation between FADE variational and the above strategy makes our inference time close to that of the DNNs inthe same setting (see Sec 5.4), while only our approach enjoys the benefits from Bayes principle and is able to achieverobust adversarial detection.5. ExperimentsTo verify if LiBRE could quickly and economically equipthe pre-trained DNNs with principled adversarial detectionability in various scenarios, we perform extensive empiricalstudies covering ImageNet classification [8], open-set facerecognition [64], and object detection [34] in this section.General setup. We fetch the pre-trained DNNs available online, and inherit all their settings for the Bayesianrefinement unless otherwise stated. We use C 20 candidates for FADE across scenarios. The FADE posterior isgenerally employed for the parameters of the last convolution block (e.g., the last residual block for ImageNet andface tasks or the feature output heads for object detection).We take the immediate output of the Bayesian sub-moduleas z for estimating feature variance uncertainty.Attacks. We adopt some popular attacks to craft adversarial examples under ℓ2 and ℓ threat models, includ-976

MethodMAPMC dropout [17]LMFVIMFVILiBRePrediction accuracy 92.58%76.19%92.98%AUROC of adversarial detection under model transfer 180.2410.2050.5040.1501.0001.0000.9821.000Table 1: Left: comparison on accuracy. Right: comparison on AUROC of adversarial detection under model transfer. (ImageNet)MethodKD [14]LID [39]MC dropout 0.9460.9350.993Table 2: Comparison on AUROC of adversarial detection for regular attacks . (ImageNet)5.1. ImageNet Classificationtecture with weight decay coefficient λ 10 4 , and set theuncertainty threshold γ as 0.5 according to the observationthat the normal samples usually have 0.5 feature varianceuncertainty. We set α 1 without tuning. We uniformlysample a training perturbation budget ǫtrain [ 2ǫ , 2ǫ] at periteration. We perform fine-tuning for E 6 epochs with(c) 3to 10 4learning rate of {wb }Cc 1 annealing from 10(0)with a cosine schedule and that of w b fixed as 10 4 .To defend regular attacks, KD and LID require to train aseparate detector for every attack under the supervision ofthe adversarial examples from that attack. Thus, to showthe best performance of KD and LID, we test the traineddetectors only on their corresponding adversarial examples.By contrast, LiBRE, MC dropout, LMFVI, and MFVI donot rely on specific attacks for training, thus have the potential to detect any (unseen) attack, which is more flexible yetmore challenging. With that said, they can be trivially applied to detect the adversarial examples under model transfer, which are crafted against a surrogate ResNet-152 DNNbut are used to attack the trained models, to further assessthe generalization ability of these defences.The results are presented in Table 1 and Table 2. Wealso illustrate the uncertainty of normal and adversarial examples assigned by LiBRe and a baseline in Fig. 2. It is animmediate observation that LiBRe preserves non-degradedprediction accuracy compared to its refinement start pointMAP, and meanwhile demonstrates near-perfect capacity ofdetecting adversarial examples. The superiority of LiBRe isespecially apparent under the more difficult model transferparadigm. The results in Fig. 2 further testify the abilityof LiBRe to assign higher uncertainty for adversarial examples to distinguish them from the normal ones. AlthoughKD and the golden standard, LID, obtain full knowledge ofthe models and the attacks, we can still see evident marginsbetween their worst-case5 results and that of LiBRe.We firstly check the adversarial detection effectivenessof LiBRe on ImageNet. We utilize the ResNet-50 [23] archi-5 The worst case is of much more concern than the average for assessingrobustness.ing fast gradient sign method (FGSM) [19], basic iterative method (BIM) [29], projected gradient descent method(PGD) [40], momentum iterative method (MIM) [10], Carlini & Wagner’s method (C&W) [4], diverse inputs method(DIM) [62], and translation-invariant method (TIM) [11].We set the perturbation budget as ǫ 16/255. We set stepsize as 1/255 and the number of steps as 20 for all the iterative methods. When attacking BNNs, the minimizationgoal in Eq. (2) refers to the posterior predictive in Eq. (4)with T 20. More details are deferred to Appendix.Baselines. Given the fact that many of the recent adversarial detection methods focus on sp

principle, LiBRe can endow a variety of pre-trained task-dependent DNNs with the ability of defending heteroge-neous adversarial attacks at a low cost. We develop and integrate advanced learning techniques to make LiBRe ap-propriate for adversarial detection. Concretely, we build the few-layer deep ensemble variational and adopt the pre-

Related Documents: