TAaMR: Targeted Adversarial Attack Against Multimedia .

3y ago
26 Views
2 Downloads
600.71 KB
8 Pages
Last View : Today
Last Download : 2m ago
Upload by : Casen Newsome
Transcription

TAaMR: Targeted Adversarial Attack againstMultimedia Recommender SystemsTommaso Di NoiaDaniele MalitestaFelice Antonio MerraPolitecnico di Baritommaso.dinoia@poliba.itPolitecnico di Baridaniele.malitesta@poliba.itPolitecnico di Barifelice.merra@poliba.itAbstract—Deep learning classifiers are hugely vulnerable toadversarial examples, and their existence raised cybersecurityconcerns in many tasks with an emphasis on malware detection,computer vision, and speech recognition. While there is aconsiderable effort to investigate attacks and defense strategies inthese tasks, only limited work explores the influence of targetedattacks on input data (e.g., images, textual descriptions, audio)used in multimedia recommender systems (MR). In this work,we examine the consequences of applying targeted adversarialattacks against the product images of a visual-based MR. Wepropose a novel adversarial attack approach, called TargetAdversarial Attack against Multimedia Recommender Systems(TAaMR), to investigate the modification of MR behavior whenthe images of a category of low recommended products (e.g.,socks) are perturbed to misclassify the deep neural classifiertowards the class of more recommended products (e.g., runningshoes) with human-level slight images alterations. We explore theTAaMR approach studying the effect of two targeted adversarialattacks (i.e., FGSM and PGD) against input pictures of two stateof-the-art MR (i.e., VBPR and AMR). Extensive experimentson two real-world recommender fashion datasets confirmedthe effectiveness of TAaMR in terms of recommendation listschanging while keeping the original human judgment on theperturbed images.Index Terms—Adversarial Machine Learning, RecommenderSystemsI. I NTRODUCTIONDeep Neural Networks (DNN) serve as core componentsof many real-world systems for performing different AI taskssuch as image classification [1], object detection [2], speechrecognition [3], and malware detection [4]. However, recentstudies have demonstrated that a malicious user, the adversary,can modify the classification behavior of a trained deep neuralclassifier by attaching human-imperceptible adversarial noiseon inputs at prediction time [5]. A famous example in thecomputer vision domain is the misclassification of a slightlymutated STOP traffic signal into another one by a DNNclassifier installed in a self-driving car system [6]. Moreover,recent researches have proved that adversaries might have thecapabilities to generate adversarial examples such that theyare misclassified into a chosen target class, performing theso-called targeted adversarial attacks [7], [8].The power of DNN in providing latent representations(features) of input data in a supervised and unsupervisedway has been recently exploited in the application domain ofrecommender systems (RS). They act as a primary componentin several real-world online product retailers (e.g., Amazon [9]) or multimedia content providers (e.g., Netflix [10])by furnishing users with personalized recommendations tosimplify the identification of the product best suiting theirpreferences in catalogs with millions of potential alternatives.Besides, the availability of several types of multimedia contentfor products/services (e.g., images [11], [12], video [13],soundtracks [14]) supports RS to have better-personalizedrecommendations.Recommendation engines are prone to performance alteration by malicious users that might be able to poisonthe training data with hand-engineered, and machine-learningoptimized, fake user profiles (shilling profiles). This attackscenario —comprehensively studied in the literature [15]–[17]— is different from the adversarial machine learning onesince adversarial perturbations are evaluated in an optimizedway to be human-imperceptible. Recently, adversarial attacks(and defense) have been studied in RS [18] with a primaryconcern on the evaluation of adversarial perturbations applied to recommender model embeddings [19]. Indeed, thisapproach is also used in [20], the first work to explore theeffectiveness of adversarial attacks against MR (i.e., a visualbased recommender model). Differently from our work, theauthors of [20] investigated the performance worsening withuntargeted perturbation on input images.In this work, we show how the performance of MR iseffectively modified by an adversary that might insert targetedadversarial perturbed images in a visual-based recommendersystem (e.g., VBPR [12]). The proposed attack approach,named Targeted Adversarial Attack against Multimedia Recommender Systems (TAaMR), explores attack situations wherethe adversary’s goal is to perturb images of a low recommended category of products (e.g., the 20th most recommended) to be misclassified by the deep classifier towards atarget more recommended category (e.g., the 1st/2nd).In brief, this work aims at addressing the following researchquestions:RQ1 Can targeted adversarial attacks against images of alow recommended category of products be exploited tomodify the recommendation lists of multimedia recommender systems in terms of probability of being morerecommended?RQ2 What are the effects of adversarial perturbations againstthese attacked images for human-perceptions?

To answer the previous questions, we have performed anextended experimental evaluation to investigate the effectsof TAaMR on two real-world visual-based fashion datasets(Amazon Men and Amazon Women) where visual features are predominant. The code is available at https://github.com/sisinflab/TAaMR.II. P ROBLEM D EFINITIONDefinition 1 (Recommendation Problem). Let U and I denotea set of users and items in a RS, respectively, and c : U I R be a utility function. The Recommendation Problemis defined as(1)i I11PreferencePredictor11 PreferenceSorting2 3 strategy is interested in changing the original class of theperturbed example to a specific new one t.Definition 4 (Targeted Adversarial Attack). Let C be a set ofclasses for a classifier F . Let c C be the source class suchthat F (x) c, and t C be a target class with t 6 c. ATargeted Adversarial Attack finds the adversarial examplesx as following:min d(x, x )d We further define S R U I as the user-item feedbackmatrix (UFM), where each entity sui R is a 0/1-valuedfeedback (e.g., review, rating) that represents an historicalinteractions for user u U to item i I.Definition 2 (Learned Image Feature). Let X be a set ofimages and F a trained DNN model (e.g., CNN). Then, letL be the number of layers of F , let f l be the output of thel-th layer of F with l {0, 1, ., L 1} and F (·) f L 1 .Given an image x X , the Learned Image Feature of x,extracted at layer l is defined as f l (x).Given an item i associated to an image xi , we represent itsfeatures fi by extracting the output f e of a layer e. In thiswork, we select e as one of the layers placed immediatelyafter the convolutional part since all the stacked convolutionshave extracted high-level features from images that are usedin a multimedia Recommendation Problem.Adversarial attacks can be untargeted or targeted. An untargeted adversarial strategy is only interested in a perturbationof the input x in x such that the predicted class F (x ) forx is different from the original one F (x).Definition 3 (Untargeted Adversarial Attack). An AdversarialAttack against a DNN (e.g., CNN) is the process of findingan adversarial example x , solving the following constrainedoptimization problem:min d(x, x )d 1such that F (x ) twhere îu is the recommended item to the user u.such that F (x) 6 F (x )Top-NRecommendationsMRFig. 1: Overview of TAaMR.The discovery of a utility function able to predict how mucha user (e.g., an e-commerce client) will like an unknownitem (e.g., a product for sale) is the central recommendationproblem task. u U, îu arg max c(u, i)TargetedAdversarialAttack(2)where x is a clean sample for F , d(·) is a distance metric,and 0 is the perturbation budget.Untargeted adversarial attacks construct adversarial samples(i.e., images) maximizing the classification cost function related to the original (source) class. On the other side, a targeted(3)Targeted adversarial attacks generate adversarial images tominimize the classification loss function with respect to thetarget class. Based on the previous problem definitions, wepropose a novel attack strategy against a visual-based recommender system, called Targeted Adversarial Attack againstMultimedia Recommender Systems (TAaMR).III. A PPROACHIn the description of our approach on perturbing inputimages over multimedia recommenders (MR) with targetedadversarial attacks, we first introduce the core multimediarecommender model, and then we define the adversarial threatmodel. Finally, we discuss a novel metric proposed to evaluatethe impact of the attack on the recommendation lists. In Fig. 1we visually represent the approach.A. Multimedia RecommenderThe core component of TAaMR is the multimedia recommender. A multimedia recommender model solves theRecommendation Problem estimating the user preferences ofunknown items/products combining pure collaborative filteringinformation (i.e., users’ interactions) with multimedia-basedfeatures. As shown in Fig. 1, the proposed approach investigates the class of MR that performs the preference predictiontask integrating features extracted from deep neural models(e.g., CNN in the case of visual-based MR).The deep feature extractor component (F ) is the vulnerability point exploited by an adversary to have a direct influenceon the preference predictor component of the MR. The basicintuition of TAaMR is that an adversary could tamper theRecommendation Problem by abusing the so far demonstrateddeficiency of several DNN to targeted adversarial examples.Indeed, TAaMR simulates targeted attacks on images in lowrecommended categories towards highly popular target categories.

B. Adversary Threat ModelBefore diving into the examination of the consequencesof the source-target-misclassification attack [7] (or targetedattack [8]) on a MR, we outline the adversary threat modelbased on the guidelines proposed by Carlini et al. [21]. Theadversary’s assumptions are: adversary goal: The adversary is interested in misclassifying images of a low suggested category of productsfrom their source class (e.g., socks) towards a target one(e.g., t-shirts). adversary knowledge: We assume a white-box knowledge setting since the adversary holds a full knowledge ofthe feature extraction model parameters used to estimatethe targeted perturbation. Additionally, the adversary hascomplete access to the MR input image features altereddue to the performed attack. Furthermore, she can extractall the recommendation lists used to identify the sourcetarget classes of TAaMR. adversary capability: We restrict the adversary capability to make l -norm constrained perturbations.C. Impact on RecommendationsPrior research in adversarial machine learning focused onthe evaluation of adversarial attacks in discrediting the classification accuracy of ’victim’ classifiers [22] (e.g., CNN), andthe validation of defense strategies [21], [22]. To the best ofour knowledge, there are not metrics to examine the impactof targeted adversarial attacks against MR. As evidence, theclosest line of research to our work [20], evaluates the effectsof untargeted attacks as the reduction in recommendationaccuracy metrics [23], [24].In this work, we aim to fill the mentioned gap by proposing a Hit Ratio-based metrics, named Category Hit Ratio(CHR@N ), to study the fraction of the category of attackeditems — whose images have been adversarially perturbed —in the top-N recommendation lists.Definition 5 (Category Hit Ratio). Let c C be a category(class), and Ic {i I F (xi ) c}. The Category HitRatio@N is defined asX X1hit(i, u) (4)CHR@N (Ic , U ) N · U u U i Ic \Iuwhere hit(i, u) is a 0/1-valued function that is 1 when theitem i, classified within the c class, is inside the top-Nrecommendation list of the user u, otherwise it is 0.IV. E XPERIMENTAL E VALUATIONWe evaluated TAaMR on two real-world datasets in therecommendation domain. Firstly, we present the experimentalsetup; then we discuss the experimental results. We havecarried out an extensive set of experiments to answer theresearch questions raised in Section I, i.e., whether it is possible to (i) modify the MR recommendation list by perturbinginput images with targeted attacks, (ii) evaluate the visualappearance of perturbed product images.TABLE I: Dataset statistics. U , I , and S represent thenumber of users, items and feedback respectively.DatasetAmazon MenAmazon Women U 26, 15518, 514 I 82, 63076, 889 S 193, 365137, 929A. Evaluation Settings1) Dataset: We executed experiments on two populardatasets extracted from Amazon.com [11], [25]. We considered the ”Clothing, Shoes and Jewelry” category of menand women, named Amazon Men and Amazon Women,since several works demonstrated the significative impact ofvisual-features on users’ choices in the fashion domain [12],[26]. Table I shows the dataset statistics as a result of preprocessing steps applied to each dataset. As a first step, wehave downloaded all the available images from Amazon.combased on the URL published within the available metadata (http://jmcauley.ucsd.edu/data/amazon/).Then, we have converted any users’ rating into a 0/1-valuedinteraction, and we have considered all the users with at leastfive interactions ( Iu 5) to discard cold-users. Furthermore,we have produced a smaller version of Amazon Women tomake the number of product images comparable with AmazonMen.2) Adversarial Attacks: We took into account two stateof-the-art adversarial attacks, i.e., Fast Gradient Sign Method(FGSM) and Projected Gradient Descent (PGD).Fast Gradient Sign Method (FGSM) [27] focuses on thespeed of the adversarial example generation. As a matter offact, it generates an adversarial version of the attacked imagein only one step. Given a clean input image x, a target classt, a model F with parameters θ, and a perturbation coefficient , the targeted adversarial image x is thenx x · sign( x LF (θ, x, t))(5)where x LF (θ, x, y) is the gradient of the F loss function,and sign(·) is the sign function.Projected Gradient Descent (PGD) [28] iteratively appliesFGSM, with a budget perturbation α (i.e., the step size) smallerthan . The attack algorithm works similarly to FGSM, butafter each completed perturbation step, the temporary attackedimage is clipped to remains in a -neighborhood of the cleanimage x. The described approach is an extended and more effective version [29] of Basic Iterative Method (BIM) proposedby Kurakin et al. [30]. Indeed, PGD differs from BIM in thefact that PGD starts from a uniform random noise as the initialperturbation on the clean image x. The implemented versionexecutes 10 iterations. For the implementation of both thealgorithms, we adopted the Python library CleverHans [31].Both the attacks have been executed in their targeted version.3) Recommender Models: We studied TAaMR effectiveness on the following MR: Visual Bayesian PersonalizedRanking (VBPR), and Adversarial Multimedia Recommendation (AMR).

Visual Bayesian Personalized Ranking (VBPR) [12] isa state-of-the-art multimedia recommender model designed tointegrate visual features (learned via CNNs) into a latent factormodel (BPR-MF [32]). The fundamental idea of VBPR is thata user might be influenced by the visual appearance of productimages (e.g., a T-shirt picture on Amazon.com).The preference predictor of VBPR (see Fig 1) is built ontop of a matrix factorization (MF) model [33]. Given the Ddimensional visual feature fi of an item i, E representing aD A matrix to transform fi into a smaller A-dimensionallatent representation, and u as a user who did not interact withi, then the preference score (ŝui ) is calculated as:ŝui bui pTu qi αuT (Efi ) β T fi(6)where pu and qi are K-dimensional (K U , I ) latentrepresentations of user u and item i, bui is the sum of theglobal offset, the user, and item biases components, and β isa parameter to represent the overall effect of visual featureson users preferences.VBPR estimates model parameters θ by minimizing apairwise ranking loss [32]. The basic intuition is that, given atriplet (u, i, j) of a user u U , an interacted item i Iu and anot-interacted item j Iu , where Iu and Iu are, respectively,the set of interacted and not-interacted items by the user u;then the preference score ŝui should be higher than ŝuj . LetT {(u, i, j) u U, i Iu , j Iu } be the set of triplets,then the VBPR optimization problem consists in minimizingthe following objective function:XLV BP R ln σ(ŝui ŝuj ) λkθk22(7)(u,i,j) Twhere λ is the regularization coefficient of the L2 -norm ofmodel parameters, and σ(·) is the sigmoid function.Adversarial Multimedia Recommendation (AMR) [20]integrates VBPR with the adversarial training procedure forRS [19] to make the model more robust to adversarial perturbation ( i ) applied on the i-th image feature fi . The preferenceprediction function is determined as:TTTŝadvui bui pu qi αu (E(fi i )) β (fi i )(8)where i R1 D is the optimal adversarial perturbation thatmaximizes (7). Based on the FGSM-like adversarial attackproposed in [19], the matrix of adversarial perturbations onimage features adv R I D is evaluated as: adv ηΠkΠkwhereΠ LV BP R (T θ̂ ) (9)where θ̂ represents the fixed model parameters, η is thecoefficient to control the magnitude of the feature perturbation,and is the zeros-initialized perturbation matrix.To reduce the impact of the introduced perturbations, AMRlearns parameters θ minimizing the objective function:LAM R Xadv ln σ(ŝui ŝuj ) γ ln σ(ŝadvui ŝuj )(u,i,j) T LV BP R (T θ) γ LV BP R (T θ adv ) {z}adversarial regularizer(10)where γ is a weight coefficient to control the impact of theadversarial regularizer.We have trained VBPR for 4000 epochs storing the modelparameters at 2000-th epoch,i.e., the point where AMR startsthe further 2000-epochs of adversarial training based on 10.VBPR and AMR hyper-parameters are set based on theconfiguration proposed in [20]; in particular, the parametersof the adversarial regularizer are set to γ 0.1 and η 1.4) Visual Evaluation Metrics: We need to evaluate thevisual distortion between the original and the attacked imagessince they are presented to a real online customer. There existseveral quality metrics to measure the amount of distortionbetween two images (i.e., x and x ), categorized as subjectiveand objective [34]. The former involves the quality evaluationof actual human users, but they are not always easy to collect.Conversely, the latter aims at mathematically mimicking ahuman evaluation. In this work, we studied the followingobjective metrics: Peak Signal-To-Noise Ratio (PSNR), Structural Similarity Index (SSIM) and a Perceptual SimilarityMetric (PSM).Peak Signal-To-Noise Ratio (PSNR) [35] is a more easilyinterpretable, logarithmic version of the Mean Squared Error(MSE) and is defined as: P2 (11)PSNR(x, x ) 10 log10MSE(x, x )where P is the maximum pixel value (e.g., P 255 for 8-bitimages). The higher the PSNR value, the lower the distortionbetween x and x . Typically, it ranges between 20 and 50 dB.Structural Similarity Index (SSIM) [36] is an objectivemetric based on the assumption that humans are sensitive tothe structure of the image, an aspect that PSNR (as wellas MSE) is not always able to capture [36]. The approachcalculates local SSIM indexes over smaller windows of theoriginal images, and finally takes the average. Given twocorresponding image windows w and w extracted respectivelyfrom x and x , the SSIM between w and w is:(2µw µw k1 )(2σww k2 )SSIM(w, w ) 2(12)2 σ2 k )(µw µ2w k1 )(σw2wwhere µv and σv are the mean and the standard deviationof the v-th window, σww is th

attacks on input data (e.g., images, textual descriptions, audio) used in multimedia recommender systems (MR). In this work, we examine the consequences of applying targeted adversarial attacks against the product images of a visual-based MR. We propose a novel adversarial attack approach, called Target

Related Documents:

Additional adversarial attack defense methods (e.g., adversarial training, pruning) and conventional model regularization methods are examined as well. 2. Background and Related Works 2.1. Bit Flip based Adversarial Weight Attack The bit-flip based adversarial weight attack, aka. Bit-Flip Attack (BFA) [17], is an adversarial attack variant

1) Adversarial Input Attack and Defense (CVPR'2019) 2) Adversarial Weight Attack and Defense against DRAM memory bit-flip (USENIX Security'2020, ICCV'2019, CVPR'2020, TPAMI'2021 , DAC'20, DATE'21) 3) Adversarial Weight Attack and Defense against power-plundering circuits caused noise

Deep Adversarial Learning in NLP There were some successes of GANs in NLP, but not so much comparing to Vision. The scope of Deep Adversarial Learning in NLP includes: Adversarial Examples, Attacks, and Rules Adversarial Training (w. Noise) Adversarial Generation Various other usages in ranking, denoising, & domain adaptation. 12

non-targeted approach is proposed by Moosavi et al. [24], where an image-agnostic Universal Adversarial Perturba-tion (UAP) is computed and applied to unseen images to cause network misclassification. Adversarial attacks on image retrieval are studied by re-cent work [19, 20, 37] in a non-targeted scenario for CNN-basedapproaches.

Chapter 3 Adversarial Attack Consider a data point x 0 2Rd belonging to class C i.Adversarial attack is a malicious attempt which tries to perturb x 0 to a new data point x such that x is misclassi ed by the classi er. Goodfellow et al. made this

deep learning models were vulnerable to adversarial attacks, learning how to generate adversarial examples has quickly attracted wide research interest. Goodfellow et al. [24] devel-oped a single gradient step method to generate adversarial examples,whichwas known asthefastgradientsign method r-

(VADA) improved adversarial feature adaptation using VAT. It generated adversarial examples against only the source classifier and adapted on the target domain [9]. Unlike VADA methods, Transferable Adversarial Training (TAT) adversari-ally generates transferable examples that fit the gap between source and target domain [3].

6061-T6) extruded aluminum per ASTM B221-08. The rails are used in pairs to support photovoltaic solar panels in order to span between points of attachment to the existing roof structure. The following tables and information summarize the structural analysis performed by SEI in order to certify the SMR100 Rail for the state noted above.