SUPPLEMENTARY MATERIAL—— AOT: Appearance Optimal . - NeurIPS

1y ago
8 Views
2 Downloads
5.29 MB
9 Pages
Last View : 1y ago
Last Download : 2m ago
Upload by : Mollie Blount
Transcription

——SUPPLEMENTARY MATERIAL——AOT: Appearance Optimal Transport Based IdentitySwapping for Forgery DetectionHao Zhu 1,2 , Chaoyou Fu 1,3, , Qianyi Wu 4 , Wayne Wu 4 , Chen Qian 4 , Ran He 1,3†1NLPR & CEBSIT & CRIPAC, CASIA 2 Anhui University3University of Chinese Academy of Sciences 4 SenseTime Researchhaozhu96@gmail.com ianchen}@sensetime.comAppendixAImplementation DetailsA.1Data PreparationThe training data only consists of the target faces and the reenacted faces. The target faces are directlyextracted from the original FF [9] (900 videos) and DPF-1.0 [4] (10 identities). Then, we leverageDFL [8] and FSGAN [6] to produce the reenact faces using identities that are not existed in the targetfaces. Our quantitative experiments are conducted on the remaining videos in FF and DPF-1.0.Then, we first detect 106 facial landmarks of each video. Then, we crop the face area and resize theminto 256*256 resolution. To obtain the PNCC and normals, we use 3DDFA [15] to estimate the 3Dmesh of each face, and render the mesh and corresponding PNCC and normals codes to images via aneural renderer [5].A.2Training StrategiesWe use PyTorch [7] to implement our model. In the training phase, our model is trained with 200Kiterations on two NVIDIA1080Ti GPUs, where the batch size 16. We use Adam optimizer forrelighting generator with β1 0.5, β2 0.999, weight decay 0.0002, and RMSprop optimizer forMix-and-Segment Discriminator (MSD), Ω, Ψ with beta 0.9. The learning rates of both the Adamand the RMSprop optimizers are set to 0.0002. In Ltotal , we set λ1 120, λ2 1, λ3 90, and λ4 1.The full training algorithm is summarized here 1.A.3Network ArchitecturesThe full architecture as shown in Fig. S1. †Equal contributionCorresponding author34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.

Algorithm 1 Training algorithm.Require: {Xr }N , {Xt }N ;Require: Initialize Ωi , Ψi , P erceptualEncoder, Decoder, and M SD with θi , ωi , α, β, γ respectively.1: while not converged do2:Sample mini-batch {xr }3:Sample mini-batch {xt }4:5:6:7:8:9:10:11:12:13:// Forward: Encoder1234FX, FX, FX, FX P ERCEPTUAL E NCODER({xr })rrrr1234FXt , FXt , FXt , FXt P ERCEPTUAL E NCODER({xt })// Update: NOTPEfor i 1, 2, 3 dofor j 1, ., ni doiSample vrj FXrjiSample vt FXt1gωi ωi [ mΨi (Ωi (vrj Xr , Xt )) ωi ωi α·RMSProp(ωi ,x)ωi CLIP(ωi , c, c)end forend forfor i 1, 2, 3 dofor j 1, ., ni dosSample vrj FXr1gθi θi m Ψi (Ωi (vrj Xr , Xt ))θi θi α·RMSProp(θi ,x)end forend forj114:m Ψi (vt )]15:16:17:18:19:20:21:22:23:24:25:26:27:// Forward: NOTPE28:for i 1, 2, 3 doi29:FYi Ωi (FX,Xr ,Xt )r30:end for31:32:// Forward: Decoder123433:Yt,t D ECODER(FX, FX, FX, FX)rrrr123434:Yr,t D ECODER(FY , FY , FY , FXr )35:36:// Update: MSD37:Mr Random Mask Generator().38:Ymix M IX(Yt,t , Yr,t , Mr )1139:gγ γ [ m][Mr M SD(Ymix )] m[(1 Mr ) M SD(Ymix )]40:γ γ α · ADAM (γ)41:42:// Update: Perceptual Encoder, Decoder143:gβ β [ m]Loss(Yt,t , Yr,t , Xt )44:β β α · ADAM (β)45: end while2. Total Loss

Mix-and-Segment Discriminator(MSD)Relighting GeneratorFace DecoderPerceptual EncoderPixel ShufflePixel ShuffleWeight NormWeight 3,256,256)ReLuInstance /1Weight 28/4/2/1Conv/64/128/4/2/1Pixel ShuffleInstance NormPNCC/NormalsReLuInstance NormPixel Shuffle3DDFAInstance NormWeight NormLeakyReLu(0.2)NOTFEBlockscaleInstance Norm ReLuConv/128/256/4/2/1 Instance NormPixel ShufflescaleReLu Pixel stance NormscaleDeConv/1024/512/3/1Instance NormNOTFEBlockConv/512/512/4/2/1ReLU Pixel ShufflePixel ShuffleLeakyReLu(0.2)Weight NormLeakyReLu(0.2)Weight 512/4/2/1DeConv/512/512/3/1/Instance NormReLUInstance NormReLULinear/256/1Linear/k/1024Linear/1024/ kLinear/1024/512Linear/k 256/1024Linear/1024/256Instance NormConv/128/256/4/2/1LeakyReLU(0.2) Linear/1024/256K-channelForeachpixelInstance NormInstance Norm Pixel ShuffleDeConv/1024/256/3/1Weight NormK-channelLeakyReLU(0.2) LeakyReLu(0.2) Conv/64/128/4/2/1ReLuWeight Norm Instance /3/64/4/2/1Conv/256/512/4/2/1ReLuNOTFEBlock ReLuPixel ShuffleWeight NormLeakyReLu(0.2)DeConv/1024/256/3/1Instance NormWeight NormDeConv/256/64/3/1/1Pixel ShuffleWeight NormLeakyReLu(0.2)Conv/256/512/4/2/1Pixel ShuffleWeight /2/1DeConv/128/3/3/1/1Figure S1: The detailed pipeline of our proposed model.BCompared BaselineB.1Face Swapping MethodsDeepfaceLab. DeepfaceLab (DFL) [8] requires to retrain the model for different source identities.It means we need to train the DFL model different videos respectively. It should be clear that, DFLprovides lots of options to tune the results. In practice, we use the options reported in Table S1.FSGAN. FSGAN [6] is a landmark-guided subject agnostic method. We leverage the latest modelsprovided by authors.B.2Appearance Transfer MethodsPoisson Blending. Poisson Blending is a classical image harmonization method. We use theOpenCV implemented version, and set the flag cv2.NORMAL CLONE.Deep Image Harmonization (DIH) [13]. 3 DIH is a deep learning based image harmonizationmethod and it can capture both the context and semantic patterns of the images rather than hand-craftfeatures.3DIH: https://github.com/wasidennis/DeepHarmonization3

Table S1: Options of DeepFaceLab.nameresolutionface typemodels opt on gpuarchiae dimse dimsd dimsd mask dimsmasked trainingeyes priolr dropoutrandom warpTraining Optionschoice name224gan powerftrue face powerTrueface style powerdfhdbg style power256ct mode64clipgrad64pretrain22autobackup hourTruewrite preview historyFalsetarget iterFalserandom flipTruebatch ging Optionsnamemask modeerode mask modifierblur mask modifiermotion blur poweroutput face scalecolor transfer modesharpen modeblursharpen amountsuper resolution powerimage denoise powerbicubic degrade powercolor degrade powerchoicelearned5501rctnone01000Style Transfer for Headshot Portraits (STHP) [10]. 4 STHP allows users to easily produce styletransferred results. It transfers multi-scale local statistics of an reference portrait into another.WCT2 . 5 WCT2 is a state-one-the-art photorealistic style transfer method. We use the optionunpool ’cat5’ version, and the pretrained models.CAdditional ExperimentsC.1Noise AnalysisFurthermore, we verified our results with photo forgery methods: noise analysis, error level analysis,level sweep, luminance gradient 6 . As shown in Fig. S2, ours framework reduces the noises (Fig. S2(a, b)) and preserves the appearance with target images (Fig. S2 (c, d)).(b) Error Level Analysis(c) Level Sweep(d) Luminance GradientOursDFLTarget(a) Noise AnalysisFigure S2: Noise analysis with photo forensics algorithms. Our method can not only reduce thenoises (a,b), but also better preserve appearances. (c,d).4STHP: https://people.csail.mit.edu/yichangshih/portrait web/WCT2 : to-forensics/54

MixMixed Mask Swapped TargetPredictMaskFigure S3: The mixed results.C.2Results of Mix-and-Segment DiscriminatorWe provide more results of the mixed results. As shown in Fig. S3, we mix the target faces and theswapped faces using the mix mask. It is difficult to find the real patch and the fake patch.C.3Feature VisualizationTo give intuitive results, we visualize the features at different scales by using PCA to reduce thedimensions of them to 3-dimensional vectors.In the latent space the pixel distributions are more balance under different lighting conditions, asshown in Fig. S4.Image 𝐹1𝐹2𝐹3Image 𝐹1𝐹2𝐹3Image 𝐹1Figure S4: Visualization of the features at different scales.5𝐹2𝐹3

Table S2: Inference speed comparisonMethodsPoissonDIHSTHPWCTAOT (ours)C.4FPS3.8911.2471.6862.81712.821Speed ComparisonFurthermore, as reported in Table S2, our framework achieves the highest FPS compared with otherrelated methods, which means our method introduces the minimum computational burdens. Allexperiments conducted on Ubuntu16.04 with an Intel i7-7700K CPU and a Nvidia 1060 GPU.C.5Forgery DetectionBinary detection accuracy of two video classification baselines: I3D [1] and TSN [14] on the hiddenset provided by DeeperForensics-1.0 [4].We trained the baselines on four manipulated datasets of FF [9] [9] produced by DeepFakes [2],Face2Face [11], FaceSwap [3], and NeuralTextures [12]. (Green bars). Then, we add 100 manipulatedvideos produced by our method to the training set. All detection accuracies are improved with theaddition of our data. (Blue bars).Figure S5: Forgery Detection Results.6

TargetDFL PoissonBlending DIH STHPFigure S6: Comparison results with DFL.7 𝐖𝐂𝐓 𝟐Ours

TargetFSGAN PoissonBlending DIH STHPFigure S7: Comparison8 results with FSGAN. 𝐖𝐂𝐓 𝟐Ours

References[1] João Carreira and Andrew Zisserman. Quo vadis, action recognition? A new model and the kinetics dataset.In Conference on Computer Vision and Pattern Recognition, pages 4724–4733, 2017.[2] Deepfakes. https://github.com/deepfakes/faceswap, Accessed: ee/master/dataset/[3] FaceSwap.FaceSwapKowalski, Accessed: 2020.4.[4] Liming Jiang, Wayne Wu, Ren Li, Chen Qian, and Chen Change Loy. Deeperforensics-1.0: A large-scaledataset for real-world face forgery detection. arXiv preprint arXiv:2001.03024, 2020.[5] Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. Neural 3d mesh renderer. In Conference onComputer Vision and Pattern Recognition, 2018.[6] Yuval Nirkin, Yosi Keller, and Tal Hassner. Fsgan: Subject agnostic face swapping and reenactment. InInternational Conference on Computer Vision, pages 7184–7193, 2019.[7] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen,Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deeplearning library. In Advances in Neural Information Processing Systems, pages 8024–8035, 2019.[8] Ivan Perov, Daiheng Gao, Nikolay Chervoniy, Kunlin Liu, Sugasa Marangonda, Chris Umé, Mr. Dpfks,Carl Shift Facenheim, Luis RP, Jian Jiang, Sheng Zhang, Pingyu Wu, Bo Zhou, and Weiming Zhang.Deepfacelab: A simple, flexible and extensible face swapping framework. arXiv preprint arXiv:2005.05535,2020.[9] Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner.Faceforensics : Learning to detect manipulated facial images. In International Conference on ComputerVision, pages 1–11, 2019.[10] YiChang Shih, Sylvain Paris, Connelly Barnes, William T Freeman, and Frédo Durand. Style transfer forheadshot portraits. ACM Transactions on Graphics, 33(4):148, 2014.[11] Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. Face2face:Real-time face capture and reenactment of RGB videos. In Conference on Computer Vision and PatternRecognition, pages 2387–2395, 2016.[12] Justus Thies, Michael Zollhöfer, and Matthias Nießner. Deferred neural rendering: Image synthesis usingneural textures. CoRR, abs/1904.12356, 2019.[13] Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, and Ming-Hsuan Yang. Deep imageharmonization. In Conference on Computer Vision and Pattern Recognition, pages 3789–3797, 2017.[14] Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporalsegment networks: Towards good practices for deep action recognition. In Bastian Leibe, Jiri Matas, NicuSebe, and Max Welling, editors, European Conference on Computer Vision, pages 20–36, 2016.[15] Xiangyu Zhu, Xiaoming Liu, Zhen Lei, and Stan Z Li. Face alignment in full pose range: A 3d totalsolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1):78–92, 2017.9

Style Transfer for Headshot Portraits (STHP) [10]. 4 STHP allows users to easily produce style transferred results. It transfers multi-scale local statistics of an reference portrait into another. WCT2. 5 WCT2 is a state-one-the-art photorealistic style transfer method. We use the option unpool 'cat5' version, and the pretrained models.

Related Documents:

spring-aot-gradle-plugin: Gradle plugin that invokes AOT transformations. spring-aot-maven-plugin: Maven plugin that invokes AOT transformations. samples: contains various samples that demons

Herschel DP Workshop - ESAC, Madrid, E, 2009 March 24-27 - page 4 HIFI Pipelines and Data Products G AOT III Spectral Scans NHSC/HIFI (01/142008) AOT Schemes AOT I Single Point Observations AOT II Mapping Observations Reference scheme 1 - Position Switch 2 - Dual Beam Switch Optional continuum measurement 3 - Frequency Switch Optional sky .

Xamarin developed an AOT compiler to allow .NET applications to run on iOS. The AOT compiler turns IL into architecture-specific machine code at build time. C# / F# Source .NET IL MyApp.exe After compiling source to IL, the IL is AOT'dfor as many target platforms as need to be supported Build time At runtime, Xamarin.iOSpicks

OMH issued an Interim Report required by Kendra’s Law, which reviewed the implemen-tation and status of AOT and presented find-ings from OMH’s evaluation of the program.2 This Final Report on the status of AOT in New York State is also statutorily required and updates the Interim Report. Impleme

138 level product is known as the Intermediate Product (IP) as it is used to create the aggregated AOT EDR, 139 along with acting as an input for other VIIRS products. The VIIRS algorithm aggregates 8x8 arrays of IP AOT pixels into a single EDR pixel with a resolution of 6 km

Service-Oriented Architecture Service-Oriented paradigm . Focus on realizing SOA through and applying service-orientation principles to Web services technology . 5 AOT LAB . Technology, and Design . Prentice Hall Service-orientation has become a distinct design approach which introduces commonly ac

Improving Outcomes and Saving Money WHAT IS AOT? Assisted outpatient treatment (AOT) is a tool in the toolbox for civil courts and mental health systems . J. W., Steadman, H. J., Robbins, P. C., & Monahan, J. (2009). New York State assisted outpatient treatment program evaluation. Duke University School of Medicine. 13 Hiday, V. A. & Scheid .

ACCOUNTING 0452/11 Paper 1 May/June 2018 1 hour 45 minutes Candidates answer on the Question Paper. No Additional Materials are required. READ THESE INSTRUCTIONS FIRST Write your Centre number, candidate number and name on all the work you hand in. Write in dark blue or black pen. You may use an HB pencil for any diagrams or graphs. Do not use staples, paper clips, glue or correction fluid. DO .