Joint Discriminative And Generative Learning For Person Re .

2y ago

24 Views

2 Downloads

1.12 MB

10 Pages

Last View : 14d ago

Last Download : 3m ago

Upload by : Laura Ramon

Report this link

Download PDF

Transcription

Joint Discriminative and Generative Learning for Person Re-identificationZhedong Zheng1,2 Xiaodong Yang1 Zhiding Yu1Liang Zheng3 Yi Yang2 Jan Kautz121NVIDIA CAI, University of Technology Sydney 3 Australian National UniversityAbstractPerson re-identification (re-id) remains challenging dueto significant intra-class variations across different cameras. Recently, there has been a growing interest in usinggenerative models to augment training data and enhancethe invariance to input changes. The generative pipelinesin existing methods, however, stay relatively separate fromthe discriminative re-id learning stages. Accordingly, re-idmodels are often trained in a straightforward manner on thegenerated data. In this paper, we seek to improve learnedre-id embeddings by better leveraging the generated data.To this end, we propose a joint learning framework thatcouples re-id learning and data generation end-to-end. Ourmodel involves a generative module that separately encodeseach person into an appearance code and a structure code,and a discriminative module that shares the appearance encoder with the generative module. By switching the appearance or structure codes, the generative module is able togenerate high-quality cross-id composed images, which areonline fed back to the appearance encoder and used to improve the discriminative module. The proposed joint learning framework renders significant improvement over thebaseline without using generated data, leading to the stateof-the-art performance on several benchmark datasets.1. IntroductionPerson re-identification (re-id) aims to establish identity correspondences across different cameras. It is oftenapproached as a metric learning problem [52], where oneseeks to retrieve images containing the person of interestfrom non-overlapping cameras given a query image. Thisis challenging in the sense that images captured by different cameras often contain significant intra-class variationscaused by the changes in background, viewpoint, humanpose, etc. As a result, designing or learning representationsthat are robust against intra-class variations as much as possible has been one of the major targets in person re-id. Workdone during an internship at NVIDIA Research.Figure 1: Examples of generated images on Market-1501by switching appearance or structure codes. Each row andcolumn corresponds to different appearance and structure.Convolutional neural networks (CNNs) have recentlybecome increasingly predominant choices in person re-idthanks to their strong representation power and the abilityto learn invariant deep embeddings. Current state-of-theart re-id methods widely formulate the tasks as deep metric learning problems [12, 53], or use classification lossesas the proxy targets to learn deep embeddings [22, 38, 40,47, 52, 55]. To further reduce the influence from intra-classvariations, a number of existing methods adopt part-basedmatching or ensemble to explicitly align and compensatethe variations [34, 36, 45, 50, 55].12138

Appearance Spaceclothing/shoes color,texture and style,other id-related cues, etc.Structure Spacebody size, hair, carrying,pose, background,position, viewpoint, etc.Table 1: Description of the information encoded in the latent appearance and structure spaces.Another possibility to enhance robustness against inputvariations is to let the re-id model potentially “see” thesevariations (particularly intra-class variations) during training. With recent progress in the generative adversarial networks (GANs) [10], generative models have become appealing choices to introduce additional augmented data forfree [54]. Despite the different forms, the general considerations behind these methods are “realism”: generated images should possess good qualities to close the domain gapbetween synthesized scenarios and real ones; and “diversity”: generated images should contain sufficient diversityto adequately cover unseen variations. Within this context,some prior works have explored unconditional GANs andhuman pose conditioned GANs [9, 16, 26, 30, 54] to generate pedestrian images to improve re-id learning. However,a common issue behind these methods is that their generative pipelines are typically presented as standalone models,which are relatively separate from the discriminative re-idmodels. Therefore, the optimization target of a generativemodule may not be well aligned with the re-id task, limitingthe gain from generated data.In light of the above observation, we propose a learning framework that jointly couples discriminative and generative learning in a unified network called DG-Net. Ourstrategy towards achieving this goal is to introduce a generative module, of which encoders decompose each pedestrian image into two latent spaces: an appearance spacethat mostly encodes appearance and other identity relatedsemantics; and a structure space that encloses geometryand position related structural information as well as otheradditional variations. We refer to the encoded features in thespace as “codes”. The properties captured by the two latentspaces are summarized in Table 1. The appearance spaceencoder is also shared with the discriminative module, serving as a re-id learning backbone. This design leads to a single unified framework that subsumes these interactions between generative and discriminative modules: (1) the generative module produces synthesized images that are takento refine the appearance encoder online; (2) the encoder, inturn, influences the generative module with improved appearance encoding; and (3) both modules are jointly optimized, given the shared appearance encoder.We formulate the image generation as switching the appearance or structure codes between two images. Givenany pairwise images with the same/different identities, oneis able to generate realistic and diverse intra/cross-id composed images by manipulating the codes. An example ofsuch composed image generation on Market-1501 [51] isshown in Figure 1. Our design of the generative pipeline notonly leads to high-fidelity generation, but also yields substantial diversity given the combinatorial compositions ofexisting identities. Unlike the unconditional GANs [16,54],our method allows more controllable generation with betterquality. Unlike the pose-guided generations [9, 26, 30], ourmethod does not require any additional auxiliary data, buttakes the advantage of existing intra-dataset pose variationsas well as other diversities beyond pose.This generative module design specifically serves for ourdiscriminative module to better make use of the generateddata. For one pedestrian image, by keeping its appearancecode and combining with different structure codes, we cangenerate multiple images that remain clothing and shoes butchange pose, viewpoint, background, etc. As demonstratedin each row of Figure 1, these images correspond to thesame clothing dressed on different people. To better capturesuch composed cross-id information, we introduce the “primary feature learning” via a dynamic soft labeling strategy.Alternatively, we can keep one structure code and combinewith different appearance codes to produce various images,which maintain the pose, background and some identity related fine details but alter clothes and shoes. As shown ineach column of Figure 1, these images form an interestingsimulation of the same person wearing different clothes andshoes. This creates an opportunity for further mining thesubtle identity attributes that are independent of clothing,such as carrying, hair, body size, etc. Thus, we propose thecomplementary “fine-grained feature mining” to learn additional subtle identity properties.To our knowledge, this work provides the first framework that is able to end-to-end integrate discriminative andgenerative learning in a single unified network for personre-id. Extensive qualitative and quantitative experimentsshow that our image generation compares favorably againstthe existing ones, and more importantly, our re-id accuracyconsistently outperforms the competing algorithms by largemargins on several benchmarks.2. Related WorkA large family of person re-id research focuses on metric learning loss. Some methods combine identification losswith verification loss [46, 53], others apply triplet loss withhard sample mining [5, 12, 32]. Several recent works employ pedestrian attributes to enforce more supervisions andperform multi-task learning [25, 35, 42]. Alternatives harness pedestrian alignment and part matching to leverage onthe human structure prior. One of the common practice isto split input images or feature maps horizontally to takeadvantage of local spatial cues [22, 38, 48]. In a similar2139

Figure 2: A schematic overview of DG-Net. (a) Our discriminative re-id learning module is embedded in the generativemodule by sharing appearance encoder Ea . A dash black line denotes the input image to structure encoder Es is convertedto gray. The red line indicates the generated images are online fed back to Ea . Two objectives are enforced in the generativemodule: (b) self-identity generation by the same input identity and (c) cross-identity generation by different input identities.(d) To better leverage generated data, the re-id learning involves primary feature learning and fine-grained feature mining.manner, pose estimation is incorporated into learning localfeatures [34,36,45,50,55]. Apart from pose, human parsingis used in [18] to enhance spatial matching. In comparison,our DG-Net relies only on simple identification loss for reid learning and requires no extra auxiliary information suchas pose or human parsing for image generation.Another active research line is to utilize GANs to augment training data. In [54], Zheng et al. first introduce to useunconditional GAN to generate images from random vectors. Huang et al. proceed with this direction with WGAN[1] and assign pseudo labels to generated images [16]. Li etal. propose to share weights between re-id model and discriminator of GAN [24]. In addition, some recent methodsmake use of pose estimation to conduct pose-conditionedimage generation. A two-stage generation pipeline is developed in [27] based on pose to refine generated images. Similarly, pose is also used in [9, 26, 30] to generate images of apedestrian in different poses to make learned features morerobust to pose variances. Siarohin et al. achieve better poseconditioned image generation by using a nearest neighborloss to replace the traditional ℓ1 or ℓ2 loss [33]. All themethods set image generation and re-id learning as two disjointed steps, while our DG-Net end-to-end integrates thetwo tasks into a unified network.Meanwhile, some recent studies also exploit syntheticdata for style transfer of pedestrian images to compensatefor the disparity between the source and target domains. CycleGAN [58] is applied in [8, 57] to transfer pedestrian image style from one dataset to another. StarGAN [6] is usedin [56] to generate pedestrian images with different camerastyles. Bak et al. [3] employ a game engine to render pedestrians using various illumination conditions. Wei et al. [44]take semantic segmentation to extract foreground mask inassisting style transfer. In contrast to the global style transfer, we aim for manipulating appearance and structure details to facilitate more robust re-id learning.3. MethodAs illustrated in Figure 2, DG-Net tightly couples thegenerative module for image generation and the discriminative module for re-id learning. We introduce two image mappings: self-identity generation and cross-identitygeneration to synthesize high-quality images that are onlinefed into re-id learning. Our discriminative module involvesprimary feature learning and fine-grained feature mining,which are co-designed with the generative module to betterleverage the generated data.2140

3.1. Generative ModuleFormulation. We denote the real images and identityNlabels as X {xi }Ni 1 and Y {yi }i 1 , where N is thenumber of images, yi [1, K] and K indicates the numberof classes or identities in the dataset. Given two real imagesxi and xj in the training set, our generative module generates a new pedestrian image by swapping the appearance orstructure codes of the two images. As shown in Figure 2,the generative module consists of an appearance encoderEa : xi ai , a structure encoder Es : xj sj , a decoderG : (ai , sj ) xij , and a discriminator D to distinguish between generated images and real ones. In the case i j,the generator can be viewed as an auto-encoder, so xii xi .Note: for generated images, we use superscript to denotethe real image providing appearance code and subscript toindicate the one offering structure code, while real imagesonly have subscript as image index. Compared to the appearance code ai , the structure code sj maintains more spatial resolution to preserve geometric and positional properties. However, this may result in a trivial solution for G toonly use sj but ignore ai in image generation since decoderstend to rely on the feature with more spatial information. Inpractice, we convert input images of Es into gray-scale todrive G to leverage both ai and sj . We enforce the two objectives for the generative module: (1) self-identity generation to regularize the generator and (2) cross-identity generation to make generated images controllable and match realdata distribution.Self-identity generation. As illustrated in Figure 2(b),given an image xi , the generative module first learns how toreconstruct xi from itself. This simple self-reconstructiontask serves as an important regularization role to the wholegeneration. We reconstruct the image using the pixel-wiseℓ1 loss:1(1)Limgrecon E[kxi G(ai , si )k1 ].Based on the assumption that the appearance codes of thesame person in different images are close, we further propose another reconstruction task between any two imagesof the same identity. In other words, the generator shouldbe able to reconstruct xi through an image xt with the sameidentity yi yt :2Limgrecon E[kxi G(at , si )k1 ].(2)This same-identity but cross-image reconstruction loss encourages the appearance encoder to pull appearance codesof the same identity together so that intra-class feature variations are reduced. In the meantime, to force the appearancecodes of different images to stay apart, we use identificationloss to distinguish different identities:Lsid E[ log(p(yi xi ))],(3)where p(yi xi ) is the predicted probability that xi belongsto the ground-truth class yi based on its appearance code.Cross-identity generation. Different from self-identitygeneration that works with image reconstruction using thesame identity, cross-identity generation focuses on imagegeneration with different identities. In this case, there isno pixel-level ground-truth supervision. Instead, we introduce the latent code reconstruction based on appearance andstructure codes to control such image generation. As shownin Figure 2(c), given two images xi and xj of different identities yi 6 yj , the generated image xij G(ai , sj ) is required to retain the information of appearance code ai fromxi and structure code sj from xj , respectively. We shouldthen be able to reconstruct the two latent codes after encoding the generated image:1Lcoderecon E[kai Ea (G(ai , sj ))k1 ],(4)2Lcoderecon(5) E[ksj Es (G(ai , sj ))k1 ].Similar for self-identity generation, we also enforce identification loss on the generated image based on its appearancecode to keep the identity consistency:Lcid E[ log(p(yi xij ))],(6)where p(yi xij ) is the predicted probability of xij belongingto the ground-truth class yi of xi , the image that providesappearance code in generating xij . Additionally, we employadversarial loss to match the distribution of generated images to the real data distribution:Ladv E[log D(xi ) log(1 D(G(ai , sj ))].(7)Discussion. By using the proposed generation mechanism, we enable the generative module to learn appearanceand structure codes with explicit and complementary meanings and generate high-quality pedestrian images based onthe latent codes. This largely eases the generation complexity. In contrast, the previous methods [9, 16, 26, 30, 54] haveto learn image generation either from random noise or managing the pose factor only, which is hard to manipulate theoutputs and inevitably introduces artifacts. Moreover, dueto using the latent codes, the variants in our generated images are explainable and constrained in the existing contentsof real images, which also ensures the generation realism.In theory, we can generate O(N N ) different images bysampling various image pairs, resulting in a much largeronline generated training sample pool than the ones withO(2 N ) images offline generated in [16, 30, 54].3.2. Discriminative ModuleOur discriminative module is embedded in the generativemodule by sharing the appearance encoder as the backbonefor re-id learning. In accordance with the images generatedby switching either appearance or structure codes, we propose the primary feature learning and fine-grained feature2141

mining to better take advantage of the online generated images. Since the two tasks focus on different aspects of generated images, we branch out two lightweight headers ontop of the appearance encoder for the two types of featurelearning, as illustrated in Figure 2(d).Primary feature learning. It is possible to treat thegenerated images as training samples similar to the existing work [16, 30, 54]. But the inter-class variations in thecross-id composed images motivate us to adopt a teacherstudent type supervision with dynamic soft labeling. We usea teacher model to dynamically assign a soft label to xij , depending on its compound appearance and structure from xiand xj . The teacher model is simply a baseline CNN trainedwith identification loss on the original training set. To trainthe discriminative module for primary feature learning, weminimize the KL divergence between the probability distribution p(xij ) predicted by the discriminative module and theprobability distribution q(xij ) predicted by the teacher:Lprim E[ KXk 1p(k xij ))],q(k xij ) log(q(k xij )(8)where K is the number of identities. In comparison with thefixed one-hot label [30, 59] or static smoothing label [54],this dynamic soft labeling fits better in our case, as each synthetic image is formed by the visual contents from two realimages. In the experiments, we show that a simple baselineCNN serving as the teacher model is reliable to provide thedynamic labels and improve the performance.Fine-grained feature mining. Beyond the direct usageof generated data for learning primary features, an interesting alternative, made possible by our specific generation pipeline, is to simulate the change of clothing for thesame person, as shown in each column of Figure 1. Whentraining on images organized in this manner, the discriminative module is forced to learn the fine-grained id-relatedattributes (such as hair, hat, bag, body size, and so on) thatare independent to clothing. We view the images generated by one structure code combining with different appearance codes as the same class as the real image providingthe structure code. To train the discriminative module forfine-grained feature mining, we enforce identification losson this particular categorizing:Lfine E[ log(p(yj xij ))].(9)This loss imposes additional identity supervision to the discriminative module in a multi-tasking way. Moreover, unlike the previous works using manually labeled pedestrianattributes [25, 35, 42], our approach performs automaticfine-grained attribute mining by leveraging on the syntheticimages. Furthermore, compared to the hard sampling policyapplied in [12, 32], there is no need to explicitly search forthe hard training samples that usually possess fine-graineddetails, since our discriminative module learns to attentionon the subtle identity proper

generative models to augment training data and enhance the invariance to input changes. The generative pipelines . code and combining with different structure codes, we can . work that is able to end-to-end integrate discriminative and generativ

Related Documents:

Generative Models: Gaussian Discriminative Analysis and …

1 Generative vs Discriminative Generally, there are two wide classes of Machine Learning models: Generative Models and Discriminative Models. Discriminative models aim to come up with a \good separator". Generative Models aim to estimate densities to the training data. Generative Models ass

39 Views

2y ago

Eigenboosting: Combining Discriminative and Generative ...

Combining discriminative and generative information by using a shared feature pool. In addition to discriminative classify- . to generative models discriminative models have two main drawbacks: (a) discriminant models are not robust, whether. in

34 Views

2y ago

Structured Discriminative Models for Speech Recognition

Structured Discriminative Models for Speech Recognition Combining Discriminative and Generative Models Test Data ϕ( , )O λ λ Compensation Adaptation/ Generative Discriminative HMM Canonical O λ Hypotheses λ Hypotheses Score Space Recognition O Hypotheses Final O Classifier Use generative

27 Views

2y ago

Combining information theoretic kernels with generative ...

Combining information theoretic kernels with generative embeddings . images, sequences) use generative models in a standard Bayesian framework. To exploit the state-of-the-art performance of discriminative learning, while also taking advantage of generative models of the data, generative

33 Views

2y ago

Learning Generative Models via Discriminative Approaches

For the discriminative models: 1. This framework largely improves the modeling capability of exist-ing discriminative models. Despite some recent efforts in combining discriminative models in the random ﬁelds model [13], discrimina-tive model

32 Views

2y ago

[ Hui Jiang and Xinwei Li ] - York University

combining generative and discriminative learning methods. One active research topic in speech and language processing is how to learn generative models using discriminative learning approaches. For example, discriminative training (DT) of hidden Markov models (HMMs) fo

24 Views

2y ago

Hybrid Discriminative-Generative Approach with Gaussian ...

2 Discriminative Models 2.1 Overview From a probabilistic perspective, a discriminative model (or regression model ) represents a conditional . Generative models (or joint models ) consist of mod- . to the shared challeng

33 Views

2y ago

cPrime -Agile Processes for Hardware Development

Thus it might seem that Scrum, the Agile process often used for software development, would not be appropriate for hardware development. However, most of the obvious differences between hardware and software development have to do with the nature and sequencing of deliverables, rather than unique attributes of the work that constrain the process. The research conducted for this paper indicates .

43 Views

3y ago

Recent Views

Quotes within Quotes: When Single (') and Double (") Quotes . - SAS

Here the outside double quotes are replaced by a single quote and the apostrophe is replaced by two single quotes. This works because when the parser sees two single (or double) quotes immediately following each other, the parser resolves them into one quote mark after the closing quote has been determined.

1y ago

237 Views

What These Inspirational Quotes Say

Self Motivation Quotes Success Quotes Teacher Quotes And after reading all of these inspirational quotes you’d like to share which quotation is . -- Brian Tracy "You must constantly ask yourself these questions: Who am I around? What are they doing to me? Wha

2y ago

302 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Quotations - Free Website Builder: Create free websites

cards, but sometimes, playing a poor hand well." . 50th Birthday Quotes 60th Birthday Quotes And there are more. Funny Birthday Quotes Cute Birthday Quotes . it a try, itʼs free. Triumph over failure can be a

2y ago

267 Views

The Top 100 Motivational & Inspirational Quotes for 2015

I've spent hours crawling through the web trying to find the best quotes to keep me motivated and inspired all throughout the New Year. I've saved hundreds of quotes on my laptop and figured that words alone could motivate and inspire me. but if I couple the quotes

2y ago

329 Views

Inspirational Quotes - Guideposts

Inspirational Quotes Inspiring quotes are like vitamins for the soul. From the heartfelt to the humorous, the words of wisdom you’ll find here will strengthen your faith, lift your spirits, and even spark a positive change in your life. This collection of some our favorite inspirational quotes from religious figures, world leaders, authors,

2y ago

553 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Common Questions About Home Insurance

Homes with good security will generally be offered lower insurance quotes than the equivalent homes with poor security. In fact, some insurers may not offer quotes at all for homes with poor security. Contents Insurance Is money automatically covered? Most insurance policies will cover a limited amount of money (say up to 500) as part of

1y ago

257 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Joint Discriminative And Generative Learning For Person Re .

It looks like you're using an ad-blocker