EmojiGAN: Learning Emojis Distributions With A Generative .

3y ago
39 Views
2 Downloads
1.66 MB
7 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Aarya Seiber
Transcription

EmojiGAN: learning emojis distributions with a generative modelThang Doan Desautels Faculty of ManagementMcGill UniversityBogdan Mazoure Department of Mathematics & StatisticsMcGill UniversitySaibal RayDesautels Faculty of ManagementMcGill UniversityAbstractmethods excel at conveying the dominant idea ofthe input. On the other hand, we use ideograms,also popular under the names of emojis or pictographs as a natural amalgam between annotationand summarization tasks. Note that, in this work,we use the terms emoji, ideogram and pictographinterchangeably to represent the intersection ofthese three domains. Ideograms bridge togetherthe textual and visual spaces by representinggroups of words with a concise illustration. Theycan be seen as surrogate functions which convey,up to a degree of accuracy, reactions of socialmedia users. Furthermore, because each emoji hasa corresponding text description, there is a directmapping from ideograms onto the word space.In this paper, we model the distribution of emojisconditioned on an image with a deep generativemodel. We use generative adversarial networks(GANs) (Goodfellow et al., 2014), which arenotoriously known to be harder to train thanother distributional models such as variationalauto-encoders (VAEs) (Kingma and Welling,2013) but tend to produce sharper results oncomputer vision tasks.Generative models have recently experienceda surge in popularity due to the developmentof more efficient training algorithms and increasing computational power. Models such asadversarial generative networks (GANs) havebeen successfully used in various areas such ascomputer vision, medical imaging, style transfer and natural language generation. Adversarial nets were recently shown to yield resultsin the image-to-text task, where given a set ofimages, one has to provide their correspondingtext description. In this paper, we take a similar approach and propose a image-to-emoji architecture, which is trained on data from social networks and can be used to score a givenpicture using ideograms. We show empiricalresults of our algorithm on data obtained fromthe most influential Instagram accounts.1IntroductionThe spike in the amount of user-generated visualand textual data shared on social platforms such asFacebook, Twitter, Instagram, Pinterest and manyothers luckily coincides with the developmentof efficient deep learning algorithms (Perozziet al., 2014; Pennacchiotti and Popescu, 2011;Goyal et al., 2010). As humans, we can notonly share our ideas and thoughts through anyimaginable media, but also use social networksto analyze and understand complex interpersonalrelations. Researchers have access to a rich set ofmetadata (Krizhevsky, 2012; Liu et al., 2015) onwhich various computer vision (CV) and naturallanguage processing (NLP) algorithms can betrained.For instance, recent work in the area of imagecaptioning aims to provide a short description (i.e.caption) of a much larger document or image (Daiet al., 2017; You et al., 2016; Pu et al., 2016). Such2Related Work and MotivationSince the release of word2vec by Mikolov andcolleagues in 2013 (Mikolov et al., 2013), vectorrepresentations of language entities have becomemore popular than traditional encodings such asbag-of-words (BOW) or n-grams (NG). Becauseword2vec operations preserve the original semantic meaning of words, concepts like word similarity and synonyms are well-defined in the newspace and correspond to closest neighbors of apoint according to some metric.The aforementionned word representation was followed by doc2vec (Le and Mikolov, 2014). Orig- These authors contributed equally.273Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 273–279Brussels, Belgium, October 31, 2018. c 2018 Association for Computational Linguisticshttps://doi.org/10.18653/v1/P17

gram. A practical application of our method is toanalyze the effects of product advertisement on Instagram users. Previous works attempted to predict the popularity of Instagram posts by using surrogate signals such as number of likes or followers (Almgren et al., 2016; De et al., 2017). Othersused social media data in order to model the popularity of fashion industry icons (Park et al., 2016).A thorough inspection of clothing styles aroundthe world has also been conducted (Matzen et al.,2017).inally, doc2vec was meant to efficiently encodecollections of words as a whole. However, sinceempirical results suggest a similar performance forboth algorithms, researchers tend to opt for thesimpler and more interpretable word2vec model.One of the most recent and the most interesting vector embeddings has been emoji2vec (Eisner et al., 2016). It consists of more than 1,600symbol-vector pairs, each associating a Unicodecharacter to a real 300 dimensional vector. Theabundance of pictographs such as emojis on social communication platforms suggests that wordonly analyses are limited in their scope to capture the full scale of interactions between individuals. Emojis’ biggest advantage is their universality: no information is lost due to faulty translations, mistyped characters or even slang words.In fact, emojis were designed to be more conciseand expressive than words. They, however, havebeen shown to suffer from varying interpretationswhich depend of factors such as viewing the pictograph on an iPhone or a Google Pixel (Milleret al., 2016). This in turn implies that the subjectof conversation highly impacts the choice of media (text or emoji) picked by the user (Kelly andWatts, 2015). Reducing a whole media such asa public post or an advertisement image to a single emoji would almost certainly mean loosing therichness of information, which is why we suggestto instead model visual media as a conditional distribution over emojis that users employ to scorethe image.Deep neural models have previously been used toanalyse pictographic data: (Cappallo et al., 2015)used them to assign the most likely emoji to a picture, (Felbo et al., 2017) predicted the prevalentemotion of a sentence and (Zhao and Zeng, 2017)used recurrent neural networks (RNNs) to predictthe emoji which best describes a given sentence.We build on top of this work to propose EmojiGAN a model meant to generate realistic emojis based on an image. Since we are interested inmodeling a distribution over image-emoji tuples,it is reasonable to represent it using generativeadversarial networks. They have been shown tosuccessfully memorize distributions over both textand images. For example, a GAN can be coupledwith RNNs in order to generate realistic imagesbased on an input sentence (Reed et al., 2016).We train our algorithm on emoji-picture pairs obtained from various advertisement posts on Insta-33.1Proposed ApproachGenerative Adversarial NetworksGenerative Adversarial Networks (GANs) (Goodfellow et al., 2014) have recently gained hugepopularity as a blackbox unsupervised method oflearning some target distribution. Taking roots ingame theory, their training process is framed as atwo player zero-sum game where a generator network G tries to fool a discriminator network D byproducing samples closely mimicking the distribution of interest. In this work, we use WassersteinGAN (Arjovsky et al., 2017), a variant of the original GAN which uses the Wasserstein metric in order to avoid problems such as mode collapse. Thegenerator and the discriminator are gradually improved through either alternating or simultaneousgradient descent minimization of the loss functiondefined as:min maxGDE[D(x)] x fX (x)E [ D(x)] p(λ),x G(z)(1)where p(λ) λ( x̃ D(x̃) 1)2 ,x̃ εx (1 ε)G(Z), ε Uniform(0, 1),and Z fZ (z). This gradient penalized loss(Gulrajani et al., 2017) is now widely used toenforce the Lipschitz continuity constraint. Notethat setting λ 0 recovers the original WGANobjective.3.2Choice of embeddingMultiple embeddings have been proposed to encode language entities such as words, ideograms,sentences and even documents. A more recentsuccessor of word2vec, emoji2vec aims to encode groups of words represented by visual symbols (ie ideograms or emojis). This representation is a fine-tuned version of word2vec which was274

Algorithm 1 Conditional Wasserstein GANInput: Tuple of emojis and images (X, Y ), thegradient penalty coefficient λ, the number ofcritic iterations per generator iteration ncritic ,the batch size m, learning rate lr and weightvector w.Initialization: initialize generator parametersθG0 , critic parameters θD0for epoch 1, ., N dofor t 1, ., ncritic do{Updating Discriminator}for n 1, ., ndisc domSample {x}mi 1 X, {y}i 1 Y ,mm{z}i 1 N (0, 1), { }i 1 U [0, 1]x̃i xi (1 i )G(zi yi )L(i) D(G(zi (yi )) D(xi yi ) λ( x i D(x̃i yi ) P1)2(i)θD Adam( θD mi 1 wi L , lr )end for{Updating Generator}for n 1, ., ngen do(i) }m N (0, 1)sample a batch of {zPi 1(i)θG Adam( θG mi 1 wi L , lr )end forend forend fortrained on roughly 1,600 emojis to output a 300dimensional real-valued vector. We experimentedwith both word2vec and emoji2vec by encodingeach emoji through a sum of the word2vec representations of its textual description. We observedthat both word2vec and emoji2vec embeddingsyielded only a mild amount of similarity for mostemojis. Moreover, dealing with groups of wordsrequires to design a recurrent layer in the architecture, which can be cumbersome and yield suboptimal results as opposed to restricting the generatornetwork to only Unicode characters. Bearing thisin mind, we decided to use the emoji2vec embedding in all of our experiments.3.3Learning a skewed distributionJust like in text analysis, some emojis (mostlyemotions such as love, laughter, sadness) occurmore frequently than domain-specific pictographs(for example, country flags). The distribution overemojis is hence highly skewed and multimodal.Since such imbalance can lead to a considerablereduction in variance, also known as mode collapse, we propose to re-weight each backwardpass with coefficients obtained through either ofthe following schemes:44.1 term frequency-inverse document frequency(tf-idf ) weights, a classical approach usedin natural language processing (Salton andBuckley, 1988);exp k f req(e) e, k 0 (2)NP k freq(e)iexpi 1where k is a smoothing constant andf req(e) count(e)is the frequency of emojiNe and N is the total number of emojis.3.3.1Data collectionWe used the (soon to be deprecated) InstagramAPI to collect posts from top influencers withinthe following categories: fashion, fitness, healthand weight loss; we believe that user data acrossthose domains share similar patterns. Here, influencers are defined as accounts with the highestcombined count of followers, posts and user reactions; 166 influencers were selected from various ranking lists put together by Forbes andIconosquare. The final dataset has 80,000 (image,pictograph) tuples and covers a total of 753 distinct symbols. Exponentially-smoothed raw frequencies:ws (e) Experiments4.2ArchitectureInspired from (Reed et al., 2016), we performedexperiments using the following architecture: thegenerator has 4 convolutional layers with kernelsof size 4 which output a 4 4 feature matrix witha fully connexted layer; the discriminator is identical to G but outputs a scalar softmax instead of a300-dimensional vector. The structure of both Dand G is shown in Fig. 1.AlgorithmOur method relies on the conditional version ofWGAN-GP which accepts fixed size (64 64 3)RGB image tensors. Our approach is presented inAlgorithm. 1, shown below:275

Figure 2: Visualization of t-SNE reduced images andtheir corresponding most frequent pictographs (emojis). The most popular emoji for each picture was obtained by sampling 50 observations from the generatorand taking the mode of the sample. Note that even thistechnique has a stochastic outcome, meaning that if animage has a rather flat distribution, its mode will not beconsistent across runs. The described behaviour can beobserved in the upper right area of both space representations.Figure 1: Illustration of how EmojiGAN learns a distribution. The generator learns the conditional distribution of emojis given a set of pictures while the discriminator assigns a score to each generated emoji.5Resultsposes a shortcoming of the algorithm: if the distribution is flat (i.e. is multimodal), even largesamples will yield different modes just by chance.This phenomenon is clearly present throughout thecloud of pictographs: four identical images yieldthree distinct emojis. On the other hand, the tworemaining examples correctly capture the presenceof two people in a single photo (middle section), aswell expression of amazement (bottom section).The performance of generative models is difficultto assess numerically, especially when it comesto emojis. Indeed, the Fréchet Inception Distance(Heusel et al., 2017) is often used to score generated images but to the best of our knowledge,no such measure exists for ideograms. As an alternative way to assess the performance of ourmethod, we plotted the true and generated distributions over 30 randomly chosen emojis for 1000random images (see Fig. 3). While our algorithmrelied on raw (i.e. uncleaned and unprocessed)data, we still observe a reasonable match betweenboth distributions.A series of experiments were conducted on thedata collected from Instagram. The best architecture was selected through cross-validation and hyperparameter grid search and has been previouslydiscussed. The training process used minibatch alternating gradient descent with the popular Adamoptimizer (Kingma and Ba, 2014) with a learning rate lr 0.0001 and β1 0.1, β2 0.9.We trained both G and D until convergence after aproximatively 10 epochs. Empirically, wesaw that exponentially-smoothed raw frequenciesweights (2) performed better than tf-idf weights.In order to assess how closely the generator network approximates the true data distribution, wefirst sampled 750 images and obtained their respective emoji distribution by performing 50 forward passes through G. The mode, that is themost frequent observation in the sample, of the resulting distribution is considered as the most representative pictograph for the given image. Weused t-SNE on the image tensor in order to visualize both the image and the emoji spaces (seeFig. 2). The purpose of the performed experimentwas to assert whether two entities close to eachother in the image space will also yield similaremojis. The top right corner of both clouds ex-Fig. 4 reports the fitted distribution of the top10 most frequent observations for three randomlysampled images. The top image represents a fashion model in an outfit; our model correctly captures the concepts of woman, love, and overall276

problems (the sunset pictograph dominates thedistribution). We note how algorithms based onunfiltered data from social networks are prone toethical fallacies, as illustrated in the middle image.This situation is reminiscent of the infamous Microsoft chatbot Tay which started to pick up racistand sexist language after being trained on uncensored tweets and had to be shut down (Neff andNagy, 2016). We ourselves experienced a similar behaviour when assessing the performance ofEmojiGAN. One plausible explanation of this phenomenon would be that while derogatory comments are quite rare, the introduction of exponential weight or similar scores in the hope of preventing mode collapse to the most popular emojihas the side effect of overfitting least frequent pictographs.Figure 3: True and fitted distributions over 30 randomly sampled emojis for 500 randomly sampled images. Probabilities are normalized by the maximal element of the set.positive emotion in the image. However, EmojiGAN can struggle with filtering out unrealisticemojis (in this case, pineapple and pig nose) forimages with very few distinct ideograms. Thebottom subfigure outlines another very commonproblem seen in GANs: mode collapse. While thegenerated emoji fits in the context of the image,the variance in this case is nearly zero and resultsin G learning a Dirac distribution at the most frequent observation.The middle image also suffers from the above6Conclusion and DiscussionIn this work, we proposed a new way of modeling social media posts through a generative adversarial network over pictographs. EmojiGAN managed to learn the emoji distribution for a set ofgiven images and generate realistic pictographicrepresentations from a picture. While the issue ofnoisy predictions still remains, our approach canbe used as an alternative to classical image annotation methods. Using a modified attention mechanism (Xu et al., 2015) would be a stepping stoneto correctly model the context-dependent connotations (Jibril and Abdullah, 2013) of emojis. However, the biggest concern is of ethical nature: training any algorithm on raw data obtained from socialnetworks without filtering offensive and derogatory ideas is itself a debate (Islam et al., 2016;Davidson et al., 2017).Future work on the topic should start witha thorough analysis of algebraic properties ofemoji2vec similar to (Arora et al., 2016). For example, new Unicode formats support emoji composition, which is reminiscent of traditional wordembeddings’ behaviour and could be explicitly incorporated into a learning algorithm. Finally, theethical concerns behind deep learning without limits are not specific to our algorithm but rather acommunity-wide discourse. It is thus important towork together with AI safety research groups inorder to ensure that novel methods developed byresearchers learn our better side.Figure 4: Emojis sampled for some Instagram posts:observe the mode collapse in the bottom subfigure asopposed to more equally spread out distributions.277

ReferencesMartin Heusel, Hubert Ramsauer, Thomas Unterthiner,Bernhard Nessler, Günter Klambauer, and SeppHochreiter. 2017. Gans trained by a two time-scaleupdate rule converge to a nash equilibrium. CoRR,abs/1706.08500.Khaled Almgren, Jeongkyu Lee, et al. 2016. Predicting the future popularity of images on socialnetworks. In Proceedings of the The 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 2016,page 15. ACM.Aylin Caliskan Islam, Joanna J. Bryson, and ArvindNarayanan. 2016. Semantics derived automaticallyfrom language corpora necessarily contain humanbiases. CoRR, abs/1608.07187.Martı́n Arjovsky, Soumith Chintala, and Léon Bottou.2017. Wasserstein GAN. CoRR, abs/1701.07875.Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma,and Andrej Risteski. 2016. Linear algebraic structure of word senses, with applications to polysemy.arXiv preprint arXiv:1601.03764.Tanimu Ahmed Jibril and Mardziah Hayati Abdullah. 2013. Relevance of emoticons in computermediated communication contexts: An overview.Asian Social Science, 9(4):201.Spencer Cappallo, Thomas Mensink, and Cees G.M.Snoek. 2015. Image2emoji: Zero-shot emoji prediction for visual media. In Proceedings of the 23rdACM International Conference on Multimedia, MM’15, pages 1311–1314, New York, NY, USA. ACM.Ryan Kelly and Leon Watts. 2015. Characterisingthe inventive appropriation of emoji as relationallymeaningful in mediated close personal relationships.Experiences of Technology Appropriation: Unanticipated Users, Usage, Circumstances, and Design.Bo Dai, Dahua Lin, Raquel Urtasun, and Sanja Fidler. 2017.Towards diverse and natural image descriptions via a conditional GAN. CoRR,abs/1703.06029.Diederik P Kingma and Jimmy Ba. 2014. Adam: Amethod for stochastic optimization. arXiv preprintarXiv:1412.6980.Diederik P Kingma and Max Welling. 2013. Autoencoding variational bayes.arXiv preprintarXiv:1312.6114.Thomas Davidson, Dana Warmsley, Michael W. Macy,and Ingmar Weber. 2017. Automated hate speechdetection and the problem of offensive language.CoRR, abs/1703.04009.Alex Krizhevsky. 2012. Learning multiple layers offeatures from tiny imag

jis based on an image. Since we are interested in modeling a distribution over image-emoji tuples, it is reasonable to represent it using generative adversarial networks. They have been shown to successfully memorize distributions over both text and images. For example, a GAN can be coupled with RNNs in order to generate realistic images

Related Documents:

The illustration in figure 3, a letter from Lewis Carroll illustrates this very clearly, and demonstrates how today’s Emojis reflect earlier pictorial insertions in written language to conve

developed some conventions, such as depicting face emojis in bright yellow,13 which are now likely scènes à faire. Third, though Unicode's IP policy is not crystal clear, Unicode likely either disclaims ownership or freely grants unrestricted usage of its emoji definitions. Platform-specific implementations of

Concepts of compound, truncated and mixture distributions (definitions and examples). Sampling distributions of sample mean and sample variance from Normal population, central and non-central chi-Square, t and F distributions, their properties and inter relationships. UNIT III Concepts of random vectors, moments and their distributions.

DESCRIBING AND COMPARING DATA DISTRIBUTIONS TEACHER VERSION Subject Level: High School Math Grade Level: 9 Approx. Time Required: 50 minutes Learning Objectives: Students will be able to compare and contrast data distributions in terms of shape, center, and spread. Students will be able to describe key features of a histogram or box plot .

May 01, 2020 · the narrator’s instructions and have fun! Tracks increase in complexity, using 1 to 3 pitches and 5 popular styles. Reflection emojis support self-awareness skills of Social Emotional Learning. 4-8 Create I can create a rhythm composition. Move it! Plan and organize 2-measure phrases into an 8-measure composition, then put it in your feet.

engineering with statistics. The reliability engineer’s understanding of statistics is focused on the practical application of a wide variety of accepted statistical methods. Most reliability texts provide only a basic introduction to probability distributions or only provide a detailed reference to few distributions.

children, each of whom had differing numbers of conversations about schoolwork with her child in the past week. The population parameters are presented in Table 9-1, along with the simple data array from . This brings us to the third type of distribution. Chapter 9: Distributions: Population, Sample and Sampling Distributions.File Size: 1MB

American Revolution Lapbook Cut out as one piece. You will first fold in the When Where side flap and then fold like an accordion. You will attach the back of the Turnaround square to the lapbook and the Valley Forge square will be the cover. Write in when the troops were at Valley Forge and where Valley Forge is located. Write in what hardships the Continental army faced and how things got .