Transfer Meets Hybrid: A Synthetic Approach For Cross .

3y ago
7 Views
3 Downloads
1.11 MB
8 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Victor Nelms
Transcription

Transfer Meets Hybrid: A Synthetic Approach for Cross-DomainCollaborative Filtering with TextGuangneng HuYu ZhangQiang YangDepartment of Computer Science andEngineering, Hong Kong Universityof Science and Technologynjuhgn@gmail.comDepartment of Computer Science andEngineering, Hong Kong Universityof Science and Technologyyu.zhang.ust@gmail.comDepartment of Computer Science andEngineering, Hong Kong Universityof Science and Technologyqyang@cse.ust.hkABSTRACTCollaborative Filtering (CF) is the key technique for recommendersystems. CF exploits user-item behavior interactions (e.g., clicks)only and hence suffers from the data sparsity issue. One researchthread is to integrate auxiliary information such as product reviewsand news titles, leading to hybrid filtering methods. Another threadis to transfer knowledge from source domains such as improvingthe movie recommendation with the knowledge from the bookdomain, leading to transfer learning methods. In real-world applications, a user registers for multiple services across websites. Thusit motivates us to exploit both auxiliary and source information forrecommendation in this paper. To achieve this, we propose a Transfer Meeting Hybrid (TMH) model for cross-domain recommendation with unstructured text. The proposed TMH model attentivelyextracts useful content from unstructured text via a memory network and selectively transfers knowledge from a source domain viaa transfer network. On two real-world datasets, TMH shows betterperformance in terms of three ranking metrics by comparing withvarious baselines. We conduct thorough analyses to understandhow the text content and transferred knowledge help the proposedmodel.CCS CONCEPTS Information systems Personalization.KEYWORDSRecommender Systems; Collaborative Filtering; Deep LearningACM Reference Format:Guangneng Hu, Yu Zhang, and Qiang Yang. 2019. Transfer Meets Hybrid:A Synthetic Approach for Cross-Domain Collaborative Filtering with Text.In Proceedings of the 2019 World Wide Web Conference (WWW’19), May13–17, 2019, San Francisco, CA, USA. ACM, New York, NY, USA, 8 ODUCTIONRecommender systems are widely used in various domains ande-commerce platforms, such as recommending products to buy atAmazon and videos to watch on Youtube. Collaborative Filtering(CF) is an effective approach based on an intuition that if users ratedThis paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.WWW ’19, May 13–17, 2019, San Francisco, CA, USA 2019 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 08558.33135432822items similarly in the past then they are likely to rate items similarlyin the future. Matrix Factorization (MF) techniques are its maincornerstone [26, 35] since they can learn latent factors for users anditems. Recently, neural networks like multilayer perceptrons (MLP)are used to learn non-linear interaction functions from data [10, 17].Both MF and neural CF suffer from the data sparsity and cold-startissues.One solution is to integrate CF with the content information,leading to hybrid methods. Items are usually associated with unstructured text like the news articles and product reviews. Additional information alleviates the data sparsity issue and is essentialfor recommendation beyond user-item interactions. For applicationdomains like recommending research papers and news articles, theunstructured text associated with an item is its text content [1, 47].Other domains like recommending products, the unstructured textassociated with the item is its user reviews which justify the ratingbehavior of consumers [21, 33, 56]. Recently, neural networks havebeen proposed to exploit the item content. For example, memorynetworks [43] are used to model item reviews [19], or to model auser’s neighbors who rated the same items with this user [11].Another solution is to transfer the knowledge from relevant domains and the cross-domain recommendation techniques addresssuch problems [3, 27, 37]. In real-world applications, a user typicallyregisters multiple service systems to acquire different informationneed. For example, a user installs applications in an app store andreads news from another website. It brings us an opportunity toimprove the recommendation performance in the target service (orall services) by learning from across domains. In the above example,we can represent the app installation feedback using a binary matrixwhose entries indicate whether a user has installed an app. Similarly, we use another binary matrix to indicate whether a user hasread a news article. Typically these two matrices are highly sparse,and it is beneficial to learn them simultaneously. This idea is sharpened into the Collective Matrix Factorization (CMF) approach [42]which jointly factorizes these two matrices by sharing the userlatent factors. It combines CF on a target domain and another CFon an auxiliary domain, enabling knowledge transfer [36, 54]. Interms of neural networks, given two activation maps from twotasks, cross-stitch network [34] and its sparse variant [22] learnlinear combinations of both the input activations and feed thesecombinations as input to the successive layers, and hence enablingthe knowledge transfer between two domains.These two threads motivate us to exploit information from boththe content and cross-domain information for recommendation inthis paper. To capture text content and to transfer cross-domain

knowledge, we propose a novel neural model, TMH, for crossdomain recommendation with unstructured text. TMH can not onlyattentively extract useful content via a memory network but alsoselectively transfer knowledge across domains by a novel transfernetwork. A shared layer of feature interactions is stacked on the topto couple high-level representations learned from both networks.On real-world datasets, TMH shows the better performance invarious settings. We conduct thorough analyses to understand howthe content and transferred knowledge help the proposed model.Our contributions are summarized as follows: The proposed model is a novel deep model that transferscross-domain knowledge for recommendation with unstructured text by using an attention based neural network. (Sec. 4) We interpret the memory networks to attentively exploit thetext content to match word semantics with user preferences. The transfer network can selectively transfer source itemswith the guidance of target user-item interactions by theattentive weights. Our model alleviates cold-user and cold-item start issues,and outperforms various baselines on real-world datasets.(Sec. 5)2RELATED WORKSWe review related works on three topics: collaborative filtering,hybrid methods, and cross-domain recommendation.Collaborative Filtering Recommender systems aim at learninguser preferences on unknown items from their past history. Contentbased recommendations are based on the matching between userprofiles and item descriptions. It is difficult to build the profile foreach user when there is no/few content. CF alleviates this issueby predicting user preferences based on the user-item interactionbehavior, agnostic to the content [9]. Latent factor models learnfeature vectors for users and items mainly based on MF [26] whichhas probabilistic interpretations [35]. Factorization machines (FM)can mimic MF [40]. To address the data sparsity, an item-item matrix called Shifted Positive Pointiest Mutual Information (SPPMI) isconstructed from the user-item interaction matrix in the CoFactormodel [28]. It then simultaneously factorizes the interaction matrix and the SPPMI matrix in a shared item latent space, enablingthe usage of co-click information to regularize the learning of theuser-item matrix. In contrast, we use independent unstructured textand source domain information to alleviate the data sparsity issuein the user-item matrix. Neural networks are proposed to pushthe learning of feature vectors towards non-linear representations,including the Neural Network Matrix Factorization (NNMF) andMultiLayer Perceptron (MLP) [10, 17]. The basic MLP architectureis extended to regularize the factors of users and items via socialand geographical information [51]. Other neural approaches learnfrom the explicit feedback for the rating prediction task [5, 56]. Wefocus on learning from the implicit feedback for top-N recommendation [50].Hybrid Filtering Items are usually associated with the contentinformation such as unstructured text (e.g., abstracts of articles andreviews of products). CF approaches can be extended to exploitthe content information [1, 47, 48] and user reviews [16, 20, 33].Combining matrix factorization and topic modelling technique (e.g.,2823Topic MF) is an effective way to integrate ratings with item contents [2, 29, 33]. Item reviews justify the rating behavior of a user,and item ratings are associated with their attributes hidden in reviews [14]. Topic MF methods combine latent item factors in ratingswith latent topics in reviews [2, 33]. The behavior factors and topicfactors are aligned with a link function such as softmax transformation in the Hidden Factors and hidden Topics (HFT) model [33]or an offset deviation in the Collaborative Topic Regression (CTR)model [47]. The CTR model assumes the latent item vector learntfrom the interaction data is close to the corresponding topic proportions learnt from the text content, but allows them to be divergentfrom each other if necessary. Additional sources of informationare integrated into CF to alleviate the data sparsity issues including knowledge graph [49, 55]. Convolutional Networks (CNNs)have been used to extract features from audio signals for musicrecommendation [45] and from image for product and multimediarecommendation [6, 16]. Autoencoders are used to learn an intermediate representations from text [48, 53]. Recurrent networks [1] andconvolutional networks [5, 24, 56] can exploit the word order whenlearning the text representations. Memory networks can reasonwith an external memory [43]. Due to the capability of neurallylearnt word embeddings to address the problems of word sparseness and semantic gap, a memory module can be used to modelitem content [19] or the neighborhood of users [11]. Memory networks can learn to match word semantics with the specific user. Wefollow this thread by using neural networks to attentively extractimportant information from text content.Cross-domain Recommendation Cross-domain recommendation [3] is an effective technique to alleviate the data sparsity issue.A class of MF-based methods has been applied to cross-domain recommendation. Typical methods include the CMF approach whichjointly factorizes two rating matrices by sharing the latent userfactors and hence it enables knowledge transfer. CMF has its heterogeneous variants [37], and codebook transfer [27]. The coordinatesystem transfer can exploit heterogeneous feedbacks [38, 52]. Multiple source domains [32] and multi-view learning [13] are alsoproposed for integrating information from several domains. Transfer Learning (TL) aims at improving the performance of the targetdomain by exploiting knowledge from source domains [36]. Similarto TL, Multitask Learning (MTL) is to leverage useful knowledge inmultiple related tasks to help each other [4, 54]. The cross-stitchnetwork [34] and its sparse variant [22] enable information sharingbetween two base networks for each domain in a deep way. Robustlearning is also considered during knowledge transfer [15]. Thesemethods treat knowledge transfer as a global process with sharedglobal parameters and do not match source items with the specifictarget item given a user. We follow this research thread by usingneural networks to selectively transfer knowledge from the sourceitems. We introduce a transfer network to exploit the source domainknowledge.3A BASIC NEURAL CF NETWORKWe adopt a Feedforward Neural Network (FFNN) as the base neuralCF model to parameterize the interaction function [7, 8, 17]:f (xui P, Q, θ f ) ϕo (.(ϕ 1 (xui )).),(1)

𝑟𝑢𝑖ƸLoss 𝑟𝑢𝑖SoftmaxShared mapDot prod.Dot iven a document dui (w 1 , w 2 , ., wl ) corresponding to the(u, i) interaction, we form the memory slots mk R2d by mappingeach word w k into an embedding vector with matrix A, wherek 1, ., l and the length of the longest document is equal to thememory size. We form a preference vector q (ui ) corresponding tothe given document dui and the user-item interaction (u, i) whereeach element encodes the relevance of user u to these words given(ui )(u )(i )item i as: qk xuT mk x iT mk , k 1, ., l, where we split theReLUα1Sum𝒙𝑗1 αs𝒙𝑗s(u )H(i )(u )(i )mk [mk , mk ] into the user part mk and the item part mk .Then, we compute the attentive weights over words for a givenuser-item interaction to infer the importance of each word’s uniqueH 𝑗1𝑗𝑠Source items𝒙𝑢𝒙𝑖QPu Useri Target item(ui )contribution: pk(ui )exp(βq(ui ))k Softmax(qk ) P(ui ) , where pak ′ exp(βq k ′ )rameter β is introduced to stabilize the numerical computationand can amplify or attenuate the precision of the attention like1a temperature [18]. We set β d 2 by scaling along with thedimensionality [46].We construct the high-level representations by interpolating theexternal memories with the attentive weights as the output:X(ui )oui pk c k ,(2)Figure 1: The architecture of TMH.where input xui [P T xu , Q T x i ] R2d is concatenated fromembeddings of user u and item i and is a projection of their one-hotencodings xu {0, 1}m and x i {0, 1}n via embedding matricesP Rm d and Q Rn d , respectively. The output and hiddenlayers are computed by ϕo and {ϕl } in FFNN. The sizes of users anditems are m and n, respectively. The rating rui is 1 if user u has ankinteraction with item i and 0 otherwise.The base network consists of four modules with the informationwhere the external memory slot c k Rd is another embeddingflow from the input (u, i) to the output rˆui as follows.vector for word w k by mapping it with matrix C.Input: (u, i) u , i This module encodes user-item interactionSelecting Source Items to Transfer We propose a novel transferindices. We adopt the one-hot encoding. It takes user u and itemnetwork (TNet) which can selectively transfer source knowledge fori, and maps them into one-hot encodings u {0, 1}m and i specific target item in a coarse-to-fine way. Given the source items{0, 1}n where only the element corresponding to that index is 1 and[j]u (j 1 , j 2 , ., js ) with which the user u has interacted in theall others are 0.source domain, TNet learns a transfer vector cui Rd to captureEmbedding: u , i xui This module firstly embeds one-hotthe relations between the target item i and source items given theencodings into continuous representations xu P T u and x i user u. The similarities between target item i and source items can(i )Q T i by embedding matrices P and Q respectively, and then conbe computed by their dot products: a j x iT x j , j 1, ., s, wherecatenates them as xui [xu , x i ], to be the input of followingx j Rd is the embedding for the source item j by an embeddingbuilding blocks.matrix H Rn S d . This score computes the compatibility betweenHidden layers: xui zui . This module takes the continuousthe target item and the source items consumed by the user.representations from the embedding module and then transformsscores to be a probability distrithrough several layers to a final latent representation zui (.(ϕ 1 (xui ).). Then, we normalize similarity(i )(i )butionoversourceitems:α Softmax(aThis module consists of hidden layers to learn nonlinear interactionjj ). Finally the transfervector is a weighted sum of the corresponding source item embedbetween users and items.dings:ˆˆOutput : zui rui . This module predicts the score rui for theX(i )given user-item pair based on the representation zui from the lastcui ReLU(α x j ),(3)j jlayer of multi-hop module. Since we focus on one-class collaborative filtering, the output is the probability that the input pair iswhere we introduce non-linearity on the transfer vector by thea positive interaction. This can be achieved by a softmax layer:rectified linear unit ReLU(x ) max (0, x ).1rˆui ϕo (zui ) 1 exp( hPutting It All Together We firstly use a simple neural CF modelT z ) , where h is the parameter.ui(CFNet) which has one hidden layer to learn a nonlinear representation for the user-item interaction:4 THE PROPOSED TMH MODELThe architecture of the TMH model is illustrated in Fig. 1.Matching Word Semantics with User Preferences We adapt amemory network (MNet) to integrate unstructured text since itcan learn to match word semantics with user preferences [12, 19,23, 44]. The MNet consists of one internal memory matrix A RL 2d , where L is the vocabulary size (typically L 8, 000 afterprocessing [47]) and 2d is the dimension of each memory slot, andone external memory matrix C with the same dimensions as A. Thefunction of the two memory matrices works as follows.2824zui ReLU(W xui b),(4)where W and b are the weight and bias parameters in the hiddenlayer. Usually the dimension of zui is half of that xui in a typicaltower-pattern architecture.The outputs from the three individual networks can be viewedhigh-level features of the content text, source domain knowledge,and the user-item interaction. They come from different featurespace learned by different networks. Thus, we use a shared layer on

1the top of the all features: rˆui 1 exp( hT y ) , where h is the pauirameter. The joint representation, yui [Wo oui ,Wz zui ,Wc cui ],is concatenated from the linear mapped outputs of individual networks where matrices Wo ,Wz ,Wc are the corresponding linearmapping transformations.Learning Due to the nature of the implicit feedback and the taskof item recommendation, the squared loss (rˆui rui ) 2 may be notsuitable since it is usually for rating prediction. Instead, we adoptPthe binary cross-entropy loss: L (u,i ) S rui log rˆui (1 rui ) log(1 rˆui ), where the training samples S RT RT are theunion of observed target interaction matrix and randomly samplednegative pairs. Usually, RT RT and we do not perform a predefined negative sampling in advance since this can only generate afixed training set of negative samples. Instead, we generate negativesamples during each epoch, enabling diverse and augmented training sets of negative examples to be used. The objective function canbe optimized by stochastic gradient descent (SGD) and its variantslike adaptive moment(Adam) method [25].Complexity In the model parameters Θ, the embedding matrices P, Q and H contain a large number of parameters since theydepend on the input size of users and (target and source) items,and their scale is hundreds of thousands. Typically, the number ofwords, i.e., the vocabulary size is L 8, 000 [47]. The dimension ofembeddings is typically d 100. Since the architecture follows atower pattern, the dimension of the outputs of the three individualnetworks is also limited within hundreds. In total, the size of modelparameters is linear with the input size and is close to the size oftypical latent factors models [42] and neural CF approaches [17]with a hidden layer. During training, we compute the outputs of thethree individual networks in parallel using mini-batch stochasticoptimization which can be trained efficiently by back-propagation.TMH is scalable to the number of the training data. It can easilyupdate when new data examples come by just feeding them intothe training mini-batch. Thus, TMH c

A Synthetic Approach for Cross-Domain Collaborative Filtering with Text. In Proceedings of the 2019 World Wide Web Conference (WWW’19), May . recommendation [45] and from image for product and multimedia recommendation [6, 16]. Autoencoders are used to learn an interme-

Related Documents:

Master Eco Hybrid 0W-20 Fully synthetic SP-RC, GF-6A Master Eco Hybrid 5W-30 Fully synthetic SP-RC, GF-6B Master Eco VCC 0W-20 Fully synthetic SN A1/B1, . RENAULT RN0720 Elite Evolution RN 5W-30 Fully synthetic C3 RENAULT RN17 Elite Evolution V 0W-30 Fully synthetic C3 BMW LL-04, PORSCHE C30, VW 507.00/504.00

SONATA Hybrid & Plug-in Hybrid Hybrid SE Hybrid Limited Plug-in Hybrid Plug-in Hybrid Limited Power & Handling 193 net hp, 2.0L GDI 4-cylinder hybrid engine with 38 kW permanent magnet high-power density motor —— 202 net hp, 2.0L GDI 4-cylinder hybrid engine with 50 kW permanent magnet high-power density motor —— 6-speed automatic .

ebay,4life transfer factor eczema,4life transfer factor effectiveness,4life transfer factor en el salvador,4life transfer factor en espanol,4life transfer factor en español,4life transfer factor energy go stix,4life transfer factor enummi,4life transfer factor 4life transfer factor equine,4li

The Synthetic Turf Council's quality guidelines help assure an attractive appearance over many years for synthetic grass landscape, particularly in residential communities, business areas and places where curb appeal is essential. The Synthetic Turf Council (STC) created this guide to showcase the numerous uses and benefits of synthetic grass.

Synthetic turf can be utilized around 3,000 hours per year with no "rest" required, more than three times that of natural grass. All synthetic turf fields are not the same. There are a variety of synthetic turf systems and infill materials. Currently, between 12,000-13,000 synthetic turf

Science meets Parliaments in the European Parliament (28 November 2017) and Science meets Regions in the Committee of the Regions (29 November 2017). This eventually resulted in the pilot project 'Science meets Parliaments / Science meets Regions' which was initiated in 2018. Science advice is an invaluable resource for our democracy.

C. Hybrid Quantum Transfer Learning Hybrid neural networks are made up of classical and quantum elements. There is the paradigm by which a pre-trained classical neural network is augmented with a variational quantum circuit [14]; this is how the paradigm called Hybrid Quantum Transfer Learning was born. Based on this approach

A. Synthetic dataset for relevant transfer learning Two recent studies have shown that supervised DL tech-niques can learn from synthetic ultrasound sequences to im-prove motion estimation on in-vitro [2] and in-vivo data [24]. In this context, the realism of synthetic image sequences is key for improving the performance of DL models. In both