Research Paper Recommender System Evaluation: A Quantitative . - Docear

1y ago
9 Views
2 Downloads
745.13 KB
8 Pages
Last View : 16d ago
Last Download : 3m ago
Upload by : Braxton Mach
Transcription

Please note that we have recently conducted a more thorough literature vey%20(preprint).pdfJoeran Beel, Bela Gipp, Stefan Langer, and Corinna Breitinger. “Research Paper Recommender Systems: A Literature Survey.” InternationalJournal on Digital Libraries (2015):1–34. doi:10.1007/s00799-015-0156-0.Research Paper Recommender System Evaluation:A Quantitative Literature SurveyJoeran BeelDocear, MagdeburgGermanyStefan LangerDocear, MagdeburgGermanyMarcel GenzmehrDocear, MagdeburgGermanyBela GippUniv. of California,Berkeley, USACorinna BreitingerUniv. of California,Berkeley, USAAndreas NürnbergerOvGU, FIN, ITIMagdeburg, dreas.nuernberger@ovgu.deABSTRACTOver 80 approaches for academic literature recommendation existtoday. The approaches were introduced and evaluated in more than170 research articles, as well as patents, presentations and blogs. Wereviewed these approaches and found most evaluations to containmajor shortcomings. Of the approaches proposed, 21% were notevaluated. Among the evaluated approaches, 19% were notevaluated against a baseline. Of the user studies performed, 60%had 15 or fewer participants or did not report on the number ofparticipants. Information on runtime and coverage was rarelyprovided. Due to these and several other shortcomings described inthis paper, we conclude that it is currently not possible to determinewhich recommendation approaches for academic literature are themost promising. However, there is little value in the existence ofmore than 80 approaches if the best performing approaches areunknown.Categories and Subject DescriptorsH.3.3 [Information Storage and Retrieval]: Information Searchand Retrieval – information filtering.General TermsMeasurement, Algorithms, Performance, ExperimentationKeywordsIn the remainder of this paper, we describe the main features, whichcontribute to a ‘good’, i.e. a high quality, recommender system, andthe methods used to evaluate recommender systems. We thenpresent our research objective and methodology, and conclude withthe results and a discussion.1.1 Features of Recommender System Quality1.1.1 AccuracyThe first factor that contributes to a good recommender is itsaccuracy, i.e. its capacity to satisfy the individual user’s informationneed [62]. Information needs vary among users due to differentbackground and knowledge [3], preferences and goals [4], andcontexts [108]. One user may be interested in the most recentresearch papers on mind mapping, while another may be interestedin the first publication introducing recommender systems, or themost popular medical research on lung cancer, but only in a givenlanguage, etc. Items that satisfy the information needs are“relevant” to the user [62]. Accordingly, a good recommendersystem is one that recommends (the most) relevant items. To do so,a recommender system must first identify its users’ informationneeds and then identify the items that satisfy those needs. How wella recommender system performs at this task is reflected by itsaccuracy: the more relevant, and the less irrelevant items itrecommends, the more accurate it is.30Number of papersResearch paper recommender systems, evaluation, comparativestudy, recommender systems, survey1. INTRODUCTIONRecommender systems for research papers are becomingincreasingly popular. In the past 14 years, over 170 research articles,patents, web pages, etc. were published in this field. Interpolatingfrom the numbers of published articles in this year, we estimate 30new publications to appear in 2013 (Figure 1). Recommendersystems for research articles are useful applications, which forinstance help researchers keep track of their research field. Themore recommendation approaches are proposed, the more importanttheir evaluation becomes to determine the best approaches and theirindividual strengths and weaknesses.Evaluating recommender systems requires a definition of whatconstitutes a good recommender system, and how this should bemeasured. There is mostly consensus on what makes a goodrecommender system and on the methods to evaluate recommendersystems [1,11,62]. However, at least in related research fields,authors often do not adhere to evaluation standards. For instance,three quarters of evaluations published in the User Modeling andUser-Adapted Interaction (UMAI) journal were statistically notsignificant, and often had serious shortcomings in their evaluations[2]. These results raise the question whether researchers in the fieldof research paper recommender systems might ignore evaluationstandards in the same way as authors of the UMAI journal.RepSys '13, October 12 2013, Hong Kong, China. Copyright is held bythe owner/author(s). Publication rights licensed to ACM.ACM 978-1-4503-2465-6/13/10 29234618141192016115YearFigure 1: Published papers per year1A prerequisite to achieve high accuracy is high coverage of theavailable items [5]. Coverage describes how many papers of thosein the recommender’s database may be recommended with therecommendation approach. For text-based approaches, coverage isusually 100%. For many citation-based approaches, coverage isusually significantly lower, because only a fraction of all documentsis cited, and can hence be recommended [58].1Based on the papers we reviewed for this article. Numbers for 2013 wereestimated by interpolating from the number of articles published until oursurvey was conducted (late April 2013).

1.1.2 User Satisfaction1.2 Evaluating MethodsThe second factor that contributes to a good recommender system isits ability to provide “satisfaction” to the user [6]. At first glance,one may assume that an accurate recommender system, i.e. one thatrecommends the most relevant items, satisfies the user. However,many additional factors influence user satisfaction. One of thesefactors is serendipity [9,60]. If milk was recommended to acustomer in a supermarket, this could be a very accuraterecommendation, but not a satisfying one [60]. Milk is an obviousproduct to buy in a supermarket. Therefore, most customers wouldbe more satisfied with more diverse recommendations (that stillshould be accurate to some extent). Users may also be dissatisfiedwith accurate recommender systems, if they must wait for too longto receive recommendations [62], the presentation is unappealing[11], labeling of recommendations is suboptimal, orrecommendations are given for commercial reasons [7]2. Usersatisfaction may also differ by demographics – older users tend tobe more satisfied with recommendations than younger users [8]. Inaddition, costs can play a role. Typically, recommender systems arefree but some systems charge users a fee or are only available aspart of subscription packages. One example is the referencemanager Mendeley, which offers its recommender system MendeleySuggest only to its premium users. The time a user must investbefore receiving recommendations may also influence usersatisfaction. Some systems expect users to specify their interestsmanually. In other systems, users’ interests are inferredautomatically, which significantly reduces the user’s required timecommitment. The mentioned factors are only a small selection.There are many more factors influencing whether a user is satisfiedwith a recommender system [9,11].Knowing the three features contributing to a good recommendersystem – recommendation accuracy, user satisfaction, and providersatisfaction – leads to the question how these three features are to bequantified and compared. Aspects related to time and money, suchas runtime, costs, and revenue, can easily be measured and are thusnot covered in detail in the remainder of this paper. To measure arecommender’s accuracy and to gauge user satisfaction threeevaluation methods are commonly used: user studies, onlineevaluations, and offline evaluations [11]3.1.1.3 Satisfaction of the Recommendation Provider1.3 Further ConsiderationsThe third factor contributing to a good recommender system is itsability to satisfy the recommendation provider. Typically, it isassumed that providers of recommender systems are satisfied whentheir users are satisfied, but this is not always the case. One interestof the providers is keeping costs low, where costs may be measuredin terms of labor, disk storage, memory, CPU power, and traffic[11]. As such, a good recommender system may also be defined asone that can be developed, operated, and maintained at a low cost.Other providers, e.g. publishers, may have the goal of generating aprofit from the recommender system [61]. With this goal, apublisher would prefer to recommend items with higher profitmargins even if user satisfaction was not that high. A news-websitemight have the goal of keeping their readers as long as possible ontheir website [61]; in which case, a recommender would preferablysuggest longer articles even if shorter articles might result in higheruser satisfaction.In most situations, there will be a tradeoff between the three factors.For instance, clustering strongly reduce runtimes, and hence costs,but also decreases accuracy [10]; and when the primary goal is togenerate revenue, user satisfaction may suffer. Of course, usersatisfaction should never be too low because then users mightignore the recommendations completely.In user studies, users explicitly rate recommendations generatedwith different algorithms and the algorithm with the highest averagerating is judged the best algorithm [11]. In online evaluations,recommendations are shown to users as they use the real-worldsystem [11]. Users do not rate recommendations; rather, the systemobserves how often users accept a recommendation. Acceptance istypically measured by click-through rate (CTR), i.e. the ratio ofclicked recommendations4. To compare two algorithms,recommendations are created using each algorithm and then CTR ofthe algorithms are compared (A/B test). Offline evaluations usepre-compiled offline datasets from which some information isremoved for the evaluation. Subsequently, the recommenderalgorithms are analyzed on their ability to recommend the removedinformation.Which of the three evaluation methods is most suitable is still underdebate. Typically, offline evaluations are considered suitable to preselect a set of promising algorithms, which are subsequentlyevaluated in online evaluations or by a user study [11]. However,there is serious criticism of offline evaluations [60–65,106,111].Another important factor in evaluating recommender systems is thebaseline against which an algorithm is compared. Knowing that acertain algorithm has a CTR of e.g. 8% is not useful if the CTRs ofalternative approaches are unknown. Therefore, novel approachesshould be compared against a baseline representative of the state-ofthe-art. Only then is it possible to quantify whether a novelapproach is better than the state-of-the-art and by what margin.Additionally, a statistically significant number of participants iscrucial to user study validity, as well as sufficient information onalgorithm complexity and runtime, the use of representativedatasets, and several other factors [11]. Only if all these factors areconsidered, will an evaluation produce valid results that allowidentifying the best recommendation approaches. Of course, it isalso important that researchers publish all relevant details abouttheir evaluation and their approaches to allow others to verify thevalidity of the conducted evaluations and to implement theapproaches.32Identical recommendations, which were labeled once as organic and onceas commercial, influenced user satisfaction ratings despite having equalrelevance.4We ignore provider’s satisfaction in the remainder since this type ofsatisfaction should usually relate to numbers that are easy to measure, e.g.,revenue or costs.Aside from clicks, other user behavior can be monitored, for example, thenumber of times recommendations were downloaded, printed, cited, etc.

2. RESEARCH OBJECTIVE &METHODOLOGYThe research objective we pursued was to examine the validity ofevaluations performed for existing research paper recommendersystems. In reviewing the literature, we assess how suitable existingevaluations are for identifying the most promising research paperrecommender systems.To achieve this objective, we conducted a quantitative analysis ofthe status quo. We seek to answer the following questions.1.2.3.4.5.6.7.8.9.To what extent do authors perform user studies, onlineevaluations, and offline evaluations? (see Section 3.1)How many participants do user studies have? (see Section3.2)Against which baselines are approaches compared?(Section 3.3)Do authors provide information about algorithm’sruntime and computational complexity? (Section 3.4)Which metrics are used for algorithm evaluation, and dodifferent metrics provide similar rankings of thealgorithms? (Section 3.5)Which datasets are used for offline evaluations (Section3.6)Are results comparable among different evaluations basedon different datasets? (Section 3.7)How consistent are online and offline evaluations? Dothey provide the same, or at least similar, rankings of theevaluated approaches? (Section 3.8)Do authors provide sufficient information to reimplement their algorithms or replicate their experiments?(Section 3.9)To identify the status quo, we reviewed 176 papers, including a fewpatents, presentations, blogs, and websites on 89 research paperrecommendation approaches5 [14–56,58,59,66–100,102–110]. Wedistinguish between papers and approaches because often oneapproach is presented or evaluated in several papers. For instance,there are three papers on the recommender system Papyres and allcover different aspects of the same system [12,13,74]. Therefore,we count Papyres as one recommendation approach. To cite anapproach, for which more than one paper exists, we subjectivelyselected the most representative paper. For our analysis, we also‘combined’ the content of all papers relating to one approach. If anapproach was once evaluated using an online evaluation, and inanother paper using an offline evaluation, we say that the approachwas evaluated with both online and offline evaluations. Spacerestrictions keep us from providing an exhaustive bibliography ofthe 176 papers reviewed, so that we only cite the 89 approaches, i.e.one representative paper for each approach.Papers were retrieved using Google Scholar, the ACM DigitalLibraryandSpringerLinkbysearchingfor[paper article citation][recommender recommendation] [system systems] and downloading allan entry in the bibliography pointed to an article not yetdownloaded, the cited article was also downloaded and inspectedfor relevant entries in its bibliography.3. RESULTS3.1 Evaluation Methods19 approaches (21%) were not evaluated [14–26], or were evaluatedusing system-unique or uncommon and convoluted methods [27–31,93]. In the remaining analysis, these 19 approaches are ignored.Of the remaining 70 approaches, 48 approaches (69%), ],24approaches (34%) with a user study [66–74,76,77,79,81,82,87,102–108,110], five approaches (7%) were evaluated in real-worldsystems with an online evaluation [53–56,68] and two approaches(3%) were evaluated using a qualitative user study [84,85] (Table1)7.Interesting in this context is the low number of online evaluations(7%) and the prevalence of offline evaluations (69%). Despiteactive experimentation in the field of research papers recommendersystems, we observed that many researchers have no access toreal-world systems to evaluate their approaches and researchers whodo, often do not use them. For instance, C. Lee Giles and his coauthors, who are some of the largest contributors in the field [57–59,94,96,99,100], could have conducted online experiments withtheir academic search engine CiteSeer. However, they choseprimarily to use offline evaluations. The reason for this may be thatoffline evaluations are more convenient than conducting onlineevaluations or user studies. Results are available within minutes orhours and not within days or weeks as is the case for onlineevaluations and user studies. However, as stated, offline-evaluationsare subject to various criticisms [60–65,106,111].Table 1: Evaluation methods7Offline4869%3.2 Number of Participants in User StudiesFour of the 24 user-studies (17%) were conducted with less thanfive participants [66,67,102,104]. Another four studies had five toten participants [77,79,103,110]. Three studies had 11-15participants [68,81,87], and another four studies had 16-50participants [69–71,105]. Only six studies (25%), were conductedwith more than 50 participants [72–74,106–108]. Three studiesfailed to mention the number of participants [75,76,82] (Table 2).Given these findings, we conclude that most user studies were notlarge enough to arrive at meaningful conclusions on algorithmquality.Table 2: Number of participants in user studiesarticles that had relevance for research paper recommendations6. Ina second step, the bibliography of each article was examined. WhenAbsoluteRelative5We use the term ‘approach’ not only for distinct recommendation conceptslike content based or collaborative filtering, but also for minor variationsin recommendation algorithms.6The relevance judgment was done manually by using the title and if indoubt consulting the abstract.User Study Online Qualitative245234%7%3%7n/a313%Number of Participants 55-1011-1516-50443417%17%13%17% 50625%Some approaches were evaluated with several methods at the same time.Therefore, percentages do not add up to 100.

3.3 BaselinesThirteen of the evaluated approaches (19%) were not evaluatedagainst a baseline (Table 3) [77–88,102]. The evaluations’usefulness is low because knowing that in certain circumstances analgorithm has a certain CTR allows no conclusion on how itcompares against other algorithms. Another 50 approaches (71%)were evaluated against trivial baselines, such as simple contentbased filtering without any sophisticated adjustments. These trivialbaselines do not represent the state-of-the-art and are not helpful fordeciding which of the 89 approaches are most promising. This is inparticular true, since different approaches were not evaluatedagainst the same simple baselines. Even for a simple content-basedapproach, there are many variables such as whether stop-words arefiltered, if and which stemmer is applied, from which documentsection (title, abstract, etc.) the text is extracted, etc. This means,almost all approaches were compared against different baselines.Only seven authors (10%) evaluated their approaches against stateof-the-art approaches proposed by other researchers in the field.Only these seven evaluations allowed drawing some conclusions onwhich approaches may perform best. The authors, however,compared the seven approaches only against some state-of-the-artapproaches. It remains unclear how they would have performedagainst the remaining state-of-the-art approaches8.Table 3: BaselinesAbsoluteRelativeNo Baseline Simple Baseline St.of the Art Bsln.1350719%71%10%3.4 Runtimes & Computational ComplexityOnly eight approaches (11%) provided information on runtime.Runtime information, however, is crucial. In one comparison, theruntimes of two approaches differed by factor 600 [100]. For manydevelopers, an algorithm requiring 600 times more CPU power thananother would probably not be an option. While this example isextreme, it frequently occurred that runtimes differed by factor fiveor more, which can also affect the decisions on algorithm selection.Computational complexity was reported by even fewer evaluations.Computational complexity may be less relevant for researchers buthighly relevant for providers of recommender systems. It isimportant for estimating the long-term suitability of an algorithm.An algorithm may perform well for a few users but it might notscale well. Hence, algorithms with, for example, exponentiallyincreasing complexity most likely will not be applicable in practice.3.5 Use of Offline Evaluation MetricsOut of the 48 offline evaluations, 33 approaches (69%) wereevaluated with precision (Table 4). Recall was used for elevenapproaches (23%), F-measure for six approaches (13%) and NDCG8It is interesting to note that in all published papers with an evaluationagainst a baseline, at least one of the proposed approaches performed betterthan the baseline(s). It never occurred that a paper reported on anon-effective approach. This invited a search for possible explanations. First,authors may intentionally select baselines such that their approaches appearfavorable. Second, the simple baselines used in most evaluations achieverelatively unrefined results, so that any alternative easily performs better.Third, authors do not report their failures, which ties in with the fourth point,which is that journals and conferences typically do not accept publicationsthat report on failures.for six approaches. Seven approaches (15%) were evaluated usingother measures [88–91,97,98,105]. Overall, results of the differentmeasures highly correlated – that is algorithms, which performedwell using precision also performed well using, for instance,NDCG.Table 4: Evaluation measures7AbsoluteRelativePrecision Recall F-Measure NDCG33116669%23%13%13%MRR48%Other715%3.6 Use of DatasetsResearchers used different datasets to conduct their offlineevaluations (Table 5). Fourteen approaches (29%) were evaluatedusing data from CiteSeer and five approaches (10%) were evaluatedusing papers from ACM. Other data sources included CiteULike(10%), DBLP (8%) and a variety of others, many not publiclyavailable (52%). Even when data originated from the same sources,this did not guarantee that the same datasets were used. Forinstance, fourteen approaches used data from CiteSeer but no single‘CiteSeer dataset’ exists. Authors collected CiteSeer data atdifferent times and pruned datasets differently. Some authorsremoved documents with less than two citations from the corpus[92], others with less than three citations [107], and others with lessthan four citations [93]. One study removed all papers with less thanten and more than 100 citations and all papers citing less than 15and more than 50 papers [94]. Of the original dataset of 1,345,249papers, only 81,508 remained, about 6%. The question arises howrepresentative results can be based on such a pruned dataset.Table 5: Data ke510%DBLP48%Others2552%In conclusion, it is safe to say that no two studies performed bydifferent authors, used the same dataset. This raises the question towhat extent results based of different datasets are comparable?3.7 Universality of Offline DatasetsSeven approaches were evaluated on different offline datasets [95–100,110].The analysis of these seven evaluations confirms a well-knownfinding: results from one dataset do not allow any conclusions onthe absolute performance achievable in another dataset. Forinstance, an algorithm, which achieved a recall of 4% on an IEEEdataset, achieved a recall of 12% on an ACM dataset [110].However, the analysis also showed that the relative performanceamong different algorithms remained quite stable over differentdatasets. Algorithms performing well on one dataset (compared tosome baselines) also performed well on other datasets (compared tothe same baselines). Dataset combinations included CiteSeer andsome posts from various blogs [97], CiteSeer and Web-kd [98],CiteSeer and CiteULike [100], CiteSeer and Eachmovie [99], andIEEE, ACM and ScienceDirect [110]. Only in one study resultsdiffered notably, however, the absolute ranking of the algorithmsremained stable [100] (see Table 6). In this paper, the proposedapproach (CTM) performed best on two datasets with a MRR of0.529 and 0.467 respectively. Three of the four baselines performedsimilarly on the CiteSeer dataset (all with a MRR between 0.238and 0.288). However, for the CiteULike dataset the TM approachperformed four times as well as CRM. This means, if TM had beencompared with CRM, rankings would have been similar on the

CiteSeer dataset but different on the CiteULike dataset. Asmentioned, for all other reviewed evaluations no such variations inthe rankings were observed.Table 6: MRR of different recommendation approaches onCiteSeer and CiteULike datasetsDatasetRank Approach CiteSeer CiteULike1 CTM0.5290.4672 TM0.2880.2853 cite-LDA0.2850.1434 CRM0.2380.0725 link-LDA0.0280.013Overall, a sample size of seven is small, but it gives at least someindication that the impact of the chosen dataset is rather low. Thisfinding is interesting because in other fields it has been observedthat different datasets lead to different results [101].3.8 Consistency of Offline Evaluations andUser StudiesSix approaches were evaluated using an offline evaluation inaddition to a user study [102–107]. Of these six evaluations, one didnot compare its approach against any baseline [102]. The remainingfive evaluations reported non-uniform results. In two cases, resultsfrom the offline evaluations were similar to results of the userstudies [103,105]. However, the user studies had only five and 19participants respectively. As such, results should be interpreted withsome skepticism. Three other studies reported that results of theoffline evaluations contradicted the results of the user studies[104,106,107]. Two of these studies had more than 100 participants;the other study only had two participants. The findings indicate thatresults from user studies and offline evaluation do not necessarilycorrelate, which could question the validity of offline evaluations ingeneral [111].Only 10% of the reviewed approaches were compared against atleast one state-of-the-art approach.In addition, runtime information was only provided for 11% of theapproaches, despite this information being crucial for assessingalgorithm practicability. In one case, runtimes differed by factor600. Details on the proposed algorithms were often sparse, whichmakes a re-implementation difficult in many cases. Only fiveapproaches (7%) were evaluated using online evaluations. Themajority of authors conducted offline evaluations (69%). The mostfrequent sources for retrieving offline datasets were CiteSeer (29%),ACM (10%), and CiteULike (10%). However, the majority (52%)of evaluations were conducted using other datasets and even thedatasets from CiteSeer, ACM, and CiteULike differed, since theywere all fetched at different times and pruned differently. Becauseof the different datasets used, individual study outcomes are notcomparable. Of the approaches evaluated with a user study (34%),the majority (58%) of these studies had less than 16 participants. Inaddition, user studies sometimes contradicted results of offlineevaluations. These observations question the validity of offlineevaluations, and demand further research.Given the circumstances, an identification of the most promisingapproaches for recommending research papers is not possible, andneither is a replication for most evaluations. We consider this amajor problem for the advancement of research paper recommendersystems. Researchers cannot evaluate their novel approaches againsta state-of-the-art baseline because no state-of-the-art baseline exists.Similarly, providers of academic services, who wish to implement arecommender system, have no chance of knowing which of the 89approaches they should implement.We suggest the following three points of action to ensure that thebest research paper recommender approaches can be determined:1.Interestingly, the three studies with the most participants were allconducted by the authors of TechLens [105–107], who are also theonly authors in the field of research paper recommender systemsdiscussing the potential shortcomings of offline evaluations [108]. Itseems that other researchers in this field are not aware of problemsassociated with offline evaluations although there has been quite adiscussion.2.3.3.9 Sparse Information on AlgorithmsMany authors provided sparse information on the exact workings oftheir proposed approaches. Hence, replication of their evaluations,or re-implementing their approaches, for example, to use them as abaseline, is hardly possible. For instance, one set of authors statedthey had created content-based user models based on a user’sdocuments. From which document section (title, abstract, keywords,body, etc.) the text was taken was not explained. However, takingtext from titles, abstracts or the body makes a significant difference[109,110].If these actions are not taken, researchers will continue to evaluatetheir approaches without comparable results, and although manymore approaches would exist, it would be unknown which are mostpromising for practical application, or against which to comparenew approaches.5. REFERENCES[1]4. SUMMARY & OUTLOOKThe review of 176 publications has shown that no consensus existson how to evaluate and compare research paper recommenderapproaches. This leads to the unsatisfying situation that despite themany evaluations, the individual strengths and weaknesses of theproposed approaches remain largely unknown. Out of 89 reviewedapproaches, 21% were not evaluated. Of the evaluated approaches,19% were not evaluated against a baseline. Almost all evaluationsthat compared against a baseline, compared against trivial baselines.Discuss the suitability of offline evaluations forevaluating research paper recommender systems (westarted this already with the preliminary conclusion thatoffline evaluations are unsuitable in many cases forevaluating research paper recommender systems [111]).Re-evaluate existing approaches, ideally in real-worldsystems with suitable baselines, sufficient studypa

as commercial, influenced user satisfaction ratings despite having equal relevance. 1.2 Evaluating Methods Knowing the three features contributing to a good recommender system - recommendation accuracy, user satisfaction, and provider satisfaction - leads to the question how these three features are to be quantified and compared.

Related Documents:

certain user. Collaborative recommender systems recommend items based on similarities and dissimilarities among users' preferences. This paper presents a collaborative recommender system that recommends university elective courses to students by exploiting courses that other similar students had taken. The

So far, the interactive visualization of recommender systems data hasgainedonlyli lea entionofthecommunity(e.g. [6], [23], [26]). „e corresponding publications are limited to one speci c use case and/or one suitable visualization technique. A comprehensive and general discussion of interactive visualizations for recommender

by fuzzy tools. The paper is structured as follows. Section 2 presents a brief background on recommender sys-tems and fuzzy tools, and includes details related to content-based and collaborative filtering recom-mendation. Section 3 explains the survey method-ology used for obtaining the research works to be considered.

recommendation system to filter them. Recommender systems have become a crucial service for any electronic shop in the recent years and they are particularly used for movies or books recommendation. Having an efficient system can generate significant end-user value, by al

This thesis develops a Breast Cancer Recommender System (BCRS), which rec- ommends health related articles appropriate for patients confronting breast cancer. The proposed system provides recommendation in four main classes: life style, emo- tional concerns, risk factors and treatment. 1.4 Contributions

constructed questions for evaluation of learning gains. More-over, feedback loop between a learner model and collection of data for evaluation [13] further complicates the evalua-tion. Evaluation of adaptive systems has been studied before. Evaluation of recommender systems [5] faces many similar issues. Speci cally for educational systems .

1 Francesco Ricci, Lior Rokach, Bracha Shapira, Introduction to Recommender Systems Handbook, Springer, 2011, pp. 2 2 Note the distinction between e-marketplace and e-commerce. E-commerce is the umbrella term for all systems engaged in buying or selling goods and services online whereas an e-marketplace is a system where transactions

ARCHITECTURAL STANDARDS The following Architectural Standards have been developed to aid homeowners, lot owners, architects, builders, and other design professionals in the understanding of what are the appropriate details to preserve a timeless Daufuskie Architecture. The existing residents of the island can rely on these guidelines to encourage quality, attention to detail, and by creating a .