1 Keyword Extraction And Clustering For Document Recommendation In .

1y ago
2 Views
1 Downloads
702.32 KB
14 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Mara Blakely
Transcription

1Keyword Extraction and Clustering for DocumentRecommendation in ConversationsMaryam Habibi and Andrei Popescu-BelisAbstract—This paper addresses the problem of keyword extraction from conversations, with the goal of using these keywords toretrieve, for each short conversation fragment, a small numberof potentially relevant documents, which can be recommended toparticipants. However, even a short fragment contains a variety ofwords, which are potentially related to several topics; moreover,using an automatic speech recognition (ASR) system introduceserrors among them. Therefore, it is difficult to infer preciselythe information needs of the conversation participants. We firstpropose an algorithm to extract keywords from the output of anASR system (or a manual transcript for testing), which makesuse of topic modeling techniques and of a submodular rewardfunction which favors diversity in the keyword set, to match thepotential diversity of topics and reduce ASR noise. Then, wepropose a method to derive multiple topically-separated queriesfrom this keyword set, in order to maximize the chances ofmaking at least one relevant recommendation when using thesequeries to search over the English Wikipedia. The proposedmethods are evaluated in terms of relevance with respect toconversation fragments from the Fisher, AMI, and ELEA conversational corpora, rated by several human judges. The scoresshow that our proposal improves over previous methods thatconsider only word frequency or topic similarity, and representsa promising solution for a document recommender system to beused in conversations.Keywords—Keyword extraction, topic modeling, meeting analysis,information retrieval, document recommendation.I.I NTRODUCTIONHUMANS are surrounded by an unprecedented wealthof information, available as documents, databases, ormultimedia resources. Access to this information is conditioned by the availability of suitable search engines, but evenwhen these are available, users often do not initiate a search,because their current activity does not allow them to do so,or because they are not aware that relevant information isavailable. We adopt in this paper the perspective of just-in-timeretrieval, which answers this shortcoming by spontaneouslyrecommending documents that are related to users’ currentactivities. When these activities are mainly conversational, forinstance when users participate in a meeting, their informationneeds can be modeled as implicit queries that are constructed inthe background from the pronounced words, obtained throughCopyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to pubs-permissions@ieee.org.Idiap Research Institute and École Polytechnique Fédéralede Lausanne (EPFL), Rue Marconi 19, CP 592, 1920 andandrei.popescu-belis@idiap.ch.real-time automatic speech recognition (ASR). These implicitqueries are used to retrieve and recommend documents fromthe Web or a local repository, which users can choose to inspectin more detail if they find them interesting.The focus of this paper is on formulating implicit queriesto a just-in-time-retrieval system for use in meeting rooms.In contrast to explicit spoken queries that can be made incommercial Web search engines, our just-in-time-retrieval system must construct implicit queries from conversational input,which contains a much larger number of words than a query.For instance, in the example discussed in Section V-B4 below,in which four people put together a list of items to help themsurvive in the mountains, a short fragment of 120 secondscontains about 250 words, pertaining to a variety of domains,such as ‘chocolate’, ‘pistol’, or ‘lighter’. What would then bethe most helpful 3–5 Wikipedia pages to recommend, and howwould a system determine them?Given the potential multiplicity of topics, reinforced bypotential ASR errors or speech disfluencies (such as ‘whisk’in this example), our goal is to maintain multiple hypothesesabout users’ information needs, and to present a small sampleof recommendations based on the most likely ones. Therefore,we aim at extracting a relevant and diverse set of keywords,cluster them into topic-specific queries ranked by importance,and present users a sample of results from these queries.The topic-based clustering decreases the chances of includingASR errors into the queries, and the diversity of keywordsincreases the chances that at least one of the recommendeddocuments answers a need for information, or can lead to auseful document when following its hyperlinks. For instance,while a method based on word frequency would retrieve thefollowing Wikipedia pages: ‘Light’, ‘Lighting’, and ‘Light MyFire’ for the above-mentioned fragment, users would prefer aset such as ‘Lighter’, ‘Wool’ and ‘Chocolate’.Relevance and diversity can be enforced at three stages:when extracting the keywords; when building one or severalimplicit queries; or when re-ranking their results. The first twoapproaches are the focus of this paper. Our recent experimentswith the third one, published separately [1], show that reranking of the results of a single implicit query cannot improveusers’ satisfaction with the recommended documents. Previousmethods for formulating implicit queries from text (discussedextensively in Section II-B) rely on word frequency or TFIDFweights to rank keywords and then select the highest rankingones [2], [3]. Other methods perform keyword extraction byusing topical similarity [4]–[6], but do not set a topic diversityconstraint.In this paper, we introduce a novel keyword extractiontechnique from ASR output, which maximizes the coverage of

2potential information needs of users and reduces the numberof irrelevant words. Once a set of keywords is extracted, it isclustered in order to build several topically-separated queries,which are run independently, offering better precision than alarger, topically-mixed query. Results are finally merged into aranked set before showing them as recommendations to users.The paper is organized as follows. In Section II-A wereview existing just-in-time-retrieval systems and the policiesthey use for query formulation. In Section II-B we discussprevious methods for keyword extraction. In Section III wedescribe the proposed technique for implicit query formulation,which relies on a novel topic-aware diverse keyword extractionalgorithm (III-A) and a topic-aware clustering method (III-B).Section IV introduces the data and our method for comparingthe relevance of sets of keywords or recommended documentsusing crowdsourcing. In Section V we present and discussthe experimental results on keyword extraction and documentrecommendation. We also exemplify the results on one conversation fragment given in Appendix A.II.S TATE OF THE A RT: J UST- IN -T IME R ETRIEVAL ANDK EYWORD E XTRACTIONJust-in-time retrieval systems have the potential to bringa radical change in the process of query-based informationretrieval. Such systems continuously monitor users’ activitiesto detect information needs, and pro-actively retrieve relevantinformation. To achieve this, the systems generally extractimplicit queries (not shown to users) from the words thatare written or spoken by users during their activities. In thissection, we review existing just-in-time-retrieval systems andmethods used by them for query formulation. In particular,we will introduce our Automatic Content Linking Device(ACLD) [7], [8], a just-in-time document recommendationsystem for meetings, for which the methods proposed in thispaper are intended. In II-B, we discuss previous keywordextraction techniques from a transcript or text.A. Query Formulation in Just-in-Time Retrieval SystemsOne of the first systems for document recommendation,referred to as query-free search, was the Fixit system [9], anassistant to an expert diagnostic system for the products of aspecific company (fax machines and copiers). Fixit monitoredthe state of the user’s interaction with the diagnostic system,in terms of the positions in a belief network built from therelations among symptoms and faults, and ran backgroundsearches on a database of maintenance manuals to provideadditional support information related to the current state.The Remembrance Agent [10], [11], another early just-intime retrieval system, is closer in concept to the system considered in this paper. The Remembrance Agent was integratedinto the Emacs text editor, and ran searches at regular timeintervals (every few seconds) using a query that was based onthe latest words typed by the user, for instance using a buffer of20–500 words ranked by frequency. The Remembrance Agentwas extended to a multimodal context under the name ofJimminy, a wearable assistant that helped users with takingnotes and accessing information when they could not usea standard computer keyboard, e.g. while discussing withanother person [12]. Using TFIDF for keyword extraction,Jimminy augmented these keywords with features from othermodalities, for example the user’s position and the name oftheir interlocutor(s).The Watson just-in-time-retrieval system [13] assisted userswith finding relevant documents while writing or browsing theWeb. Watson built a single query based on a more sophisticated mechanism than the Remembrance Agent, by takingadvantage of knowledge about the structure of the writtentext, e.g. by emphasizing the words mentioned in the abstractor written with larger fonts, in addition to word frequency.The Implicit Queries (IQ) system [14], [15] generated contextsensitive searches by analyzing the text that a user is readingor composing. IQ automatically identified important wordsto use in a query using TFIDF weights. Another query-freesystem was designed for enriching television news with articlesfrom the Web [16]. Similarly to IQ or Watson, queries wereconstructed from the ASR using several variants of TFIDFweighting, and considering also the previous queries made bythe system.Other real-time assistants are conversational: they interactwith users to answer their explicit information needs or toprovide recommendations based on their conversation. Forinstance, Ada and Grace1 are twin virtual museum guides [17],which interact with visitors to answer their questions, suggestexhibits, or explain the technology that makes them work.A collaborative tourist information retrieval system [18], [19]interacts with tourists to provide travel information such asweather conditions, attractive sites, holidays, and transportation, in order to improve their travel plans. MindMeld2 is acommercial voice assistant for mobile devices such as tablets,which listens to conversations between people, and showsrelated information from a number of Web-based informationsources, such as local directories. MindMeld improves theretrieval results by adding the users’ location information tothe keywords of conversation obtained using an ASR system.As far as is known, the system uses state-of-the-art method forlanguage analysis and information retrieval [20].In collaboration with other researchers, we have designedthe Automatic Content Linking Device (ACLD) [7], [8] whichis a just-in-time retrieval system for conversational environments, especially intended to be used jointly by a small groupof people in a meeting. The system constantly listens to themeeting and prepares implicit queries from words recognizedthrough ASR. A selection of the retrieved documents isrecommended to users. Before the solutions proposed in thispaper, the ACLD modeled users’ information needs as a set ofkeywords extracted at regular time intervals, by matching theASR against a list of keywords fixed before the meeting. Weshowed that this method outperforms the use of the entire set ofwords from a conversation fragment as an implicit query [21].Moreover, experiments with the use of semantic similaritybetween a conversation fragment and documents as a criterionfor recommendation have shown that, although this improves1 See2 tp://www.expectlabs.com/mindmeld/.

3relevance, its high computation cost makes it unpractical forjust-in-time retrieval from a large repository [22, 4.12].These findings motivated us to design an innovative keywordextraction method for modeling users’ information needs fromconversations. As mentioned in the introduction, since evenshort conversation fragments include words potentially pertaining to several topics, and the ASR transcript adds additionalambiguities, a poor keyword selection method leads to noninformative queries, which often fail to capture users’ information needs, thus leading to low recommendation relevanceand user satisfaction. The keyword extraction method proposedhere accounts for a diversity of hypothesized topics in adiscussion, and is accompanied by a clustering technique thatformulates several topically-separated queries.B. Keyword Extraction MethodsNumerous methods have been proposed to automaticallyextract keywords from a text, and are applicable also totranscribed conversations. The earliest techniques have usedword frequencies [2] and TFIDF values [3], [23] to rankwords for extraction. Alternatively, words have been rankedby counting pairwise word co-occurrence frequencies [24].These approaches do not consider word meaning, so they mayignore low-frequency words which together indicate a highlysalient topic. For instance, the words ‘car’, ‘wheel’, ‘seat’, and‘passenger’ occurring together indicate that automobiles are asalient topic even if each word is not itself frequent [25].To improve over frequency-based methods, several ways touse lexical semantic information have been proposed. Semanticrelations between words can be obtained from a manuallyconstructed thesaurus such as WordNet, or from Wikipedia,or from an automatically-built thesaurus using latent topicmodeling techniques such as LSA, PLSA, or LDA. For instance, keyword extraction has used the frequency of all wordsbelonging to the same WordNet concept set [4], while theWikifier system [5] relied on Wikipedia links to computeanother substitute to word frequency. Hazen also applied topicmodeling techniques to audio files [26]. In another study, heused PLSA to build a thesaurus, which was then used to rankthe words of a conversation transcript with respect to eachtopic using a weighted point-wise mutual information scoringfunction [27]. Moreover, Harwath and Hazen utilized PLSAto represent the topics of a transcribed conversation, and thenranked words in the transcript based on topical similarity tothe topics found in the conversation [6]. Similarly, Harwathet al. extracted the keywords or key phrases of an audio fileby directly applying PLSA on the links among audio framesobtained using segmental dynamic time warping, and thenusing mutual information measure for ranking the key conceptsin the form of audio file snippets [28]. A semi-supervised latentconcept classification algorithm was presented by Celikyilmaz and Hakkani-Tur using LDA topic modeling for multidocument information extraction [29].To consider dependencies among selected words, word cooccurrence has been combined with PageRank [30], and additionally with WordNet [31], or with topical information [32].For instance, Riedhammer et al. considered the dependenciesamong surrounding words by merging n-gram information obtained from WordNet with word frequency, in order to extractkeywords from a meeting transcript [33]. To reduce the effectof noise in the meeting environments, this method removed alln-grams which appear only once or are represented by longern-grams with the same frequencies. However, as shown empirically in [30], [32] such approaches have difficulties modelinglong-range dependencies between words related to the sametopic. In another study, part-of-speech information and wordclustering techniques were used for keyword extraction [34],while later this information was added to TFIDF so as toconsider both word dependency and semantic information [35].In a recent paper, a word clustering technique was introducedby [36] based on the word2vec vector space representation ofa word in which the dependencies between each word andits surrounding words are modeled using a neural networklanguage model [37], [38]. However, although they consideredtopical similarity and dependency among words, the abovemethods did not explicitly reward diversity and therefore mightmiss secondary topics in a conversation fragment.Supervised machine learning methods have been used tolearn models for extracting keywords. This approach wasfirst introduced by Turney [39], who combined heuristic ruleswith a genetic algorithm. Other learning algorithms such asNaive Bayes [40], Bagging [41], or Conditional Random Fields[42] have been used to improve accuracy. These approaches,however, rely on the availability of in-domain training data, andthe objective functions they use for learning do not considerthe diversity of keywords.III.F ORMULATION OF I MPLICIT Q UERIES FROMC ONVERSATIONSWe propose a two-stage approach to the formulation ofimplicit queries. The first stage is the extraction of keywordsfrom the transcript of a conversation fragment for whichdocuments must be recommended, as provided by an ASRsystem (Subsection III-A). These keywords should cover asmuch as possible the topics detected in the conversation, andif possible avoid words that are obviously ASR mistakes. Thesecond stage is the clustering of the keyword set in the formof several topically-disjoint queries (Subsection III-B).A. Diverse Keyword ExtractionWe propose to take advantage of topic modeling techniquesto build a topical representation of a conversation fragment,and then select content words as keywords by using topicalsimilarity, while also rewarding the coverage of a diverse rangeof topics, inspired by recent summarization methods [43],[44]. The benefit of diverse keyword extraction is that thecoverage of the main topics of the conversation fragmentis maximized. Moreover, in order to cover more topics, theproposed algorithm will select a smaller number of keywordsfrom each topic. This is desirable for two reasons. First, aswe will see in Section III-B on keyword clustering, this willlead to more dissimilar implicit queries, thus increasing thevariety of retrieved documents. Second, if words which arein reality ASR noise can create a main topic in the fragment,

4then the algorithm will choose a smaller number of these noisykeywords compared to algorithms which ignore diversity.The proposed method for diverse keyword extraction proceeds in three steps, represented schematically in Figure 1 (afirst version of this method appeared in [45]). First, a topicmodel is used to represent the distribution of the abstract topicz for each word w noted p(z w) as depicted in Figure 1. Theabstract topics are not pre-defined manually but are representedby latent variables using a generative topic modeling technique.These topics occur in a collection of documents – preferably,one that is representative of the domain of the conversations.Second, these topic models are used to determine weights forthe abstract topics in each conversation fragment representedby βz . These steps are described in Section III-A1. Finally,the keyword list W {w1 , ., wk } which covers a maximumnumber of the most important topics are selected by rewarding diversity, using an original algorithm introduced in thissections.Topic modeling informationTranscript(1)p ( z w1 )p ( z wn )z zDetermine the weights ofconversation topics(2)EzzExtract the best k keywords thatcover all the main topics withhigh probability(3)W{w1 ,., wk }Fig. 1. The three steps of the proposed keyword extraction method: (1) topicmodeling, (2) representation of the main topics of the transcript, and (3) diversekeyword selection.words wi spoken in the fragment.1 Xβz p(z wi )N(1)1 i N2) Diverse Keyword Extraction Problem: The goal of thekeyword extraction technique with maximal topic coverage isformulated as follows. If a conversation fragment t mentions aset of topics Z, and each word w from the fragment t can evokea subset of the topics in Z, then the goal is to find a subsetof k unique words S t, with S k, which maximizes thenumber of covered topics.This problem is an instance of the maximum coverage problem, which is known to be NP-hard. If the coverage function issubmodular and monotone nondecreasing3 , a greedy algorithmcan find an approximate solution guaranteed to be within(1 1e ) 0.63 of the optimal solution in polynomial time [48].To achieve our goal, we define the contribution of a topicz with respect to each set of words S t of size k bysumming over all probabilities p(z w) of the words in the set.Afterward, we propose a reward function, for each set S andtopic z, to model the contribution of the set S to the topicz. Finally, we select one of the sets S t which maximizesthe cumulative reward values over all the topics. The wholeprocedure is formalized below.3) Definition of a Diverse Reward Function: We introducerS,z , the contribution towards topic z of the keyword set Sselected from the fragment t:XrS,z p(z w)(2)w SWe propose the following reward function for each topic,where βz represents the weight of topic z over all the wordsof the fragment to assign a higher weight to topics withhigher value and λ is a parameter between 0 and 1. Thisis a submodular function with diminishing returns when rS,zincreases, as proved in Appendix B.λf : rS,z βz · rS,z(3)Finally, the keyword set S t, is chosen by maximizing thecumulative reward function over all the topics, formulated asfollows:XλR(S) βz · rS,z(4)z Z1) Modeling Topics in Conversations: Topic models suchas Probabilistic Latent Semantic Analysis (PLSA) or LatentDirichlet Allocation (LDA) [46] can be used as off-linetopic modeling techniques to determine the distribution overthe topic z of each word w, noted p(z w), from a largeamount of training documents. LDA implemented in the Mallettoolkit [47] is used in this paper because it does not suffer fromthe overfitting issue of PLSA, as discussed in [46].In this equation, if candidate keywords which are in fact ASRerrors (insertions or substitutions) are associated with topicswith lower βz , as is most often the case, the probability oftheir selection by the algorithm will reduced, because theircontribution to the reward will be small.Since the class of submodular functions is closed under nonnegative linear combinations [48], R(S) is a monotone nondecreasing submodular function. If λ 1, the reward functionis linear and only measures the topical similarity of wordswith the main topics of t. However, when 0 λ 1, asWhen a conversation fragment is considered for keywordextraction, its topics are weighted, each by βz which isobtained by averaging over all probabilities p(z wi ) of the N3 A function F is submodular if A B T \ t, F (A t) F (A) F (B t) F (B) (diminishing returns) and is monotone nondecreasing if A B, F (A) F (B).

5soon as a word is selected from a topic, other words from thesame topic start having diminishing gains as candidates forselection.Therefore, decreasing the value of λ increases thediversity constraint, which increases the chance of selectingkeywords from secondary topics. As these words may reducethe overall relevance of the keyword set, it is essential to finda value of the hyper-parameter λ which leads to the desiredbalance between relevance and diversity in the keyword set.According to a different perspective, the definition of R(S)in Equation 4 can be seen as the dot product in the topic spacebetween the weights βz obtained from the topic probabilitiesgiven the fragment t and the reward function over the sumλof topic probabilities rS,zwith a scaling exponent λ andidentical coefficients over all topics. However, despite whatthis apparent similarity suggests, the use of cosine similarityfor R(S) would not lead to an appropriate definition becauseit would not provide a monotone non-decreasing submodularfunction. Indeed, if vector length normalization is introducedin R(S), for cosine similarity, then we can show that R(S) isno longer monotone submodular, e.g. on the second examplein the following subsection.4) Examples: We will illustrate the motivation for ourdefinition of R(S) on the following example. Let us considera situation with four words w1 , w2 , w3 , w4 . The goal is toselect two of them as keywords which cover the main topicspresented by these four words. Suppose that each word can berelated to two topics z1 and z2 . The probability of topic z1 forwords w1 and w2 is 1, and for words w3 and w4 it is zero, andvice versa for topic z2 . Therefore, βz1 βz2 0.5. For twosample sets S1 {w1 , w2 } and S2 {w1 , w3 } the cumulativerewards are respectively R(S1 ) 0.5 · (1 1)λ 0.5 · 0λand R(S2 ) 0.5 · 1λ 0.5 · 1λ . Since R(S1 ) R(S2 ) for0 λ 1, the keyword set S2 which covers two main topicsis selected. If λ 1 then the cumulative reward for the twosets S1 and S2 is equal, which does not guarantee to selectthe set which covers both topics.The example above has the desirable values of R(S) regardless of whether the dot product or the cosine similarityare used for the definition of R(S) in Equation 4. However,this is not always the case. In the example shown in Table I (towhich we will refer again below), if we consider A {w5 },B {w3 , w5 } and λ 0.75, then A B but R(A) 0.76 R(B) 0.70 if cosine similarity is used, hence thisversion of R(S) would not be monotone non-decreasing. Ifwe add keyword w4 to both keyword sets A and B, thenR(A {w4 }) R(A) 0.02 R(B {w4 }) R(B) 0.09,hence R(S) would neither have the diminishing returns property, if cosine similarity was used.5) Comparison with the Function Used for Summarization:We are inspired by recent work on extractive summarizationmethods [43] [44], to define a monotone submodular functionfor keyword extraction which maximizes the number of covered topics. This work proposed a square root function as areward function for the selection of sentences, to cover themaximum number of concepts of a given document. Note thatin their work, this function rewards diversity by increasing thegain of selecting a sentence including a concept that was notyet covered by a previously selected sentence. However, wepropose a reward function for diverse selection of keywordsas a power function with a scaling exponent between 0 and1, and a coefficient corresponding to the weight of each topicconveyed in the fragment. Therefore, we generalize the squareroot function and the constant coefficient equal to one for allconcepts which is proposed by [43] and [44].In our reward function, the scaling exponent between 0 and 1applies diversity by decreasing the reward of keyword selectionfrom a topic when the number of keywords representingthat topic increases, and increasing the reward of selectingkeywords from the topics which are not covered yet. In contrastto the summarization techniques proposed by [43] [44] whichadd a separate term for considering the relevance and thecoverage of the main concepts of the given text by summarysentences, we used a coefficient corresponding to the weightof topics conveyed in the fragment.6) Finding the Optimal Keyword Set: To maximize R(S) inpolynomial time under the cardinality constraint of S k wepresent a greedy algorithm shown as Algorithm 1. In the firststep of the algorithm, S is empty. At each step, the algorithmselects one of the unselected words from the conversation fragment w t \ S which has the maximum similarity to the maintopics of the conversation fragment and also maximizes thecoverage of the topics with respect toP the previously selectedkeywords in S. This is as h(w, S) z Z βz [p(z w) rS,z ]λ ,where p(z w) is the contribution to topic z by word w t \ Swhich is added to the contribution of the topic z in the setS. The algorithm updates the set S by adding one of thewords w t \ S to the set S which maximizes h(w, S).This procedure continues until reaching k keywords from thefragment t.Input : a given text t, a set of topics Z, the number ofkeywords kOutput: a set of keywords SS ;while S k doS S {argmaxP w t\S (h(w, S)) whereh(w, S) z Z βz [p(z w) rS,z ]λ ;endreturn S;Algorithm 1: Diverse keyword extraction.7) Illustration of the Greedy Algorithm: We will exemplifythe mechanism of the proposed algorithm using a simpleexample. Let us consider a conversation fragment with fivewords, each represented by four topics. The distributions oftopics for each word are given in Table I. The topics are thusweighted as follows: βz1 0.42, βz2 0.20, βz3 0.06, andβz4 0.32. We run the algorithm to extract two keywordsout of five for λ {.75, 1}. In other words, λ 1 selectswords based on their topical similarity to the main topics ofthe conversation, and λ .75 considers both topical diversityand similarity for keyword extraction.Initially S is empty. The reward values, h(w, S ), for allwords and λ {.75, 1} are shown in Table II. In the first stepof the algorithm, w1 (the best representative of topic z1 ) is

6TABLE I.Wordsw1w2w3w4w5S AMPLE INPUT TO THE GREEDY ALGORITHMp(z1 ·)1.000.900.000.100.10p(z2 ·)0.000.000.000.900.10p(z3 ·)0.000.100.200.000.00p(z4 ·)0.000.000.800.000.80added to the set S for both values of λ. In the second step, theh(w, {w1 }) values are computed for the remaining unselectedwords, and both λ values, as shown in Table II. According tothese values, λ 1 selects w2 as the second word from thetopic z1 . However, λ .75 selects w5 as the second keyword(the best representative of topic z4 ), the second main topic ofthe conversation fragment, because it rewards topical diversityin the keyword set.T HE h(w, S

In Section II-B we discuss previous methods for keyword extraction. In Section III we describe the proposed technique for implicit query formulation, which relies on a novel topic-aware diverse keyword extraction algorithm (III-A) and a topic-aware clustering method (III-B). Section IV introduces the data and our method for comparing

Related Documents:

Advance Extraction Techniques - Microwave assisted Extraction (MAE), Ultra sonication assisted Extraction (UAE), Supercritical Fluid Extraction (SFE), Soxhlet Extraction, Soxtec Extraction, Pressurized Fluid Extraction (PFE) or Accelerated Solvent Extraction (ASE), Shake Flask Extraction and Matrix Solid Phase Dispersion (MSPD) [4]. 2.

Caiado, J., Maharaj, E. A., and D’Urso, P. (2015) Time series clustering. In: Handbook of cluster analysis. Chapman and Hall/CRC. Andrés M. Alonso Time series clustering. Introduction Time series clustering by features Model based time series clustering Time series clustering by dependence Introduction to clustering

Let's move on to the second tool that is crucial for this process: the Google Keyword Planner. The Google Keyword Planner is a helpful tool that can provide some extremely valuable information for your keyword research. By inserting your list of keywords, you will get back a data set that includes the number of searches each keyword receives.

3 Keyword Generation This component aims at proposing valid and representative keywords for a landing page capitalizing on keyword extraction methods, on the co-occurrence of terms, and on keyword suggestions extracted from relevant search result snippets. 6 A more detailed presentation of this component is described in [14]. In the following two

Chapter 4 Clustering Algorithms and Evaluations There is a huge number of clustering algorithms and also numerous possibilities for evaluating a clustering against a gold standard. The choice of a suitable clustering algorithm and of a suitable measure for the evaluation depen

Data mining, Algorithm, Clustering. Abstract. Data mining is a hot research direction in information industry recently, and clustering analysis is the core technology of data mining. Based on the concept of data mining and clustering, this paper summarizes and compares the research status and progress of the five traditional clustering

preprocessing step for quantum clustering , which leads to reduction in the algorithm complexity and thus running it on big data sets is feasible. Second, a newer version of COMPACT, with implementation of support vector clustering, and few enhancements for the quantum clustering algorithm. Third, an implementation of quantum clustering in Java.

Asam folat dapat diperoleh dari daging, sayuran berwarna hijau, dan susu. Gizi buruk (malnutrisi) merupakan penyebab utamanya. Anemia jenis ini jugaberkaitan dengan pengerutan hati (sirosis). Sirosis hati menyebabkan cadangan asam folat di dalamnya menjadi sedikit sekali. Kekurangan asam folat juga dapat menyebabkan gangguan kepribadian dan hilangnya daya ingat. Gejala-gejalanya hampir sama .