Frequent Pattern Mining - Charu Aggarwal

1y ago
10 Views
2 Downloads
2.39 MB
85 Pages
Last View : 15d ago
Last Download : 3m ago
Upload by : Brady Himes
Transcription

Frequent Pattern Mining

Charu C. Aggarwal Jiawei HanEditorsFrequent Pattern Mining2123

EditorsCharu C. AggarwalIBM T. J. Watson Research CenterYorktown HeightsNew YorkUSAJiawei HanUniversity of Illinois at Urbana-ChampaignUrbanaIllinoisUSAISBN 978-3-319-07820-5ISBN 978-3-319-07821-2 (eBook)DOI 10.1007/978-3-319-07821-2Springer Cham Heidelberg New York Dordrecht LondonLibrary of Congress Control Number: 2014944536 Springer International Publishing Switzerland 2014This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of thematerial is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed. Exempted from this legal reservation are brief excerpts in connectionwith reviews or scholarly analysis or material supplied specifically for the purpose of being entered andexecuted on a computer system, for exclusive use by the purchaser of the work. Duplication of thispublication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation, in its current version, and permission for use must always be obtained from Springer. Permissionsfor use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable toprosecution under the respective Copyright Law.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.While the advice and information in this book are believed to be true and accurate at the date of publication,neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors oromissions that may be made. The publisher makes no warranty, express or implied, with respect to thematerial contained herein.Printed on acid-free paperSpringer is part of Springer Science Business Media (www.springer.com)

PrefaceThe field of data mining has four main “super-problems” corresponding to clustering,classification, outlier analysis, and frequent pattern mining. Compared to the otherthree problems, the frequent pattern mining model for formulated relatively recently.In spite of its shorter history, frequent pattern mining is considered the marqueeproblem of data mining. The reason for this is that interest in the data mining fieldincreased rapidly soon after the seminal paper on association rule mining by Agrawal,Imielinski, and Swami. The earlier data mining conferences were often dominatedby a large number of frequent pattern mining papers. This is one of the reasons thatfrequent pattern mining has a very special place in the data mining community. Atthis point, the field of frequent pattern mining is considered a mature one.While the field has reached a relative level of maturity, very few books coverdifferent aspects of frequent pattern mining. Most of the existing books are eithertoo generic or do not cover frequent pattern mining in an exhaustive way. A needexists for an exhaustive book on the topic that can cover the different nuances in anexhaustive way.This book provides comprehensive surveys in the field of frequent pattern mining.Each chapter is designed as a survey that covers the key aspects of the field of frequentpattern mining. The chapters are typically of the following types: Algorithms: In these cases, the key algorithms for frequent pattern mining areexplored. These include join-based methods such as Apriori, and pattern-growthmethods. Variations: Many variations of frequent pattern mining such as interesting patterns, negative patterns, constrained pattern mining, or compressed patterns areexplored in these chapters. Scalability: The large sizes of data in recent years has led to the need for big dataand streaming frameworks for frequent pattern mining. Frequent pattern miningalgorithms need to be modified to work with these advanced scenarios. Data Types: Different data types lead to different challenges for frequent patternmining algorithms. Frequent pattern mining algorithms need to be able to workwith complex data types, such as temporal or graph data.v

viPreface Applications: In these chapters, different applications of frequent pattern miningare explored. These includes the application of frequent pattern mining methodsto problems such as clustering and classification. Other more complex algorithmsare also explored.This book is, therefore, intended to provide an overview of the field of frequentpattern mining, as it currently stands. It is hoped that the book will serve as a usefulguide for students, researchers, and practitioners.

Contents12An Introduction to Frequent Pattern Mining . . . . . . . . . . . . . . . . . . . . .Charu C. Aggarwal1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Frequent Pattern Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Frequent Pattern Mining with the Traditional SupportFramework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2 Interesting and Negative Frequent Patterns . . . . . . . . . . . . . . . .2.3 Constrained Frequent Pattern Mining . . . . . . . . . . . . . . . . . . . . .2.4 Compressed Representations of Frequent Patterns . . . . . . . . . .3 Scalability Issues in Frequent Pattern Mining . . . . . . . . . . . . . . . . . . .3.1 Frequent Pattern Mining in Data Streams . . . . . . . . . . . . . . . . .3.2 Frequent Pattern Mining with Big Data . . . . . . . . . . . . . . . . . . .4 Frequent Pattern Mining with Advanced Data Types . . . . . . . . . . . . .4.1 Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Spatiotemporal Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . .4.3 Frequent Patterns in Graphs and Structured Data . . . . . . . . . . .4.4 Frequent Pattern Mining with Uncertain Data . . . . . . . . . . . . . .5 Privacy Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Applications of Frequent Pattern Mining . . . . . . . . . . . . . . . . . . . . . . .6.1 Applications to Major Data Mining Problems . . . . . . . . . . . . . .6.2 Generic Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Frequent Pattern Mining Algorithms: A Survey . . . . . . . . . . . . . . . . . .Charu C. Aggarwal, Mansurul A. Bhuiyan and Mohammad Al Hasan1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Join-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Apriori Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2 DHP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Special Tricks for 2-Itemset Counting . . . . . . . . . . . . . . . . . . . .113467788991010111112131313141419192223242728vii

viii34Contents2.4 Pruning by Support Lower Bounding . . . . . . . . . . . . . . . . . . . . .2.5 Hypercube Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Tree-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 AIS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2 TreeProjection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3 Vertical Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Recursive Suffix-Based Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 The FP-Growth Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Maximal and Closed Frequent Itemsets . . . . . . . . . . . . . . . . . . . . . . . .5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2 Frequent Maximal Itemset Mining Algorithms . . . . . . . . . . . . .5.3 Frequent Closed Itemset Mining Algorithms . . . . . . . . . . . . . . .6 Other Optimizations and Variations . . . . . . . . . . . . . . . . . . . . . . . . . . .6.1 Row Enumeration Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2 Other Exploration Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Reducing the Number of Passes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.1 Combining Passes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.2 Sampling Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.3 Online Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . .8 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rn-Growth Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Jiawei Han and Jian Pei1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 FP-Growth: Pattern Growth for Mining Frequent Itemsets . . . . . . . .3 Pushing More Constraints in Pattern-Growth Mining . . . . . . . . . . . .4 PrefixSpan: Mining Sequential Patterns by Pattern Growth . . . . . . .5 Further Development of Pattern Growth-Based Pattern MiningMethodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65Mining Long Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Feida Zhu1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 A Pattern Lattice Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Pattern Enumeration Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 Breadth-First Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Depth-First Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Row Enumeration Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Pattern Merge Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.1 Piece-wise Pattern Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6668727477787983838486878788899293

Contentsix6.2 Fusion-style Pattern Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987 Pattern Traversal Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103567Interesting Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Jilles Vreeken and Nikolaj Tatti1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Absolute Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Frequent Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2 Tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Low Entropy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Advanced Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Static Background Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 Independence Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Beyond Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.3 Maximum Entropy Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4 Randomization Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Dynamic Background Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1 The General Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2 Maximum Entropy Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.3 Tile-based Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.4 Swap Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Pattern Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.1 Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2 Tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.3 Swap Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Negative Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Luiza Antonie, Jundong Li and Osmar Zaiane1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Negative Patterns and Negative Association Rules . . . . . . . . . . . . . . .3 Current Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Associative Classification and Negative Association Rules . . . . . . . .5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Constraint-Based Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Siegfried Nijssen and Albrecht Zimmermann1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Level-Wise Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149152

xContents3.1 Generic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Depth-First Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Constraint-based Itemset Mining . . . . . . . . . . . . . . . . . . . . . . . .4.3 Generic Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4 Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . .5 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153154154155158159159162162Mining and Using Sets of Patterns through Compression . . . . . . . . . .Matthijs van Leeuwen and Jilles Vreeken1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Kolmogorov Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2 MDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 MDL in Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Compression-based Pattern Models . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 Pattern Models for MDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2 Code Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3 Instances of Compression-based Models . . . . . . . . . . . . . . . . . .4 Algorithmic Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 Candidate Set Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Direct Mining of Patterns that Compress . . . . . . . . . . . . . . . . . .5 MDL for Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2 A Dissimilarity Measure for Datasets . . . . . . . . . . . . . . . . . . . . .5.3 Identifying and Characterizing Components . . . . . . . . . . . . . . .5.4 Other Data Mining Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.5 The Advantage of Pattern-based Models . . . . . . . . . . . . . . . . . .6 Challenges Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.1 Toward Mining Structured Data . . . . . . . . . . . . . . . . . . . . . . . . .6.2 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.3 Task- and/or User-specific Usefulness . . . . . . . . . . . . . . . . . . . .7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .165489Frequent Pattern Mining in Data Streams . . . . . . . . . . . . . . . . . . . . . . .Victor E. Lee, Ruoming Jin and Gagan Agrawal1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Frequent Pattern Mining: Definition . . . . . . . . . . . . . . . . . . . . . .2.2 Data Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Frequent Item Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Frequent Itemset Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 91192193193194194195196199200201201202203204

Contents1011xi3.1 Mining the Full Data Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2 Recently Frequent Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3 Closed and Maximal Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . . .3.4 Mining Data Streams with Uncertain Data . . . . . . . . . . . . . . . .4 Mining Patterns Other than Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Subtrees and Semistructured Data . . . . . . . . . . . . . . . . . . . . . . . .4.3 Subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .206209214216216217218219219220Big Data Frequent Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .David C. Anastasiu, Jeremy Iverson, Shaden Smith and GeorgeKarypis1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Frequent Pattern Mining: Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2 Basic Mining Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Paradigms for Big Data Computation . . . . . . . . . . . . . . . . . . . . . . . . .3.1 Principles of Parallel Algorithms . . . . . . . . . . . . . . . . . . . . . . . .3.2 Shared Memory Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3 Distributed Memory Systems . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Frequent Itemset Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 Memory Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Work Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.3 Dynamic Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4 Further Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Frequent Sequence Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1 Serial Frequent Sequence Mining . . . . . . . . . . . . . . . . . . . . . . . .5.2 Parallel Frequent Sequence Mining . . . . . . . . . . . . . . . . . . . . . .6 Frequent Graph Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.1 Serial Frequent Graph Mining . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2 Parallel Frequent Graph Mining . . . . . . . . . . . . . . . . . . . . . . . . .7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .225Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Wei Shen, JianyongWang and Jiawei Han1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Apriori-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 Horizontal Data Format Algorithms . . . . . . . . . . . . . . . . . . . . . .3.2 Vertical Data Format Algorithms . . . . . . . . . . . . . . . . . . . . . . . . .4 Pattern Growth Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 FreeSpan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 PrefixSpan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50250252255256261261263264264268271271272

xii1213Contents5Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1 Closed Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . .5.2 Multi-level, Multi-dimensional Sequential Pattern Mining . . .5.3 Incremental Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.4 Hybrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.5 Approximate Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.6 Top-k Closed Sequential Pattern Mining . . . . . . . . . . . . . . . . . .5.7 Frequent Episode Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Conclusions and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .274274276277278279279280281281Spatiotemporal Pattern Mining: Algorithms and Applications . . . . . .Zhenhui Li1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Spatiotemporal Data Collection . . . . . . . . . . . . . . . . . . . . . . . . .2.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Background Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Individual Periodic Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 Automatic Discovery of Periodicity in Movements . . . . . . . . . .3.2 Frequent Periodic Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . .3.3 Using Periodic Pattern for Location Prediction . . . . . . . . . . . . .4 Pairwise Movement Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Generic Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.3 Behavioral Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4 Semantic Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Aggregate Patterns over Multiple Trajectories . . . . . . . . . . . . . . . . . .5.1 Frequent Trajectory Pattern Mining . . . . . . . . . . . . . . . . . . . . . .5.2 Detection of Moving Object Cluster . . . . . . . . . . . . . . . . . . . . . .5.3 Trajectory Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .283Mining Graph Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Hong Cheng, Xifeng Yan and Jiawei Han1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Frequent Subgraph Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2 Apriori-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Pattern-Growth Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.4 Closed and Maximal Subgraphs . . . . . . . . . . . . . . . . . . . . . . . . .2.5 Mining Subgraphs in a Single Graph . . . . . . . . . . . . . . . . . . . . .2.6 The Computational Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . 00302304304307307308308309310311311313

Contents14xiii3Mining Significant Graph Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2 gboost: A Branch-and-Bound Approach . . . . . . . . . . . . . . . . . . .3.3 gPLS: A Partial Least Squares Regression Approach . . . . . . . .3.4 LEAP: A Structural Leap Search Approach . . . . . . . . . . . . . . . .3.5 GraphSig: A Feature Representation Approach . . . . . . . . . . . . .4 Mining Representative Orthogonal Graphs . . . . . . . . . . . . . . . . . . . . .4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 Randomized Maximal Subgraph Mining . . . . . . . . . . . . . . . . . .4.3 Orthogonal Representative Set Generation . . . . . . . . . . . . . . . .5 Mining Dense Graph Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1 Cliques and Quasi-Cliques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2 K-Core and K-Truss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.3 Other Dense Subgraph Patterns . . . . . . . . . . . . . . . . . . . . . . . . . .6 Mining Graph Patterns in Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Mining Graph Patterns in Uncertain Graphs . . . . . . . . . . . . . . . . . . . .8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36336Uncertain Frequent Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Carson Kai-Sang Leung1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The Probabilistic Model for Mining Expected Support-BasedFrequent Patterns from Uncertain Data . . . . . . . . . . . . . . . . . . . . . . . .3 Candidate Generate-and-Test Based Uncertain Frequent PatternMining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Hyperlinked Structure-Based Uncertain Frequent Pattern Mining . .5 Tree-Based Uncertain Frequent Pattern Mining . . . . . . . . . . . . . . . . .5.1 UF-growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2 UFP-growth . . . . . . . . . . . . . . . . . .

Imielinski, and Swami. The earlier data mining conferences were often dominated by a large number of frequent pattern mining papers. This is one of the reasons that frequent pattern mining has a very special place in the data mining community. At this point, the field of frequent pattern mining is considered a mature one.

Related Documents:

Outlier Analysis Second Edition Charu C. Aggarwal IBM T. J. Watson Research Center Yorktown Heights, New York November 25, 2016 PDF Downloadable from http://rd .

1 200001 yuvansh mangla dr manoj kumar 80 10002 . 58 200058 sankalp agarwal samir agarwal 30 10112 59 200059 amaya jitender kathuria 70 10099 60 200060 ridhvi aggarwal manu aggarwal 80 10172 . 97 200097 nityam aggarwal nand kishore aggarwal 50 10490 98 200098 vivika gupta rohit gupta 60 10501

Step 5: Mine frequent items whose backings are not less than the Min-sup and push the frequent items into the frequent item-set L1. Step 6:Produce the frequent1-itemset L1and bitmap Pos. Algorithm2. Frequent k length sequence mining and (k 1) length sequence generation Input: Bitmap Pos for k length sequences, minimal support min-sup

CS2208A PRINCIPLES OF PROGRAMMING LANGUAGES L T P C 3 0 0 3 OBJECTIVES: . Charu C. Aggarwal, ―Social Network Data Analytics‖, Springer; 2014 4. Giles, Mark Smith, John Yen, ―Advances in Social Network Mining and Analysis‖, Springer, 2010. 5. Guandong Xu , Yanchun Zhang and Lin Li, ―Web Mining and Social Networking - Techniques

Graph Mining and Graph Kernels An Introduction to Graph Mining Graph Pattern Explosion Problem ! If a graph is frequent, all of its subgraphs are frequent the Apriori property! An n-edge frequent graph may have 2n subgraphs!! In the AIDS antiviral screen dataset with 400 compounds, at the su

Frequent Itemset Mining Frequent itemset mining: frequent set of items in a transaction data set First proposed by Agrawal, Imielinski, and Swami in SIGMOD 1993 SIGMOD Test of Time Award 2003 “This paper started a field of research. In addition to containing an innovative algorithm, its subject matter bro

DATA MINING What is data mining? [Fayyad 1996]: "Data mining is the application of specific algorithms for extracting patterns from data". [Han&Kamber 2006]: "data mining refers to extracting or mining knowledge from large amounts of data". [Zaki and Meira 2014]: "Data mining comprises the core algorithms that enable one to gain fundamental in

An Introduction to Modal Logic 2009 Formosan Summer School on Logic, Language, and Computation 29 June-10 July, 2009 ; 9 9 B . : The Agenda Introduction Basic Modal Logic Normal Systems of Modal Logic Meta-theorems of Normal Systems Variants of Modal Logic Conclusion ; 9 9 B . ; Introduction Let me tell you the story ; 9 9 B . Introduction Historical overview .