A DATA MINING-BASED APPROACH FOR INVESTIGATING THE .

3y ago

54 Views

2 Downloads

1.33 MB

173 Pages

Last View : 1d ago

Last Download : 3m ago

Upload by : Elisha Lemon

Report this link

Download PDF

Transcription

A DATA MINING-BASED APPROACH FOR INVESTIGATING THERELATIONSHIP BETWEEN DNA REPAIR GENES AND AGEINGThesis submitted in accordance with the requirements of the Universityof Liverpool for the degree of Master in PhilosophybyAlex A. FreitasJanuary 2011

ABSTRACTThere is a clear motivation for ageing research, since ageing is the greatest risk factor formany diseases, including most types of cancer. Arguably, another strong motivation forageing research is that, despite the large progress in this area in the last two decades,ageing is still to a large extent a poorly understood process, especially in humans.The vast majority of biogerontology research is still based on “wet lab” experiments donewith simpler organisms, due to the problems associated with performing ageing-relatedexperiments with humans. In contrast, this thesis proposes a data mining approach, basedon classification algorithms, for analysing data about human DNA repair genes and theirrelationship to ageing. The classification algorithms – more precisely, decision treeinduction and Naive Bayes algorithms – were applied to datasets prepared specifically forthis research, by adapting and integrating data from several bioinformatics resources,namely: (a) the GenAge database of ageing-related genes; (b) a web site with acomprehensive list of human DNA repair genes; (c) Uniprot, a centralized repository ofrichly-annotated data about proteins; (d) the HPRD (Human Protein Reference Database);and (e) the Gene Ontology – a controlled vocabulary for describing gene or proteinfunctions. Some experiments also used a separate dataset including gene expression data.Applying classification algorithms to such datasets aimed at producing classificationmodels that identify which gene properties are most effective in discriminating ageingrelated DNA repair genes from other types of genes – mainly non-ageing-related DNArepair genes, but in some experiments the other types of genes also included genes whoseprotein product interact with DNA repair genes. A related goal of this research was toanalyse the automatically-built classification models from two perspectives, namely: (a)measuring the predictive accuracy (or “generalization ability”) of those models from adata mining perspective; and (b) interpreting the meaning of the main gene propertiesrelevant for classification in those models, in the light of biological knowledge aboutDNA repair genes and the process of ageing.In summary, the main gene properties that were found effective in discriminating ageingrelated DNA repair genes from other types of genes (mainly non-ageing-related DNArepair genes) in the datasets created in this research are as follows: ageing-related DNArepair genes‟ protein products tend to interact with a considerably larger number ofproteins; their protein products are much more likely to interact with WRN (a proteinwhose defect causes the Werner‟s progeroid syndrome) and XRCC5 (KU80, a key proteinin the initiation of DNA double-strand repair by the error-prone non-homologous endjoining DNA repair pathway); they are more likely to be involved in response to chemicalstimulus and, to a lesser extent, in response to endogenous stimulus or oxidative stress;and they are more likely to have high expression in T lymphocytes.ii

CONTENTSABSTRACT . IICONTENTS .IIILIST OF FIGURES . VILIST OF TABLES . VIIACKNOWLEDGMENTS .VIIIDECLARATION . IXCHAPTER 1 – INTRODUCTION . 11.1 WHAT IS AGEING? . 11.1.1 Defining ageing . 11.1.2 Ageing at the cellular and tissue levels . 21.1.3 The motivation for ageing research . 51.2 THEORIES OF AGEING . 61.2.1 Evolutionary theories of ageing . 61.2.2 DNA damage theory of ageing . 81.3 PROGEROID SYNDROMES . 121.3.1 An overview of progeroid syndromes . 131.3.1.1 Werner syndrome (WS) . 131.3.1.2 Hutchinson-Gilford progeroid syndrome (HGPS) . 141.3.1.3 Trichothiodystrophy (TTD) . 151.3.1.4 Cockayne syndrome (CS) . 151.3.1.5 Ataxia telangiectasia (AT) . 161.3.1.6 Rothmund-Thomsom (RT) syndrome . 161.3.1.7 Xeroderma pigmentosum (XP) . 171.3.2 On the relevance of progeroid syndromes to the study of human ageing . 181.4 DNA DAMAGE . 201.4.1 Two major sources of DNA damage . 201.4.1.1 Oxidative damage. 201.4.1.2 Damage induced by ultraviolet (UV) radiation . 211.4.2 An overview of major types of DNA damage. 221.4.2.1 Depurination and depyrimidination . 221.4.2.2 Deamination . 231.4.2.3 Abasic (AP) sites . 251.4.2.4 DNA strand breaks . 261.4.2.5 Cyclobutane pyrimidine dimers (CPDs) . 26iii

1.5 DNA REPAIR . 271.5.1 Base excision repair (BER) . 271.5.2 Nucleotide excision repair (NER) . 301.5.3 Repair of double-strand breaks. 351.5.3.1 Homologous recombination (HR) . 351.5.3.2 Non-homologous end joining (NHEJ) . 361.5.4 Mismatch repair . 381.6 OBJECTIVES . 39CHAPTER 2 – BIOINFORMATICS AND DATA MINING . 412.1 BIOLOGICAL DATABASES . 412.1.1 GenAge . 412.1.2 Other ageing-related databases . 432.1.3 Uniprot . 442.1.4 HPRD (Human Protein Reference Database). 452.2 GENE ONTOLOGY (GO) . 462.2.1 The motivation for the gene ontology . 462.2.2 The basic structure of the gene ontology . 472.3 ANALYSING AGEING-RELATED GENE OR PROTEIN NETWORKS . 492.3.1 Types of interactions and reference organisms in ageing-related networks . 492.3.2 Analysing ageing-related gene or protein networks . 532.4 CONCEPTS AND PRINCIPLES OF DATA MINING. 572.4.1 Basic concepts of data mining . 572.4.2 The classification task of data mining . 582.4.2.1 Overfitting and underfitting . 612.4.2.2 Classification versus clustering . 612.5 CLASSIFICATION METHODS USED IN THIS RESEARCH . 632.5.1 Decision tree induction . 632.5.2 Naive Bayes . 682.6 RELATED WORK ON PREDICTING PROTEIN FUNCTION WITH CLASSIFICATIONMETHODS. 69CHAPTER 3 – DATASET CREATION AND EXPERIMENTAL SET UP . 753.1 CREATING DATASETS WITH TWO CLASSES AND MULTIPLE ATTRIBUTE TYPES . 753.1.1 Creating two classes: ageing-related vs. non-ageing-related DNA repair . 753.1.2 Creating the predictor attribute type of DNA repair . 763.1.3 Creating a predictor attribute measuring the rate of evolutionary change(Ka/Ki ratio) . 773.1.4 Creating a set of predictor attributes representing GO terms . 783.1.5 Creating a set of attributes representing protein-protein interactioninformation. 813.1.6 Removing duplicate data instances. 823.1.7 Dataset specifications. 833.2 CREATING A DATASET WITH TWO CLASSES AND GENE EXPRESSION ATTRIBUTES. 863.3 CREATING DATASETS WITH FOUR CLASSES AND MULTIPLE ATTRIBUTE TYPES . 88iv

3.3.1 Creating the four classes to be predicted . 883.3.2 Creating the predictor attributes . 893.3.3 Dataset specifications. 893.4 MEASURING PREDICTIVE ACCURACY . 913.5 STATISTICAL SIGNIFICANCE . 94CHAPTER 4 – COMPUTATIONAL RESULTS AND DISCUSSION. 964.1 RESULTS AND DISCUSSION FOR DATASETS WITH TWO CLASSES AND MULTIPLEATTRIBUTE TYPES . 964.1.1 Results for the J4.8 decision tree induction algorithm. 974.1.2 Results for the CART decision tree induction algorithm. 1004.1.3 Results for the Naive Bayes algorithm. 1034.1.4 Discussion on predictive patterns extracted from the decision trees . 1044.1.4.1 Discussion on attributes chosen as root nodes in the decision trees . 1054.1.4.2 Issues on selecting and interpreting rules extracted from decision trees . 1084.1.4.3 Discussion on selected rules extracted from decision trees . 1114.2 RESULTS AND DISCUSSION FOR DATASETS WITH TWO CLASSES AND GENE EXPRESSIONATTRIBUTES . 1174.2.1 Predictive accuracies for J4.8, CART and Naive Bayes algorithms . 1184.2.2 Interpreting a rule extracted from the decision tree built by J4.8 . 1184.2.3 Integrating results for gene expression and other types of predictor attributes. 1204.3 RESULTS AND DISCUSSION FOR DATASETS WITH FOUR CLASSES AND MULTIPLEATTRIBUTE TYPES .

repair genes) in the datasets created in this research are as follows: ageing-related DNA repair genes‟ protein products tend to interact with a considerably larger number of proteins; their protein products are much more likely to interact with WRN (a protein whose defect causes the Werner‟s progeroid syndrome) and XRCC5 (KU80, a key protein in the initiation of DNA double-strand repair .

Related Documents:

DATA MINING - University of Rajshahi

Preface to the First Edition xv 1 DATA-MINING CONCEPTS 1 1.1 Introduction 1 1.2 Data-Mining Roots 4 1.3 Data-Mining Process 6 1.4 Large Data Sets 9 1.5 Data Warehouses for Data Mining 14 1.6 Business Aspects of Data Mining: Why a Data-Mining Project Fails 17 1.7 Organization of This Book 21 1.8 Review Questions and Problems 23

13 Views

1y ago

Data Mining in Bioinformatics - UQAM

DATA MINING What is data mining? [Fayyad 1996]: "Data mining is the application of specific algorithms for extracting patterns from data". [Han&Kamber 2006]: "data mining refers to extracting or mining knowledge from large amounts of data". [Zaki and Meira 2014]: "Data mining comprises the core algorithms that enable one to gain fundamental in

42 Views

2y ago

Multi Relational Data Mining Approaches: A Data Mining Technique

Data Mining and its Techniques, Classification of Data Mining Objective of MRD, MRDM approaches, Applications of MRDM Keywords Data Mining, Multi-Relational Data mining, Inductive logic programming, Selection graph, Tuple ID propagation 1. INTRODUCTION The main objective of the data mining techniques is to extract .

9 Views

7m ago

Data Mining: Why Data Mining? - Leiden University

October 20, 2009 Data Mining: Concepts and Techniques 7 Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithm Other Disciplines Visualization October 20, 2009 Data Mining: Concepts and Techniques 8 Why Not Traditional Data Analysis? Tremendous amount of data

41 Views

3y ago

Exploration and Mining in Canada

enable mining to leave behind only clean water, rehabilitated landscapes, and healthy ecosystems. Its objective is to improve the mining sector's environmental performance, promote innovation in mining, and position Canada's mining sector as the global leader in green mining technologies and practices. Source: Green Mining Initiative (2013).

24 Views

1y ago

Data Mining Algorithms - Stanford University

Data Mining CS102 Data Mining Looking for patterns in data Similar to unsupervised machine learning Popularity predates popularity of machine learning "Data mining" often associated with specific data types and patterns We will focus on "market-basket" data Widely applicable (despite the name) And two types of data mining patterns

11 Views

1y ago

Data Mining - Brigham Young University

Data Mining Popularity lRecent Data Mining explosion based on: lData available -Transactions recorded in data warehouses -From these warehouses specific databases for the goal task can be created lAlgorithms available -Machine Learning and Statistics -Including special purpose Data Mining software products to make it easier for people to work through the entire data mining cycle

10 Views

1y ago

Visual Data Mining - Stony Brook University

Visual Data Mining. Chidroop Madhavarapu CSE 591:Visual Analytics. Motivation. Visualization for Data Mining Huge amounts of information Limited display capacity of output devices. Visual Data Mining (VDM) is a new approach for exploring very large data sets, combining traditional mining methods and information .

12 Views

28d ago

Recent Views

Finance Management for Schools Bromcom eFinance, powered .

eFinance. The Bromcom Financial Accounting System (FAS) is a purpose designed configuration of one of the world's leading financial management solutions now available to UK maintained schools, academies and multi academy trusts (MATs). Known as eFinance, at its core is a suite of modules from Unit4 Business World.

1y ago

104 Views

eFinance Budget Entry Schools

eFinance Plus Entry The boom poron of the "Expendi ture Budget Process" window will be accessible on your screen. You are now ready to enter your budget for next ﬁscal year. Enter the amount you want to allocate for your next ﬁscal year's budget in the Requested

1y ago

122 Views

Siebel eFinance for Teller Connector to IBM WebSphere .

12 Siebel eFinance for Teller Connector to IBM WebSphere Business Component Composer Guide Version 7.0, Rev. H Siebel Teller Architecture The Siebel Connector for Teller extends the functionality of the Siebel Connector for IFX XML to provide Teller-specific data exchange between Siebel and other systems.

1y ago

101 Views

Siebel eFinance ガイドバージョン6.0

siebelﬁ ebusiness applications siebel efinance ガイド siebel 2000 バージョン6.0.2 2000 年7 月 6jpa1-fb00-06020 sfsbank.book 1 ページ 2001年5月29日火曜日午後5時42分

1y ago

97 Views

1 2 4 5 7 8 9 10

The eFinance Plus Accounting, Human Resources and Payroll System are supported by D&N. This system is an online interactive package designed to handle all phases of K-12 school business. ESU#3/D&N is supporting a new time clocking system called Time Clock Plus. This clocking system will integrate with eFinance Plus as well

1y ago

105 Views

IHRE ONLINE FINANZIERUNG: eFINANCE

eFinance bietet Ihnen die Möglichkeit, Finanzierungs-produkte ab sofort ganz einfach online zu beantragen. In einem transparenten und strukturierten Prozess können Sie die notwendigen Dokumente sicher übermitteln, mit uns verhandeln, und auch elektronisch unterzeichnen. Außerdem können Sie mit Ihrem Kunden-

1y ago

144 Views

Relatório Anual 2014

Prêmio efinance 2014 O Sicredi foi o vencedor da categoria Plataforma de Canais do XIII Prêmio efinance com o case Plataforma Multicanal. A Plataforma Multicanal foi desenvolvida para renovar a tecnologia utilizada nos canais de relacionamento da instituição financeira cooperativa com os associados. Julho

1y ago

102 Views

eFinance Travel Voucher Guide - National Defense University

filling out the travel voucher (CONUS-CONUS). - If it is your current address, check the box. 5 America’s Airmen Dependents - Add all dependents. - If the individual will be claimed on the voucher, click “auto-claim this dependent” before adding them. 6 America’s Airmen

2y ago

102 Views

E-Finance in the Philippines: Status and Prospects for Digital .

the role of digital technology in financial inclusion has not been studied in detail. There has been very limited information available in the existing literature that examines the role of efinance in achieving the objective- of inclusive growth. This paper is an attempt to study the contribution of technology towards financial inclusion in

1y ago

100 Views

Wiener Processes and Ito's Lemma - efinance .cn

Categorization of Stochastic Processes Discrete time; discrete variable Random walk: if can only take on discrete values Discrete time; continuous variable

1y ago

102 Views

AIC eServices for Financing Schemes (eFASS) Navigation Guide

Schemes (eFASS) platform at https://eFinance.aic.sg For detailed steps, refer to page 3 of this navigation guide. Yes, you can apply on behalf of someone in your family.

1y ago

113 Views

2016-2017 Financial Services Guidelines

Receiving POs in eFinance 41 Staff Travel 42 Student Travel 43 Accounts Payable Forms and Instructions 43 TRAVEL PROCEDURES GUIDELINES 45 Required Documentation and Steps 46 Step 1 - Conference Approval 46 Step 2 - Conference Requisition Request 46 . 4 Step 3 - Conference Purchase Order/Payment 46 .

1y ago

103 Views

Introducing the New and Revised Data Points in HMDA

added two e numerations ( "cash -out r efinance" an d " other p urpose") t o Loan P urpose, an d s plit the "non-owner o ccupied" category o f Occupancy Type i nto " se cond r esidence" a nd " in vestment propert y." In ad dition, un der t he 20 15 H MDA R ule, ap plicants h ave t he o ption t o s elf -identify

1y ago

93 Views

Data Point: 2018 Mortgage Market Activity and Trends

The number of r efinance o riginations declined from 2.5 million in 2017 to 1.9 million in 2018. The number of reported home improvement loans declined from 549 ,000 in 2017 to 183,000 in 2018 , a drop that resulted primarily from a change in reporting requirements that excluded unsecured home improvement loans . 5

1y ago

96 Views

Ankeny Community Schools 306 Sw School St. Fixed Asset Inventory and .

reconciliation. ACSD is currently using the Fixed Assets Module of eFinance Plus software to track assets. Vendor will perform all labor to conduct a comprehensive inventory at ACSD site locations. During the inventory process, all of the following information will be captured for each item Asset Identification Information

1y ago

121 Views

A DATA MINING-BASED APPROACH FOR INVESTIGATING THE .

It looks like you're using an ad-blocker