FROM GENE EXPRESSION TO MOLECULAR PATHWAYS

2y ago
11 Views
3 Downloads
2.10 MB
156 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Lilly Andre
Transcription

FROM GENE EXPRESSION TOMOLECULAR PATHWAYST HESISSUBMITTED FOR THE DEGREE OF“D OCTOROFP HILOSOPHY ”BYDana Pe’erS UBMITTEDTO THES ENATE OF THE H EBREW U NIVERSITYN OVEMBER 2003

This work was carried out under the supervision ofNir Friedmanii

AbstractMolecular networks involving interacting proteins, RNA, and DNA molecules, underlie the majorfunctions of living cells. DNA microarrays probe how the gene expression changes to performcomplex coordinated tasks in adaptation to a changing environment at a genome-wide scale. In thisdissertation we address the challenge of reconstructing molecular pathways and gene regulationfrom gene expression data. Our goal is to automatically infer regulatory relations between genes,as well as other types of molecular interactions. To answer this challenge, we develop probabilisticgraphical models of the biological system. We offer three such models and algorithms to automatically learn these from gene expression data. Our models and learning algorithms are based onthe assumption that statistical correlation might indicate molecular or genetic interaction. We offersystematic evaluation for each of the methods presented culminating in experimental validation ofnovel predictions, automatically generated by one of our models.

AcknowledgementsI would like to express my deepest gratitude to my mentor, Nir Friedman. Nir is a genuine rolemodel, and while I have had many teachers, Nir’s mark is the most profound. Nir initiated meinto the discipline of machine learning in graphical models and continuously taught me the mostimportant scientific skills: how to dive deep into messy data and surface with simple models thataddress the question at hand, always striving to understand the connection between data, model andreality. Few have these skills and I was privileged to learn from a true master, I leave Nir with muchyet to learn. Nir’s contribution to the research in this thesis is fundamental, from the basic idea ofcoupling Bayesian networks with gene networks to little comments that made my presentation somuch clearer.I have spent a total of ten terrific years as a student at the Hebrew University and this has been asignificant chapter in my life. During this time, many teachers have molded me into the researcherI am today. I would like to thank Avi Wigderson for patiently teaching me the rigors of problemsolving. Avi is a mental giant and I was most privileged to brainstorm with him and learn how hetakes a hard problem apart into little bits he can understand. I would like to thank Shmuel Peleg forteaching me that resarch should first and foremost be fun. Shmuel taught me that if one does notfind enjoyment and passion in the problem at hand, it is probably the wrong problem to be workingon. I rarely left his office without a smile. I was especially fortunate to an adopting “mother” and“father”, Daphna Weinshall and Noam Nisan, in the Computer Science department. While never myofficial mentors, they took me under their wing, providing guidance, many rewarding discussionsand emotional support. In addition, I would like to thank Noam for bringing to my attention αmodular functions and their connection to the MinReg algorithm. I would like to thank Daphna foractively fighting to make my years at the university more comfortable, be it easing the prerequisiteswhen I transferred from mathematics or easing my TA workload as a new mother.Good science is always the joint effort of many people and the research in this thesis is noexception. This thesis could never have happened without Aviv Regev, my scientific partner, biologytutor and dearest friend. My research is the result of a close and synergistic collaboration with Aviv,working with whom is an absolute joy and pleasure. Aviv transformed me from a naı̈ve computerscientist to semi-biologist teaching me so much more than biology along the way. In addition tosharing her wisdom and many unique insights, Aviv gave me endless support and backing. Duringthe toughest and lowest points, Aviv was always there to stop me from quitting by infecting me withher energetic enthusiasm and leading me to believe in myself. There are simply no words to expressmy gratitude to her.I would like to give many thanks to all my co-authors on the works presented here. MichalLinial, the first biologist who dared believe our ideas might have merit. Iftach Nachman, whoshared with me the first steps of this research. Gal Elidan, who brought order and efficiency to thechaos in which I was used to be working in. It was a wonderful pleasure to work with Amos Tanayon our ‘underground’ MinReg project, and the speed in which he programmed some of our ideasnever ceased to surprise me. Amos has great scientific vision and I cherish the many hours we spentii

brainstorming over coffee.My intense collaboration with Eran Segal has been very fruitful and lead to great science. Eran,I very much admire your ability and stamina. I feel very priviliged to have worked with DaphneKoller, a brilliant scientist; I learned much from our many insightful discussions.Lots of thanks to all my lab mates at the Computational Biology group and the Machine Learning group at the Hebrew university. It was marvelous to belong to a group with such great academiccooperation and social atmosphere; Full of seminars, reading groups, or just hallway discussions;Beach parties, dinners, and hiking trips. Specifically, I would like to thank my office mate MatanNinio, who fed me well, almost as often as he distracted me. Matan was always helpful from thecountless times he aided me with system related issues, to the laborious work of printing this thesisand submitting it for me.I would also like to thank the many people who gave me the support and technical backingso I could focus on my research. I thank the Ministry of Science, Israel, for the Eshkol fellowshipawarded to me and the Higher Education Council, Israel, for additional financing. I thank the Systemgroup at the Computer Science Department for the consistently providing the best and most reliablecomputer support possible. I thank the administrative staff at the Computer Science department forall their help and support, shielding me from the bureaucratic jungle that laid beyond our department.I would also like to thank Laura Garwin. Some times help comes unexpectedly, when my laptopcrashed at critical stages of writing this thesis, Laura (at the time a stranger) out of pure kindnessand generosity, lent me her personal laptop and hosted me in a wonderful office at the Bauer Centerfor Genomic Research.During the course of my PhD. studies, the two most important events of my life occurred, theBirths of Inbar and Carmel. I would like to thank my two most beloved daughters for distractingme and granting me joy and happiness of a magnitude I never knew before. I apologize to them,it is Inbar and Carmel that have paid the heaviest price for this thesis, during the endless hours Iworked away from them. I hope you understand and forgive. I thank Rocha, my mother-in-law forthe countless hours she took care of the girls, giving me more time to work. While Rocha was withmy girls, I could peacefully work, knowing they were getting the best of love and care. I thank mybrother Michael for caring so much and for his constant reminders that there is so much more to lifethan research. I would like to thank Bat-Sheva for being available at any hour of the day or nightfor a relaxing walk and an opportunity to wind down.I am grateful to both my parents, Mara and Aaron, for being such wonderful and supportiveparents. They nurtured my curiosity, creativity and passion for understanding from the earliest age.I started my studies in the Mathematics department at the Hebrew university where both my parentsmet and the completion of this thesis gives me a great feeling of fulfillment. Dad, thank you forattempting to teach me Cantor’s diagonal proof from preschool (that was a wee bit early), carefullycorrecting the English for this entire thesis and everything in between. Mom, you have and willalways be my role model, you are my very inspiration to excel, I aspire to be like you.Last, I dedicate this thesis to my better half, Itsik. I am endlessly indebted and grateful to Itsikiii

for everything. My love, thank you for helping me in all aspects of my research. I thoroughlyenjoyed our scientific discussions that occurred at all times of the day and in all forms of dress.Many of your comments have been invaluable to my work. Thank you for your help with all mymanuscripts including this one. Thank you for your unconditional love in my worse moments andfor being a strong pillar of support in most desperate moments. Thank you for making my victoriesmore memorable by sharing them with you, this victory could have never happened without all yourencouragement and help.iv

Contents12Introduction11.1Biological Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2Microarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31.3Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41.4Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51.5Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6Bayesian Networks Primer82.1Model Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92.2The Graph structure: Independence, Dependence and Causality . . . . . . . . . . .122.2.1d-separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122.2.2Equivalence Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .142.2.3Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17Learning Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .182.3.1Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .192.3.2Structure Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .222.33Bayesian Network Models for Biological Interactions283.1Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .283.2Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .293.3Extracting Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .303.4In Silico Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .323.5Biological Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .343.5.1Gene Mates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .343.5.2Separators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .383.5.3Hubs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39v

3.63.73.84. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .413.6.1Constructing Subnetworks . . . . . . . . . . . . . . . . . . . . . . . . . .423.6.2Biological Subnetworks . . . . . . . . . . . . . . . . . . . . . . . . . . .44Systematic Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .463.7.1Statistical Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . .473.7.2Comparison to Literature . . . . . . . . . . . . . . . . . . . . . . . . . . .493.7.3Comparison to Other Methods . . . . . . . . . . . . . . . . . . . . . . . .51Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54Computational Methods for Learning Bayesian Networks564.1The “Sparse Candidate” Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .564.1.1Outline of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . .574.1.2Choosing Candidate Sets . . . . . . . . . . . . . . . . . . . . . . . . . . .584.1.3Learning with Small Candidate Sets . . . . . . . . . . . . . . . . . . . . .604.1.4Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64Modeling Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .644.2.1Modeling an Intervention . . . . . . . . . . . . . . . . . . . . . . . . . . .654.2.2Scoring with Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . .674.2.3Inferring causality with mutational data . . . . . . . . . . . . . . . . . . .694.2.4Empirical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .724.2.5Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .764.25SubnetworksFocusing on Regulation - MinReg775.1A Regulation Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .775.2Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .795.2.1Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . .805.2.2MinReg Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .835.3.1Performance Guarantee . . . . . . . . . . . . . . . . . . . . . . . . . . . .835.3.2MinReg Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . .855.4Annotating Regulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .875.5Biological Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .895.6Systematic Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .935.6.1Robustness and Cross Validation . . . . . . . . . . . . . . . . . . . . . . .935.6.2The Importance of Candidate Regulators . . . . . . . . . . . . . . . . . .955.3vi

5.6.35.76Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97Module Networks - Reconstructing Regulatory Modules986.1From Bayesian Network to Module Network996.2From Module Network to Regulatory Module . . . . . . . . . . . . . . . . . . . . 1006.2.16.36.46.56.66.77. . . . . . . . . . . . . . . . . . . .Algorithmic Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Biological Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.3.1Selected Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.3.2Global View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076.3.3Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 110Definition and Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.4.1Formal Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.4.2Bayesian Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.4.3Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.4.4Priors and the Bayesian Score . . . . . . . . . . . . . . . . . . . . . . . . 116Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.5.1Structure Search Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.5.2Module Assignment Search Step . . . . . . . . . . . . . . . . . . . . . . . 1186.5.3Algorithm Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.5.4Learning with Regression Trees . . . . . . . . . . . . . . . . . . . . . . . 121Systematic Evalution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.6.1Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.6.2Gene Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125Discussion1277.1Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.2Comparing the Methods7.3From Gene Expression to Transcriptional Regulation . . . . . . . . . . . . . . . . 1307.4Future Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Bibliography138vii

viii

Chapter 1IntroductionMolecular networks involving interacting proteins, RNA, and DNA molecules, underlie the majorfunctions of living cells. Different metabolic, signaling and transcriptional levels are integratedto maintain a working cell. Deciphering the organization of molecular networks, their functionand behavior under different conditions is a major goal of molecular cell biology. The availabilityof complete genomic sequences, combined with robotics, computing and material sciences, haslead to the development of high-throughput assays that probe cells at a new, genome-wide, scale.For instance, DNA microarrays [55, 90] can measure the mRNA levels of an entire genome in asingle experiment. A major promise of such high-throughput methods, is that they will enable us toreconstruct how tens of thousands of genes and proteins work together in interconnected networksto orchestrate the basic functions of life.In this dissertation we address the challenge of reconstructing molecular pathways and generegulation from gene expression data. Our goal is to automatically infer regulatory relations between genes, as well as other types of molecular interactions. To answer this challenge, we developprobabilistic models of the biological system. A model is a simplification of the underlying systemthat captures the primary phenomena we are interested in and explains how these lead to the observations we make through our assays. We focus on probabilistic models that use stochasticity toaccount for measurement noise, variability in the biological system, and aspects of the system thatare not captured by the model. In this thesis we formulate a number of such models, develop algorithms to learn the structure of these models from data and provide a systematic biological analysisfor our resulting models.1.1 Biological BackgroundWe begin with a brief overview of the basic concepts of molecular biology - the interested readeris referred to molecular biology textbooks [2] for more information. Cells are the fundamentalworking units of every living system. To a large extent, cells are made of proteins, which determinethe shape and structure of the cell. In addition, other proteins serve as machines that perform many1

CHAPTER 1. INTRODUCTION2Figure 1.1: The central dogma of molecular biologyof life’s functions, including molecular recognition and catalysis.DNA is the organism’s blueprint, it contains the instructions for the synthesis and regulationof proteins. Instructions for a particular protein is coded on a segment of DNA called a gene. Thecentral dogma of molecular biology states that information flows from DNA through RNA to protein(see Figure 1.1). Thus, protein is synthesized from DNA in the following two step process:1. DNA RNA: Transcription is the process by which RNA polymerase copies a gene untomRNA (messenger RNA) sequence using the DNA sequence as a template. This process bywhich a genes are transcribed into mRNA, present and operating in the cell, is termed geneexpression.2. RNA Protein: In the subsequent process, called translation, a protein factory call ribosome, synthesizes the protein according the information coded in the mRNA.A key observation is while each cell contains the same copy of the organism’s DNA, the geneexpression(and subsequently protein expression) can drastically vary, both temporally and spatially.To control gene expression, specialized proteins called transcription factors bind to the DNA andeither enhance or inhibit the transcription of specific genes. These transcription factors often worktogether in different combinations, to ensure the correct amount of each gene is being transcribed.We note that transcription factions are themselves proteins and are thus subject to transcriptionalcontrol.Transcription factors are by no means the only control over gene expression. Biological regulation is extremely diverse and involves different mechanisms at many layers: Before transcription

1.2. MICROARRAYS3occurs, proteins regulate the structure of the DNA itself and determine whether a transcription factor can bind to the gene specific regulatory sites or not. Once the mRNA molecule is transcribed,other mechanisms regulate its editing and transport to the ribosome, thus controlling whether itgets translated into protein or not. For a given gene, the total amount of mRNA is regulated notonly by transcription (creation) of mRNA, but also regulated by the degradation of mRNA. Regulation continues even after the protein is translated: a large part of biological regulation is viapost-translational modifications that determine a protein’s activity.1.2 MicroarraysIn recent years, technical breakthroughs in spotting hybridization probes and advances in genomesequencing lead to development of DNA microarrays, which consist of many species of probes, either oligonucleotides or cDNA, that are immobilized in a predefined organization to a solid surface.By using DNA microarrays researchers are now able to measure the abundance of thousands ofmRNA targets simultaneously [26, 64], providing a “genomic” viewpoint of gene expression.Microarray technology is based on DNA hybridization: a process in which a DNA strand bindsto its unique complementary strand. A set of probes (known sequence) are fixed to a surface andare placed in interaction with a set of fluorescently tagged targets (unknown sequences). Afterhybridization, the fluorescently lit spots indicate the identity of the targets and the intensity of thefluorescence signal is in correlation to the quantitative amount of each target. Due to differenthybridization affinities between clones and the fact that an unknown amount of cDNA is fixedfor each probe, we cannot directly associate the hybridization level with a quantitative amountof transcript. Instead cDNA microarray experiments compare a reference pool and a target pool.Typically, green is used to label the reference pool, representing the baseline level of expressionand red is used to label the target sample in which the cells were treated with some condition ofinterest. We hybridize the mixture of reference and target pools and read a green signal in case ourcondition reduces expression level and a red signal in case our condition increases expression level(see Figure 1.2).A genome wide measurement of transcription is called an expression profile and provides uswith a complete list of genes whose transcription level is effected in our condition. Biologicallyspeaking, what we measure is how the gene expression of each gene changes to perform complexcoordinated tasks in adaptation to a changing environment. In our context, while transcriptionalregulation directly changes the measured mRNA levels, other factors such as proteins and theiractivity, are not observed by microarrays. Furthermore, due to biological variation and a multi-stepexperimental protocol, these data are very noisy, and fluctuate up to two-fold between repeatedexperiments.In order to obtain a wide variety of profiles, reflecting different active pathways, various perturbations (e.g. mutations [51]) and treatments (e.g. heat shock [40]) are employed. The outcomeis a matrix associating for each gene (row) and condition (column), the expression level. In our

CHAPTER 1. INTRODUCTION4Reference DNATarget DNALabelHybridizeFigure 1.2: An image of a microarray. Each spot represents a different gene.setting, this expression matrix contains thousands of gene and hundreds of conditions. Our goal isto uncover molecular interactions, most notably regulation, from these data.1.3 Previous workThe first attempts to analyze these data identified a list of differentially expressed genes for eachcondition or treatment. Since current technology is very noisy and typical datasets contain only 2-5repeats of each condition, even this simple task is not trivial. Early works [49] defined differentialexpression as a two-fold or greater change in expression. Developing statistically robust tests todetermine which genes are differentially expressed remains an active area of research.Currently, the most popular analysis method is clustering. Clustering of the genes is used toidentify sets of genes that behave similarly (i.e. have similar expression patterns) over a set of experiments [3, 30] (see Figure 1.3). Clustering provides an intuitive way to organize and visualize ofthe data. Furthermore, clustering facilitates in the functional annotation of uncharacterized genes.If an uncharacterized gene belongs a cluster dominated by genes of some function, the unknowngene could possibly have a similar function. While clustering has successfully expanded our understanding in important biological processes (including cell cycle [30], cancer [3], metabolism [51]),it does not address our challenge to uncover the underlying gene network of interactions.Previously, a number of regulatory models have been suggested. The most realistic of suchmodels are stochastic networks [68]. While these directly model many of the actual details of theregulatory machinery, they are extremely complex and can only deal with small scale networks. Formore global applications, simplified and abstract models are required. A few such models have beensuggested, all based on the following basic idea: The regulatory network is a directed graph G. Eachnode in G corresponds to a specific gene that behaves according to some deterministic function ofits parents in G. These include: Boolean network models [89, 1], where each gene is either on or

Genes1.4. OUR APPROACH5clusteringExperimentsFigure 1.3: Clustering gene expression data: Each row corresponds to a gene andeach column corresponds to a microarray sample, i.e., all the spots on the microarray in Figure 1.2 appear as a column in this figure. To the left is the unclusteredinput matrix. To the right is the matrix after clustering reordered the rows andcolumns.off depending on some boolean function of its parents. Linear models [99, 27], where each gene ismodeled as a continuous linear function of its parents. In order to simplify the complexity of suchmodels, it is typically assumed that G is acyclic and of bounded indegree. While these methodshave had partial success on simulated data, none of them have had any success when applied to realbiological data.1.4 Our approachRecall, our goal is to reconstruct molecular networks representing processes such as gene regulation. To answer this challenge, we adopt a systems perspective of the cell and its components, andattempt to build models of this system. Our measurements observe the system at different states,which can be defined in terms of the concentration of active proteins and metabolites in the variouscompartments, the concentration of different mRNA molecules in the cytoplasm, etc. Our basicassumption is that the components in the cell do not work in isolation. Rather they effect each otherthrough a wide variety of interactions. The key point being, that the components effect each otherin a consistent fashion, Thus, if we consider a random sampling of the system, some states are moreprobable than others. For example, Gal4 is a transcription factor which strongly activates the galactose pathway genes, therefore if Gal4 is overexpressed in some state, it is likely that other galactosepathway genes are also overexpressed.We treat measurements of the cell’s components (e.g. gene expression measurements) as random variables and thus the likelihood of a cell state can be specified by the joint probability distribution on these variables. By representing measurements as random variables, to account for

6CHAPTER 1. INTRODUCTIONmeasurement noise, variability in the biological system, and aspects of the system that are not captured by the model.In this dissertation, due to issues of data availability, we only observe the level of mRNA expression for each of the genes. Therefore, we resort to a partial view which projects the activity ofthe entire cell onto gene expression profiles. In our model, each gene is associated with a randomvariable that represents the measurement of its expression. We use the term genes, interchangeably,to represent both the biological genes and the random variables that represent them in our model.We stress that the basic approach described here for gene expression data can be easily extendedto other data types (e.g. protein levels) as these become available. For example, when more direct measurements of transcription factor activity become available, these be easily incorporated asrandom variables in the model and can greatly enhance the resulting reconstruction.Our goal is to estimate the joint probability distribution over gene expression and understandits structural features from data. Our reconstruction of pathway structure is based on the following idea: molecular interactions between the genes sometimes generate corresponding statisticaldependencies between the random variables that represent them. Using Gal4 as an example: Gal4activates the transcription of other galactose genes, thus creating a correlation in their expression.The learning algorithms presented in this dissertation detect consistent statistical dependenciesand reconstruct a model that explains them, i.e., a model that could have generated the observeddata. Our approach is global: we fit a model to data by studying the joint probability distributionover the entire gene set. Once we define such a model, its interpretation is as important an issueas the learning algorithm. An important question that will be repeatedly addressed throughout thedissertation is: What type of molecular relations create statistical dependencies in gene expressionprofiles?A large part of this dissertation focuses on regulatory relations. Our ability to detect regulatoryrelations relies on the assumption that the

important scientic skills: how to dive deep into messy data and surface with simple models that address the question at hand, always striving to understand the connection between data, model and reality. Few have these skills and I was privileged to lear

Related Documents:

AQA GCE Biology A2 Award 2411 Unit 5 DNA & Gene Expression Unit 5 Control in Cells & Organisms DNA & Gene Expression Practice Exam Questions . AQA GCE Biology A2 Award 2411 Unit 5 DNA & Gene Expression Syllabus reference . AQA GCE Biology A2 Award 2411 Unit 5 DNA & Gene Expression 1 Total 5 marks . AQA GCE Biology A2 Award 2411 Unit 5 DNA & Gene Expression 2 . AQA GCE Biology A2 Award 2411 .

Gene Expression 1. TaqMan Gene Expression Assays 2. Custom TaqMan Gene Expression Assays 3. TaqMan MicroRNA Assays 4. Use of Primer Express Software for the Design of Primer and Probe Sets for Relative Quantitation of Gene Expression 5. Design of Assays for SYBR Green I Applications Section IV.

Vector are conveniently included in the ExpiSf Expression System Starter Kit for expression of your gene of interest in ExpiSf9 cells. pFastBac 1 Expression Vector pFastBac 1 Expression Vector is a non-fusion donor plasmid that is used to clone your gene of interest using restriction enzyme digestion and ligation. Gene expression

Level 2 Biology, 2013 91159 Demonstrate understanding of gene expression 9.30 am Friday 22 November 2013 Credits: Four Achievement Achievement with Merit Achievement with Excellence Demonstrate understanding of gene expression. Demonstrate in-depth understanding of gene expression. Demonstrate comprehensive understanding of gene expression.

gene expression can be regulated by modulating the degree to which the transcript is protected. 1. Initiation of transcription. Most control of gene expression is achieved by regulating the frequency of transcription initiation. 3. Passage through the nuclear membrane. Gene expression can be regulated by controlling access to or efficiency of .

3. Identify the main mechanism for turning on gene expression. Explain why control of gene expression in eukaryotic cells is like a “dimmer switch”, an “ON” switch that can be fine tuned. 4. Identify the major switch and all the fine-tuning steps that can modulate eukaryotic gene expression. 5.

Main purposes of this tutorial ! Provide an updated list of plant gene-expression . expression profiles ! Review considerations relevant to the use of gene expression databases ! Use web-based tools for visualization of transcriptomic data . Background ! Expression databases hosting microarray -derived data have been fundamental to study gene .

One Gene-One Enzyme Hypothesis (Beadle & Tatum) The function of a gene is to dictate the production of a specific enzyme One Gene—One Enzyme but not all proteins are enzymes those proteins are coded by genes too One Gene—One Protein but many proteins are composed of several polypeptides, each of which has its own gene One Gene—One Polypeptide