Enriching For Direct Regulatory Targets In Perturbed Gene .

3y ago
57 Views
2 Downloads
521.97 KB
14 Pages
Last View : 15d ago
Last Download : 3m ago
Upload by : Emanuel Batten
Transcription

Open Accesset al.Tringe2004Volume5, Issue 4, Article R29MethodSusannah G Tringe*‡, Andreas Wagner† and Stephanie W Ruby*commentEnriching for direct regulatory targets in perturbedgene-expression profilesAddresses: *Department of Molecular Genetics and Microbiology, University of New Mexico Health Sciences Center, Albuquerque, NM 87131,USA. †Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA. ‡Current address: DOE Joint Genome Institute, 2800Mitchell Drive, Bldg 400, Walnut Creek, CA 94596, USA.Published: 30 March 2004reviewsCorrespondence: Stephanie W Ruby. E-mail: sruby@unm.eduReceived: 27 November 2003Revised: 29 January 2004Accepted: 12 February 2004Genome Biology 2004, 5:R29The electronic version of this article is the complete one and can befound online at http://genomebiology.com/2004/5/4/R29reports 2004 Tringe et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in allmedia for any purpose, provided this notice is preserved along with the article's original vewe buildforgenesregulatorydirectonWhenarea rectfeedbackto hAbstractAcyclic networks, by definition, lack feedback pathwaysthrough which genes can regulate their own activity. As feedback pathways are known to exist in regulatory networks, thepreviously proposed algorithm also included a procedure bywhich it could be applied to any network, even one withcycles. This procedure transforms the network into an equivalent acyclic digraph, called a condensation, before reconstruction. The algorithm thus bypasses the cyclic componentsand reconstructs the acyclic portion of the network. Thestructure of the feedback pathways themselves, however, cannot be determined from steady-state single-mutant data [5].To improve the ability of our algorithm to reconstruct alltypes of regulatory pathway, we drew from the traditionalgenetic approach of epistasis analysis. 'Epistasis' describes aGenome Biology 2004, 5:R29informationAnother approach to identifying regulatory targets involvesperturbing gene activity by deleting or overexpressing a transcription factor, and analyzing the effects on the gene-expression profile. However, transcripts affected in suchexperiments include those of both direct and indirect targetsof the perturbed gene, and in some cases the latter may dominate. Various methods have been used to identify the directtargets among the affected genes, including promotersequence examination and/or genome-wide location analysis[3,4]. In an earlier article, one of us proposed pooling datafrom a complete set of single-mutant gene-expressionprofiles to reconstruct a tentative network, then enriching fordirect targets by paring the network down to the simplest acyclic directed graph (digraph) consistent with the availabledata [5].interactionsGene-expression studies, using cDNA or oligonucleotidearrays, hold promise for elucidating the structure of geneticregulatory networks. A wealth of computational techniqueshave been proposed for extracting regulatory relationshipsfrom these data, many of which rely on correlated expressionpatterns to identify temporally co-regulated genes (reviewedin [1,2]). While these methods often detect important patterns, they cannot definitively identify the targets of transcriptional regulators.refereed researchBackgrounddeposited researchHere we build on a previously proposed algorithm to infer direct regulatory relationships usinggene-expression profiles from cells in which individual genes are deleted or overexpressed. Theupdated algorithm can process networks containing feedback loops, incorporate positive andnegative regulatory relationships during network reconstruction, and utilize data from doublemutants to resolve ambiguous regulatory relationships. When applied to experimental data thereconstruction procedure preferentially retains direct transcription factor-target relationships.

R29.2 Genome Biology 2004,Volume 5, Issue 4, Article R29Tringe et al.phenomenon in which an allele of one gene can influence thephenotypic expression of an allele of another gene [6]. Forexample, an altered allele of a downstream gene in a biological pathway may block the effects of a mutation furtherupstream, thereby changing the outcome of a biological process. Such epistatic relationships can therefore be used todetermine the order of gene function, or ascertain that twogene products act in parallel, independent pathways [7,8]. Inepistasis analysis, genes involved in the process of interest aresystematically perturbed; phenotypes of double mutants,with two genes perturbed, are compared to those of singlemutants with only one perturbed gene. If the phenotype of adouble mutant is different from either of the related singlemutants, the two genes are presumed to act independently ofeach other. However, if the double mutant resembles one orthe other single mutant, the genes are likely to participate inan ordered pathway and the gene whose mutant phenotypedominates is placed downstream of the other. This type ofanalysis has proved highly informative in the study of genetic,metabolic and signaling networks, suggesting that the inclusion of double-mutant data in genetic network analysis couldgreatly improve the accuracy of the network reconstruction.Here we extend the capabilities of a genetic network reconstruction algorithm [5] to improve its performance andbroaden its applicability. First, we implement a preprocessingstep to accommodate feedback loops. Second, we modify thealgorithm to consider positive and negative regulatory relationships when generating the reconstruction. Third, we utilize data from double mutants to resolve cyclical structuresand to identify nontranscriptional or redundant regulatoryrelationships. The performance of these modified versions ofthe algorithm is then tested in multiple ways. We use synthetic networks to assess the ability of the cycle-accommodating algorithm to tolerate incomplete or noisy data, and toexamine the potential improvement achieved by the incorporation of double-mutant data. Finally, we test the improvedalgorithm on published expression data from the buddingyeast Saccharomyces cerevisiae and compare our resultswith transcription factor binding profiles.ResultsGraph theoretical frameworkIn this work we represent the genetic regulatory network as adirected graph or digraph, G, and all discussion of graphshere refers to digraphs. A digraph consists of nodes, which inthis case correspond to genes, and directed edges, which inour model point from regulator to target. A graph can be represented by a diagram (Figure 1a) of nodes and edges. Analternative representation that fully defines the graph is theadjacency list, Adj(G), in which each node is listed along withthe nodes to which it is connected by a directed edge (Figure1b). In the context of a genetic regulatory network, the adjacency list of a gene includes all the genes it directly influences:for example, the genes whose promoters are bound by n factor. The accessibility list, Acc(G), of thedigraph lists each node along with all nodes that can bereached along a directed path of any length from that node(Figure 1c). For a genetic regulatory network, the accessibilitylist includes all genes whose transcription can be influencedby a gene, directly or indirectly. (For a more thorough discussion of digraphs, see [9].)The genes whose transcript levels change when a gene isdeleted or perturbed constitute an accessibility list for thatgene. Reconstructing a genetic regulatory network from geneexpression data, therefore, is equivalent to determining anadjacency list based on an accessibility list [5]. Because anaccessibility list does not define a unique graph, the algorithmseeks the minimum equivalent (or most parsimonious)graph, in which the number of edges is minimized. A uniquemost parsimonious graph, which provides a core set of edgesthat are present in all graphs sharing the accessibility list,exists by definition for an acyclic graph [5,10]. The algorithmobtains the simplest network that can explain the observations by a process that initially connects each perturbed geneto all genes affected by its perturbation then prunes awayedges, called shortcuts, which connect one node with anothernode already accessible via a directed path.Cycles present a special problem for the reconstruction algorithm in that a graph with cycles does not possess a uniqueminimum equivalent graph. A cycle is a closed path in adigraph that begins and ends on the same node and crosses atleast one other node (for example, Figure 1a, nodes 7, 8, and9). It is impossible to reconstruct the edges in a cycle on thebasis of single-mutant data: all genes in a cycle have identicalaccessibility lists, so they are effectively equivalent. Such agroup of nodes, in which each node can be reached from everyother node, is called a strong component. A multinode strongcomponent may contain one or many cycles, whereas eachnode not contained within a cycle is a strong component untoitself. Every graph has an equivalent acyclic graph [10], orcondensation [9], in which each strong component is represented by a single node (Figure 1d). By mapping the tentativenetwork onto this acyclic equivalent, the algorithm circumvents the problem of cycles [10]. This mapping is achieved byexamining each perturbed gene and scanning its accessibilitylist for any reciprocally regulating genes, which are thenassigned to the same component.Extensions of the algorithmWhile the previous paper [5] presented a basic procedure forreconstructing a network, several factors limit its applicability to gene-expression data. Here we address these shortcomings with a number of extensions. These modificationsaccommodate cycles in such a way that the error tolerance ofthe algorithm can be assessed, they distinguish between positive and negative regulation, and they incorporate information from double-mutant gene-expression profiles into thefinal reconstruction.Genome Biology 2004, 5:R29

http://genomebiology.com/2004/5/4/R29Genome Biology 2004,(a)(b)139247(c)68(d)1 2 4 5 7 8 9 102 4 7 8 93 2 4 7 8 9 104 5 2 4 7 8 9 106 7 8 97 7 8 98 7 8 99 7 8 910 2 4 7 8 9reviews1 52 4 73 104 5 106 87 88 99 710 210Tringe et al. R29.3comment5Volume 5, Issue 4, Article R29(e)513510141067 89924768Genome Biology 2004, 5:R29informationThe original algorithm represents the regulatory network as asimple directed graph, in which the edges have neitherMutant gene-expression data can be represented by an M Naccessibility matrix P(G), where M is the number of genesperturbed and N is the number of genes in the network. Eachmatrix element pij 1 if there is an edge from node i to node j,and pij 0 if no edge is present [5,9]. We modified this matrixsuch that pij 1 if there is a positive regulatory relationship,and pij -1 if there is a negative regulatory relationship. Thus,if the transcript level of gene j goes up when gene i is deleted,then gene i negatively regulates gene j and the matrix elementpij -1. Inspection reveals that any indirect regulatory pathway will have a value equal to the product of the intermediateedges, so the extended algorithm only prunes an edge, by converting the matrix element to zero, if this condition is met(Figure 2a, lines 15-19). For example, if the two intermediateedges both have a positive sign, the original algorithm willremove the shortcut regardless of its sign (Figure 2b), but theextended algorithm will only prune the edge if it is also positive (Figure 2c). Furthermore, an edge will not be pruned ifinteractionsPositive and negative regulationmagnitude nor sign. However, real genetic regulatory relationships can be either activating or repressing, and can varyin strength; failure to take this information into account couldresult in erroneous reconstruction. Although the strength ofan interaction is difficult to determine from microarray data,it is simple to assess whether a regulatory influence is activating or repressing. Moreover, it is straightforward to incorporate this information into the reconstruction algorithm.refereed researchIn the previous paper, the error tolerance of the algorithmwhen reconstructing networks containing cycles was notexamined. To compare graphs with different numbers ofstrong components, we devised a method of generating areconstruction in which each node again represents one gene.In the reconstruction, any mutually regulating pairs of genesin the network are mapped onto the same strong component[5]. We have added a step to expand each strong componentinto its constituent genes by adding direct connections fromeach node in the component to all other nodes in the component, and between each node in the component and all nodesadjacent to the component (Figure 1e). This maps the reconstruction back onto the original set of nodes and allows it tobe compared, edge by edge, to the original network. We choseto treat the components in this way because alternativeapproaches result in the undesirable situation of a single network having multiple possible reconstructions [10,11]. Whileour method can result in a number of extra edges in the reconstruction that are not part of the real network (false-positiveedges), it does result in a unique reconstruction that minimizes the number of correct relationships missed (false-negative edges).deposited researchFigure 1 representation of genetic regulatory networksGraphicalGraphical representation of genetic regulatory networks. (a) A sample regulatory network; (b) its adjacency list; (c) its accessibility list; and (d) itscondensation. (e) The reconstruction of this network, mapped onto the original nodes. Circles represent nodes, or genes, and arrows represent edges.Accommodating cyclesreports23

R29.4 Genome Biology 2004,Volume 5, Issue 4, Article R29Tringe et al.http://genomebiology.com/2004/5/4/R29(a)(b)1 for all nodes i of G2for all nodes j Acc(i)3Adj(i,j) Acc(i,j)1 24 for all nodes i of G5if node i has not been visited6call PRUNE ACC(i)7end if2 318 PRUNE ACC(i)9for all nodes j Acc(i)10if Acc(j) Ø11declare j as visited12else13call PRUNE ACC(j)14end if1516171819202111 22 for all nodes j Acc(i)for all nodes k Adj(j)if k Acc(i) and Acc(i,k) Acc(i,j)*Acc(j,k)delete k from Adj(i)end ifdeclare node i as visitedend PRUNE ACC(i)333(c)11 22 3131 22 33Figure 2Edge-removalcriteriaEdge-removal criteria. (a) Pseudocode of the algorithm including positive and negative regulation. Acc(i) and Adj(i) indicate the accessibility and adjacencylists for gene i, respectively, and Acc(i,j) indicates the value ( 1 or -1) of the edge from i to j. (b) The original algorithm will pare away any edge connectingtwo nodes that already have a pathway between them. (c) Algorithm taking positive and negative regulation into account will only pare away an edge if itssign is equal to the product of the signs of the remaining edges in the pathway.the mediating node is a multigene strong component thatcontains some edges with negative sign, because edges to andfrom these components have ambiguous values.of edges within these strong components. We therefore developed an algorithm (Figure 3a) that uses double-mutant datato refine the reconstruction generated with single-mutantdata.Double-mutant dataReconstructions generated by the algorithm using data fromsingle mutants may contain a number of unresolved strongcomponents. Double-mutant data, from strains in which twogenes have been perturbed, should allow the reconstructionNew accessibility lists for genes in a double mutant are generated by comparing the gene-expression profile of the doublemutant to that of each single mutant. For example, in a simplethree-gene cycle (Figure 3b), comparing the expressionGenome Biology 2004, 5:R29

http://genomebiology.com/2004/5/4/R29Genome Biology 2004,(a)(b)456789101112131415REFINE(i,j)for all nodes k Adj(i)if k Acc j(i) and k Acc(j)delete k from Adj(i)if j Acc(i)add j to Adj(i)end ifend iffor all nodes l Acc j(i)if l Acc(i)add l to Adj(i) and Adj(j)end if(c)1[1] 2 3[1] 3 131 1 2 32 1 2 33 1 2 321[2] 1 [2] 3 1132321[3] 1 2[3] 2 32reports(d)23(e)121 32 3[1] 2 3[2] 1 3123are needed. This procedure is successful as long as there arenot multiple cycles within the component. If there is morethan one redundant, but indirect, pathway from one gene toanother, the two genes will appear to be directly connected inthis analysis.Genome Biology 2004, 5:R29informationDouble-mutant data can similarly be used to identify redundant or nontranscriptional regulatory relationships [7], andwe have extended the algorithm to reconstruct these types ofrelationships (Figure 3a,d,e). If two genes, i and j, haveredundant or overlapping regulatory effects on a third gene,k, the transcript level of k may be unchanged in each of thesingle mutants but altered in the double mutant. This type ofrelationship can be inferred when Acc-j(i) contains membersthat are absent from the single-mutant accessibility list Acc(i)(Figure 3d). In such a case, the algorithm adds connectionsfrom both gene i and gene j to gene k (Figure 3a, lines 12-15).This could represent a case, for example, where either of twotranscription factors can bind the same site in the promoterinteractionsprofile of a double mutant in which two genes have beendeleted to that of a single mutant can reveal indirect relationships (Figure 3c). To incorporate this information, a reconstruction is first generated based on the single-mutant dataalone (Figure 3b, bottom right), in which strong componentsare fully connected as described earlier. Information from thedouble mutants is then used to remove connections that arenot supported by the data. If, in the reconstruction, a gene kis a member of the adjacency list of gene i, Adj(i), but not inthe accessibility list of gene i in the presence of a mutation ingene j, Acc-j(i), then the connection from gene i to gene k isprobably indirect. It is removed from the reconstruction aslong as k is a member of Acc(j), meaning that gene j could bemediating the interaction (Figure 3a, lines 5-7). In this manner, data from each of the double mutants are used successively to refine the reconstruction (Figure 3c). To fully resolvethe structure of cycles that are subcomponents of a largergraph, double-mutant data for all pairs of genes in, orimmediately adjacent to, each multinode strong componentrefereed researchFigure 3network structure with double-mutant dataRefiningRefining network structure with double-mutant data. (a) Pseudocode of the extension utilizing double-mutant data. Acc-j(i) indicates the accessibility list ofgene i in the absence of gene j. i, j, k, and l are arbitrary indices for genes in the network. (b) An example of a three-gene cycle (top), its single-mutantaccessibility lists (bottom left) and a reconstruction based on that data (bottom right). (c) The double-mutant accessibility lists for the cycle in (b) and thereconstruction process. For each set of double-mutant data (left), edges revealed to be indirect are removed from the reconstruction (right). The notation[1] 2 indicates the accessibility list of gene 2 in a strain in which gene 1 is already perturbed. (d) A network in which genes 1 and 2 redundantly regulategene 3 (right), and single-mutant and double-mutant accessibility lists for the network (left). (e) A ne

Susannah G Tringe*‡, Andreas Wagner† and Stephanie W Ruby* Addresses: *Department of Molecular Genetics and Microbiology, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA. †Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA. ‡Current address: DOE Joint Genome Institute, 2800

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

The Virginia Satir Global Network . Enriching Your Relationship Program The Enriching Your Relationship with Yourself and Others (Enriching) model is an experiential and psycho-educational brief intervention program based on the sk

A formal Regulatory Management System [RMS] can help with: reduction of regulatory burden on citizens and firms improvement of regulatory quality identification of best choice of policy options Comprised of four elements: 1. regulatory quality tools 2. regulatory processes 3. regulatory institutions 4. regulatory policies 16

Page 1 of 9 Rapid Regulatory Courses in HealthStream Getting Started Tip Sheet Please note: Everyone is required to take two compliance trainings titled: Rapid Regulatory Compliance: Non-clinical I Rapid Regulatory Compliance: Non-clinical II Depending on your position at CHA, you may have more courses on your list. One must complete them all.File Size: 1MBPage Count: 9Explore furtherRapid Regulatory Compliance: Clinical II - KnowledgeQ .quizlet.comRapid Regulatory Compliance: Clinical I - An HCCS .quizlet.comRapid Regulatory Compliance: Non-clinical II-KnowledgeQ .quizlet.comThe Provider Compliance Tip fact sheets are now available .www.cms.govRapid Regulatory Compliance - Non-Clinical - Part Istudyres.comRecommended to you b