Bioinformatics Proteomics Lecture 11

1y ago
9 Views
2 Downloads
7.43 MB
45 Pages
Last View : 14d ago
Last Download : 3m ago
Upload by : Camille Dion
Transcription

Bioinformatics Proteomics Lecture 11Prof. László PoppeBME Department of Organic Chemistryand TechnologyBioinformatics – ProteomicsLecture and practice24.11.2015.Bioinformatics - Proteomics

Structural genomics, proteomics, biologyThe revolution in biology (new research methods, new approaches):complete genomes - biomolecular structure determination - bioinformatics - high-throughputprocedures to characterize the biological samples (microarray techniques)Driving forces:Genome sequencing projects ( 7,000 genome known, others are in progress)Automated structure assignment projects (Protein Structure Initiative, PSI)Pre-genomic era ("classic bioinformatics") mainly applied bioinformatics methods based onhomology (BLAST, PSI-BLAST, threading, etc.).Post-genomic era: a whole set of new methods not based on homology! New bioinformatics.224.11.2015.Bioinformatics

Structural genomics, proteomics, biologyNew disciplinesGenomicsGenome: complete set of gene or DNA of a species.Genomics: understanding / study of the genome: use of the full genetic information (studynot only individual genes or groups of genes)Functional genomics: assigning function to genes by genomic methods (experimental andcomputational [in silico] procedures)Structural genomics: exploring the spatial structure (electronic and experimental) of theproteins encoded in the genome and their use (e.g. in functional genomics )Further terms and disciplines related to biological informationProteome: total proteins present / expressed in a cell (in its certain state).Proteomics: investigation of the proteome (mostly experimental)Transcriptome / transcriptomics: the mRNS population / its investigationMetabolome / metabolomics: the metabolic network / its investigationThe „revolution of omics": studies of other complex biological systems324.11.2015.Bioinformatics

Structural genomics, proteomics, biology424.11.2015.Bioinformatics

The biological functionThe classicof function:The molecular function (e.g. what reaction is catalyzed by or what other molecule binds to aspecific protein)The extended meaning of function ("post-genomic):Contextual / cellular function (the location of the specific protein in the cell interactionsnetwork)Contextual / cellular functionMolecular functionThe function of protein A:formation of P from SThe function of protein A is definedby its position in the interaction networkof the other cell proteins524.11.2015.Bioinformatics

Structural genomicsProtein sequenceSequence similaritysearch (e.g. BLAST)HomologoussequencesGrouping,data collectionThreading, ab-Initiostructure prediction3D structureStructure motifs,active-site librariesSearch in genomedatabasesGenome dataPhylogenetic profiles,Rosetta-stones, neghbouredgenes, Correlated geneexpressionFUNCTION !!Pre-genomic era624.11.2015.Post-genomic eraBioinformatics

Structural genomicsPost-genomic bioinformatics methods:Pure computation:Phylogenetics profilesRosetta stone methodNeighboured genesExperimental, but computer-aided:Correlated gene expression724.11.2015.Bioinformatics

Huynen MA, Bork P, Proc Natl Acad Sci U S A. 1998, 95(11), 5849-5856.Pellegrini M, et al, Proc Natl Acad Sci U S A. 1999, 96(8), 4285-4288.Structural genomics – Phylogenetic profilesPhylogenetic profiles: Study of the incidence of specificgenes in different organisms (necessary to know the fullgenomes).The same or very similar (or fully or nearly complementary)phylogenetic profile indicates probable functionalrelationship between the genes (this means that the genes areall occur together).The more complete genomes are available for analysis, themore reliable the result.2009. 05. 08.824.11.2015.BUT: certain evolutionary phenomena disrupt the analysis- Gene function redundancy (more genes with the samegene function);- Gene replacement with another gene which is notorthologues to the orthologs of the original gene;- Horizontal gene transfer (DNA transfer betweenorganisms);- Gene deletion in certain organismsBioinformatika 2

Enright AJ, et al., Nature 1999, 402(6757), 86-90.Marcotte EM, et al., Nature 1999, 402(6757), 83-86.Yanai I, et al., Proc Natl Acad Sci U S A. 2001, 98(14), 7940-7945.Structural genomics – Rosetta-stone methodDomain-fusion methodTwo distinct proteins of an organism may occur as a fusion protein (single polypeptidechain) in other organisms.If two proteins are expressed as fusion proteins, it is likely a functional relationshipbetween them (fusion of proteins in some organisms can occur because their proximity isbenefitial due to their functions).The fusion protein is a kind of Rosetta stone: on the basis of the known function in a certaindomain the unknown function of the other domain can be inferred.A proteinB proteinA-B fusion protein inanother organismBUT: there are "promiscous" domains fusing numerous other proteins9The Rosetta Stone provided three translations of an ancient text to the researchers: Egyptian demoticknown, this stone2009. 05. 08.script, Greek text and Egyptian hieroglyphs. As the Greek language was wellBioinformatika2 wasthe key to deciphering hieroglyphics.

DeRisi JL, et al., Science 1997, 278(5338), 680-686.Wu LF, et al., Nat Genet. 2002, 31(3), 255-265.Structural genomics - Adjacent genesIf two genes in most organisms located next to each other on the chromosome, it is likely tohave a functional relationship between them.Prokaryotic operons are frequent (several related functional gene is located one after another,in a common promoter).Operons in eukaryotes are less common, but still adjacent gene phenomena is characteristic.Location of three investigated genes in bacterial chromosomesConclusion:andare functionally related with high probabilityBUT: neighbourhood does not mean necessarily a functional relationship1024.11.2015.Bioinformatics

Structural genomics, proteomics, biologyAims of structural genomics- To determine the structures of all proteins encoded in the genome- Identification of functions using structural information (in this the sense the functionalgenomics is a part of structural biology)Structure determination of proteins- Classic approach: first identify the function of a certain protein, followed by experimentaldetermination of the structure (protein X-ray crystallography / NMR)- Structural genomics approach: first obtain the structure of the proteins (preferably of allproteins) followed by investigation of their functions (also by the aid of their structures)1124.11.2015.Bioinformatics

DNA microchipStructural genomics – Correlated gene expressionGenes expressing always with the same pattern under the same conditions are likely relatedfunctionally - analysis and evaluation of microarray dataE.g.: Synchronize yeast cells (the same cell cycle )- (a) Sampling in two cycles (every ten minutes), preparing cDNAfrom the mRNA, then hybridization of the samples on DNA chipscontaining all (6000) yeast genes - determination of theexpression level of each gene- (b) Clustering (grouping) the genes showing significant fluctuationin their expression levels (409 out of 6000 ) according to theirtemporal expression pattern of correlations (red : high expression ,blue: low expression) . The tree ( dendrogram ) shows thishierarchical grouping.- Time 409 genes were classified to five main groups (c) accordingto their expression behavior (d).122009. 05. 08.Simple clusteringHierarchical clusteringBioinformatika 2

Structural genomics – Combined methodsCombination of the purely computing (in silico) methods of functional genomics withexperimental data based on correlated gene expression is the most efficient.1324.11.2015.Bioinformatics

Experimental structural genomics/biologyThe diversity of protein structures- The number of different fold is estimated at between 1,000 and 100,000.- The PDB contains currently approx. 110,000 structures, but they are structurally highlyredundant representing approx. 1200-1500 different folds. Most of the newly determinedstructures also have known fold.- Only about 15-25 % of the proteins encoded by genes in the entire genomes have sufficientlevel of homology to the known protein structures.Experimental structural genomics- Aaim of structural genomics: selection of those target proteins from the genomes determinationof whose 3D structure experimentally will cover the whole sequence space in a way that all theother protein will fall within homology modeling distance of (about 20% sequence identity) tosome of them. So each protein structure became predictable by homology modeling.- Systematic structure identification projects are going on, for example: Protein StructureInitiative: http://www.nigms.nih.gov/Initiatives/PSIBUT: Non expressable proteins, membrane proteins and hard-to-crystallize proteins may causeproblems.1424.11.2015.Bioinformatics

PSI - Protein Structure Initiative1524.11.2015.Bioinformatics

Structural genomics/biologyBinding site sequence motifsIdentification of sequence patterns related to a given local structureE.g. Several ATP or GTP-binding proteins (eg, ATP synthase, myosin heavy chain,helicases, thymidine kinase, the G-protein alpha subunit, etc.) include the followingconsensus sequence: [A or G] XXXXGK[S or V]. This sequence forms a mobile loopbetween the alpha-helical and beta-sheet domains of protein in question, regardless of thegeneral fold of the protein.See: (a) GTP in the P loop of H-Ras signal protein (PDB 1qra); (b) ATP in the P loop of aprotein kinase (PDB 1aq2).1624.11.2015.Bioinformatics

Structural genomics/biologyConvergent and divergent evolutionHomology is often difficult to identify based only on the sequence, becausethe sequence vary much faster than the 3D structure, and thus the convergentand divergent evolution is sometimes difficult to distinguish.In some instances, spatial consistency can be observed at functional locations,while the functionally important amino acids may show only little or nosequence identity. In this case, the distinction between convergent anddivergent evolution can be difficult. For example, the benzoylformatedecarboxylase (BFD) and pyruvate decarboxylase (PDC) exhibit only approx.21% sequence identity, but their folds are practically identical. The catalyticamino acid residues are conserved in 3D space, but not in sequence.It is possible that the two proteins evolved and converged by the similarchemical solution of the alpha-keto acid decarboxylation independently. Theobserved high similarity in their fold may mean, however, that they areevolved from a common ancestor protein and their functions diverged. Thelow degree of sequence identity here does not allow differentiation betweenthese two possibilities.1728.11.2013.Bioinformatics

Structural genomics/biologyStructure familiesHALMembers of structural superfamilies often have related biochemical functionsA superfamily is (not strictly) defined as a set of homologous proteins withsimilar 3D structure having similar but not necessarily identical biochemicalfunction. Almost every superfamily shows some functional diversity arisingfrom local sequence differences and/or domain exchanges. Within an enzymesuperfamily substrate diversity is frequent, while the chemistry of thereaction is highly conserved (see MIO containing ammonia-lyasees HAL,PAL, TAL).In many enzyme superfamilies the position of the the catalytic groups insequence may be different in member by member, despite the same functionwithin the protein. These variations sometimes make it difficult or evenimpossible to classify a certain protein based only on sequence alignmentsinto a superfamily. Although some members of the superfamily can be similarin their sequences, the structural and functional similarity are the basics onwhich a given protein can be classified into a superfamily of. Within allsuperfamilies there are families, between the members of which there existclose functional relationship and significant sequence identity ( 50 %).PALTAL1828.11.2013.Bioinformatics

Structural genomics/biologyConvergent evolutionKimotripszinThe four superfamilies of serine proteases: example of convergent evolutionSerine proteases belong to several structural superfamilies, which varyconsiderably in sequence and general fold, however, very similar in therelative position their catalytic triads of amino acids (Ser - His Glu/Asp) within their active centers.SzubtilizinEach serine protease superfamily have many members, but no sequenceor structural similarity exists between the superfamilies. In eachsuperfamily the position of the catalytic triad of amino acid may vary inthe sequence but their final location are very similar in the tertiarystructure.Presumably the formation of a similar active site is the result ofconvergent evolution, while within each superfamilies the divergentevolution resulted in different proteases which are very similar instructure but have different substrate specificity .28.11.2013.19Representants of two superfamilies of serin proteasesBioinformatics

Christianson,CV, et al., J Am Chem Soc. 2007, 129, 15744-15745.Structural genomics/biologyCharacterization of an active site by substrate analoguesStructure of tyrosine 2,3-aminomutase crystallyzed with an inhibitors as an example ofexperimental determination of the active site2024.11.2015.Bioinformatics

Structural genomics/biologyLocalization of an active site by crystalyzation with solvents21Structure of subtilisin in 100 % acetonitrileThe organic solvent (green) binds only to a fewlocations on the surface of the protein, includingthe active site as well ( the left center ofFigure). The red globes are waters that remainto be bound (they are structural waters beingessential to maintain the active structure of theprotein) despite the 100% concentration of thewater-miscible solvent .28.11.2013.Thermolysine structure with differentsolventsThe different binding sites of termolysine –based on crystal structures wetted bydifferent solvents. Based on similar spaceoccupied by various solvents the binding sitecan be clearly identified. The active sitecontains bound zinc (gray) and calcium(black) ions as well.Bioinformatics

Röther D, et al., Eur. J. Biochem. 2001, 268, 6011–6019.Structural genomics/biologyFunctional study in an active site by point mutationsEg.: By point mutations of aminoacids within the active site ofhistidine ammonia-lyase (HAL) itwas possible to deduce theimportance of each amino acid incatalysis2228.11.2013.Bioinformatics

Expasy Tools: http://www.expasy.orgProteomics databases and programs - ExPASy2328.11.2013.Bioinformatics

Expasy Tools: http://www.expasy.org/proteomicsProteomics databases and programs - ExPASy242009. 05. 08.Bioinformatika 2

Expasy Tools: http://www.expasy.org/proteomicsExPASy – protein identification252009. 05. 08.Bioinformatika 2

ProtParam: http://www.expasy.org/protparam/Prediction of protein properties - ProtParam26

ProtParam: http://www.expasy.org/protparam/Prediction of protein properties - ProtParam27

ProtParam: http://www.expasy.org/protparam/Prediction of protein properties - ProtParam28

Expasy Tools: http://www.expasy. org/proteomics/protein sequences and identificationExPASy – protein sequences29

Expasy Tools: http://www.expasy. org/proteomics/similarity search alignmentExPASy – sequence search / alignment30

Expasy Tools: http://www.expasy. org/proteomics/protein structureExPASy – protein structure31

Expasy Tools: http://www.expasy. org/proteomics/families patterns and profilesExPASy – protein structure families

Expasy Tools: http://www.expasy. org/proteomics/post-translational modificationExPASy – post-translational modifications

Expasy Tools: http://www.expasy. org/proteomics/mass spectrometry and 2-DE dataExPASy – MS and 2D-EF data34

Expasy / Swiss-2DPage: http://world-2dpage.expasy.org/swiss-2dpage/ExPASy / Swiss-2DPage - 2D-EF data35

GQuery: http://www.ncbi.nlm.nih.gov/gqueryBioinformatics databases/tools - GQuery

GQuery: http://www.ncbi.nlm.nih.gov/gqueryBioinformatics databases/tools - GQuery37

GQuery: http://www.ncbi.nlm.nih.gov/geneNCBI - Gene38

Genome: http://www.ncbi.nlm.nih.gov/genomeNCBI - Genome39

Genome: http://www.ncbi.nlm.nih.gov/genome/167NCBI – Genome (E. coli)40

GOLD: http://genomesonline.org/cgi-bin/GOLDGOLD – Genome project database41

NCBI Structure: http://www.ncbi.nlm.nih.gov/structureNCBI - Structure42

NCBI Taxonomy: http://www.ncbi.nlm.nih.gov/taxonomyNCBI - Taxonomy

NCBI Taxonomy: http://www.ncbi.nlm.nih.gov/taxonomyNCBI - Taxonomy44

NCBI: http://www.ncbi.nlm.nih.gov/BioSystemsNCBI - BioSystems45

3 New disciplines Genomics Genome: complete set of gene or DNA of a species. Genomics: understanding / study of the genome: use of the full genetic information (study not only individual genes or groups of genes) Functional genomics: assigning function to genes by genomic methods (experimental and computational [in silico] procedures)Structural genomics: exploring the spatial structure .

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

Proteomics 2015, 15, 3175–3192 DOI 10.1002/pmic.201500108 3175 REVIEW Quantitative proteomics using SILAC: Principles, applications, and developments Xiulan Chen 1, Shasha Wei , Yanlong Ji1,2, Xiaojing Guo and Fuquan Yang1 1 Key Laboratory of Protein and Peptide Pharmaceuticals and Laboratory of Proteomics, Institute of Biop

Proteomics offers a constantly evolving set of novel techniques to study all aspects of protein structure and function. Proteomics aims to find out the identity and amount of each and every protein present in a cell and actual function mediating specific cellular processes. Structural proteomics

Bioinformatics Crash Course Ian Misner Ph.D. Bioinformatics Coordinator UMD Bioinformatics Core . Bioinformatics!Core The Plan Monday – Introductions – Linux and Python Hands-on Training Tuesday – NGS Introduction – RNAseq with Sailfish (Dr. Steve Mount, CBCB) – RNAse

Functional genomics! mics!Metabolomics! Next generation sequencing! Mass spectrometry! Bioinformatics! Knowledge management! Ontology! Pathway! Network! High-dimensionality! Curse of dimensionality!Clustering!Feature selection!Prediction analysis!Text-mining S. Ballereau (&) ! A. Chaiboonchoe .

What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data from high-throughput biology, such as genomic sequencing or proteomics. Combines aspects of computer science, statistics, biology. Different aspects more important at

Furthermore, bioinformatics is useful for guiding functional proteomic studies. Bioinformatics analysis gives vital information on the primary, secondary and tertiary structures of proteins and their . DisProt provides structural and functional information about intrinsically disordered proteins (IDPs), and is available atwww.disprot.org. .

Uses functional genomics . combined with genome editing, proteomics, structural biology and high content imaging. PHD AND POSTDOC IN BIOMEDICINE, BIOTECHNOLOGY OR BIOINFORMATICS p 5. International, vibrant and collaborative research environment . BIOTECHNOLOGY OR BIOINFORMATICS p 6. The great thing about CPR is that, whatever your need, there