Structural Bioinformatics EvoEF2: Accurate And Fast Energy Function For .

1y ago

2 Views

2 Downloads

3.84 MB

8 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Ciara Libby

Report this link

Download PDF

Transcription

Bioinformatics, 36(4), 2020, 1135–1142doi: 10.1093/bioinformatics/btz740Advance Access Publication Date: 7 October 2019Original PaperStructural bioinformaticsXiaoqiang Huang1, Robin Pearce1 and Yang Zhang1,2,*1Department of Computational Medicine and Bioinformatics and 2Department of Biological Chemistry, University of Michigan,Ann Arbor, MI 48109, USA*To whom correspondence should be addressed.Associate Editor: Arne ElofssonReceived on August 21, 2019; revised on September 19, 2019; editorial decision on September 20, 2019; accepted on September 25, 2019AbstractMotivation: The accuracy and success rate of de novo protein design remain limited, mainly due to the parameterover-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs.Results: We developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, basedon a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3%of all, core and surface residues for 148 test monomers, and was generally applicable to protein–protein interactiondesign, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 testdimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER toevaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the factthat 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts.The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitablefor computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data.Availability and implementation: The source code of EvoEF2 and the benchmark datasets are freely available t: zhng@umich.eduSupplementary information: Supplementary data are available at Bioinformatics online.1 IntroductionComputational protein design aims to create new protein molecules thatadopt specific folds and perform desirable biological functions by usingeffective computational sampling, scoring and searching techniques.Since scoring functions play a central role in discriminating correctdesigns from incorrect designs in protein design algorithms, the development of effective and efficient energy functions is of critical importancefor improving the accuracy of protein design algorithms. In previousstudies, we developed an automatic protein design protocol, EvoDesign(Pearce et al., 2019), based on the combination of fold-level evolutionaryprofiles derived from multiple sequence alignments of structural analogsand an atomic-level physical energy function. Constraining the sequenceselection space using evolutionary profiles showed improved performance over many other algorithms that only utilize physics- orknowledge-based energy functions (Huang et al., 2013; Kuhlman andBaker, 2000; Tian et al., 2015). Our previous studies showed thatEvoDesign can yield very high success rates when designing new thermostable monomer proteins (Mitra et al., 2013; Shultis et al., 2015) andprotein–protein interactions (PPIs) (Shultis et al., 2019).Although EvoDesign has many advantages, it still has severallimitations. First, it must obtain reliable, structurally-derived evolutionary profiles, which requires obtaining a sufficient number ofstructural analogs. In previous studies (Mitra et al., 2013; Shultiset al., 2019), a relatively large number ( 10) of structural analogswere always identified for the target scaffolds of design interest.However, we have recently found that for many newly released targets, an insufficient number of structural analogs could be identified,which can reduce the effectiveness of evolution-based design. Inthese situations, the design procedure should be performed using thephysical energy component only. In previous work, we developedthe EvoEF energy function to assist protein design (Pearce et al.,2019). EvoEF was rigorously evaluated on thermodynamic mutationdata and it outperformed FoldX (Guerois et al., 2002) on twolarge sets of experimental protein stability change (DDGstability) andprotein–protein binding free energy change (DDGbind) data, with a3 5 times faster running speed. However, the performance ofEvoEF alone on de novo sequence design had never been examinedin the situation where the evolutionary profile information wasunreliable.C The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.comV1135Downloaded from stract/36/4/1135/5582267 by University of Michigan user on 26 May 2020EvoEF2: accurate and fast energy function forcomputational protein design

11362 Materials and methodsthermodynamic data (3989 DDGstability entries from 210 monomersand 2204 DDGbind entries from 177 dimers), which were collected ina previous study (Pearce et al., 2019), were used to assess the abilityof EvoEF and EvoEF2 to predict the thermodynamic changes uponmutation.2.2 Energy function and protein designEvoEF was first proposed and implemented in our evolutionaryprofile-based protein design protocol, EvoDesign (Pearce et al.,2019). In general, EvoEF consists of five energy terms:EEvoEF ¼ EVDW þ EELEC þ EHB þ EDESOLV EREF(1)Here, EVDW , EELEC , EHB , EDESOLV and EREF represent the totalvan der Waals, electrostatic, hydrogen bonding, desolvation and reference energy terms for a protein system, respectively. Here, the protein reference energy term, EREF , is used to model the energy of theprotein in the unfolded state and it is calculated as the sum of aminoacid-specific reference energy values (Pearce et al., 2019). The fiveterms were preserved in EvoEF2 and four new terms were introduced to make it capable of tackling more difficult design cases. Thecomplete EvoEF2 energy function is written as:EEvoEF2 ¼EVDW þ EELEC þ EHB þ EDESOLVþ ESS þ EAAPP þ ERAMA þ EROT EREF(2)Here, ESS describes the disulfide-bonding interactions, EAAPP represents the energy for calculating amino acid propensities at givenbackbone (u/w) angles, ERAMA is the Ramachandran term for choosing specific backbone angles (u/w) given a particular amino acid andEROT is the energy term for modeling the rotamer probabilities fromthe rotamer library.The details of the mathematic formulas for the EvoEF andEvoEF2 energy terms and the parameterization of EvoEF2 aredescribed in Supplementary Materials S1–S3, respectively. Weextended the EvoDesign Monte Carlo pipeline (Pearce et al., 2019)to test the ability of EvoEF and EvoEF2 to perform protein designand the detailed procedure is described in Supplementary MaterialS4. In general, the design procedure was very fast; for instance, ittook less than 15 min to completely design a protein that was about200 amino acids long.2.1 Dataset construction2.3 Definition of core, surface and interface residuesMonomer Dataset. X-ray determined monomer structures were collected from the datasets used in previous side-chain packing studies(Krivov et al., 2009), and protein design simulations (Mitra et al.,2013). Structures with missing main-chain atoms (N, Ca, C and O)were discarded, and protein chains with more than 300 aminoacids were excluded for fast protein design simulations. CD-HIT(Fu et al., 2012) was then used to cluster the remaining dataset witha sequence identity cutoff of 30%, and the representative proteinwas selected from each cluster to construct a set of 370 monomers.60% of these structures (222 monomers) were randomly chosen asthe training set, while the other 148 structures were used for testing.To compare the protein design results on X-ray and NMR structures, 29 monomers that had both X-ray and 10 NMR modelswere used (Schneider et al., 2009). Dimer Dataset. X-ray determineddimer structures were collected from our previous work for EvoEF’sbenchmark tests (Pearce et al., 2019), from the dimers used bySharabi et al. to optimize ORBIT for protein–protein interface design (Sharabi et al., 2011a, b) and from the dimers used by Cui et al.to compare the subunit interfaces of heterodimers and homodimers(Zhanhua et al., 2005). The dimers were filtered and clustered usingsimilar criteria as the monomer datasets (Fu et al., 2012), wheredimers whose shortest chains had more than 300 amino acidswere excluded for the sake of rapid design simulations. Followingthis procedure, 120 heterodimers and 100 homodimers wereselected; 60% of them (72 heterodimers and 60 homodimers)were randomly selected for training, while the other 48heterodimers and 40 homodimers were used for testing. DDGstabilityand DDGbind Datasets. Two sets of non-redundant experimentalThe core and surface residues were defined using criteria similarto (Kortemme et al., 2003; Kuhlman and Baker, 2000).Specifically, we defined core residues as those positions that hadmore than 20 Cb atoms within 10 Å of the Cb atom of the residue ofinterest, while the surface residues were required to have less than15 Cb atoms within the same region. Ca atoms were counted for glycine. In protein–protein interfaces, a residue was denoted as an interface residue if at least one of its atoms was within 5 Å of the otherchain.3 Results3.1 Recapitulation of native monomer sequencesThe ability to recapitulate native sequences for given protein scaffolds has been regarded as an important in silico benchmark test ofprotein design algorithms (Ding and Dokholyan, 2006; Kuhlmanand Baker, 2000; Leaver-Fay et al., 2013). For this purpose, thenative sequence recapitulation rate is defined as the ratio of thenumber of designed residues that are identical to the naturallyoccurring amino acids at the corresponding design positions tothe number of total design positions. Usually the higher the rate is,the more likely an algorithm can produce native-like proteinsequences.We first examined the ability of EvoEF to recapitulate nativesequences on a set of 148 monomer scaffolds, where the backboneswere fixed and the results are summarized in Table 1. Overall, thenative amino acid types were selected for 16.8% of the total designDownloaded from stract/36/4/1135/5582267 by University of Michigan user on 26 May 2020In this study, we first tested EvoEF’s ability to perform de novoprotein sequence design using a simulated annealing Monte Carloprocedure (Kirkpatrick et al., 1983). We found that EvoEF onlyyielded overall sequence recapitulation rates of 16.8% for the 148test monomers and 15.6% for the 88 test PPIs, which was muchworse than the results for some other protein design algorithms likeRosetta (Saunders and Baker, 2005), Medusa (Ding and Dokholyan,2006) and even FoldX (Bazzoli et al., 2011), thereby demonstratingthe inability of EvoEF to produce native-like sequences or performprotein sequence design. Since our ultimate goal is to use EvoEF forprotein design in addition to DDG estimation, we extended EvoEFto EvoEF2 by introducing four new energy terms, including termsfor disulfide bonds, amino acid propensities, Ramachandran biasesand rotamer probabilities, the weights of which were systematicallyre-optimized through protein sequence design simulations. Thebenchmark experiments showed that EvoEF2 was much more effective at generating native-like sequences for given protein scaffolds forboth monomer and PPI design, yielding overall native sequencerecapitulation rates of 32.5% for the 148 monomers and 30.9% forthe 88 PPIs. The sequence recovery performance of EvoEF2 wascomparable to those obtained by the state-of-the-art Rosetta(Saunders and Baker, 2005) and Medusa (Ding and Dokholyan,2006) algorithms. Furthermore, the foldability of the designedsequences for the 148 monomer proteins in the test set was assessedusing the leading protein structure modeling software, I-TASSER(Yang et al., 2015), where each pair of predicted and native structures for all 148 designs were found to possess the same foldwith TM-scores 0.5 and root-mean-square-deviations (RMSDs) 4 Å; these results were much better than those obtained in a previous large-scale assessment on 52 single-domain proteins (Bazzoliet al., 2011). Moreover, 87.8% and 87.1% of the designs were predicted to fold within 2 Å or with TM-scores 0.9 to the nativestructures, suggesting that the EvoEF2 designs were of high quality.Despite the fact that EvoEF2 was optimized for sequence design, italso performed reasonably well on DDG estimation. Nevertheless,the results showed that, based on the thermodynamic data estimation, EvoEF, which was specifically optimized for this task, might bemore appropriate than EvoEF2 for DDG estimation.X.Huang et al.

Computational protein design1137Table 1. Summary of native sequence recapitulation results fromdesigning 148 monomers using EvoEF and EvoEF2Residues#natEvoEF#id23 1831142102#id/#nat0.3250.4790.222Note: #nat, number of native residues; #id, number of residues with recapitulated identities.positions, while a much higher percentage, 28.4% of native residueswere recapitulated in protein cores. As a control, we found that thenative sequence recapitulation rates using random selection werearound 5% for the overall protein and the core residues, suggestingthat EvoEF was significantly better than random for sequence design. However, for surface residues, the sequence recapitulation ratewas only 7.4%, which was quite close to random, indicating thatEvoEF could not recover the surface residues effectively. Comparedwith several previous complete sequence design studies, the abilityof EvoEF to recapitulate native sequences was not, in general, asgood as some other protein design algorithms such as Rosetta(Kuhlman and Baker, 2000), Medusa (Ding and Dokholyan, 2006)and FoldX (Bazzoli et al., 2011), which achieved overall native sequence recapitulation rates ranging from 24% to 33% on differentdatasets.To improve the ability of EvoEF to produce native-like sequences, we extended EvoEF into EvoEF2 by introducing four new energy terms and re-optimizing the weights and reference energiesthrough protein sequence design simulations. The comparison of theresults for recapitulation of native residues using EvoEF and EvoEF2is shown in Table 1. Overall, the native sequence recapitulationrates for EvoEF2 were much higher than those for EvoEF. 32.5% ofall designed residues were recapitulated by EvoEF2, while a muchhigher number, 47.9%, of the native core residues were correctlyselected; both ratios were close to those reported in the work forRosetta’s benchmark on 42 monomers (Saunders and Baker, 2005)using Dunbrack’s backbone-dependent rotamer library without adding subrotamers (33.0% and 47.7% for overall and core residues,respectively). Figure 1 illustrates an example of a well-recoveredprotein core (PDB ID: 1ZEQ), where 13 out of the 14 core residueswere successfully recapitulated, not only in identity but also withclose conformations to the crystal residue side-chains. The only incorrectly predicted residue was isoleucine at position 11, which ischemically similar to the native valine anyways but with an extramethylene group. These results indicate that EvoEF2 not only recapitulates the residues at a sequence-level, but also recovers theatomic-level physical interactions, which is key for successful protein design. Moreover, utilizing the extended EvoEF2 energy function, 22.2% of the surface residues were recovered, which is about a3-fold higher rate than that obtained by the original EvoEF program. The recapitulation statistics for all 20 amino acids in all, coreand surface positions for the 148 test proteins are listed inSupplementary Table S1. Overall, the hydrophobic, aliphatic residues, with the exception of methionine and cysteine, were recapitulated at higher rates. Glycine and proline were the two bestrecovered residues, probably due to their unique side-chain structures and the fact that they are frequently found in special conformations (e.g. turns and kinks) in protein structures. Methionine andcysteine were not favored partly because the well depth of the vander Waals attractive energy is weak for sulfur atoms in theCHARMM19 (Brooks et al., 1983) atom parameters. Many cysteineresidues were involved in disulfide bonds in the test proteins, andalthough an energy term was introduced to explicitly account for disulfide bonding, it could not always recover the native-like disulfidebond geometries, in part due to the absence of crystal-like cysteinerotamer conformations. Compared with phenylalanine, the lowerrecapitulation rates for tyrosine and tryptophan were likely due toFig. 1. An illustrative example of an Escherichia coli periplasmic protein involved incopper and silver binding (PDB ID: 1ZEQ) redesigned based on the EvoEF2 energyfunction. (A) Comparison between the native and designed sequences, where the sequence identity was 31.2%. The identical residues are highlighted using darker colors and the core residues are labeled with ‘*’. (B) Comparison of the native anddesigned core residues. The protein scaffold is shown in cartoon, and the native anddesign core residues are shown in sticks with different colorsthe penalties incurred by buried hydroxyl and amide groups in theprotein core. Comparison of the results for EvoEF and EvoEF2shows that not only were the total recapitulation rates improved inthe new energy function, but the specific ratios for each amino acidtype in the designed cores were also closer to those found in the native cores, except those for aspartic acid and serine (SupplementaryTable S1), probably because aspartic acid was overdesigned byEvoEF while serine was underdesigned by EvoEF2 in protein coreregions. For example, the total number of aspartic acid and serineresidues present in the cores of all 148 native monomers was 119and 278, respectively. But the number of aspartic acid and serineresidues present in the designed cores was 885 and 292, respectively,for EvoEF, and 150 and 112, respectively, for EvoEF2. Another important finding is that, whether EvoEF or EvoEF2 was used, the native sequence recovery rate for core residues was much higher thanthe rate for surface residues, which is consistent with the findings ofprevious computational studies (Gainza et al., 2012; Kuhlman andBaker, 2000) and may suggest that the protein core is more evolutionarily conserved and its sequence space is more highly constrained than the surface. As a comparison, the native sequencerecapitulation results for the design of the 222 training proteins arepresented in Supplementary Table S2. The overall recapitulationrates and the amino acid-specific ratios for both the training and testsets were almost identical, suggesting that over-fitting may not be aproblem for the EvoEF and EvoEF2 energy weights.In some studies, only proteins with high-resolution X-ray structures ( 2.0 Å) and small sizes were selected to parameterize and testtheir protein design algorithms. Here, structures with resolutions 2.0 Å and medium sizes (e.g. up to 300 amino acids) were alsoincluded in the EvoEF2 benchmark set. We believe that the use oflarger and more diverse datasets can make our algorithm more robust and applicable to low-resolution structures or even models. InSupplementary Figure S1, we show the sequence identity betweenthe 370 native and designed monomer proteins as a function of protein structure resolution and length; both the training and test proteins were used for statistical analysis because no over-fitting wasobserved. It appeared that a weak negative correlation between resolution and sequence identity existed, with a Pearson correlation coefficient (PCC) of –0.24. However, this might be due to the smallnumber of low-resolution structures in the dataset, as in fact therewere only 34 structures whose resolutions were 2.0 Å. If weDownloaded from stract/36/4/1135/5582267 by University of Michigan user on 26 May 2020AllCoreSurfaceEvoEF2

1138excluded the 34 structures, the PCC for the group with resolution 2.0 Å was only –0.081, suggesting that the sequence identity of thedesigns is likely to be independent of the structure resolution.Additionally, the PCC between sequence identity and protein lengthfor the 370 structures was 0.084, indicating that there does not exista strong correlation between sequence identity and protein length.Therefore, we conclude that the EvoEF2 energy function may be applicable to a diverse number of structures.The optimized weights and reference energies are presented inSupplementary Tables S3 and S4. The optimized weights for thenew energy terms, ESS , EAAPP , ERAMA and EROT were 2.72, 0.59,0.42 and 0.35 (Supplementary Table S3), respectively, suggestingthat the new terms play a role in the sequence design process. Toexamine to what extent these terms are useful for sequence design,we tested the native sequence recapitulation performance of EvoEF2by disabling each of these terms, while holding the others constant.Removal of any new term lead to a decrease in the overall native sequence recapitulation rate compared to the complete EvoEF2 energyfunction, but their contributions were not identical (SupplementaryFigure S2). In general, disabling the disulfide bonding, amino acidpropensity and Ramachandran terms individually only caused amoderate decrease in performance, but disabling the Dunbrack rotamer probability term alone led to a substantial decrease in the sequence recovery rate. More specifically, inclusion of the disulfidebonding term in EvoEF2 was found to be able to recover only about2-fold the number of cysteines recapitulated by the energy functionwith this term excluded. This improvement was not as large as weexpected, which is probably due to the strict geometries employedfor modeling disulfide bonding interactions and the absence ofnative-like cysteine rotamers in the non-expanded rotamer library(Shapovalov and Dunbrack, 2011). Furthermore, a plausible reasonfor the fact that the amino acid propensity and Ramachandran termshad a small effect on the designs was that their roles were likely tobe largely and implicitly considered by some other terms, such as thevan der Waals packing interactions in a local environment. TheDunbrack rotamer probability term was crucial for treatingrotamers with different side-chain conformations differently, andexclusion of this term caused a significant decrease in performanceand posed a severe challenge to the other physics-based energyterms. As expected, disabling the four terms simultaneously dramatically weakened the native sequence recapitulation performance andtherefore we concluded that the extended terms are important forprotein design.and RMSDs 4 Å to their native counterparts. All of the designsshared a sequence identity between 20% and 50% to their nativesequences; 33.7% (50/148) were located in the so-called ‘twilightzone’ (Rost, 1999) with sequence identities ranging from20% 30%, while the other 66.3% (98/148) would be more likelyto be recognized as sequence homologs to their corresponding naturally occurring sequences.In Figure 2, the TM-scores and RMSDs are illustrated as a function of sequence identity for the 148 test monomers, where 87.1%(129/148) of the designs had TM-scores 0.9 to their native structures. Alternatively, 87.8% (130/148) of the designs were predictedto have RMSDs 2 Å to their native structures, which is a reasonable upper bound for regarding a protein design case as successful(Dahiyat and Mayo, 1997; Kuhlman et al., 2003). The results presented here are much better than a previous protein design studyperformed using FoldX, where 77% of the 52 tested single-domainmonomers were recovered at an RMSD threshold of 2 Å (Bazzoliet al., 2011).Three examples are illustrated in Figure 3 that compare theI-TASSER predicted models to the native scaffolds. The designed sequence based on an outer membrane protein (PDB ID: 2FI9) sharedthe highest overall sequence identity (47.4%) to the native, and, asFig. 2. TM-scores (A) and RMSDs (B) of the predicted I-TASSER models to the native crystal structures as a function of sequence identity between the native sequences and those designed using EvoEF23.3 Foldability assessment of the designed sequencesAlthough native sequence recapitulation is an important metric forevaluating the performance of protein design algorithms (Alfordet al., 2017; Kuhlman and Baker, 2000; Leaver-Fay et al., 2013),high native sequence similarity does not always guarantee thedesigns are of high quality and foldable. To further examine the design quality, we used the state-of-the-art protein structure predictionsuite, I-TASSER (Yang et al., 2015), to test the foldability of thedesigned sequences and to examine how close the predicted modelswere to the native scaffold structures. The designed sequences withthe lowest EvoEF2 free energies for each of the aforementioned 148test monomers were modeled by I-TASSER in order to assess theirfoldability. A test protein was defined as foldable if the designed sequence was predicted to fold into a structure with a TM-score to thenative scaffold structure greater than a specified TM-score threshold, where a TM-score 0.5 indicates that two structures share asimilar fold topology (Xu and Zhang, 2010). Alternatively, RMSDwas also used to calculate the similarity between two structures(Bazzoli et al., 2011) and, generally, two structures share a similarfold when the RMSD is less than 4 Å. Supplementary Table S5presents the TM-scores and RMSDs between the I-TASSER modelsfor the designed sequences and their corresponding native scaffoldstructures for the 148 proteins. We found that all 148 designed proteins were predicted to fold into structures with TM-scores 0.5Fig. 3. Comparison of the native structures and the I-TASSER models of thedesigned sequences for three example proteins designed using EvoEF2Downloaded from stract/36/4/1135/5582267 by University of Michigan user on 26 May 20203.2 Importance of the new energy termsX.Huang et al.

Computational protein design3.4 Sequence design of NMR scaffoldsSince EvoEF2 performed very well on X-ray structures, it was alsoof great interest to examine its sequence design ability on NMRstructures, as there are many proteins that only have experimentallysolved NMR structures. To compare the sequence design performance of EvoEF2 on NMR and X-ray scaffolds, 29 monomer proteinscollected by Schneider et al. (Schneider et al., 2009) were selectedfor design, where all 29 proteins had both NMR and X-ray structures available. Here, it is worth mentioning that these structureshad sequence identities 30% to the proteins from the aforementioned training and test sets. The information for the 29 proteins ispresented in Supplementary Table S6, where each of them had morethan 10 NMR models. The free energy of the designs as a functionof the sequence identity between the designed and native sequencesfor all 29 structure pairs is illustrated in Supplementary Figure S3.For NMR structures, the sequence identities were widely distributed, from 5.5% (PDB ID: 1BC4), which was close to random, to ashigh as 35.3% (PDB ID: 1UF0). On average, the native sequence recovery rates were consistently higher for the X-ray structures(Supplementary Figure S4a), and the native amino acids were recapitulated less frequently when NMR structures were used as thescaffolds. Similar observations were reported for Rosetta by(Kuhlman and Baker, 2000) and (Schneider et al., 2009). Therefore,it seems that X-ray structures are preferred by Rosetta (Kuhlmanand Baker, 2000; Schneider et al., 2009) and EvoEF2. Nevertheless,for 6 out of the 29 cases, comparable or even higher recovery rateswere achieved for the best NMR models than the corresponding Xray scaffolds (Supplementary Figure S4b), suggesting that NMRstructures are not always bad templates for protein design(Schneider et al., 2009). Consequently, in cases where an X-raystructure is not available, an NMR structure should be tested as ascaffold candidate.3.5 Recapitulation of native PPI sequencesPPIs play important roles in the biological processes of cells, andnon-synonymous single nucleotide polymorphisms, especially thoseoccurring at protein interfaces, may cause various human diseases(Brender and Zhang, 2015; Xiong et al., 2017). Designing novelproteins/peptides targeting PPIs involved in diseases is of great value(Shultis et al., 2019), but progress in this field has not been extensively demonstrated due to difficulty in accurately modeling novelfunctions and interactions. In previous studies, most protein designalgorithms were optimized and tested using monomers, and thetransferability of an energy function optimized on monomers to PPImodeling is under debate. For instance, (Sharabi et al., 2011a, b)showed that the original ORBIT algorithm that was optimized formonomer design was not sufficiently good at recovering residues atprotein–protein interfaces and the reweighted algorithm optimizedusing dimer interfaces yielded better results for PPI design.However, (Kortemme et al., 2003) suggested that the Rosetta energyfunction optimized on monomers was generally applicable to theprediction of specificity for PPIs, as demonstrated by their testswhere, for the majority of the positions, the most frequently predicted amino acids were the naturally occurring residues. A limitation of these benchmark studies is that the PPI desi

Structural bioinformatics EvoEF2: accurate and fast energy function for computational protein design Xiaoqiang Huang 1, Robin Pearce1 and Yang Zhang1,2,* 1Department of Computational Medicine and Bioinformatics and 2Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA *To whom correspondence should be addressed.

Related Documents:

Structural Bioinformatics - Lehigh University

Structural bioinformatics adds scale and precision Structural Bioinformatics Structure Prediction Integrative Methods Molecular Simulation Structure Alignment Functional Site Comparison Docking . Lehigh University BioS 10: BioSciences in the 21st Century Brian Y. Chen Many computational fields support Structural Bioinformatics Structural

11 Views

1y ago

Bioinformatics Crash Course

Bioinformatics Crash Course Ian Misner Ph.D. Bioinformatics Coordinator UMD Bioinformatics Core . Bioinformatics!Core The Plan Monday – Introductions – Linux and Python Hands-on Training Tuesday – NGS Introduction – RNAseq with Sailfish (Dr. Steve Mount, CBCB) – RNAse

35 Views

2y ago

On Design and Implementation of a Bioinformatics Portal in ...

volumes of biological information in bioinformatics database. They also provide some bioinformatics tools for database search and data acquire. With the explosion of sequence information available to researchers, the challenge facing bioinformatics and computational biologists is to aid in biomedical researches and to invent efficient toolkits.

21 Views

3y ago

Bioinformatics - eng Marwa AR & Mariam - 0804 - ed. 2

Bioinformatics Bioinformatics is the combination of biology and information technology. The discipline encompasses any computational tools and methods used to manage, analyze and manipulate large sets of biological data. Essentially, bioinformatics has three components: The creation of databases allowing the storage and

14 Views

2y ago

Bioinformatics

Bioinformatics, Stellenbosch University Many bioinformatics tools and resources are available on the command-line interface These are often on the Linux platform (or other Unix-like platforms such as the Mac command line). They are essential for many bioinformatics and genomics applications.

38 Views

3y ago

ISSN 2347-2677 Advances and applications of Bioinformatics ...

Bioinformatics is an interdisciplinary area of the science composed of biology, mathematics and computer science. Bioinformatics is the application of information technology to manage biological data that helps in decoding plant genomes. The field of bioinformatics emerged as a tool to facilitate biological discoveries more than 10 years ago.

14 Views

2y ago

SPACE FOR BIOINFORMATICS. - JKU

tronics, Physics, Statistics, or Business Informatics. 8 LUM RAMABAJA Bachelor’s Student in Bioinformatics ‘Bioinformatics is a truly interesting field. The program has inspired me to apply what I have learned and help people by starting a company that diagnoses malaria.’ To The Point KRISTINA PREUER BSc MSc Graduate in Bioinformatics

40 Views

3y ago

Genomes DNA Genes to Proteins

DNA Genes to Proteins Kathleen Hill Lab Tour WSC 333. 2 The human genome is a multi-volume instruction manual The GENOME is a multi-volume instruction manual Each CHROMOSOME is a volume of text Genes are a chapter of text in the volume The text is written in a chemical language that has a four letter alphabet A,C,G,T NUCLEOTIDES Our instruction manual can be read in our DNA .

69 Views

3y ago

Recent Views

Family Law and You Booklet - lsc.sa.gov.au

FAMILY LAW AND YOU The Family Law Act is the main law that deals with divorce, disputes about children and property matters. All children are covered by the Family Law Act, no matter where in Australia they live or who their parents are. The courts that can make decisions under the Family Law Act are federal courts called Family Law Courts.

1y ago

147 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

752 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

505 Views

Case Law Update by Victor P. Valmus Family Law uarterly

Family Law uarterly Official Publication of the Cobb County Family Law Section The Cobb Case Law Update The Cobb Family Law uarterlyJune, 201 The Cobb Family Law Quarterly June, 2014 In this Edition Business Valuation and Reporting in Matrimonial Disputes by Marc L. Effron, CPA/ CFF, JD, CVA and Kevin P. Couillard, ASA, CFA

1y ago

119 Views

Board Beans Collection - BOARD BEANS - Board Beans

Catan Family 3 4 4 Checkers Family 2 2 2 Cherry Picking Family 2 6 3 Cinco Linko Family 2 4 4 . Lost Cities Family 2 2 2 Love Letter Family 2 4 4 Machi Koro Family 2 4 4 Magic Maze Family 1 8 4 4. . Top Gun Strategy Game Family 2 4 2 Tri-Ominos Family 2 6 3,4 Trivial Pursuit: Family Edition Family 2 36 4

2y ago

390 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

463 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

386 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

395 Views

Intermediate Law Law and You Worksheet 3: Australian law - Home Affairs

4. There are different kinds of law to deal with different kinds of problems. Four important kinds of law are civil law, criminal law, family law and administrative law. Civil law deals with disputes between individuals; for example, if someone sells you goods that are faulty, or that cause you injury or damage, you can take that person to court.

4m ago

116 Views

What is Family Law? - Courts and Tribunals Judiciary

What is family law? After all, the law of inheritance is usually thought of as a branch of property law and thus a matter for the Chancery rather than the Family Division. And family 1 Changing families: family law yesterday, today and tomorrow - a view from south of the Border [2018] Fam Law 538, 542-3.

1y ago

133 Views

Domestic Violence and Family Law in Papua New Guinea

Family law in PNG Family law deals with issues relating to family and domestic relationships. Major topics covered by family law include marriage, divorce, child maintenance, prop - erty claims following separation and the custody and adoption of children (Jessep and Luluaki 1985:11). Much of PNG's family law legislation was adopted as

1y ago

131 Views

Faculty of Juridical, Social and Political Sciences Year .

Law L Law IV 8 Drept procesual civil II / Civil Procedure Law II 5 Law L Law IV 8 Dreptul comerțului internațional / International ommercial Law 4 Law L Law IV 8 riminalistică / Forensics 4 Law L Law IV 8 Practică de cercetare pentru elaborarea lucrării de lincență(3 săptămân

2y ago

391 Views

Ohm ’s Law

Ohm ’s Law Ohm's law states that, in an electrical circuit, the current passing through most materials is directly proportional to the potential difference applied across them. 3-1—3-3: Ohm ’s Law Formulas There are three forms of Ohm’s Law: I V/R V IR R V/I where:File Size: 1MBPage Count: 40Explore furtherOhm's Law Quiz MCQs with Answers Ohm Lawohmlaw.comOhm’s Law Worksheet - Basic Electricity - All About omohms law worksheet - eering.orgOhm’s Law Worksheet - Richmond County School Systemwww.rcboe.orgOhm's Law with Examples - Physics Problems with Solutions ended to you b

2y ago

302 Views

Family Law for the Future — An Inquiry into the Family Law .

Review of the Family Law System On 27 September 2017, the Australian Law Reform Commission received Terms of Reference to undertake an inquiry into the family law system. On behalf of the Members of the Commission involved in this Inquiry, and in accordance with the Australian Law

3y ago

142 Views

Practice Material - Family - Law Society of British Columbia

The Law Society's . Report of the Family Law Task Force: Best Practice Guidelines for Law-yers Practising Family Law. Family law has undergone significant changes over the past several years, and more changes are underway. 2. It is important to verify that your legal knowledge and re-sources are current. For example, note these changes:

1y ago

129 Views

Structural Bioinformatics EvoEF2: Accurate And Fast Energy Function For .

It looks like you're using an ad-blocker