Stochastic Simulation Of The Kinetics Of Multiple .

3y ago
20 Views
2 Downloads
3.86 MB
104 Pages
Last View : 22d ago
Last Download : 3m ago
Upload by : Mia Martinelli
Transcription

Stochastic Simulation of the Kinetics of Multiple InteractingNucleic Acid StrandsThesis byJoseph Malcolm SchaefferIn Partial Fulfillment of the Requirementsfor the Degree ofDoctor of PhilosophyCalifornia Institute of TechnologyPasadena, California2013(Defended September 28, 2012)

iic2013Joseph Malcolm SchaefferAll Rights Reserved

iiiAcknowledgementsThanks to my advisor Erik Winfree, for his enthusiasm, expertise, and encouragement.The models presented here are due in a large part to helpful discussions with Niles Pierce,Robert Dirks, Justin Bois, and Victor Beck. Two undergraduates, Chris Berlind and JoshuaLoving, did summer research projects based on Multistrand and in the process helpedbuild and shape the simulator. There are many people who have used Multistrand andprovided very helpful feedback for improving the simulator, especially Josh Bishop, NadineDabby, Jonathan Othmer, and Niranjan Srinivas. Nadine Dabby was also invaluable forher feedback and discussions while writing the thesis. Thanks also to the many past andcurrent members of the DNA and Natural Algorithms group for providing a stimulatingenvironment in which to work.Funding for this work was provided by National Science Foundation grants DMS-0506468and CCF-0832824, and the Gordon and Betty Moore Foundation through the Caltech Programmable Molecular Technology Initiative.There are many medical professionals to which I owe my good health while writing thisthesis, especially Dr. Jeanette Butler, Dr. Mariel Tourani, Cathy Evaristo, and the staff ofthe Caltech Health Center, especially Alice, Divina, and Jeannie.I want to acknowledge all my family and friends for their support. A journey is madeall the richer for having good company, and I would not have made it nearly as far withoutall the encouragement.Finally, I must thank my wife Lorian, who has been with me every step of this journeyand has shared all the high points and low points with her endless love and support.

ivAbstractDNA nanotechnology is an emerging field which utilizes the unique structural properties ofnucleic acids in order to build nanoscale devices, such as logic gates, motors, walkers, andalgorithmic structures. Predicting the structure and interactions of a DNA device requiresgood modeling of both the thermodynamics and the kinetics of the DNA strands withinthe system. The kinetics of a set of DNA strands can be modeled as a continuous timeMarkov process through the state space of all secondary structures. The primary meansof exploring the kinetics of a DNA system is by simulating trajectories through the statespace and aggregating data over many such trajectories.We expand on previous work by extending the thermodynamics and kinetics models tohandle multiple strands in a fixed volume, and show that the new models are consistentwith previous models. We developed data structures and algorithms that allow us to takeadvantage of local properties of secondary structure, improving the efficiency of the simulator so that we can handle larger systems. The new kinetic parameters in our modelwere calibrated by analyzing simulator results on experimental systems that measure basickinetic rates of various processes. Finally, we apply the new simulator to explore a casestudy on toehold-mediated four-way branch migration.

vContentsAcknowledgementsiiiAbstractivList of FiguresviiiList of Tablesx1 Introduction12 System42.1Strands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42.2Complex Microstate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52.3System Microstate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53 Energy73.1Energy of a System Microstate . . . . . . . . . . . . . . . . . . . . . . . . .73.2Energy of a Complex Microstate . . . . . . . . . . . . . . . . . . . . . . . .83.3Computational Considerations . . . . . . . . . . . . . . . . . . . . . . . . .103.4Choice of Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114 Kinetics134.1Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134.2Unimolecular Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . .154.3Bimolecular Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .164.4Transition Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174.5Unimolecular Rate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

vi4.6Bimolecular Rate Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .195 Thermodynamic Equivalence Between the Multistrand and NUPACKModels215.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215.2Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .235.3Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .245.45.3.1Qkinand Qjj. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .245.3.2Composing Qkin from Qkin. . . . . . . . . . . . . . . . . . . . . . .j27Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 The Simulator: Multistrand6.16.26.332Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .326.1.1Energy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .326.1.2The Current State: Loop Structure . . . . . . . . . . . . . . . . . . .336.1.3Reachable States: Moves . . . . . . . . . . . . . . . . . . . . . . . . .35Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .366.2.1Move Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .376.2.2Move Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .396.2.3Move Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .406.2.4Energy Computation . . . . . . . . . . . . . . . . . . . . . . . . . . .41Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .427 Multistrand: Output and Analysis7.13045Trajectory Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .457.1.1Testing: Energy Model . . . . . . . . . . . . . . . . . . . . . . . . . .497.1.2Testing: Kinetics Model . . . . . . . . . . . . . . . . . . . . . . . . .49Macrostates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .507.2.1Common Macrostates . . . . . . . . . . . . . . . . . . . . . . . . . .537.3Transition Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .557.4First Passage Time Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . .587.4.1Comparing Sequence Designs . . . . . . . . . . . . . . . . . . . . . .617.4.2Systems with Multiple Stop Conditions . . . . . . . . . . . . . . . .637.2

vii7.57.6Fitting Chemical Reaction Equations . . . . . . . . . . . . . . . . . . . . . .647.5.1Fitting Full Simulation Data to the kef f Model . . . . . . . . . . . .66First Step Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .677.6.1Fitting the First Step Model . . . . . . . . . . . . . . . . . . . . . .677.6.2Analysis of First Step Model Parameters . . . . . . . . . . . . . . . .688 Calibration of Kinetic Parameters718.1kuni Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .718.2kbi Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .738.3Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .768.3.1Choice of Experimental Paper for Determining kuni. . . . . . . . .768.3.2Nonlinearity of kbi Calibrations . . . . . . . . . . . . . . . . . . . . .778.3.3Other Substrates . . . . . . . . . . . . . . . . . . . . . . . . . . . . .789 Case Study: Toehold-Mediated Four-Way Branch Migration799.1Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .799.2Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .799.3Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .819.4Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84A Strand Orderings for Pseudoknot-Free RepresentationsA.1 Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bibliography869091

viiiList of Figures3.1Secondary Structure Loop Decomposition . . . . . . . . . . . . . . . . . . . .94.1Adjacent Microstate Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . .146.1Representation of Secondary Structures . . . . . . . . . . . . . . . . . . . . .346.2Move Data Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .366.3Move Update Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .396.4Move Generation Example . . . . . . . . . . . . . . . . . . . . . . . . . . . .416.5Full Comparison vs Kinfold 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . .437.1Trajectory Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .467.2Three-Way Branch Migration System . . . . . . . . . . . . . . . . . . . . . .477.3Trajectory Output after 0.01 s Simulated Time . . . . . . . . . . . . . . . . .487.4Trajectory Output after 0.05 s Simulated Time . . . . . . . . . . . . . . . . .497.5Example Macrostate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .527.6Hairpin Folding Pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . . .557.7First Passage Time Data, Design B . . . . . . . . . . . . . . . . . . . . . . .607.8First Passage Time Data, Design A . . . . . . . . . . . . . . . . . . . . . . .627.9First Passage Time Data, Sequence Design Comparison . . . . . . . . . . . .627.10First Passage Time Data, 6 Base Toeholds . . . . . . . . . . . . . . . . . . .637.11Starting Complexes and Strand Labels . . . . . . . . . . . . . . . . . . . . . .657.12Final Complexes and Strand Labels . . . . . . . . . . . . . . . . . . . . . . .658.1Zippering Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .729.1Four-Way Branch Migration Mechanism . . . . . . . . . . . . . . . . . . . . .809.2Toehold-Mediated Four-Way Branch Migration Parameterization. . . . . . .81

ix9.3Four-Way Branch Migration Mechanism, Start and Stop States . . . . . . . .829.4Bimolecular Success Rate vs Total Toehold Length . . . . . . . . . . . . . . .849.5Comparison of Experimental and Simulated Rates for Toehold Mediated FourWay Branch Migration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85A.1Polymer Graph Representation . . . . . . . . . . . . . . . . . . . . . . . . . .87A.2Polymer Graph Changes (Break Move) . . . . . . . . . . . . . . . . . . . . .88A.3Polymer Graph Changes (Join Move) . . . . . . . . . . . . . . . . . . . . . .89

xList of Tables7.1Two Branch Migration Sequences . . . . . . . . . . . . . . . . . . . . . . . .487.2Distance Metric Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . .537.3Transition States in Hairpin Pathway . . . . . . . . . . . . . . . . . . . . . .567.4Transition Pathways via Transition Mode Simulation . . . . . . . . . . . . . .577.5Transition Pathway Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .577.6Transition Pathway Statistics, 100 Trajectories . . . . . . . . . . . . . . . . .587.7First Passage Time Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .608.1Average f ( Gstep ) for Forward Zippering Steps . . . . . . . . . . . . . . . .728.2Calibrated kuni Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . .738.3Bimolecular Association Rate khyb Parameters . . . . . . . . . . . . . . . . .748.4Calibrated kbi Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . .758.5Comparison of Calibrated Parameters . . . . . . . . . . . . . . . . . . . . . .789.1Sequences for Four-Way Branch Migration Domains . . . . . . . . . . . . . .839.2Toehold-Mediated Branch Migration, Raw Simulation Results . . . . . . . . .84

1Chapter 1IntroductionDNA nanotechnology is an emerging field that utilizes the unique structural properties of nucleic acids in order to build nanoscale devices, such as logic gates [23], motors [4, 1], walkers[24, 1, 26], and algorithmic structures [18, 31]. These devices are built out of DNA strandswhose sequences have been carefully designed in order to control their secondary structure—the hydrogen bonding state of the bases within the strand (called “base-pairing”). Thisbase-pairing is used to not only control the physical structure of the device, but also to enable specific interactions between different components of the system, such as allowing, forexample, a DNA walker to take steps along a prefabricated track. Predicting the structureand interactions of a DNA device requires good modeling of both the thermodynamics andthe kinetics of the DNA strands within the system. Thermodynamic models can be usedto make equilibrium predictions for these systems, allowing us to look at questions like “Isthe walker-track interaction a well-formed and stable molecular structure?”, while kineticsmodels allow us to predict the non-equilibrium dynamics, such as “How quickly will thewalker take a step?” While the thermodynamics of multiple interacting DNA strands is awell-studied model [6], which allows for both analysis and design of DNA devices [34, 7],previous work on secondary structure kinetics models only explored the kinetics of how asingle strand folds on itself [8].The kinetics of a set of DNA strands can be modeled as a continuous time Markov process through the state space of all secondary structures. Due to the exponential size of thisstate space it is computationally intractable to obtain an analytic solution for most problemsizes of interest. Thus the primary means of exploring the kinetics of a DNA system is bysimulating trajectories through the state space and aggregating data over many such trajectories. We present here the Multistrand kinetics simulator, which extends the previous

2work [8] by using the multiple strand thermodynamics model [6] (a core component for calculating transition rates in the kinetics model), adding new terms to the thermodynamicsmodel to account for stochastic modeling considerations, and by adding new kinetic movesthat allow bimolecular interactions between strands. Furthermore, we prove that this newkinetics and thermodynamics model is consistent with the prior work on multiple strandthermodynamics models [6].The Multistrand simulator is based on the Gillespie algorithm [9] for generating statistically correct trajectories of a stochastic Markov process. We developed data structuresand algorithms that take advantage of local properties of secondary structures. These algorithms enable the efficient reuse of the basic objects that form the system, such that only avery small part of the state’s neighborhood information needs to be recalculated with everystep. A key addition was the implementation of algorithms to handle the new kinetic stepsthat occur between different DNA strands, without increasing the time complexity of theoverall simulation. These improvements lead to a reduction in worst case time complexityof a single step and also led to additional improvements in the average case time complexity.What data does the simulation produce? At the very simplest, the simulation producesa full kinetic trajectory through the state space—the exact states it passed through, andthe time at which it reached them. A small system might produce trajectories that passthrough hundreds of thousands of states, and that number increases rapidly as the systemgets larger. Going back to our original question, the type of information a researcher hopesto get out of the data could be very simple: “How quickly does the walker take a step?”,with the implied question of whether it’s worth it to actually purchase the particular DNAstrands composing the walker to perform an experiment, or go back to the drawing boardand redesign the device. One way to acquire that type of information is to look at the firsttime in the trajectory where we reached the “walker took a step” state, and record thatinformation for a large number of simulated trajectories in order to obtain a useful answer.We designed and implemented new simulation modes that allow the full trajectory data tobe condensed as it’s generated into only the pieces the user cares about for their particularquestion. This analysis tool also required the development of flexible ways to talk aboutstates that occur in trajectory data; if someone wants data on when the walker took a step,we have to be able to express that in terms of the Markov process states which meet thatcondition.

3Chapters 1, 2, 3, 4, 6, 7, and Appendix A originally appeared in my Master’s thesis.Chapter 5 is a completely new proof of equivalence between the thermodynamics model wedevelop in Chapter 3 and the NUPACK model. Chapter 8 discusses how we calibrate thekinetics parameters kuni and kbi which were introduced in Chapter 4. Chapter 9 is a casestudy on using the simulator to explore a toehold-mediated four-way branch migration. Theexperimental data k1f it in Chapter 9 is from Dabby, et al. [5] on which I am a co-author.I performed simulations using Multistrand for that work, which appear in that paper andare presented in more detail here.Found in the Master’s thesis but not here are two appendices which describe the softwaredesign for the data structures and algorithms used in simulator, as well as a different proofof equivalence between our thermodynamics model and the NUPACK model.

4Chapter 2SystemWe are interested in simulating nucleic acid molecules (DNA or RNA) in a stochastic regime;that is to say that we have a discrete number of molecules in a fixed volume. This regime isfound in experimental systems that have a small volume with a fixed count of each moleculepresent, such as the interior of a cell. We can also apply this to experimental systems witha larger volume (such as a test tube) when the system is well mixed, as we can pick a fixed(small) volume and deal with the expected counts of each molecule within it, rather thanthe whole test tube.To discuss the modeling and simulation of the system, we need to be very careful todefine the components of the system, and what comprises a state of the system within thesimulation.2.1StrandsEach DNA molecule to be simulated is represented by a strand. Our system then containsa set of strands Ψ , where each strand s Ψ is defined by s (id, label, sequence). Astrand’s id uniquely identifies the strand within the system, while the sequence is the orderedlist of nucleotides that compose the strand.Two strands could be considered identical if they have the same sequence. However, insome cases it is convenient to make a distinction between strands with identical sequences.For example, if one strand were to be labeled with a fluorophore, it would no longer bephysically identical to another with the same sequence but no fluorophore. Thus, the labelis used to designate whether two strands are identical. We define two strands as beingidentical if they have the same labels and sequences. In most cases this distinction between

5the label and the sequence is not used, so it will be explicitly noted when it is important.2.2Complex MicrostateA complex is a set of strands connected by base pairing (secondary structure). We define thestate of a complex by c (ST, π , BP ), called the “complex microstate”. The componentsare a nonempty set of strands ST Ψ , an ordering π on the strands ST , and a list ofbase pairings BP {(ij · kl ) base i on strand j is paired to base k on strand l, and j l,with i k if j l}, where we note that “strand l” refers to the strand occurring in positionl in the ordering π . Note that we require a complex to be “connected”: there is no propersubset of strands in the complex for which the base pairings involving those strands do notinvolve at least one base outside that subset.

There are many people who have used Multistrand and provided very helpful feedback for improving the simulator, especially Josh Bishop, Nadine Dabby, Jonathan Othmer, and Niranjan Srinivas. Nadine Dabby was also invaluable for her feedback and discussions while writing the thesis. Thanks also to the many past and current members of the DNA and Natural Algorithms group for providing a .

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Jul 09, 2010 · Stochastic Calculus of Heston’s Stochastic–Volatility Model Floyd B. Hanson Abstract—The Heston (1993) stochastic–volatility model is a square–root diffusion model for the stochastic–variance. It gives rise to a singular diffusion for the distribution according to Fell

are times when the fast stochastic lines either cross above 80 or below 20, while the slow stochastic lines do not. By slowing the lines, the slow stochastic generates fewer trading signals. INTERPRETATION You can see in the figures that the stochastic oscillator fluctuates between zero and 100. A stochastic value of 50 indicates that the closing