Make Parallel Programs Reliable With Stable Multithreading

3y ago
15 Views
2 Downloads
1,011.36 KB
9 Pages
Last View : 1m ago
Last Download : 2m ago
Upload by : Abram Andresen
Transcription

Make Parallel Programs Reliable with Stable MultithreadingJunfeng Yang, Heming Cui, Jingyue Wu, John Gallagher, Chia-Che Tsai, Huayang Guo, Yang Tang, Gang HuDepartment of Computer ScienceColumbia UniversityABSTRACTOur accelerating computational demand and the rise of multicorehardware have made parallel programs increasingly pervasive andcritical. Yet, these programs remain extremely difficult to write,test, analyze, debug, and verify. In this article, we provide our viewon why parallel programs, specifically multithreaded programs, aredifficult to get right. We present a promising approach we call stablemultithreading to dramatically improve reliability, and summarizeour last four years’ research on building and applying stable multithreading systems.1IntroductionReliable software has long been the dream of many researchers,practitioners, and users. In the last decade or so, several researchand engineering breakthroughs have greatly improved the reliability of sequential programs (or the sequential aspect of parallel programs). Successful examples include Coverity’s source code analyzer [7], Microsoft’s Static Driver Verifier [3], Valgrind memorychecker [17], and certified operating systems and compilers [20].However, the same level of success has not yet propagated to parallel programs. These programs are notoriously difficult to write,test, analyze, debug, and verify, much harder than the sequential versions. Experts consider reliable parallelism “something of ablack art” [9] and one of the grand challenges in computing [1, 18].Unsurprisingly, widespread parallel programs are plagued with insidious concurrency bugs [15], such as data races (concurrent accesses to the same memory location with at least one write) anddeadlocks (threads circularly waiting for resources). Some worst ofthese bugs have killed people in the Therac 25 incidents and causedthe 2003 Northeast blackout. Our study also reveals that these bugsmay be exploited by attackers to violate confidentiality, integrity,and availability of critical systems [24].In recent years, two technology trends have made the challengeof reliable parallelism more urgent. The first is the rise of multicore hardware. The speed of a single processor core is limited byfundamental physical constraints, forcing processors into multicoredesigns. Thus, developers must resort to parallel code for best performance on multicore processors. The second is our acceleratingcomputational demand. Many physical-world computations, suchas search and social networking, are now computerized and hostedin the cloud. These computations are massive, run on the “big data,”and serve millions of users and devices. To cope with these massivecomputations, almost all services employ parallel programs to boostperformance.If reliable software is an overarching challenge of computer science, reliable parallelism is surely the keystone. This keystone challenge has drawn decades of research efforts, producing numerousideas and systems, ranging from new hardware, new programminglanguages, new programming models, to tools that detect, debug,avoid, and fix concurrency bugs. As usual, new hardware, languages, or models take years, if not forever, to adopt. Tools arehelpful, but they tend to attack derived problems, not the root cause.Over the past four years, we have been attacking fundamental, open problems in making shared-memory multithreaded programs reliable. We target these programs because they are themost widespread type of parallel programs. Many legacy programs,such as the Apache web server and the MySQL database, are multithreaded. New multithreaded programs are being written every day.This prevalence of multithreading is unsurprising given its widesupport from many platforms (e.g., Linux, Windows, and MacOS)and languages (e.g., Java and C 11). For the same reasons, we believe multithreading will remain prevalent in the foreseeable future.Unlike sequential programs, multithreaded programs are nondeterministic: repeated executions of the same multithreaded programon the same input may yield different (e.g., correct v.s. buggy)behaviors, depending on how the threads interleave. Conventionalwisdom has long blamed this nondeterminism for the challengesin reliable multithreading [13]. For instance, testing becomes lesseffective: a program may run correctly on an input in the testinglab because the interleavings tested happen to be correct, but executions on the same exact input may still fail in the field whenthe program hits a buggy, untested interleaving. To eliminate nondeterminism, several groups of brilliant researchers have dedicated significant effort to build deterministic multithreading systems [2, 4, 6, 8, 12, 14, 19].Unfortunately, as explained later in this article, nondeterminismis responsible for only a small piece of the puzzle. Its cure, determinism, is neither sufficient nor necessary for reliability. Worse,determinism sometimes harms reliability rather than improves it.We believe the challenges in reliable multithreading have a ratherquantitative root cause: multithreaded programs have too many possible thread interleavings, or schedules. For complete reliability, allschedules must be correct. Unfortunately, ensuring so requires agreat deal of resources and efforts simply because the set of schedules is just enormous. For instance, testing all possible schedulesdemands astronomical CPU time. (See §2 for detailed discussion.)We attacked this root cause by asking: are all the exponentiallymany schedules necessary? Our study reveals that many real-worldprograms can use a small set of schedules to efficiently process awide range of inputs [10]. Leveraging this insight, we envision anew approach we call stable multithreading (SMT) that exponentially reduces the set of schedules for reliability. We have built threesystems: T ERN [10] and P EREGRINE [11], two compiler and runtime systems to address key challenges in implementing SMT systems; and an effective program analysis framework atop SMT todemonstrate key benefits of SMT [22]. These systems are designedto be compatible with existing hardware, operating systems, thread

libraries, and programming languages to simplify adoption. Theyare also mostly transparent to developers to save manual labor. Weleave it for future work to create new SMT programming modelsand methods that help active development.Our initial results with these systems are promising. Evaluationon a diverse set of widespread multithreaded programs, includingApache and MySQL, show that T ERN and P EREGRINE dramaticallyreduce schedules. For instance, they reduce the number of schedules needed by parallel compression utility PBZip2 down to twoschedules per thread count, regardless of the file contents. Theiroverhead is moderate, less than 15% for most programs. Our program analysis framework enables the construction of many programanalyses with precision and coverage unmatched by their counterparts. For instance, a race detector we built found previously unknown bugs in extensively checked code with almost no false bugreports.In the remaining of this article, we first present our view on whymultithreaded programs are hard to get right. We then describe ourSMT approach, its benefits, and the three SMT systems we built.We finally present some results and conclude.2Why Are Multithreaded Programs So Hardto Get Right?We start with preliminaries, then describe the challenges caused bynondeterminism and by too many schedules. We then explain whynondeterminism is a lesser cause than too many schedules.2.1Preliminaries: Inputs, Schedules, and Buggy SchedulesTo ease discussion, we use input to broadly refer to the data a program reads from its execution environment, including not only thedata read from files and sockets, but also command line arguments,return values of external functions such as gettimeofday, and anyexternal data that can affect program execution. We use schedule tobroadly refer to the (partially or totally) ordered set of communication operations in a multithreaded execution, including synchronizations (e.g., lock and unlock operations) and shared memoryaccesses (e.g., load and store instructions to shared memory). Ofall the schedules, most are correct, but some are buggy and causevarious failures such as program crashes, incorrect computations,and deadlocked executions. Consider the toy program below:// thread 1lock(l);*p . . .;unlock(l);// thread 2lock(l);p NULL;unlock(l);The schedule in which thread 2 gets the lock before thread 1 isbuggy and causes a dereference-of-NULL failure. Consider anotherexample. The toy program below has data races on balance:// thread 1// deposit 100t balance 100;balance t;// thread 2// withdraw 100balance balance 100;The schedule with the statements executed in the order shown isbuggy and corrupts balance.2.2Challenges Caused by NondeterminismA multithreaded program is nondeterministic because even with thesame program and input, different schedules may still lead to different behaviors. For instance, the two toy programs in the previoussubsection do not always run into the bugs. Except the schedulesdescribed, all their other schedules lead to correct executions.This nondeterminism appears to raise many challenges, especially in testing and debugging. Suppose an input can execute undern schedules. Testing n 1 schedules is not enough for complete reliability because the single untested schedule may still be buggy.An execution in the field may hit this untested schedule and fail.Debugging is challenging, too. To reproduce a field failure for diagnosis, the exact input alone is not enough. Developers must alsomanage to reconstruct the buggy schedule out of n possibilities.Figure 1a depicts the traditional multithreading approach. Conceptually, it is a many-to-many mapping, where one input may execute under many schedules because of nondeterminism, and manyinputs may execute under one schedule because a schedule fixes theorder of the communication operations but allows the local computations to operate on any input data.2.3Challenges Caused by Too Many SchedulesTo make a multithreaded program completely reliable, we must ensure that no schedules are buggy, either manually by thinking reallyhard or automatically by applying tools. Either way, the numberof the schedules determines the amount of efforts and resourcesneeded.Unfortunately, a typical multithreaded program has an enormousnumber of schedules. For a single input, the number of schedulesis asymptotically exponential in the schedule length. For instance,given n threads competing for a lock, each order of lock acquisitions forms a schedule, easily yielding n! total schedules. (A lockimplementation that grants locks to threads based on the arrival order, such as a ticket lock, does not reduce the set of schedules, because the threads may arrive at the lock operations in n! differentorders.) Aggregated over all inputs, the number of schedules is evengreater.Finding a few buggy schedules in these exponentially manyschedules thus raises a series of “needle-in-a-haystack” challenges.For instance, to write correct multithreaded programs, developersmust carefully synchronize their code to weed out the buggy schedules. As usual, humans err when they must scrutinize many possibilities to locate corner cases. Various forms of testing tools suffer,too. Stress testing is the common method for (indirectly) testingschedules, but it often redundantly tests the same schedules repeatedly while missing others. Recent work made testing more powerfulby systematically enumerating through schedules for bugs [16], butwe seriously lack resources to cover more than a tiny fraction of allexponentially many schedules.2.4Determinism is OverratedResearchers have built several systems to make multithreading deterministic, hoping to address the challenges raised by nondeterminism. Yet, little has been done to solve the needle-in-a-haystackchallenges caused by too many schedules. We believe the community has charged nondeterminism more than its share of the guiltand overlooked the main culprit that multithreaded programs simply have too many schedules. We argue that determinism, the cureof nondeterminism, is neither sufficient nor necessary for reliability.Determinism 6reliability. Determinism provides only a narrow type of repeatability: it guarantees that executions are repeatable if both the program and the input remain exactly the same,and has no jurisdiction otherwise. However, developers often expect repeatabilities for executions on slightly varied programs orinputs. For instance, adding a printf statement purely for debugging should in principle not make the bug disappear. Unfortunately,determinism does not provide these expected repeatabilities, andsometimes even undermines them [2, 5, 10].To illustrate, consider some existing deterministic systems thatforce a multithreaded program to always use the same—but

InputsSchedules(a) Traditional.InputsSchedules(b) Deterministic.InputsSchedules(c) Stable (deterministic).InputsSchedules(d) Stable (nondeterministic).Figure 1: Different multithreading approaches. Red stars represent buggy schedules.arbitrary—schedule to process the same input. Figure 1b depictssuch a system. This arbitrary mapping destabilizes program behaviors over multiple inputs. A slight input change, as slight asa bit flip, may force a program to discard a correct schedule and(ad)venture into a vastly different, buggy schedule.This instability is counterintuitive at least, and actually raisesnew reliability challenges. For instance, as confirmed in our experiments [10], testing one input provides little assurance on verysimilar inputs, despite that the input differences do not invalidatethe tested schedule. Debugging now requires every bit of the buginducing input, including not only the data a user typed, but alsoenvironment variables, shared libraries, etc. A different user name?Error report doesn’t include credit card numbers? The bug maynever be reproduced, regardless of how many times developersretry, because the schedule chosen by the deterministic system forthe altered input happens to be correct. In addition to inputs, thesedeterministic systems destabilize program behaviors over minorcode changes as well, so adding a printf for debugging may causethe bug to deterministically disappear.Another problem with an arbitrary mapping as in Figure 1b is thatthe number of all possible schedules remains enormous (asymptotically as enormous as the minimum of (1) the number of all possible inputs and (2) the number of all possible schedules). Thus,the needle-in-a-haystack challenges remain. For instance, testingall schedules may now require testing all inputs, another difficultchallenge we have no idea how to solve.For those curious minds, deterministic multithreading systemsmay be implemented in several ways. A frequent scheme is toschedule threads based on logical clocks [4, 19], instead of physical clocks which change nondeterministically across executions.Specifically, each thread maintains a logical clock that ticks basedon the code the thread has run. For instance, if a thread has completed 50 load instructions, tick its clock by 50. Moreover, threadscommunicate only when their logical clocks have deterministic values (e.g., the smallest across the logical clocks of all threads [19]).In short, local executions tick logical clocks deterministically, andlogical clocks force threads to communicate deterministically. Byinduction, a multithreaded execution becomes deterministic. It isstraightforward to see that a slight input change or an additionalprintf statement may change the number of completed load instructions, thus altering the schedule and destabilizing program behaviors.Reliability 6determinism. Determinism is not necessary forreliability because a nondeterministic system with a small set ofschedules can be made reliable easily. Consider an extreme case.The system depicted in Figure 1d is nondeterministic because itmaps some inputs to more than one schedules. However, each inputmaps to at most two schedules, so the challenges caused by nondeterminism (§2.2) are easy to solve. For instance, to reproduce afield failure given an input, developers can easily afford to searchfor one out of only two schedules.3Shrink the Haystack with Stable MultithreadingMotivated by the limitations of determinism and the needle-in-ahaystack challenges caused by exponentially many schedules, weinvestigated a central research question: are all the exponentiallymany schedules necessary? A schedule is necessary if it is the onlyone that can (1) process specific inputs or (2) yield good performance under specific scenarios. Removing unnecessary schedulesfrom the haystack would make the needles easier to find.We investigated this question on a diverse set of popular multithreaded programs, ranging from server programs such as Apache,to desktop utilities such as the aforementioned PBZip2, to parallelimplementations of computation-intensive algorithms such as fastFourier transformation. These programs use diverse synchronization primitives such as locks, semaphores, condition variables, andbarriers. Our investigation reveals two insights:First, for many programs, a wide range of inputs share the sameequivalent class of schedules. Thus, one schedule out of the classsuffices to process the entire input range. Intuitively, an input oftencontains two types of data: (1) metadata that controls the communication of the execution, such as the number of threads to spawn;and (2) computational data that the threads locally compute on. Aschedule fixes the metadata in the input, but it allows the computational data to vary. Thus, it can process any input that has the samemetadata. For instance, consider the aforementioned PBZip2 whichsplits an input file among multiple threads, each compressing onefile block. The communication, i.e., which thread gets which fileblock, is independent of the thread-local compression. As long asthe number of threads remains the same, PBZip2 can use two schedules (one if the file can be evenly divided and another otherwise) tocompress any file, regardless of the file data.This loose coupling of inputs and schedules is not unique toPBZip2; many other programs also exhibit this property. Table 1shows a sample of our findings. The programs shown includethree real-world programs, Apache, PBZip2, and aget (a parallel file download utility) and five implementations of computationintensive algorithms from two widely used benchmark suites, Stanford’s SPLASH2 and Princeton’s PARSEC.Second, the overhead of enforcing a schedule on different inputsis low. Presumably, the exponentially many schedules allow theruntime system to react to various timing factors and select a mostefficient schedule. However, results from the SMT systems we built

lesswaptionsPurposeWeb serverCompressionFile downloadN-body simulationFast Fourier transformMatrix decompositionOption pricingSwaption pricingConstraints on inputs sharing schedulesFor a group of typical HTTP GET requests, same cache statusSame number of threadsSame number of threads, similar file sizesSame number of threads, same values of two configuration variablesSame number of threadsSame number of threads, similar sizes of matrices and blocksSame number of threads, number of options no less than number of threadsSame number of threads, number of swaptions no less than number of threadsTable 1: Constraints on inputs sharing the same equivalent class of schedules. For each program listed, one schedule out of the class suffices to process anyinput satisfying the constraints shown in the third column.invalidated this presumption. With carefully designed schedule representations (§4.2), our systems incurred less than 15% overheadenforcing schedules for most evaluated programs (§6). We bel

Our accelerating computational demand and the rise of multicore hardware have made parallel programs increasingly pervasive and critical. Yet, these programs remain extremely difficult to write, test, analyze, debug, and verify. In this article, we provide our view on why parallel programs, specifically multithreaded programs, are

Related Documents:

Series-Parallel Circuits If we combined a series circuit with a parallel circuit we produce a Series-Parallel circuit. R1 and R2 are in parallel and R3 is in series with R1 ǁ R2. The double lines between R1 and R2 is a symbol for parallel. We need to calculate R1 ǁ R2 first before adding R3.

The Series-Parallel Network In this circuit: R 3 and R 4 are in parallel Combination is in series with R 2 Entire combination is in parallel with R 1 Another example: C-C Tsai 4 Analysis of Series-Parallel Circuits Rules for analyzing series and parallel circuits apply: Same current occurs through all series elements

Series and Parallel Circuits Basics 3 5) Click the advanced tab and alter the resistivity of the wire. Record your observations. Click the reset button to begin working on a parallel circuit. Parallel Circuits 6) Parallel circuits provide more than one path for electrons to move. Sketch a parallel circuit that includes

quence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation.

as: wall clock of serial execution - wall clock of parallel execution Parallel Overhead - The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel overhead can include factors such as: 1) Task start-up time 2) Synchronizations 3) Data communications Software overhead imposed by parallel compilers,

In the heterogeneous soil model, OpenMP parallel optimization is used for multi-core parallelism implementation [27]. In our previous work, various parallel mechanisms have been introduced to accelerate the SAR raw data simulation, including clouding computing, GPU parallel, CPU parallel, and hybrid CPU/GPU parallel [28-35].

Your parallel port scanner connects to any available parallel (LPT) port. Check your computer 's manual for the parallel port locations. To connect the parallel port scanner: 1. Save any open files, then shut down the power to your computer. 2. If a printer cable is attached to your computer's parallel port, unplug the cable from the computer.

The Lockdown Haircut by Barbara Henderson 18 The Worst Birthday Ever by Maisie Chan 20 . And a world that’s still full of hope and love. Diana Hendry Looking Up LOCKdown Life. 2 3 A virus is a very small piece of organic matter, far smaller than bacteria. Viruses are not really living things. On their own they cannot grow or reproduce. When viruses infect a plant or animal, they get inside .