Rimoire: Synthesizing Structure While Fuzzing

3y ago
46 Views
2 Downloads
6.54 MB
19 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Joao Adcock
Transcription

Grimoire: Synthesizing Structure while FuzzingTim Blazytko, Cornelius Aschermann, Moritz Schlögel, Ali Abbasi, Sergej Schumilo,Simon Wörner, and Thorsten Holz, Ruhr-Universität rity19/presentation/blazytkoThis paper is included in the Proceedings of the28th USENIX Security Symposium.August 14–16, 2019 Santa Clara, CA, USA978-1-939133-06-9Open access to the Proceedings of the28th USENIX Security Symposiumis sponsored by USENIX.

G RIMOIRE: Synthesizing Structure while FuzzingTim Blazytko, Cornelius Aschermann, Moritz Schlögel, Ali Abbasi,Sergej Schumilo, Simon Wörner and Thorsten HolzRuhr-Universität Bochum, GermanyAbstractIn the past few years, fuzzing has received significant attention from the research community. However, most of thisattention was directed towards programs without a dedicatedparsing stage. In such cases, fuzzers which leverage the inputstructure of a program can achieve a significantly higher codecoverage compared to traditional fuzzing approaches. Thisadvancement in coverage is achieved by applying large-scalemutations in the application’s input space. However, thisimprovement comes at the cost of requiring expert domainknowledge, as these fuzzers depend on structure input specifications (e. g., grammars). Grammar inference, a techniquewhich can automatically generate such grammars for a givenprogram, can be used to address this shortcoming. Such techniques usually infer a program’s grammar in a pre-processingstep and can miss important structures that are uncovered onlylater during normal fuzzing.In this paper, we present the design and implementationof G RIMOIRE, a fully automated coverage-guided fuzzerwhich works without any form of human interaction or preconfiguration; yet, it is still able to efficiently test programsthat expect highly structured inputs. We achieve this by performing large-scale mutations in the program input spaceusing grammar-like combinations to synthesize new highlystructured inputs without any pre-processing step. Our evaluation shows that G RIMOIRE outperforms other coverageguided fuzzers when fuzzing programs with highly structuredinputs. Furthermore, it improves upon existing grammarbased coverage-guided fuzzers. Using G RIMOIRE, we identified 19 distinct memory corruption bugs in real-world programs and obtained 11 new CVEs.1IntroductionAs the amount of software impacting the (digital) life ofnearly every citizen grows, effective and efficient testingmechanisms for software become increasingly important. Thepublication of the fuzzing framework AFL [65] and its success at uncovering a huge number of bugs in highly relevantUSENIX Associationsoftware has spawned a large body of research on effectivefeedback-based fuzzing. AFL and its derivatives have largelyconquered automated, dynamic software testing and are usedto uncover new security issues and bugs every day. However,while great progress has been achieved in the field of fuzzing,many hard cases still require manual user interaction to generate satisfying test coverage. To make fuzzing available tomore programmers and thus scale it to more and more targetprograms, the amount of expert knowledge that is required toeffectively fuzz should be reduced to a minimum. Therefore,it is an important goal for fuzzing research to develop fuzzingtechniques that require less user interaction and, in particular,less domain knowledge to enable more automated softwaretesting.Structured Input Languages. One common challenge forcurrent fuzzing techniques are programs which process highlystructured input languages such as interpreters, compilers,text-based network protocols or markup languages. Typically,such inputs are consumed by the program in two stages: parsing and semantic analysis. If parsing of the input fails, deeperparts of the target program—containing the actual application logic—fail to execute; hence, bugs hidden “deep” in thecode cannot be reached. Even advanced feedback fuzzers—such as AFL—are typically unable to produce diverse setsof syntactically valid inputs. This leads to an imbalance, asthese programs are part of the most relevant attack surface inpractice, yet are currently unable to be fuzzed effectively. Aprominent example are browsers, as they parse a multitudeof highly-structured inputs, ranging from XML or CSS toJavaScript and SQL queries.Previous approaches to address this problem are typically based on manually provided grammars or seed corpora [2, 14, 45, 52]. On the downside, such methods requirehuman experts to (often manually) specify the grammar orsuitable seed corpora, which becomes next to impossible forapplications with undocumented or proprietary input specifications. An orthogonal line of work tries to utilize advancedprogram analysis techniques to automatically infer grammars28th USENIX Security Symposium1985

[4, 5, 25]. Typically performed as a pre-processing step, suchmethods are used for generating a grammar that guides thefuzzing process. However, since this grammar is treated as immutable, no additional learning takes place during the actualfuzzing run.Our Approach. In this paper, we present a novel, fully automated method to fuzz programs with a highly structuredinput language, without the need for any human expert ordomain knowledge. Our approach is based on two key observations: First, we can use code coverage feedback to automatically infer structural properties of the input language. Second,the precise and “correct” grammars generated by previousapproaches are actually unnecessary in practice: since fuzzershave the virtue of high test case throughput, they can dealwith a significant amount of noise and imprecision. In fact, insome programs (such as Boolector) with a rather diverse setof input languages, the additional noise even benefits the fuzztesting. In a similar vein, there are often program paths whichcan only be accessed by inputs outside of the formal specifications, e. g., due to incomplete or imprecise implementationsor error handling code.Instead of using a pre-processing step, our technique isdirectly integrated in the fuzzing process itself. We propose aset of generalizations and mutations that resemble the innerworkings of a grammar-based fuzzer, without the need for anexplicit grammar. Our generalization algorithm analyzes eachnewly found input and tries to identify substrings of the inputwhich can be replaced or reused in other positions. Based onthis information, the mutation operators recombine fragmentsfrom existing inputs. Overall, this results in synthesizing new,structured inputs without prior knowledge of the underlyingspecification.We have implemented a prototype of the proposed approach in a tool called G RIMOIRE1 . G RIMOIRE does notneed any specification of the input language and operates inan automated manner without requiring human assistance;in particular, without the need for a format specification orseed corpus. Since our techniques make no assumption aboutthe program or its environment behavior, G RIMOIRE can beeasily applied to closed-source targets as well.To demonstrate the practical feasibility of our approach,we perform a series of experiments. In a first step, we select adiverse set of programs for a comparative evaluation: we evaluate G RIMOIRE against other fuzzers on four scripting language interpreters (mruby, PHP, Lua and JavaScriptCore),a compiler (TCC), an assembler (NASM), a database (SQLite),a parser (libxml) and an SMT solver (Boolector). Demonstrating that our approach can be applied in many differentscenarios without requiring any kind of expert knowledge,such as an input specification. The evaluation results show1 A grimoire is a magical book that recombines magical elements toformulas. Furthermore, it has the same word stem as the Old French wordfor grammar—namely, gramaire.198628th USENIX Security Symposiumthat our approach outperforms all existing coverage-guidedfuzzers; in the case of Boolector, G RIMOIRE finds up to87% more coverage than the baseline (R EDQUEEN). Second, we evaluate G RIMOIRE against state-of-the-art grammarbased fuzzers. We observe that in situations where an inputspecification is available, it is advisable to use G RIMOIREin addition to a grammar fuzzer to further increase the testcoverage found by grammar fuzzers. Third, we evaluate G RI MOIRE against current state-of-the-art approaches that useautomatically inferred grammars for fuzzing and found thatwe can significantly outperform such approaches. Overall,G RIMOIRE found 19 distinct memory corruption bugs thatwe manually verified. We responsibly disclosed all of themto the vendors and obtained 11 CVEs. During our evaluation, the next best fuzzer only found 5 of these bugs. Infact, G RIMOIRE found more bugs than all five other fuzzerscombined.Contributions. In summary, we make the following contributions: We present the design, implementation and evaluationof G RIMOIRE, an approach to fully automatically fuzzhighly structured formats with no human interaction. We show that even though G RIMOIRE is a binary-onlyfuzzer that needs no seeds or grammar as input, itstill outperforms many fuzzers that make significantlystronger assumptions (e. g., access to seeds, grammarspecifications and source code). We found and reported multiple bugs in various commonprojects such as PHP, gnuplot and NASM.2Challenges in Fuzzing Structured LanguagesIn this section, we briefly summarize essential informationparamount to the understanding of our approach. To thisend, we provide an overview of different fuzzing approaches,while focusing on their shortcomings and open challenges.In particular, we describe those details of AFL (e. g., codecoverage) that are necessary to understand our approach. Additionally, we explain how fuzzers explore the state space ofa program and how grammars aid the fuzzing process.Generally speaking, fuzzing is a popular and efficient software testing technique used to uncover bugs in applications.Fuzzers typically operate by producing a large number of testcases, some of which may trigger bugs. By closely monitoring the runtime execution of these test cases, fuzzers areable to locate inputs causing faulty behavior. In an abstractview, one can consider fuzzing as randomly exploring thestate space of the application. Typically, most totally random inputs are rejected early by the target application andUSENIX Association

do not visit interesting parts of the state space. Thus, in ourabstract view, the state space has interesting and uninterestingregions. Efficient fuzzers somehow have to ensure that theyavoid uninteresting regions most of the time. Based on thisobservation, we can divide fuzzers into three broad categories,namely: (a) blind, (b) coverage-guided and (c) hybrid fuzzers,as explained next.2.1Blind FuzzingThe most simple form of a fuzzer is a program which generates a stream of random inputs and feeds it to the targetapplication. If the fuzzer generates inputs without consideringthe internal behavior of the target application, it is typicallyreferred to as a blind fuzzer. Examples of blind fuzzers areRADAMSA [29], PEACH [14], Sulley [45] and ZZUF [32].To obtain new inputs, fuzzers traditionally can build on twostrategies: generation and mutation.Fuzzers employing the former approach have to acquirea specification, typically a grammar or model, of an application’s expected input format. Then, a fuzzer can use theformat specification to be able to generate novel inputs in asomewhat efficient way. Additionally, in some cases, a set ofvalid inputs (a so-called corpus) might be required to aid thegeneration process [46, 58].On the other hand, fuzzers which employ a mutation-basedstrategy require only an initial corpus of inputs, typicallyreferred to as seeds. Further test cases are generated by randomly applying various mutations on initial seeds or noveltest cases found during fuzzing runs. Examples for commonmutators include bit flipping, splicing (i. e., recombining twoinputs) and repetitions [14, 29, 32]. We call these mutationssmall-scale mutations, as they typically change small parts ofthe program input.Blind fuzzers suffer from one major drawback. They eitherrequire an extensive corpus or a well-designed specificationof the input language to provide meaningful results. If aprogram feature is not represented by either a seed or theinput language specification, a blind fuzzer is unlikely toexercise it. In our abstract, state space-based view, this can beunderstood as blindly searching the state space near the seedinputs, while failing to explore interesting neighborhoods,as illustrated in Figure 1(a). To address this limitation, theconcept of coverage-guided fuzzing was introduced.2.2Coverage-guided FuzzingCoverage-guided fuzzers employ lightweight program coverage measurements to trace how the execution path of the application changes based on the provided input (e. g., by trackingwhich basic blocks have been visited). These fuzzers use thisinformation to decide which input should be stored or discarded to extend the corpus. Therefore, they are able to evolveinputs that differ significantly from the original seed corpusUSENIX Association(a) Blind mutational fuzzers mostlyexplore the state space near the seedcorpus. They often miss interestingstates (shaded area) unless the seedsare good.(b) Coverage guided fuzzers canlearn new inputs (arrows) close to existing seeds. However, they are oftenunable to skip large gaps.(c) Programs with highly structuredinput formats typically have largegaps in the state space. Current feedback and hybrid fuzzers have difficulties finding other interesting islandsusing local mutations.(d) By introducing an input specification, fuzzers can generate inputs ininteresting areas and perform largescale mutations that allow to jumpbetween islands of interesting states.Figure 1: Different fuzzers exploring distinct areas in state space.while at the same time exercising new program features. Thisstrategy allows to gradually explore the state of the programas it uncovers new paths. This behavior is illustrated in Figure 1(b). The most prominent example of a coverage-guidedfuzzer is AFL [65]. Following the overwhelming success ofAFL, various more efficient coverage-guided fuzzers such asA NGORA [12], QS YM [64], T-F UZZ [47] or R EDQUEEN [3]were proposed.From a high-level point of view, all these AFL-style fuzzerscan be broken down into three different components: (i) the input queue stores and schedules all inputs found so far, (ii) themutation operations produce new variants of scheduled inputsand (iii) the global coverage map is used to determine whethera new variant produced novel coverage (and thus should bestored in the queue).From a technical point of view, this maps to AFL as follows: Initially, AFL fills the input queue with the seed inputs.Then, it runs in a continuous fuzzing loop, composed of thefollowing steps: (1) Pick an input from the input queue, then(2) apply multiple mutation operations on it. After each mutation, (3) execute the target application with the selected input.If new coverage was triggered by the input, (4) save it back tothe queue. To determine whether new coverage was triggered,28th USENIX Security Symposium1987

AFL compares the results of the execution with the values inthe global coverage map.This global coverage map is filled as follows: AFL sharesa memory area of the same size as the global coverage mapwith the fuzzing target. During execution, each transitionbetween two basic blocks is assigned a position inside thisshared memory. Every time the transition is triggered, thecorresponding entry (one byte) in the shared memory map isincremented. To reduce overhead incurred by large programtraces, the shared coverage map has a fixed size (typically216 bytes). While this might introduce collisions, empiricalevaluation has shown that the performance gains make up forthe loss in the precision [66].After the target program terminates, AFL compares thevalues in the shared map to all previous runs stored in theglobal coverage map. To check if a new edge was executed,AFL applies the so-called bucketing. During bucketing, eachentry in the shared map is rounded to a power of 2 (i. e., atmost a single bit is set in each entry). Then, a simple binaryoperation is used to check if any new bits are present in theshared map (but not the global map). If any new bit is present,the input is stored in the queue. Furthermore, all new bitsare also set to 1 in the global coverage map. We distinguishbetween new bits and new bytes. If a new bit is set to 1 ina byte that was previously zero, we refer to it as a new byte.Intuitively, a new byte corresponds to new coverage while anew bit only illustrates that a known edge was triggered moreoften (e. g., more loop iterations were observed).Example 1. For example, consider some execution a whileafter starting the fuzzer run for a program represented byits Control-Flow Graph (CFG) in Figure 2 a . Assume thatthe fictive execution of an input causes a loop between Band C to be executed 10 times. Hence, the shared map isupdated as shown in b , reflecting the fact that edges A B and C D were executed only once, while the edges B C and C B were encountered 10 (0b1010) times. Inc , we illustrate the final bucketing step. Note how 0b1010is put into the bucket 0b1000, while 0b0001 is moved intothe one identified by 0b0001. Finally, AFL checks whetherthe values encountered in this run triggered unseen edges ind . To this end, we compare the shared map to the globalcoverage map and update it accordingly (see e ), setting bitsset in the shared but not global coverage map. As visualizedin f , a new bit was set for two entries, while a new bytewas found for one. This means that the edge between C Dwas previously unseen, thus the input used for this exampletriggered new coverage.While coverage-guided fuzzers significantly improve uponblind fuzzers, they can only learn from new coverage if theyare able to guess an input that triggers the new path in theprogram. In certain cases, such as multi-byte magic values,the probability of guessing an input necessary to trigger adifferent path is highly unlikely. These kind of situations198828th USENIX Security Symposiumoccur if there is a significant gap between interesting areas inthe state space and existing mutations are unlikely to cross theuninteresting gap. The program displayed in the Figure 1(b)illustrates a case with only one large gap in the programspace. Thus, this program is well-suited for coverage-guidedfuzzing. However, current mutation-based coverage-guidedfuzzers struggle to explore the whole state space becausethe island in the lower right is never reached. To overcomethis limitation, hybrid fuzzer were introduced; these combinecoverage-guided fuzzing with more in-depth program analysistechniques.2.3Hybrid FuzzingHybrid fuzzers typically combine coverage-guided fuzzingwith program analysis techniques such as symbolic execution,concolic execution or taint tracking. As noted above, fast andcheap fuzzing techniques can uncover the bulk of the easyto-reach code. However, they struggle to trigger programpaths that are highly unlikely. On the other hand, symbolicor concolic execution does not move through the state spacerandomly. Instead, these techniques use an SMT solver tofind inputs that trigger the desired behavior. Therefore, theycan cover hard-to-reach program locations. Still, as a consequence of the precise search technique, they struggle toexplore large code regions due to significant overhead.By combining fuzzing and reasoning-based techniques, onecan benefit from the strength of each individual technique,while avoiding the drawbacks. Purely symbolic approacheshave proven difficult to scale. Therefore, most current toolssuch as S AGE [21], D RILLER [54] or QS YM [64] use concolicexecution instead. This mostly avoids the state explosionprobl

1A grimoire is a magical book that recombines magical elements to formulas. Furthermore, it has the same word stem as the Old French word for grammar—namely, gramaire. that our approach outperforms all existing coverage-guided fuzzers; in the case of Boolector, GRIMOIRE finds up to 87% more coverage than the baseline (REDQUEEN). Sec-

Related Documents:

Fuzzing for Software Security Testing and Quality Assurance Media whore. Overview The fuzzing setup Fuzzing PDF's, Preview and Adobe Acrobat Reader Fuzzing PPT's, OpenOffice and MS PowerPoint Fuzzing "truths" revealed. About this talk Most fuzzing talks take one of two forms

Fuzzing JavaScript Engines with Aspect-preserving Mutation Soyeon Park Wen Xu Insu Yun Daehee Jang Taesoo Kim Georgia Institute of Technology {spark720, wen.xu, insu, daehee87, taesoo} @ gatech.edu Abstract—Fuzzing is a practical, widely-deployed technique to find bugs in complex, real-world programs like JavaScript engines.

in-depth view on fuzzing outlining the state-of-the-art advance-ment in this area since its original introduction. RQ2, which is discussed in Section VI, is proposed to give an insight into the scope of fuzzing and its applicability to different domains. Fi-nally, based on the answer to the previous questions, we expect

fuzzing to big data analytics directly for three reasons: (1) the long latency of DISC systems prohibits the applicability of fuzzing: naïve fuzzing would spend 98% of the time in setting up a test environ-ment; (2) conventional branch coverage is unlikely to scale to DISC applications because most binary code comes from the framework

information they can disregard. Synthesizing (grades 3-4) Synthesizing is the most complex of comprehension strategies. Synthesizing information integrates the words and ideas in the text with the

Hartmut Pohl: Automated Testing with Commercial Fuzzing Tools 4 After the interfaces have been successfully identified, input data can be generated using a Fuzzer; these data can then be

Lecture 4: Dynamic Analysis and Fuzzing Lecturer: Suman Jana Scribe: Jonas Guan Feb 21, 2019 Presentation Logistics Starting from the class on March 7th, students will begin to indiv

Fedrico Chesani Introduction to Description Logic(s) Some considerations A Description Language DL Extending DL Description Logics Description Logics and SW A simple logic: DL Concept-forming operators Sentences Semantics Entailment Sentences d 1: d 2 Concept d 1 is equivalent to concept d 2, i.e. the individuals that satisfy d 1 are precisely those that satisfy d 2 Example: PhDStudent .