Combining Static And Dynamic Analysis For Top-down Communication Trace .

1y ago
9 Views
2 Downloads
1.96 MB
26 Pages
Last View : 13d ago
Last Download : 3m ago
Upload by : Maleah Dent
Transcription

6th Annual MVAPICH User Group (MUG) MeetingCombining Static and Dynamic Analysisfor Top-down Communication Trace CompressionJidong ZhaiTsinghua University

Communication Traces CommunicationTraces highly useful:lCommunicationperformance analysis Bottleneck/Bug detection Platform selection Communication optimizationlDesign future HPC systemsID MPI TYPE1. MPI Send2. MPI RecvSIZE SRC DEST TIME20B 10 16 10us10B 12 10 12us3. MPI Send4. MPI Recv20B10B1011121010us11us5. MPI Send6. MPI Recv7. MPI Send20B10B20B10141011101412us15us11us8. MPI Recv 10B161013usSegment of MPI communication traceMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression2

Analyzing Communication PatternsCommunication patterns extracted from MPI communication traces.[1] Ruini Xue, Xuezheng Liu, M ing Wu, Zhenyu Guo, Wenguang Chen, Weimin Zheng, Zheng Zhang,Geoffrey M . Voelker: M PIW iz: subgroup reproducible replay of mpi applications. PPOPP 2009: 251-260.MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression3

Predicting Future HPC SystemsNetworkParameters etc.CommunicationTraces Trace-drivenSimulatorLeading-edge applicationComputationTracesPredicted PerformanceLeading-edge system[1] A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework forapplication performance modeling and prediction. In SC ’02, pages 1–17, 2002.MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression4

Tracing Communication: Challenges Problem: Traces à HugelCaused by growing Problem size job scale (# of processes)lExample: ASCI SMG2000, 64*64*32 Problem size,22,538 Processes à 5 TB Traces[1] Huge traces bringllPressure on storage systemInterference with user application execution[1] B. J. N. Wylie, M . Geimer, and F. Wolf, “Performance measurement and analysis of large-scale parallelapplications on leadership computing systems,” Sci. Program., vol. 16, no. 2-3, pp. 167–181, Apr. 2008.MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression5

Solution: Trace Compression Significant redundancyllWithin a process: iterative timesteps (loops)Across processes: similar behavior (SPMD) Existing tools: Bottom-up approach ScalaTrace[1], ScalaTrace-2[2] Examine traces at runtime to discover repeatedpatterns Two-phase: intra-process inter-process[1] M . Noeth, F. M ueller, M . Schulz, and B. de Supinski, “Scalable compression and replay ofcommunication traces in massively parallel environments,” in IPDPS’07, 2007.[2] X. Wu and F. M ueller, “Elastic and scalable tracing and accurate replay of non-deterministic events,”in ICS’13, 2013.MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression6

Bottom-up 1: Intra-Process Compression Identify repeated pattern within one processRepeated patternabc100*bc200*P0bc100*P1 100*200*P2aP 3 N*(b, c)200*P N-1PNCommunication traces after intra-process compressionP0P1P2P3P N-1PNCommunication traces for each process.MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression7

Bottom-up 2: Inter-Process Compression Compare traces in horizontal dimension to identify repeatedpatterns100*200*N/2*PP0 0100*N/2*PP1 1 200*P2P3100*200*P N-1PNCommunication traces after inter-processintra-process compressioncompressionMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression8

Limitations of Bottom-Up Compression Intra-Process:llExpensive with complex communication patternsNested loop structures[1]1 for (i 0; i 10; i )2 for (j 0; j i; j ) {3if (j%2 0)4M PI Isend( );5else6M PI Irecv( );7 }8 M PI Waitall( );9 M PI Reduce( );10 }M PI Program snippet12345678910111213MPI Isend( )MPI Waitall( )MPI Reduce( )MPI Isend( )MPI Irecv( )MPI Waitall( )MPI Reduce( )MPI Isend( )MPI Irecv( )MPI Isend( )MPI Waitall( )MPI Reduce( ) Communication trace segment[1] Q. Xu, J. Subhlok, and N. Hammen, “Efficient discovery of loop nests in execution traces,” in M ASCOTS’10,2010, pp. 193–202.MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression9

Limitations of Bottom-Up Compression Inter-Process:llEven more expensive, not scalableComplexity for two processes à O(n2 )4*MPI Send3*MPI Recv5*MPI Bcast3*MPI Recv comm. traces of process P03*MPI Isend4*MPI Irecv5*MPI Wait3*MPI Barrier comm. traces of process P14*MPI Send3*MPI Recv5*MPI Bcast3*MPI Recv 3*MPI Isend4*MPI Irecv5*MPI Wait3*MPI Barrier Compressed traces of P0 and P1MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression10

Our Approach: CYPRESS Top-down techniquel Combines static and dynamic analysisl Static analysis phase Extract program communication structure tree– Loops, branches etc.lDynamic analysis phase Use this tree as a template “Fill in” runtime information– Loop countMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression11

Overview of CYPRESSProgram CSTAn MPI ProgramIntra-ProceduralAnalysisAn MPI BinaryInter-ProceduralAnalysisA Compiler Plug-inStatic Analysis ModuleProgram communication structure tree (CST)A HPC systemDynamic Analysis ModuleCompressed communication tracesMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression12

CYPRESS: Advantages Much fasterllTop-down “Big picture” Redundancy expected Cut the guessingMore efficient Knows “where to stop” ScalablelllSignificant improvement to Interprocess compressionA better worst-case complexityMore affordable for large-scaleanalysis0:Root1:Loop2:Br3:M PI Send4:Br8:Br6:Loop5:M PI Recv9:M PI Reduce7:M PI Bcast?MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression13

CYPRESS: Static AnalysisProgram CSTAn MPI ProgramIntra-ProceduralAnalysisAn MPI ProgramInter-ProceduralAnalysisA Compiler Plug-inStatic Analysis ModuleProgram communication structure tree (CST)A HPC systemDynamic Analysis ModuleCompressed communication tracesMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression14

Communication Structure Tree (CST)lOrdered tree Program communication structure Non-leaf nodes:– Loop– Branch Leaf nodes:– MPI communication calls» MPI Send, MPI RecvMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression15

Build Communication Structure Tree1 int main(){2 for (i 0; i k; i ){3if (myid % 2 0)4M PI Send(buf, size, M PI INT,5myid 1, 0, M PI COM M W ORLD);6else7M PI Recv(buf, size, M PI INT,8myid-1, 0, M PI COM M WORLD, &status);9bar();10 }11 foo();12 if (myid % 2 0)13M PI Reduce(sbuf, rbuf, 1, M PI INT,14M PI SUM , root, comm);15 }16 int bar(){17 for(k 0; k n; k )18M PI Bcast(buf, size, M PI INT, root, 1819M PI COM M WORLD);20 }21 int foo(){22 for(j 0; j m; j )23sum j;24 }0:Root1:Loop2:Br4:Br3:M PI Send7:foo()6:bar()8:Br9:M PI Reduce5:M PI RecvLocal CST0:Root1:Loop2:Br3:M PI Send4:Br8:Br6:Loop5:M PI Recv9:M PI Reduce7:M PI BcastGlobal CSTMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression16

CYPRESS: Dynamic AnalysisProgram CSTAn MPI ProgramIntra-ProceduralAnalysisAn MPI ProgramInter-ProceduralAnalysisA Compiler Plug-inStatic Analysis ModuleProgram communication structure tree (CST)A HPC systemDynamic Analysis ModuleCompressed communication tracesMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression17

Intra-Process Compression “Fill in” dynamic information into CSTllRecord loop count and branch taken statusPer-node linked list for MPI event details0:Root K ß 0,K,1 ß K ß2:Br3:M PI Sendà 1:Loop4:Br5:M PI Recvà 8:Br6:Loop9:M PI Reduce7:M PI Bcastpara.à para.Fill in dynamic information into the CST.MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression18

Inter-Process Compression Compare corresponding nodes of the treescompare nodes0:Root K ß 1:Loop 0,K,1 ß K ß2:Br3:MPI Send4:Br5:MPI Recv8:Br6:Loop K ß9:MPI Reduce7:MPI BcastA compressed trace tree for P0 0:Root null ß null ß2:Br3:MPI Send1:Loop4:Br5:MPI Recv8:Br6:Loop9:MPI Reduce7:MPI Bcast A compressed trace tree for P1MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression19

BenchmarksllExperiments8 NPB programs: BT, CG, DT, EP, FT, LU, MG, SPReal Application: LESlie3d PlatformlExplorer-100 HPC cluster 2*Intel Xeon X5670/48GB Memory Infiniband network, MVAPICH-2 Alternative systemslllGzip (offline compression)ScalaTrace (online, bottom-up)ScalaTrace-2 (online, bottom-up) MetricsllCompression effectiveness (trace size reduction)Compression overheadMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression20

Compressed Trace Size100 0000100 00BT100 00FT100 0100100101164121100 0000256CG100 0064400128256512LU100 0000100 001001001164128100 00256512128256512256512256400MG100 0000DT100 064100 0010010010114864100 00128256128100 0000EP100 064SP100 0010010010116412825651264121Compressed communication trace size in log scale (KB).MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression21

Intra-Process Compression Overhead80%BT8%Tim e o ve rh e ad (% )Tim e o ve rh e ad (% )10%6%4%2%0%64121256LU60%40%20%0%40064128N u m o f P ro ce sse s100 0%CG15%10%5%0%64128256MG100 %10%1%51264N u m o f P ro ce sse sFT8%40%6%4%2%0%64128256N u m o f P ro ce sse sScalaTrace:ScalaTrace-2:CYPRESS:128256512N u m o f P ro ce sse sTim e o ve rh e ad (% )Tim e o ve rh e ad (% )10%512N u m o f P ro ce sse sTim e o ve rh e ad (% )Tim e o ve rh e ad (% )20%256512SP30%20%10%0%6451.05%9.1%1.58% 5X over ScalaTrace-2121256400N u m o f P ro ce sse sMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression22

Inter-Process Compression OverheadInter-process overhead (Sec.)Sc alaT raceSc alaT race 2C ypre ss100 01001010. 2MG64121256400SPInter-process trace compression overhead in log-scale (second)Average inter-process compression overhead:ScalaTrace: 170.69%ScalaTrace-2: 30.3%CYPRESS:3.29% 9X over ScalaTrace-2MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression23

Overall Compression Overhead Overhead breakdownDynamic PhaseIntra-Process TraceScalaTrace-2CYPRESSStatic ionOverhead (Sec.)0.250.110.050.090.050.160.090.11Extra compilation time incurred by Cypress for each program (second)MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression24

Conclusion CYPRESS: Top-Down trace compressionllLeverage static information to facilitate more effectiveand efficient trace compressionReduce compression overhead Intra-process by 5X, inter-process by 9XlStrong lossless compression performance Take-away Message: static information usefulProvides reliable insightRelatively cheap, with scale-independent cost Analyze memory access patterns:lllSpindle: Informed Memory Access Monitoring. USENIX ATC 2018.MUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression25

Thanks!Contributors: Jianfei Hu, Liwei Chen, Jidong Zhai,XiongchaoTang, Xiaosong Ma, Wenguang ChenMUG'18-Combining Static and 26 Analysis for Top-down Communication Trace Compression26

l Combines static and dynamic analysis l Static analysis phase Extract program communication structure tree - Loops, branches etc. l Dynamic analysis phase Use this tree as a template "Fill in" runtime information - Loop count MUG'18-Combining Static and 26 Analysis for Top -down Communication Trace Compression 11

Related Documents:

Combining Static and Dynamic Analysis for Bug Finding ( ) Abstract to analyze all paths (-) Even impossible ones ChristophCsallner(4) Model with superset of possible paths Example: compiler Warn about bug'' on impossible path False bug warning. Combining Static and Dynamic

simulation method. A dynamic fault tree usually consists of static gates and dynamic gates. The unique function of dynamic gates is depicting interactions in a complex system, which cannot be realized by static gates. In order to understand fault tree better, we apply static fault tree and dynamic fault tree in risk analysis of di erent areas .

variation of the load is small, hence static analysis is sufficient. However, in case of off-shore structures (oil rigs), high rise buildings subjected to lateral loads (wind, earth quake) dynamic effects of the load must be explored for knowing the exact safety and reliability of the structure. Comparison between static and dynamic analysis:

Abstract: Static analysis relies on features extracted without executing code, while dynamic analysis extracts features based on execution (or emulation). In general, static analysis is more efficient, while dynamic analysis can be more informative, particularly in cases where the code is obfuscated. Static analysis of an Android application

Static analysis checks 20 Stages of static analysis Control flow analysis. Checks for loops with multiple exit or entry points, finds unreachable code, etc. Data use analysis. Detects uninitialized variables, variables written twice without an intervening assignment, variables which are declared but never used, etc. Interface analysis.

Dec 06, 2018 · Dynamic Strategy, Dynamic Structure A Systematic Approach to Business Architecture “Dynamic Strategy, . Michael Porter dynamic capabilities vs. static capabilities David Teece “Dynamic Strategy, Dynamic Structure .

5 Vulnerability Analysis Techniques Static analysis Analysis performed before a program starts execution Works mainly on source code Binary static analysis techniques are rather limited Not very effective in practice, so we won't discuss in depth Dynamic analysis Analysis performed by executing the program Key challenge: How to generate input for execution?

dimensional structure of a protein, RNA species, or DNA regulatory element (e.g. a promoter) can provide clues to the way in which they function but proof that the correct mechanism has been elucid-ated requires the analysis of mutants that have amino acid or nucleotide changes at key residues (see Box 8.2). Classically, mutants are generated by treating the test organism with chemical or .