USING MACHINE LEARNING FOR VLSI TESTABILITY AND

2y ago
217 Views
43 Downloads
1.34 MB
40 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Cade Thielen
Transcription

USING MACHINE LEARNING FOR VLSITESTABILITY AND RELIABILITYMark Ren, Miloni Mehta

TAKE-HOME MESSAGES Machine learning can improve approximate solutions for hardproblems. Machine learning can accurately predict and replace brute forcemethods for computational expensive problems.2

VLSI TESTABILITY AND tyYearsNVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.ChipWaferPassFailTesting3

PART 1Testability Prediction and Test Point Insertion with GraphConvolutional Network (GCN)Mark Ren, Brucek Khailany, HarbinderSikka, Lijuan Luo, Karthikeyan NatarajanYuzhe Ma, Bei Yu“High Performance Graph Convolutional Networks with Applications in Testability Analysis”, to appear in Proceedings of Design Automation Conference, 20194

PART 2Full Chip FinFET Self-heat Prediction using Machine LearningMiloni Mehta, Chi Keung Lee, Chintan Shah, Kirk Twardowski5

PART 1 OUTLINEIntroductionLearning model for testability analysis and enhancementPractical issuesScalabilityData imbalance6

HOW DO WE TEST A CHIP100010000101100111011101Input ck-at-0faultoutput patterns?010101101111001011110101golden patterns7

TESTABILITY PROBLEMB’s faults unobservable Difficult-to-test (DT)i9i10BTest Point (TP)Stuck-at-0faultB’s faults are observable with an inserted registerGNDOi0i1i2i3i4i5i6i7i8Almost always 0AAlmost always 08

MOTIVATIONTest Point Insertion Problem:Pick the smallest number of test points to achieve the largest testability enhancementNumber of test points chip area costNumber of test patterns test timeHard problem, only approximate solutions existCommercial solution: Synopsys TetraMaxCan we improve it with Machine Learning?Predict testabilitySelect test points9

ML BASED TESTABILITY PREDICTIONGiven a circuit, predict which gate outputs are difficult-to-test (DT)Gate Features: [logic level, SCOAP C0, SCOAP C1, SCOAP OB]Gate Label: DT (0 or 1) generated by TetraMaxInput FeaturesN1: 0,0,1,1N2: 1,0,1,0N3: 2,0,1,1.Output classificationML ModelN1: 0N2: 1N3: 0.10

BASIC MACHINE LEARNING MODELINGDid not fully leverage the inductive bias of circuit structurefaninfanout1237149a2106813511ML ModelsF(a) [Fa, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10]faninLRRFSVMMLPa is DTa is not DTfanout11

GRAPH CONVOLUTIONAL NETWORK (GCN)132489576Aggregation (mean, sum)Encoding (R4 R32,Relu)12

GCN BASED TESTABILITY PREDICTIONLayer 1Layer 2Layer 3Fully Connected Layers11000Weighted sum& Relu(R4 R32)Weighted sum& Relu(R32 R64)Weighted sum& Relu(R64 R128)0(64,64,128,2)13

ACCURACY IMPACT OF GCN LAYERS (K)Training Accuracy (%)100959085K 1 80K 2 7570K 36560Testing Accuracy(%)1009590858075706560131 61 91 121 151 181 211 241 271Epochs131 61 91 121 151 181 211 241 271Epochs14

EMBEDDING VISUALIZATION Embeddings looks more discriminative as stage increase;K 1K 2K 315

MODEL COMPARISON ON BALANCED DATASETCompare with basic ML modeling: LR, RF, MLP, SVMN 500 nodes in fanin cone and 500 nodes infanout cone, a total of 1000 nodesCompare to 3-layer GCN10.90.80.70.60.50.40.30.20.10Less than 1000 nodes influence each node,comparable with the baselineGCN has the best accuracy (93%).PrecisionRecallF1 scoreAccuracy16

TEST POINT INSERTION WITH GCN MODELCircuitGraphAn iterative process to select TPsenabled by GCN modelSelect TP candidate based onpredicted impactNumber of reduced DTs in thefanin cone of TPGCN ModelTP CandidatesGraphModificationGraphModificationGCN Modelnew TPImpactEstimationPoint Selectionnew TPNDone?YFinal TPs17

TEST POINT INSERTION RESULTS COMPARISONMachine learning can improve approximate solutions for hard problems11% less test points with 6% less test pattern under same coverage vs TetraMax.30.00%25.00%20.00%15.00%Test point reduction10.00%Test pattern reduction5.00%0.00%1234-5.00%-10.00%-15.00%18

MODEL SCALABILITYChoices of model implementationBatch processing: RecursionFull graph: Sparse matrix multiplication𝐸𝑘 𝑅𝑒𝐿𝑈((𝐴 𝐸𝑘 1 ) 𝑊𝑘 )TradeoffMemory vs speed1M nodes/second on Volta GPU19

MULTI GPU TRAININGTraining dataset has multiple million gates designs that can not fit on one GPUData parallelism, each GPU computes one design/graphGraph1Shared modelGPU1Graph2Shared modelGPU2Replicate models across multiple GPUsLeverage PyTorch DataParallel moduleTrained with 4 Tesla V100 GPUs on DGX1Graph3Shared modelGPU3Graph4Shared modelGPU4Δ20

IMBALANCE ISSUEIt is very common to have much more non-DTs (negative class) than DTs (positive class),imbalance ratio more than 100XClassifier 1: ok precision, low recallPredict: 0Predict: 1Fact: 0133576290Fact: 13681432Recall: 10.5%Precision: 59.8%Classifier 2: high recall, low precisionPredict: 0Predict: 1Fact: 010091932927Fact: 11144069Recall: 97.3%Precision: 11.0%21

MULTI-STAGE CLASSIFICATIONThe networks on initial stages only filter out negative data points with high confidenceHigh recall, low precisionPositive predictions are sent to the network on the next stage- Network 1- Network 2- Network 322

MULTI-STAGE CLASSIFICATION RESULTBalanced Recall and PrecisionPred: 0Pred: 1Fact: 010091932927Fact: 11144069Stage 1Recall: 97.3%Precision: 11.0%Pred: 0Pred: 1Fact: 0269355992Fact: 12213848Pred: 0Pred: 1Fact: 05207785Fact: 13093539Stage 3Recall: 92.05Precision: 81.8%Stage 2Recall:94.6%Precision: 39.1%Pred: 0Pred: 1Fact: 0133061785Fact: 15743539OverallRecall: 86.0%Precision: 81.8%23

PART 1 - SUMMARYMachine learning can improve VLSI design testability beyond the existing solutionPredictive power of ML modelGraph based model is suitable for VLSI problemsPractical issues such as scalability and data imbalance need to be dealt with24

PART 2Full Chip FinFET Self-heat Prediction using Machine LearningMiloni Mehta, Chi Keung Lee, Chintan Shah, Kirk Twardowski25

VLSI TESTABILITY AND tyYearsNVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.ChipWaferPassFailTesting26

SEMICONDUCTOR RELIABILITYSource: eliability/27

RELIABILITYDEVICE SELF-HEAT (SH)Active power in transistors dissipated as heat to thesurroundingsFinFETs are more sensitive to SH than planar devicesWhy do we care?Exacerbates Electro-migration (EM) on interconnects16ff EM limit reduction vs TemperatureTransistor threshold voltage (Vt) shiftsEM rating factor (Imax)Time dependent dielectric breakdown (TDDB)2.521.510.5090110130Temperature15017028

SH METHODOLOGIES SO FARNo sign-off tool that can handle full chip SHanalysis2D LUT vs Spice comparisonLimitations using Spice simulationsImpractical to run on billions of transistors2D Look-up Table approachBased on frequency and capacitive loading fordifferent clock driversTemperature(C)Teams review high power density cellsLUTSPICEReduced run time by more than 90% over full SpicesimulationsPessimistic wrt Spice29

SELF-HEAT TRENDSCell size 1/SHResistance 1/SH (nonlinear)Predicted SHCapacitiveloading SHNormalised SHFrequency SHR/C (1e12)Frequency30

MOTIVATION TO USE MLIdentify problematic cells in the design without exhaustive Spice simulationsComplex relationship between design and SHDesign database available for several projectsReusability across projectsFocusClock inverters and buffersQuick, easy, light-weightRank cells above certain SH threshold for thorough analysis31

MACHINE LEARNING MODELGet GenerateML ModelSimulate inHSPICEYtrainingGet Attributesfrom PrimeTimeX testReady forDeploymentEquation:Y ?NoYesValidationSelectTest DataSimulate inHSPICEPrediction onTest SetYpred- (Predicted testSpice)?Ytest32

DATASET SELECTIONCover a wide range of frequenciesCover different types of standard cell sizesPrevent duplication in training data due to replicated partitions/chipletsOutliers in the design chosenLabels obtained through Spice simulations (supported from foundry spice models)TSMC 16nm FinFET training model used 4300 training samples with 9 features33

DNN REGRESSOR MODELXn1X11 X12 . . . X19X21 X22 . . . X29.Xn1 Xn2 . . . Xn9PredictedSelf-HeatYn Xn2OutputlayerFeatures:Output CapacitanceFrequencyCell sizeNet resistanceInput slewOutput slew# of output loadsInput Capacitance of loadsAvg transition on loadCost Σ (Ypred- Y)2Xn9NInput Layerhiddenlayer 1hiddenlayer 2hiddenlayer 334

MINIMIZING COST FUNCTIONGradient descentAdam optimizer which has adaptive learning rateExponential Linear Unit (ELU) used as activationfunction300,000 training steps35

RESULTSXavier CPU 2000 validation samplesGood correlation between DNNprediction and Spice SHAverage err % wrt Spice 6.5%MSE 0.05Spice SH36

QUANTITATIVE BENEFITSTrained model is deployed for inference on millions of clock cellsTraining time: 37 minutes (DGX1 used)Inference time: 1min 99% cells filtered from Spice simulations!Top 1000 prediction results simulated and verifiedFound small clock tree cells had highest SHOutlier detection improved inference by 2.65% in Turing37

COMPARISON TO PRIOR WORKInstance #38

PART 2 - SUMMARYFinFET Self-Heat is a growing reliability concernProposed supervised ML model using DNNAccurately predict Self-heat100x runtime improvementDisplayed techniques to select representative dataset for trainingModel deployed for Xavier and Turing projectsUse ML techniques to improve productivity and solve challenging problems in VLSI39

Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model is suitable for VLSI problems Practical issues such as scalability and data imbalance

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

VLSI Design 2 Very-large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining thousands of transistors into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device.

VLSI IC would imply digital VLSI ICs only and whenever we want to discuss about analog or mixed signal ICs it will be mentioned explicitly. Also, in this course the terms ICs and chips would mean VLSI ICs and chips. This course is concerned with algorithms required to automate the three steps “DESIGN-VERIFICATION-TEST” for Digital VLSI ICs.

VL2114 RF VLSI Design 3 0 0 3 VL2115 High Speed VLSI 3 0 0 3 VL2116 Magneto-electronics 3 0 0 3 VL2117 VLSI interconnects and its design techniques 3 0 0 3 VL2118 Digital HDL Design and Verification 3 0 0 3 VL2119* Computational Aspects of VLSI 3 0 0 3 VL2120* Computational Intelligence 3 0 0 3

Dr. Ahmed H. Madian-VLSI 3 What is VLSI? VLSI stands for (Very Large Scale Integrated circuits) Craver Mead of Caltech pioneered the filed of VLSI in the 1970’s. Digital electronic integrated circuits could be viewed as a set

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid