DL-BASED INDUSTRIAL INSPECTION (DEFECT SEGMENTATION)

2y ago
26 Views
2 Downloads
3.48 MB
58 Pages
Last View : 2d ago
Last Download : 2m ago
Upload by : Brenna Zink
Transcription

DL-BASED INDUSTRIAL INSPECTION(DEFECT SEGMENTATION)Peter Pyun Ph.D.Andrew Liu Ph.D.

Relevant Links:Defect SegmentationNvidia Industrial Inspection White Paper 5c949687a2e3a90445b8431fUsing U-net and public DAGM dataset (with Nvidia GPU T4, TRT5), it shows 23.5xperf. boost using T4/TRT5, compared to CPU-TF.2

Industrial Defect InspectionNvidia GPU Cloud (NGC) Docker imagesDL Model set up - UnetAGENDAData preparationDefect segmentation – precision/recallAutomatic Mixed Precision - AMPGPU accelerated inferencing – TF-TRT & TRT3

INDUSTRIALDEFECT INSPECTION4

Industrial Inspection Use-caseDisplay panelAutomotiveManufacturingPanelPCBCPU socketBatterysurface defects(Electric car,Mobile phone)Foundry/WaferIC Packaging5

2 Main Scenarios– Industrial/Manufacturing inspectionWith AOIWithoutAOI6

NVIDIA DEEP LEARNING PLATFORMData(Curated/Annotated)AI INFERENCING @EDGEAI TRAINING @DATA CENTERTensorRTTesla/TuringDGXTeslaNvidia GPU Cloud (NGC)docker containerDRIVE AGXJetson AGXOptimizerRuntimeTensorRTDNN7

NGC DOCKER IMAGES8

Benefits for Deep Learning WorkflowHigh Level Benefits and Feature SetSingle software stackDevelop once, deployanywhereScale across teams ofpractitionersDeveloper, DevOp, QC

Defect classification workflowRapid prototyping for production with NGCPreTrainingTrainingTensorflow: NGC optimized docker image1. NGC TensorFlowV100Used in industrial inspection white paperDGX-1VInferenceTF-TRT / TensorRT1. NGC TensorFlow2. NGC TensorRTDGX-1 / 2V100T4

MODEL SET UP11

DL FOR DEFECT INSPECTIONSupervisedClassification(Defect / Non Defect)unsupervisedObject ns MaskItself12

FROM LITERATURE: CNN/LENET (2016)Source: Design of Deep Convolutional Neural Network Architectures for Automated Feature Extraction in Industrial Inspection, D.Weimer et al, 201613

FROM LITERATURE CNN/LENET (2016)Coarse segmentation results - can we do better?Source: Design of Deep Convolutional Neural Network Architectures for Automated Feature Extraction in Industrial Inspection, D.Weimer et al, 201614

U-Net structure12864256225622562256232 64 2812821282128212826464212823X3 Conv2d ReLU1282256251225122512264 32 3216 32 32256251225122512232 16 161 16 162X2 MaxPool2X2 Conv2dTransposecopy and concatenate

KERAS-TF IMPLEMENTATION- ENCODINGConvolution16

KERAS-TF IMPLEMENTATION- ENCODINGdeconvolution17

Image segmentation on medical imagesSame process among various use casesData Science BOWL2016Data Science BOWL2017Data Science BOWL2018MRI imageCT imageImageLeft ventricleNoduleNucleiheart diseaseLung cancerDrug discovery18

Many othersDifferent verticalsSurveillanceAutonomous CarDroneHumanRoad SpacePath SpaceAnomaly DetectionSpace for Self Driving CarNavigation19

MANUFACTURINGDefect Inspection20

DATA PREPARATION21

DATASET FOR INDUSTRIAL OPTICAL INSPECTIONDAGM (from German Association for Pattern Recognition) 007/prizes.html22

DAGM DATASETPassNGPassNGPassPassNGNG23

DAGM DETAILS Original images are 512 x 512 grayscale format Output is a tensor of size 512 x 512 x 1 Each pixel belongs to one of two classes 6 defect classes Training set consist of 100 defect images Validation set consist of 50 defect images24

DAGM EXAMPLES WITH LABELS25

Dice Metric (IOU) for unbalanced dataset Metric to compare the similarity of two samples:2𝐴𝑛𝑙𝐴𝑛 𝐴𝑙 Where: An is the area of the contour predicted by the network Al is the area of the contour from the label Anl is the intersection of the two The area of the contour that is predicted correctly by the network 1.0 means perfect score. More accurately compute how well we’re predicting the contour against the label We can just count pixels to give us the respective areas26

27LEARNING CURVES27

U-NET / DAGM FOR INDUSTRIAL INSPECTION DAGM merged binaryclassification dataset: 6000defect-free, 132 defect images Challenges: Not all deviations fromthe texture are necessarily defects.28

DEFECT SEGMENTATION– PRECISION/RECALL29

FINAL DECISION30

DEFECT VS NON-DEFECT BY THRESHOLDINGDeclare as defect (white)if probability is higherthan threshold ( 0.5)Segmentation model outputs Numpy array ofclass probability of each class (example 2classes)Thresholdingquery image 512x51231

INFERENCE PIPELINEDomain expertise involved decision making (not a black-box)InferenceData Center/ Clouddecision making(defect vs. non-defect)DGX Server / V100TF-TRT &TensorRTDomain CriteriaDetermine iteResultMetadata Defect Pattern RatioDefect LevelDefect region sizeDefect counts Precision/ RecallInspection MachineEdgeTF-TRT &TensorRTT4 / V10032

(Example) Precision/Recall diagram33

(Example) Simple binary anomaly detectorThreshold of probability of defect: higher number means harder for classifier to detect as defect class.Higher threshold: FP lower, precision (TP/(TP FP)) higherFN higher, recall (TP/(TP FN)) lowerTP: True Positive, FP: False Positive, FN: False Negative, TN: True Negative.red arrow means moving threshold of probability on defect detection into higher value.34

Precision/Recall ResultsExperimental results verifies precision/recall trade-off.Domain expert knowledge involved: choose threshold per your application and business P1682222210FN133333357FP 30.97830.97830.97830.96380.9493Choose: threshold 0.8 for high precision 0.9925 & small FP rates 0.001135

Precision/Recall - reducing false positivesPrecision TP/(TP FP) : 99.25%Recall TP/(TP FN) : 96.38%False alarm rate FP/(FP TN): 0.11%Actualdefectdefect freedefect99.25% (TP)0.75% (FP)defect free0.55% (FN)99.45% (TN)Predict*sensitivity recall true positive rate,specificity true negative rate TN/(TN FP), false alarm rate false positive rate36

Final decisionDefect segmentation(U-net Thresholding)37

AUTOMATIC MIXED PRECISIONFOR U-NETON V10038

TENSOR CORES FOR DEEP LEARNINGMixed Precision implementation usingTensor Cores on Volta and Turing GPUsTensor Cores A revolutionary technology that accelerates AI performance by enablingefficient mixed-precision implementation Accelerate large matrix multiply and accumulate operations in a singleoperationMixed Precision Techniquecombined use of different numerical precisions in a computational method; focusis on FP16 and FP32 combination.Benefits Decreases the required amount of memory enabling training of larger models ortraining with larger mini-batches Shortens the training or inference time by lowering the required resources byusing lower-precision s39

Automatic Mixed PrecisionEasy to Use, Greater Performance and Boost in Productivity Insert two lines of code to introduceAutomatic Mixed-Precision in your traininglayers for up to a 3x performanceimprovement. The Automatic Mixed Precision feature uses agraph optimization technique to determineFP16 operations and FP32 operations. Available in TensorFlow, PyTorch and MXNetvia our NGC Deep Learning FrameworkContainers.More details: sionUnleash the next generation AI performance and get faster to the market!40

Enable Automatic Mixed PrecisionAdd Just A Few Lines of Code, Get Upto 3X SpeedupTensorFlowos.environ['TF ENABLE AUTO MIXED PRECISION'] '1'OR thru NGCexport TF ENABLE AUTO MIXED PRECISION 1PyTorchmodel, optimizer amp.initialize(model, optimizer)with amp.scale loss(loss, optimizer) asscaled loss:MXNetamp.init()amp.init trainer(trainer)with amp.scale loss(loss, trainer) as scaled loss:autograd.backward(scaled loss)scaled loss.backward()More details: sion41

U-Net AMP performance boostTraining performance (17% boost)# GPUsPrecisionTraining (Imgs/sec)Training TimeSpeedup1FP32897m441.001Automatic Mixed Precision (AMP)1046m401.17Inference performance (30% boost)# GPUsPrecisionTraining (Imgs/sec)Speedup1FP322281.001Automatic Mixed Precision Examples/blob/master/TensorFlow/Segmentation/UNet A CONFIDENTIAL. DO NOT DISTRIBUTE.Courtesy of Jonathan Dekhtiar, Alex Fit-Flora at Nvidia42

GPU-ACCELERATEDINFERENCING43

Defect classification workflowRapid prototyping for production with NGCPreTrainingTrainingTensorflow: NGC optimized docker image1. NGC TensorFlowV100Used in industrial inspection white paperDGX-1VInferenceTF-TRT / TensorRT1. NGC TensorFlow2. NGC TensorRTDGX-1 / 2V100T4

TensorRT workflowONNXCalled UFF, Universal Framework Format45

TensorRT IntegratedWith TensorFlowSpeed Up TensorFlow Inference WithTensorRT OptimizationsSpeed up TensorFlow model inference withTensorRT with new TensorFlow APIsSimple API to use TensorRT within TensorFloweasilySub-graph optimization with fallback offersflexibility of TensorFlow and optimizations ofTensorRTOptimizations for FP32, FP16 and INT8 with useof Tensor Cores automaticallydeveloper.nvidia.com/tensorrt# Apply TensorRT optimizationstrt graph trt.create inference graph(frozen graph def,output node name,max batch size batch size,max workspace size bytes workspace size,precision mode precision)# INT8 specific graph conversiontrt graph trt.calib graph to infer graph(calibGraph)Available from low46

V100/TRT4 Inference Results on U-netTF-TRT for fast prototyping, TRT for maximum performance8.6x speed-up by native TRT (FP16 precision)Inference methodFP 32 bitFP 16 bit*GPU-TFTF-TRTTRTimages/sec141.8236.11079.8perf. Increase11.77.6images/secN/A297.41219.7perf. Increase12.18.6FP 16 bit*: by mixed precision TensorCore in V100 GPU47

TESLA T4WORLD’S MOST ADVANCED SCALE-OUT GPU320 Turing Tensor Cores2,560 CUDA Cores65 FP16 TFLOPS 130 INT8 TOPS 260 INT4 TOPS16GB 320GB/s70 WDeep Learning Training & InferenceHPC WorkloadsVideo TranscodeRemote Graphics48

TensorRT 5 & TensorRT inference serverTuring SupportWorld’s Most Advanced InferenceAcceleratorUp to 40x faster perf. on Turing Tensor Cores Optimizations & APIs Inference ServerNew optimizations & flexible INT8APIsNew INT8 workflows, Win & CentOS supportTensorRT inference serverMaximize GPU utilization, run multiple modelson a nodeFree download to members of NVIDIA Developer Program atdeveloper.nvidia.com/tensorrt49

T4/TRT5 Inference Results on U-netTF-TRT for fast prototyping, TRT for maximum performance23.5x speed-up by native TRT (INT 8 precision)50

SUMMARYChallengesDeliversTraining , inference environment is hard to build, maintain, share.Using NGC Docker images.Model optimizations and speed up throughput.TF-TRT or TensorRTSo many deep learning model out there, how to choose the rightmodel?If your dataset, demand requirement fit the scenariolike we do. U-Net model is great choice forsegmentation task.Inference Service Architect hard to developNGC ready TRTIS and open sourced, easy set up51

ThankYou52

Appendix53

TensorRT INTEGRATED WITH TensorFlowTRT4: Delivers 8x Faster Inference Available in TensorFlow1.7 *CPU(FP32)AI ResearchersData V100(FP32)V100 Tensor Cores(TensorRT)* Min CPU latency measured was 83 ms. It is not 7 ms.CPU: Skylake Gold 6140, 2.5GHz, Ubuntu 16.04; 18 CPU threads.Volta V100 SXM; CUDA (384.111; v9.0.176);Batch size: CPU 1, TF GPU 2, TF-TRT 16 w/ latency 6ms54

INFERENCE SERVER ARCHITECTUREAvailable with Monthly UpdatesPython/C Client LibraryModels supported TensorFlow GraphDef/SavedModelTensorFlow and TensorRT GraphDefTensorRT PlansCaffe2 NetDef (ONNX import)Multi-GPU supportConcurrent model executionServer HTTP REST API/gRPCPython/C client libraries55

TESLA PRODUCT FAMILYTESLA V100 (Scale-up)TESLA T4 (Scale-out)SupercomputingDL Training & InferenceMachine LearningVideo GraphicsDL Inference &TrainingMachine LearningVideo GraphicsHGX-2 BaseboardV100 SXM2V100 PCIeT4 PCIe16 V100 NVSwitchwith NVLINK2 slotLow Profile56HGX-2: V100 & NVSwitch heat sink included but not shown

NEW TURING TENSOR COREMULTI-PRECISION FOR AI INFERENCE & SCALE-OUT TRAINING65 TFLOPS FP16 130 TeraOPS INT8 260 TeraOPS INT457

TensorRT 5 SupportsTuring GPUsFastest Inference Using Mixed Precision(FP32, FP16, INT8) and Turing TensorCoresSpeed up recommender, speech, video andtranslation in productionOptimized kernels for mixed precision (FP32, FP16,INT8) workloads on Turing GPUsUp to 40x faster inference for apps vs CPU-onlyplatformsMPS maximizes utilization with multiple separateinference processesdeveloper.nvidia.com/tensorrt58

1 16 16 16 32 32 32 6464 64 128 256 128 256 256 256 2 256 2 256 2 128 2128 2 64 2 64 2 64 2 32 2 32 2 128 2 128 64 128 64 2 64 2 64 2 64 512 2 . 1.0 means perfect score. . DGX Server

Related Documents:

25 Weather station defect Output 1,002 C R T 26 Block Input 1,002 C S 27 Wind sensor 1 defect Output 1.002 C R T 28 Wind sensor 2 defect Output 1.002 C R T 29 Wind sensor 3 defect Output 1.002 C R T 30 Wind sensor 4 defect Output 1.002 C R T 31 Wind direction defect Output 1.002 C R T 32 R

theoretical model and results of the experimental study have been compared to a single defect on the outer ring of bearing. The results of the theoretical model show good agreement with experimental vibration response. 2. Defect Frequencies The defect present in the bearing element tends to increase vibration energy at defect frequency.

Defect Analysis and Prevention Defect Analysis is the process of analyzing a defect to determine its root cause. Defect Prevention is the process of addressing root

Inspection lot size determined per ANSI specification ASQC Z1.4-2003. General Inspection Level as below table unless otherwise specified. If batch size 50 units 100% inspection AQL Level Critical Major Minor General 1 0.065 1 4 New supplier / First shipment / Special cause 2 0.065 1 4 3 DEFECT CRITERIA Critical Defect:

3 - Defect detection in uniform thickness skin 1a,1b,1c,3a,3b,3c 4 - Defect detection in tapered skin 1a,1b,1c 5 - Inspection of bonded substructure 1a,1b,1c 6 - Inspection of co-cured substructure 2a,2b 7 - Defect detection around other aircraft elements 2a,2b PA Exercises Panels General C-Scan Inspection Procedure All panels 1 - PA .

Pigging: What, why and when? Pipeline inspection by MFL tools Pipeline inspection by UT tools ILI tool selection. Magnetic flux leakage (MFL) tools 16" PII MFL tool. Principle of operation Brush Magnet Sensor Brush Magnet Brush Magnet Sensor Brush Magnet No defect present Defect Defect present Leakage of field N N S S

Misrun defect is a kind of incomplete casting defect, which causes the casting uncompleted. The edge of defect is round and smooth. When the metal is unable to fill the mould cavity completely and thus leaving unfilled portion called misrun. A cold shunt is called when two metal

Bugzilla, JIRA, and Redmine. For an up-to-date list of supported defect trackers, consult the Perforce website. After you install a plug-in, be sure to read the product-specific documentation that is installed in the p4dtg/doc/ directory. Important Defect replication is one-to-one, meaning that for each new defect, one Helix Server job is created.