A New Direct Connected Component Labeling And Analysis . - Nvidia

1y ago

4 Views

2 Downloads

2.36 MB

29 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Pierre Damon

Report this link

Download PDF

Transcription

A new Direct Connected Component Labelingand Analysis Algorithm for GPUsArthur Hennequin1,2 , Lionel Lacassagne1LIP6, Sorbonne University, CNRS, FranceLHCb experiment, CERN, Switzerland 21GTC 2019 March 21st1 / 29

What are Connected Component Labeling and Analysis ?Connected Components Labeling (CCL) consists in assigning a unique number(label) to each connected component of a binary imageConnected Components Analysis (CCA) consists in computing some featuresassociated to each connected component like the bounding box [xmin ,xmax ] x[ymin ,ymax ], the sum of pixels S, the sums of x and y coordinates Sx, Sy1gray level imagebinary level image(segmentation by(motion detection)2connected componentlabelingconnected componentanalysis seems easy for a human being that has a global view of the image but, ill-posed problem: the computer has only a local view around a pixel(neighborhood) important in computer vision for pattern recognition, motion detection .2 / 29

Two classes of CCL algorithms multi-pass iterative algorithmsIIIcompute the local positive min over a 3 3 neighborhooduntil stabilization : the number of iterations depends on the datanot predictable, nor suited for embedded systems two-pass direct algorithmsIIIIfirst pass temporary label creation and equivalence buildingneed an equivalence table to memorize the connectivity between labelsthen transitive closure of the tree associated to the equivalence tablesecond pass label relabeling on CPU, scalar algorithms are all direct and can be parallelized on SIMD CPU, until 2019, all SIMD algorithms are iterative, except 1 on GPU, until 2018, all algorithms are iterative, except 3Why so few direct algorithms on GPU and SIMD ? because extremely complex to design (not suited for SIMD nor GPU)3 / 29

Direct algorithms are based on Union-Find structureAlgorithm 1: Rosenfeld labeling algorithmAlgorithm 2: Find(e,T )for i 0 : h 1 dofor j 0 : w 1 doif I [i][j] 6 0 thene1 E [i 1][j]e2 E [i][j 1]if (e1 e2 0) thenne ne 1ex neelser1 Find(e1 , T )r2 Find(e2 , T )ex min (r1 , r2 )if (r1 6 0 and r1 6 ex ) then T [r1 ] exif (r2 6 0 and r2 6 ex ) then T [r2 ] exwhile T [e] 6 e doe T [e]elseex 0E [i][j] exreturn e // the root of the treeAlgorithm 3: Union(e1 ,e2 ,T )r1 Find(e1 , T )r2 Find(e2 , T )if (r1 r2 ) thenT [r2 ] r1elseT [r1 ] r2Algorithm 4: Transitive Closurefor i 0 : ne doT [e] T [T [e]]Parallel algorithms do: sparse addressing scatter/gather SIMD instructions (AVX512/SVE) concurrent min computation recursive atomic min instruction (CUDA)4 / 29

Classic direct algorithm: Rosenfeld (1966)Rosenfeld algorithm is the first 2-pass algorithm with an equivalence table when two labels belong to the same component, an equivalence is created andstored into the equivalence table T for example, there is an equivalence between 2 and 3 (stair pattern) and between 4and 2 (concavity pattern) stair and concavity are the only two patterns generator of equivalence here, background in gray and foreground in white11111000101100010111011111111111binary image of pixels1111100010440004032203222222image of labelse 0 1 2 3 4T[e] 0 1 2 2 2equivalence table2222predecessorpixelsp1predecessorlabelse1p2 pxe2 excurrent pixelcurrent labelimage of pixelsimage of labels111110001022000202220222222212 ex121 1 exstairconcavitypatterns generatorof equivalence2222image of labels after relabeling3124equivalence trees5 / 29

Parallel State-of-the-art Parallel Light Speed Labeling[1](L. Cabaret, L. Lacassagne, D. Etiemble) (2018)IIIparallel algorithm for CPUbased on RLE (Run Length Encoding) to speed up processing and savesmemory accessescurrent fastest CCA algorithm on CPU Distanceless Label Propagation[2](L. Cabaret, L. Lacassagne, D. Etiemble) (2018)Idirect CCL algorithm for GPU Playne-Equivalence[3](D. P. Playne, K.A. Hawick) (2018)IIdirect CCL algorithm for GPU (2D and 3D versions)based on the analysis of local pixels configuration to avoid unnecessary andcostly atomic operations to save memory accesses.6 / 29

Equivalence merge function & concurrency issueThe direct CCL algorithms rely on Union-Find to manage equivalences.A parallel merge operation can lead to concurrency issues:314414314411443411444142444212341244 1st example (top-left): no concurrency, T[3] 1, T[4] 1 2nd example (top-right): no concurrency, T[3] 1, T[4] 2 3rd example (bottom-left): non-problematic concurrency, T[4] 1, T[4] 1 4th example (bottom-right): concurrency issue, T[4] 1, T[4] 2II4 can’t be equal to 1 and 2 4 has to point to 1 and 2 has to point to 1 too.7 / 29

Equivalence merge function (aka recursive Union)The merge function, introduced by Playne and Hawick, solves the concurrencyissues by iteratively merging labels using atomic operationsAlgorithm 5: merge(L, e1 , e2 )while e1 6 e2 and e1 6 L[e1 ] doe1 L[e1 ]// root of e1while e1 6 e2 and e2 6 L[e2 ] doe2 L[e2 ]// root of e2while e1 6 e2 doif e1 e2 then swap(e1 , e2 )e3 atomicMin(L[e1 ], e2 )if e3 e1 then e1 e2else e1 e3// recursive minBy definition, e3 L[e1 ], so: if e3 e1 : no concurrent write, update of L is successful, terminates the loop if e3 e1 : concurrent write, L was updated by another thread, need tomerge e3 and e28 / 29

Hardware Accelerated algorithm : HA4Analysis of state-of-the-art weaknesses: vertical borders (non-coalescent memory accesses) expensive atomic operationsAnalysis of state-of-the-art strengths: equivalence table embedded in the image (Cabaret, Playne) merge function (Komura [4] Playne) segments labeling (Light Speed Labeling) necessary condition to merge two equivalence trees (Playne)Figure 1: All possible 4 pixels configurations. Only (f) need to merge labels. (Playne)9 / 29

Hardware Accelerated: HA4The algorithm is divided into 3 kernels: strip labeling: the image is split intohorizontal strips of 4 rows. Each strip isprocessed by a block of 32 4 threads(one warp per row). Only the head ofsegment is labeled border merging: to merge the labels onthe horizontal borders between strips relabeling / features computation: topropagate the label of each segment tothe pixels or to compute the featuresassociated to the connected components10 / 29

Example – Strip labeling initialization (Step #0)The 8 8 image is divided into 2 strips of 8 4 pixels, warp size 8Initial strip labeling: only the head of each segment (start node)is labeled with an unique label equal to its linear address: L[k] k with k y width x warning: label numbering starts at 0, not 100 01 2 3 41 8183 24260 32342 483 567122 161 405 662043475462(a) Initialization11 / 29

Example – Strip labeling (Step #1)After initialization: detection of merging nodes using necessary conditions in each thread update of start nodes onlyStrips’ segments are now labeled00 01 2 3 41 85 66701 2 3 4121 020182 812263 16180 32340 32323 5643471 32542 40623 48(b) Strip labeling063 242 48762 161 405 60 01234343286403416124843201856475447542662(c) Strip labeledHere, a CC spanning over several strips is represented by 3 disjoint trees of labels12 / 29

Example – Border merging (Step #2)Same merging operations on border nodes only. All the segments are correctlylabeled. A CC spanning to several strips is represented by 1 tree.01 2 3 40 05 600 0761 02 8123 16180 32321 3261 0122 8123 16180 03234472 403 48543 48(d) Border merging01234344754086403416124843267(e) Border merged32185 6061 32342 40201 2 3 45647546232864034161248432018265647546213 / 29

Example – Re-Labeling / Analysis (Step #3)In the final step only, each start node (blue) flattens its equivalence tree to Label the image: broadcast the label to the whole segment to Analyse the image: accumulate features into global memory using atomicsexample of features associated to segment [x0 , x1 [ at line y :I00 0S x1 x0 ,1 2 3 40 01 0Sy S y0 ,0Sx 1 2 3 40 05 60700 000001 000002 0012[x1 (x1 1) (x0 (x0 1)]5 6070000000002 003 00000003 00000000 00000000 00000000001 0001 002 00000002 00000003 00000003 0000000FindRoot5662685412160484718204340 32 3426Relabeling14 / 29

Implementation details: Grid-stride loop first weakness of previous GPU algorithms is the vertical border merging: thenon-coalescent memory accesses are slower we used the grid-stride loop [5] design pattern to divide the image in stripsinstead of tileskernel Classic(width)x blockDim.x blockIdx.x threadIdx.xif x width then// do stuff.kernel Grid stride loop(width)for x threadIdx.x to width by blockDim.x do// do stuff.Benefits: thread reuse: less thread creation. Helps to amortize the cost of threadcreation/destruction thread context is preserved: the loop ensures that pixels are processed in aspecific order and allows to reuse previously computed values15 / 29

Implementation details: horizontal data exchangeAll threads working on the same row are from the same warp, CUDA Warp-LevelPrimitives [6] can be used to directly exchange data from threads registers ballot sync primitive returns a 32-bit bitmask based on the value of aboolean within each thread (1 bit per thread) shfl sync primitive exchanges a 32-bit value between any pair of threads ina warp. Each thread specifies a thread ID to read and a value to share16 / 29

Implementation details: segments each thread needs to find its distance to the segment’s start node distance to the end is also needed for features computation bitwise operations can accelerate the computation of these distances (tx thread number)01 2 3 45 67pixels 01 1 1 00 10start distance0 1 20end distance3 2 111operator start distance(pixels, tx)return clz( (pixels (32 tx))) // clz Count Leading Zerosoperator end distance(pixels, tx)return ffs( (pixels (tx 1))) // ffs Find First Set17 / 29

Implementation details: vertical data exchange classic way of optimizing memory accesses: copying data from global toshared memory shared memory is divided in 32 banks: same bank memory accesses atdifferent addresses get serialized [7]txpixelsy01234567sharedmemorypixelsy-118 / 29

Implementation details: vertical data exchange for each row, we store the bitmasks of the 32 neighbor pixels in differentbanks store: no serialization, load: lsy-119 / 29

One final optimization. two pixels directly next to each other either belong to the same segment orhave a different color we can assign a thread two pixels instead of one. 32-bit 64-bit bitmask: modified distance operators. new version: HA464tx010121 0 1 031 041 0operator start distance64(pixels, tx)b get bit tx of pixelstxb tx breturn clzll( (pixels (64 txb)))operator end distance64(pixels, tx)b get bit tx of pixelstxb tx breturn ffsll( (pixels (txb 1)))20 / 29

Benchmark of CCL and CCA algorithms random 2048x2048 (2k) images of varying density (0% - 100%), granularity(1 - 16, granularity 4 close to natural image complexity) percolation threshold: transition from many smalls CCs to few larges CCsII8C: density 45%4C: density 64%21 / 29

Comparison of CCL algorithms on Jetson TX2 comparison with 2state-of-the-art algorithms[Playne, Cabaret] Cabaret and Playne losetime updating all thetemporary labels thanks to the use ofsegments, HA4’s processingtime decreases after thepercolation thresholdd 64%(a) Playne(b) Cabaret(c) HA432 (ccl)(d) HA464 (ccl) HA464 is 2 faster inaverage than Playne andCabaret CCL throughput: 1.2 Gpx/s(HA464 , 2k, g 4)22 / 29

Comparison of CCA algorithms on Jetson TX2 HA464 CCA: labeling kernel is replaced by on-the-fly analysis kernel other algorithms: features computation kernel after relabeling kernel 7 features: S, Sx, Sy, xmin , ymin , xmax , ymax 1.1 Gpx/s (HA464 , 2k, g 4)(a) Playne(b) Cabaret(a) HA432(b) HA46423 / 29

Performance of CCL on Jetson AGX & V100Latest results on Volta architecture: AGX: 4.6 Gpx/s (HA464 , 2k, g 4) V100: 27.0 Gpx/s (HA464 , 2k, g 4)(a) HA432 Jetson AGX(b) HA464 Jetson AGX(c) HA432 V100(d) HA464 V10024 / 29

Performance of CCA on Jetson AGX & V100Latest results on Volta architecture: AGX: 3.4 Gpx/s (HA464 , 2k, (S, Sx, Sy, xmin , ymin , xmax , ymax ), g 4) V100: 14.9 Gpx/s (HA464 , 2k, (S, Sx, Sy, Sx2 , Sy2 ), g 4)(a) HA432 Jetson AGX(b) HA464 Jetson AGX(c) HA432 V100(d) HA464 V10025 / 29

Observations for Jetson AGX & V100 strong scalability for CCL weak scalability for CCA (concurrent accesses in atomic operations) some features are faster to compute than others: the first statisticalmoments, computed with atomic addition, are faster than the boundingboxes computed with atomic min and max(a) HA464 (cca) V100 (S, Sx, Sy, xmin , ymin , xmax , ymax ) (b) HA464 (cca) V100 (S, Sx, Sy, Sx2 , Sy2 )26 / 29

Conclusion two new algorithms for 4-connectivity connected component processing onGPU:IICCL 2 faster than State-of-the-ArtCCA new on GPU introduced a new way to efficiently reduce the number of global memoryaccesses using segments, combined with low-level intrinsics HA464 ready for realtime embedded processing.I CCL throughput: 4.6 Gpix/s on AGX (1920x1080: 2208 fps) orI CCA throughput: 3.4 Gpix/s on AGX (1920x1080: 1615 fps) future works:I Design 8-connectivity versions on GPUsI Improve CCA by implementing different merging strategies Algorithm and benchmarks published at DASIP 2018 [8]27 / 29

Thank you!28 / 29

References IL. Cabaret, L. Lacassagne, and D. Etiemble, “Parallel Light Speed Labeling for connected componentanalysis on multi-core processors,” Journal of Real Time Image Processing, no. 15,1, pp. 173–196, 2018.L. Cabaret, L. Lacassagne, and D. Etiemble, “Distanceless label propagation: an efficient directconnected component labeling algorithm for GPUs,” in IEEE International Conference on ImageProcessing Theory, Tools and Applications (IPTA), pp. 1–8, 2017.D. P. Playne and K. Hawick, “A new algorithm for parallel connected-component labelling on GPUs,”IEEE Transactions on Parallel and Distributed Systems, 2018.Y. Komura, “Gpu-based cluster-labeling algorithm without the use of conventional iteration: applicationto swendsen-wang multi-cluster spin flip algorithm,” Computer Physics Communications, pp. 54–58,2015.M. .Y. Lin and V. Grover, el-primitives/,”2018.M. Harris, -cuda-cc/,” 2013.A. Hennequin, L. Lacassagne, L. Cabaret, and Q. Meunier, “A new Direct Connected ComponentLabeling and Analysis Algorithms for GPUs,” in DASIP, (Porto, Portugal), Oct. 2018.29 / 29

(label) to each connected component of a binary image Connected Components Analysis(CCA) consists in computing some features associated to each connected component like the bounding box [x min,x max] x [y min,y max], the sum of pixels S, the sums of x and y coordinates Sx, Sy gray level image binary level image (segmentation by (motion detection)

Related Documents:

Connected Component Workbench - BISON ACADEMY

Connected Component Workbench Connected Component Workbench is the software used to program Allen Bradley Mico-810 PLC's. Copies can be . Start Connected Components Wizard. You should get the following screen NDSU Connected Component Workbench ECE 461 JSG 1 rev August 13, 2016.

32 Views

1y ago

LSC COMMUNICATIONS PENSION PLAN

RR Donnelley Component R.R. Donnelley Printing Companies Component Haddon Component Banta Employees Component Banta Book Group Component Banta Danbury Component Banta Specialty Converting Component Moore Wallace Component (other than Cardinal Brands Benefit and Check Printers Benefit) Cardinal Brands Benefit of the Moore Wallace Component

36 Views

2y ago

TMS Component Pack Quick Start

Note that when a new component or component update has been released that is part of the TMS Component pack, there might be some delay before the full TMS Component pack update is available that contains this new component or component update. This is due to the far more extensive testing and build procedure for the complete TMS Component Pack.

15 Views

1y ago

Custom Components for Steel - freeit.free.fr

Since we have a custom component in the model we can open the Custom component editor. Edit custom 1. Select the User_end_plate component symbol. component 2. Right-click and select Edit custom component. The Custom component editor opens along with the Custom component editor toolbar, the Custom component browser and four views of the custom .

21 Views

1y ago

ECONOMIC IMPACT TRAINING SEPTEMBER 15, 2017 - Oklahoma

Example of Direct Airport Tenant Impacts Direct Employment Direct Payroll Direct Spending Direct Economic Activity 8 304,000 200,000 504,000 Direct Employment Direct Payroll Direct Spending Direct Economic Activity Tenant #1 2 70,000 51,000 121,000 Tenant #2 1 40,000 32,000 72,000 Tenant #3 5 194,000 117,000 311,000

51 Views

1y ago

Connect:Direct for UNIX Administration Guide

started guide. The Connect:Direct F ile Agent Help contains instruct ions for configuring File Agent. direct Connect:Direct for UNIX Administration Guide Connect:Direct for UNIX Administration Guide Connect:Direct for UNIX Administration Guide Connect:Direct for UNIX Administration Guide . Connect:Direct for UNIX Administration Guide

20 Views

9m ago

Modul - Fakultas Ekonomi dan Bisnis Islam

akuntansi musyarakah (sak no 106) Ayat tentang Musyarakah (Q.S. 39; 29) لًََّز ãَ åِاَ óِ îَخظَْ ó Þَْ ë Þٍجُزَِ ß ا äًَّ àَط لًَّجُرَ íَ åَ îظُِ Ûاَش

320 Views

2y ago

Alex Rider 3 - Skeleton Key

MI6 adventure, Alex Rider is recruited right off the soccer field to check out some suspicious goings-on at Wimbledon. This assignment catapults him into a series of life-threatening episodes, such as coming face to face with a great white shark, dodging bullets as he dives off a burning boat, and being tied to a conveyor belt that is moving toward the jaws of a gigantic grindstone in an .

101 Views

3y ago

Recent Views

Technological Revolutions and Stock Prices National Bureau of Economic .

stock prices up, but the discount rate eﬀect prevails eventually, pushing the stock prices down. The resulting pattern in the new economy stock prices looks like a bubble but it obtains under rational expectations through a general equilibrium eﬀect. The bubble-like pattern in stock prices arises in part due to an ex post selection bias.

1y ago

121 Views

Dynamic correlation between stock market and oil prices: The case of .

between stock market and oil prices is still growing. Nevertheless, there are very few studies on the dynamic correlation between these two markets. A first approach on the dynamic co-movements between oil prices and stock markets was performed by Ewing and Thomson (2007), using the cyclical components of oil prices and stock prices.

1y ago

122 Views

Prices Effective January 1, 2020 Machine Prices and Speci cations

Prices Effective January 1, 2020 Machine Prices and Speci cations Prices Effective January 1, 2020 ZERO TURN-4 SERIES REVISED MAY 18, 2020. Machine Prices and Secications Prices Eectie anuar , 2 ZT1. Prices F.O.B. Selma, Alabama and Subject to Change Without Notice. ESTATE SERIES

1y ago

119 Views

Vanguard U.S. Stock Index Small-Capitalization Funds .

† Stock market risk, which is the chance that stock prices overall will decline. Stock markets tend to move in cycles, with periods of rising prices and periods of falling prices.The Fund’s target index tracks a subset of the U.S. stock market, which could cause the Fund to perform di

2y ago

260 Views

Forecasting Prices on the Stock Exchange Using a Trading System

Forecasting prices in stock markets is a matter of great interest both in the academic field and in business. The forecasting of stock prices and stock returns is possible using various techniques and methods. Many researchers study price trends in stock markets with the help of artificial neural networks [1-2] or fuzzy-trends [3, 4]. The

1y ago

132 Views

A Hybrid Prediction Method for Stock Price Using LSTM and . - Hindawi

the relationship between stock prices and these factors. Although these factors will temporarily change the stock price, in essence, these factors will be reﬂected in the stock price and will not change the long-term trend of the stock price. erefore, stock prices can be predicted simply with historical data.

1y ago

159 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

Determine your Pricing Point Taxonomy to Help you Pricing Strategy: Using

Lowering of prices to match competitor prices can be done on a more precise level with Taxonomy. Price matching can be done based on the comparison of further classification via Taxonomy. Stock Availability: In stock/ Out of stock When both the competitor and us have the same product in stock, we ought to markdown to match prices when

1y ago

117 Views

COVID-19 and Energy COVID-19 and the Oil Price - Stock Market Nexus .

stock market. 1. Introduction Oil prices play a key role in stock market performance of oil-importing economies. A decline in oil prices reduces the cost of production and increases economic growth (Narayan et al., 2014). The effect of this is a rise in stock prices due to higher future earnings and dividends (Filis, 2010; Jones &

1y ago

125 Views

Relationship between Financial Ratios in the Stock Prices of .

projected financial ratios. Miri and Abraham (2010) linear and non-linear relationship between the ratio of stock prices in the financial and non-metallic minerals industry Tehran Stock Exchange for the years 2003 to 2007 were reviewed. The results showed that linear and nonlinear relationships between financial ratios and stock prices there is no

10m ago

99 Views

Stock prediction using a Hidden Markov Model versus a Long Short-Term .

the close price of a stock is used when training and predicting stock prices. All data was retrieved from Yahoo. Historical stock prices from 1 January 1990 until 1 June 2019 results in approximately 7000 data points (trading days) per stock. Data preparation In order to create enough data for the two models to

1y ago

121 Views

Learning from Peers Stock Prices and Corporate Investment - USI

stock prices then an increase in the -rm s own stock price informativeness reduces the sensitivity of its investment to its peer stock price (prediction 1). Indeed, as the signal conveyed by its own . stock price (prediction 2), but not otherwise. The same prediction holds for an increase in the correlation of the fundamentals of a -rm .

1y ago

128 Views

Do stock prices respond to changes in corporate income tax rates?

In the event studies, I regress stock returns on market returns and other factors over a time span well before the events of a tax change, creating a model of how the stock returns behave. Then I use the deviation of stock prices from the model's prediction around the events of the tax change to establish the stock's abnormal returns.

1y ago

117 Views

The Impact of Persian News on Stock Returns Through Text Mining Techniques

Persian news - on the stock prices has been neglected. Consequently, this study aimed to fill this gap. To this aim, the stock index values were collected from the Tehran Stock Exchange along with the . Stock market prediction is a way to understand the future fluctuations of a company's stock price (Jishag et al., 2020). Generally, two .

1y ago

225 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

A New Direct Connected Component Labeling And Analysis . - Nvidia

It looks like you're using an ad-blocker