A Novel Multi-CPU/GPU Collaborative Computing Framework For SGD-based .

1y ago

8 Views

1 Downloads

1.11 MB

24 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Brenna Zink

Report this link

Download PDF

Transcription

A Novel Multi-CPU/GPU Collaborative ComputingFramework for SGD-based Matrix FactorizationYizhi Huang*†, Yinyan Long†, Yan Liu*,Shuibing He‡, Yang Bai*, Renfa Li**†‡

Outline Background and Motivation Design and Implementation EvaluationINTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Background Matrix Factorization: can help recommender systems predicted user’s preferences toproducts. SGD-based MF5.04.54.01.05.0 2.03.51.53.55.03.5 𝑘 4.5Users’ratings ofitemsRating Matrix RUpdating feature matrix P ,Q by SGD21 ( p i , q j ) ( ri , j p i q j ) 1 P2 ( p i , q j )pi pi p iqj qj INTERNATIONALCONFERENCE ONPARALLELPROCESSING ( p i , q j )2 2 Q2Iteration𝑘User Matrix PItem Matrix QPredicted Rating Matrix 𝑹𝑷Each score 𝒓 will be used to update two kdimensional vectors 𝒑, 𝒒Need to accelerate SGD-based MF q j50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Observation: the Under-utilized CPUs Many computing nodes have multi-CPUs/GPUsGPUsGPUs Existing researches more willing to manage the GPUs forcomputing CPUs’ computing power is easily overlooked Is it possible to cooperate with the CPUs to accelerate SGDbased MF ?CPUsCooperatively acceleratingSDG-based MF?INTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

ObservationGood collaboration6242-2080S6242-2080Tesla V100RTX 2080SRTX 2080GPUCPUCPU GPU1.5921.745GPU6242-2080S32726242-20803272Tesla V1001.4991.932.21Intel Xeon Gold 62425.449Time Cost (s)CPU9000RTX 2080S699RTX 2080699Intel Xeon Gold 62422573Price ( ) The performance of high-end GPUs does not increase linearly with price Cooperative computing of CPU and GPU may bring a goodprice/performance ratioINTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

ChallengesUnbalanced load leads to short board effectBad collaborationGood collaboration6242-2080S(Unbalanced data )4.252 How to uniformly manage and transparentlyuse heterogeneous CPUsand GPUs ?6242-2080S(Bad communication)2.566 How to design appropriate data distribution?6242-2080S6242-2080 How to optimize communication inter-CPUs/GPUs?1.5921.745Time Cost (s)Heterogeneous𝑅𝑚 𝑛 𝑃𝑚 𝑘 𝑄𝑘 𝑛Naïve Communication Cost: 𝑚 𝑛 𝑘 𝑠𝑖𝑧𝑒𝑜𝑓(𝑓𝑙𝑜𝑎𝑡) 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 Τ𝐵𝑏𝑢𝑠Netflix: 𝑚 480190, 𝑛 17771, 𝑘 128, 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 20, 𝑐𝑜𝑠𝑡 0.4𝑠INTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Our solution: HCC-MFProblem 1How to transparentize heterogeneousCPUs and GPUsProblem 2How to distribute data to eachheterogeneous CPU/GPU to makethe whole system more efficient ?Problem 3How to optimize communicationInter-CPUs/GPUs ?INTERNATIONALCONFERENCE ONPARALLELPROCESSINGA general framework that unifies the abstractionand workflow- A time cost model for guiding data Distribution.- Two data partition strategies to deal withdifferent synchronization overhead conditionsCommunication optimization strategies thatreduce the amount of data transmission and usecomputation to overlap communication50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

HCC-MF Heterogeneous CPUs/GPUs are abstracted intoworker processesRating MatrixRow grid 0Q06P0Time Cost ModelASGDTaskRow grid 112DataManagerCOMMServerWorker 1SyncCOMM2push bufferTaskASGD6P1Q1DataManager4Q Use shared memory as a COMM channelbetween processes Server assigns data to workers, workersasynchronously calculate SGD-based MFPush7PGrid 0Worker 05Pullpull bufferpush bufferDataManagerCOMMData Partition Strategy3Grid 13 Workers: Pull - Computing - Push𝑝 Servers: Synchronization σ𝑖 1 𝑃𝑖 𝑄𝑖 /𝑝INTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Time Cost Model 𝑇 𝑚𝑎𝑥 𝑇𝑖 𝑇𝑠𝑦𝑛𝑐Omit performancerelated �𝑥 𝑇𝑖SimilarServerWorker 0Pull-PushPullWorker 1PullWorker 2PullWorker 3PullWorker 4PullINTERNATIONALCONFERENCE gComputingComputingCan sync be ignored ？PushPushPushPush50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Data partition for load balanceServerWorker 0Worker 1Worker 2Worker 3Worker 4ComputingComputingComputing𝑥𝑖 𝑛𝑛𝑧 16𝑘 42𝑘 𝑚 𝑛 𝐵𝑖𝐵𝑏𝑢𝑠 𝑖𝜃 x 𝑚𝑖𝑛 𝑚𝑎𝑥 𝐴𝑥 𝐵ComputingComputingServerAssuming 𝐵𝑖 is a constant function of 𝑥𝑖𝑎1 𝑥1 𝑏1 𝑎2 𝑥2 𝑏2 𝑎𝑛 𝑥𝑛 𝑏𝑛 , 𝜃 is the minimumWorker 0ComputingWorker 1ComputingWorker 2ComputingWorker 3ComputingWorker 4ComputingINTERNATIONALCONFERENCE ONPARALLELPROCESSING𝜃 x 𝑚𝑖𝑛 𝑇 min 𝑚𝑎𝑥𝑏1 𝑏2 𝑏𝑛Can DP0 really guaranteeload balance?DP0 : 𝑥𝑖 50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL11 𝑎σ𝑝𝑗 1 𝑖 σ𝑝 𝑇𝑖 𝑒𝑎𝑗𝑗 1 𝑇𝑗 𝑒

Data partition for load balancenot balanced The assumption of 𝐵𝑖 is not true 𝑥 The Runtime performance may not beignoredDifferential𝑖𝑓 𝑥 𝑖𝑠 𝑠𝑚𝑎𝑙𝑙, 𝑇 𝑐𝑎𝑛 𝑏𝑒 𝑟𝑒𝑔𝑎𝑟𝑑𝑒𝑑 𝑎𝑠 𝑙𝑖𝑛𝑒𝑎𝑟Few iterationsDP0INTERNATIONALCONFERENCE ONPARALLELPROCESSINGAlgorithm 1DP150th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Data partition: hiding synchronization𝑇 max𝑥𝑖 𝑛𝑛𝑧 16𝑘 42𝑘 𝑚 𝑛 𝐵𝑖𝐵𝑏𝑢𝑠𝑖 3𝑡𝑘(𝑚 𝑛)𝑡 is a nonlinear function of 𝑥𝐵𝑠𝑒𝑟𝑣𝑒𝑟Difficult to solve the objective functionUse DP1 to balance the computational overhead of each worker𝑇1 𝑇2 𝑇𝑛Use calculation to hide synchronization overheadcDP1-- DP2ServerWorker 0Worker 1Worker 2Worker 3Worker 4INTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL𝑇(𝑖 𝑛) 𝑇𝑖 𝑛𝑇𝑖 𝑠𝑦𝑛𝑐

Reduce data transmissionRows(columns) are independent of each other Transmitting Q matrix only 𝑘𝑘The data range of the rating matrix is limitedRating Matrix RUser Matrix PTransmitting FP16 DataINTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, ILItem Matrix Q

Overleap communicationMultiple Asynchronous computing-transmission streams in workerGPU: copy engineCPU: multithreads and free bandwidthSoC: copy engine in iGPUINTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Evaluation SetupItemHardwareDataSetBaselineContent2 Intel(R) Xeon(R) Gold 6242, Nvidia RTX 2080S, Nvidia Rtx 2080Netflix, Yahoo Music R1, R2, R1*, Movielens-20mFPSGD and cuMF SGD we implemented We do not change the core idea of the baseline algorithm in our implementation We optimized the code to make the baseline execute faster We use baseline as the kernel running on the workerINTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Overall performanceSame convergence rateFaster training speedINTERNATIONALCONFERENCE ONPARALLELPROCESSINGNetflixR150th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, ILR2

Data partition evaluationDP0 can only guarantee load balancing on similar processors-12.2%DP1 can guarantee load balance on all processors Netflix-4workers: -12.2% R2-4workers: -10%-10%DP2 can hide synchronization overhead R1*-4workers: -12.1%-12.1%INTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Communication optimizationWithout any communication optimization, the communication overheadwill offset the benefits brought by parallelismQ can achieve better optimization results, but the effectiveness dependson the shape of the rating matrixThe transmission performance of half-q is more than twice that of QINTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

ConclusionHCC-MF: A heterogeneous multi-CPU/GPU collaborative computing framework forSGD-based matrix factorization Unified workflow and transparent heterogeneous CPUs/GPUs usage Data distribution algorithm for different synchronization conditions Optimal inter-CPUs/GPUs communicationLimitation (Under study): Communication overhead can be further optimized Server bottleneckINTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

Thank youYizhi Huanghuangyizhi @hnu.edu.cnINTERNATIONALCONFERENCE ONPARALLELPROCESSING50th International Conference on Parallel Processing (ICPP)August 9-12, 2021 in Virtual Chicago, IL

50th International Conference on Parallel Processing (ICPP) August 9-12, 2021 in Virtual Chicago, IL Many computing nodes have multi-CPUs/GPUs Existing researches more willing to manage the GPUs for computing CPUs' computing power is easily overlooked Is it possible to cooperate with the CPUs to accelerate SGD-based MF ? CPUs GPUs .

Related Documents:

Porting Parallel Applications to Heterogeneous Supercomputers ... - Diens

Adaptive MPI multirail tuning for non-uniform input/output access. EuroMPI'10. CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU . F. Broquedis et al., HWLOC : A generic framework for managing hardware affinities in HPC applications. PDP '10. (2) D. Callahan, et al., Compiling Programs for Distributed Memory Multiprocessors.The .

13 Views

1y ago

Integrated CPU-GPU Power Management for 3D Mobile …

79 85 91 97 3 9 5 GPU r) U r (W) e) ex r A15 r rVR 4 U L2 Cache DRAM Cortex-A15 Quad CPU 0 CPU 1 CPU 2 CPU 3 L2 Cache PowerVR SGX544 GPU Cortex-A7 Quad CPU 0 CPU 1 CPU 2 CPU 3 Multi-layer BUS Figure 1: Exynos 5 Octa SoC simpliﬁed block diagram. However, 3D games are highly demanding of computational re-sources as well as memory bandwidth on .

27 Views

2y ago

S7-300 CPU 31xC and CPU 31x: Specifications

CPU 315-2 PN/DP 6ES7315-2EH13-0AB0 V2.6 CPU 317-2 DP 6ES7317-2AJ10-0AB0 V2.6 CPU 317-2 PN/DP 6ES7317-2EK13-0AB0 V2.6 CPU 319-3 PN/DP CPU 31x 6ES7318-3EL00-0AB0 V2.7 . SIMATIC S7-300 CPU 31xC and CPU 31x: Specifications CPU 31xC and CPU 31x: Specifications 4 Manual .

50 Views

2y ago

OpenCV on a GPU

OpenCV GPU header file Upload image from CPU to GPU memory Allocate a temp output image on the GPU Process images on the GPU Process images on the GPU Download image from GPU to CPU mem OpenCV CUDA example #include opencv2/opencv.hpp #include <

155 Views

2y ago

HPC Unit 6 CUDA Architecture - WordPress.com

CPU VS GPU A GPU is a processor with thousands of cores , ALUs and cache. S.N O CPU GPU 1. CPU stands for Central Processing Unit. While GPU stands for Graphics Processing Unit. 2. CPU consumes or needs more memory than GPU. While it consumes or requires less memor

19 Views

2y ago

1-3 Technology Impacts from the New Wave of Architectures for Media ...

1 mm 3 mm 5 mm 7 mm 9 mm 11 mm 13 mm 15 mm 17 mm AMDFSA Config Figure 6: CPU -- GPU Power Sharing While the CPU is the hot spot on the die, a 1W reduction in CPU power allows the GPU to consume an additional 1.6W before the lateral heat conduction from CPU to GPU heats the CPU enough to be the hot spot again. As the GPU

9 Views

1y ago

Introduction to GPU Computing

Introduction to GPU Computing . CPU GPU Add GPUs: Accelerate Science Applications . Small Changes, Big Speed-up Application Code GPU Use GPU to Parallelize CPU Compute-Intensive Functions Rest of Sequential CPU Code . 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming

35 Views

3y ago

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model ...

transplant a parallel approach from a single-GPU to a multi-GPU system. One major reason is the lacks of both program-ming models and well-established inter-GPU communication for a multi-GPU system. Although major GPU suppliers, such as NVIDIA and AMD, support multi-GPUs by establishing Scalable Link Interface (SLI) and Crossﬁre, respectively .

15 Views

1y ago

Recent Views

Cyber Security Guide for NZ Law Firms - WordPress

2 Incident Response Solutions Cyber Security Guide for NZ Law Firms Welcome to the Cyber Security Guide for NZ Law Firms The storage of sensitive client information and management of large funds make law firms an attractive target for cybercriminals. It is therefore critical for law firms to understand and mitigate the cyber risks they face.

1y ago

135 Views

New Prudential Regime for Investment Firms - Allen Overy

(iii) Investment firms - often referred to as 'Class 2 firms' - these are non-systemic investment firms that do not carry out dealing on own account or underwriting activities. This category of firms are subject to the full scope of the prudential regime is set out in the IFR and IFD. (iv) Small and non-interconnected investment firms -

1y ago

98 Views

The new EU prudential regime for investment firms

In any event, many bank and non-bank financial groups operating through investment firms in the UK have created new EU27 investment firms (or are scaling up existing EU27 investment firms) to serve EU27 clients as part of their Brexit planning. These firms will be subject to the new EU prudential regime. New Classification of Investment Firms

4m ago

48 Views

Actionable Intelligence: Successful Bi for Law Firms

Source: Gartner, Business Intelligence Imperative, 2001 ACTIONABLE INTELLIGENCE: SUCCESSFUL BI FOR LAW FIRMS - 3. A decade later, the fact gap remains a core issue. Law firms have more data than ever about . 1990 Mid-2000s 2015 A CONDENSED HISTORY OF BUSINESS INTELLIGENCE ACTIONABLE INTELLIGENCE: SUCCESSFUL BI FOR LAW FIRMS - 5.

1y ago

129 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Investment banks hedge funds private equity

investment banks, hedge funds, and private equity firms can use the book to broaden their understanding of their industry and competitors. Finally, professionals at law firms, accounting firms, and other firms that advise investment banks, hedge funds, and private equity firms should

2y ago

372 Views

2021 Report on the State of the Legal Market

1 Thomson Reuters Peer Monitor data are based on reported results from 162 U.S.-based law firms, including 45 Am Law 100 firms, 56 Am Law Second 100 Firms, and 61 additional Midsize firms. 2 Malcolm Gladwell, The Tipping Point

2y ago

136 Views

Cyber Security for Law Firms

Cyber Security and Legal Practice (Australia) Cyber security threats are increasing. 2019 Cyber Security Report - American Bar Association (ABA)(United States) Over a quarter of firms report that they have experienced some sort of security breach Less than a third of law firms have an incident response plan. 2019 PwC Law Firms' Survey

1y ago

130 Views

MARTINDALE-HUBBELL TOP RANKED LAW FIRMS METHODOLOGY TOP - Fee, Smith

view the entire list online at: fortune.com & law.com martindale-hubbell top ranked law firms methodology ranked firms law top page proof—for approval only presents leal leaders coming in 2015 featured in women leaders law in the 2015 for more information call: 855-808-4520 or e-mail legalleaders@alm.com page proof—for approval .

1y ago

92 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Growth Processes of High- Growth Firms in the UK - Nesta

Interest in high-growth ﬁrms (HGFs) has exploded in recent years, once the job-creating prowess of a minority of fast-growing ﬁrms became recognized - roughly 4% of ﬁrms can be expected to generate 50% of jobs (Storey, 1994, p. 117). Research into high-growth ﬁrms has itself undergone high-growth. However, the level of analysis has of-

1y ago

120 Views

Socio-economic profile Coastal and marine ecosystem and economy

According to the Philippine Plastics Industry Association, Inc. (PPIA), there are 1,088 firms throughout the Philippines. The majority of the plastics companies are situated in the National Capital Region (NCR) with 642 firms. This is followed by CALABARZON area with 176 firms. While Central Luzon registered 87 firms. Central Visayas have 87 firms.

1y ago

120 Views

A Novel Multi-CPU/GPU Collaborative Computing Framework For SGD-based .

It looks like you're using an ad-blocker