Big Data Welcome - PSC

2y ago
84 Views
2 Downloads
986.21 KB
21 Pages
Last View : Today
Last Download : 1y ago
Upload by : Gia Hauser
Transcription

Welcome to the XSEDEBig Data WorkshopJohn UrbanicParallel Computing ScientistPittsburgh Supercomputing CenterCopyright 2021

Who are we?Your hosts:Pittsburgh Supercomputing Center

Who am I?John UrbanicParallel Computing ScientistPittsburgh Supercomputing CenterWhat I mostly do:Parallelize codes with MPI, OpenMP, OpenACC, HybridBig Data, Machine Learning

XSEDE HPC Monthly Workshop ScheduleJanuary 21February 19-20March 3April 7-8May 5-6June 2-5August 4-5September 1-2October 6-7November 3December 1-2January 12February 2-3March 2April 5-6May 4-5June 1-4HPC Monthly Workshop: OpenMPHPC Monthly Workshop: Big DataHPC Monthly Workshop: OpenACCHPC Monthly Workshop: Big DataHPC Monthly Workshop: MPISummer Boot CampHPC Monthly Workshop: Big DataHPC Monthly Workshop: MPIHPC Monthly Workshop: Big DataHPC Monthly Workshop: OpenMPHPC Monthly Workshop: Big DataHPC Monthly Workshop: OpenMPHPC Monthly Workshop: Big DataHPC Monthly Workshop: OpenACCHPC Monthly Workshop: Big DataHPC Monthly Workshop: MPISummer Boot Camp

HPC Monthly Workshop Philosophyo Workshops as long as they should be.o You have real lives in different time zones that don’t come to a halt.o Learning is a social processo This is not a MOOCo But this is also not the Wide Area Classroom.so bear with us.

ary 2WelcomeA Brief History of Big DataIntro to SparkLunch BreakMore Spark and ExercisesIntro To Machine LearningAdjournWednesday, February 311:00Machine Learning: A Recommender System1:00Lunch break2:00Deep Learning with TensorFlow5:00The Big Picture5:30Adjourn6

We do this all the time, but.o This is a very ambitious agenda.o We are going to cover the guts of a semester course.o We may get a little casual with the agenda.o The reasons we can attempt this now:o Tools have reached the point (Spark and TF) whereyou can do some powerful things at a high level.o Worked last time. Feedback is very positive.

Biggest Potential For Disappointmento We absolutely, definitely, without question, wish we had morehands-on exercise time.o This is by design and demand. The topics we cover are all greatlyrequested and attempts to delete any of them provoke outrage inour surveys. This demand has compressed our hands-on sessions.o One solution is for you to use the remainder of our short days to dofurther work.o We also assume you will use your extended access to do exercises.Usually this is just a bonus.o Use your time wisely, and ask questions relentlessly.

ResourcesThe YouTube Channel Has Arrived!Our TAsQuestions from the audienceOn-line talks (no "pop ups")bit.ly/XSEDEWorkshopCopying code from PDFs is veryerror prone. Subtle things likesubstituting “-” for “-” aremaddening. I have provided onlinecopies of the codes in a directorythat we shall shortly visit. I stronglysuggest you copy from there if youare in a cut/paste mood.Due to overwhelming demand, and a lot ofediting, we have begun to post workshopvideos on the XSEDE Monthly WorkshopTraining Channel:XSEDETrainingThey will be incrementally appearing in thecoming months. Subscribe and give usfeedback.

Getting Time on seven -grant

Code of ConductXSEDE has an external code of conduct which represents our commitment to providing an inclusiveand harassment-free environment in all interactions regardless of race, age, ethnicity, nationalorigin, language, gender, gender identity, sexual orientation, disability, physical appearance,political views, military service, health status, or religion. The code of conduct extends to all XSEDEsponsored events, services, and interactions.Code of Conduct: https://www.xsede.org/codeofconductContact:Event organizer: Tom Maiden (tmaiden@psc.edu)XSEDE ombudspersons:Linda Akli, Southeastern Universities Research Association (akli@sura.org)Lizanne Destefano, Georgia Tech (lizanne.destefano@ceismc.gatech.edu)Ken Hackworth, Pittsburgh Supercomputing Center (hackworth@psc.edu)Bryan Snead, Texas Advanced Computing Center (jbsnead@tacc.utexas.edu)Anonymous reporting form available at https://www.xsede.org/codeofconduct.

Terminology StatementIn line with XSEDE’s Code of Conduct, XSEDE is committed to providing training events that fosterinclusion and show respect for all. This commitment applies not only to how we interact during theevent, it also applies to the training materials and presentation. It is not XSEDE’s position to use,condone, or promote offensive terminology.XSEDE instructors strive to keep inclusive language at the forefront. In the event that we haveincluded inappropriate materials, verbal or written, please let us know at terminology@xsede.orgWhile XSEDE has no control over external third-party documentation, we are taking steps to effectchange by contacting the relevant organizations; we hope this will be addressed by all third-partiessoon.If you see any terminology concerns in the following presentation or slides, we want to know!Please contact the Terminology Task Force: terminology@xsede.org

Check your email for the post-event survey.Surveys are conducted by an external evaluation team. XSEDE staff will not knowwho said what. If you have questions regarding the evaluation please contact:Lorna Rivera, lorna.rivera@gatech.edu, or Lizanne DeStefano,ldestefano6@gatech.edu

20 Storage Building Blocks,implementing the parallel Pylonstorage system (10 PB usable)Project &communitydatasets4 HPE IntegritySuperdome X (12TB)compute nodes each with 2 gateway nodes4 MDS nodes2 front-end nodes2 boot nodes8 management nodes6 “core” Intel OPA edge switches:fully interconnected,2 links per switchRobust paths toparallel storageIntel OPA cables42 HPE ProLiant DL580 (3TB)compute nodes12 HPE ProLiant DL380database nodes6 HPE ProLiant DL360web server nodesDistributed training, Spark, etc.32 HPE Apollo 2000 (128GB) GPU nodeswith 2 NVIDIA Tesla P100 GPUs each32 RSM nodes, each with2 NVIDIA Tesla P100 GPUsML, inferencing, DL development,Spark, HPC AI (Libratus)14User interfaces forAIaaS, BDaaS20 “leaf” Intel OPA edge switchesPurpose-built Intel Omni-PathArchitecture topology fordata-intensive HPC748 HPE Apollo 2000 (128GB)compute nodesLargememoryJava &PythonMaximum-Scale Deep Learning16 RSM nodes, each with 2 NVIDIA Tesla K80 GPUs16 HPE Apollo 2000 (128GB) GPU nodeswith 2 NVIDIA Tesla K80 GPUs eachSimulation (including AI-enabled)DeepLearningNVIDIA DGX-2 and9 HPE Apollo 6500Gen10 nodes:88 NVIDIA TeslaV100 GPUsBridges Virtual Tour:https://psc.edu/bvt

Bridges HardwareTypeRAM#CPU / GPU / SSD12 TBb216 Intel Xeon E7-8880 v3 (18c, 2.3/3.1 GHz, 45MB LLC)12 TBc216 Intel Xeon E7-8880 v4 (22c, 2.2/3.3 GHz, 55MB LLC)3 TBb84 Intel Xeon E7-8860 v3 (16c, 2.2/3.2 GHz, 40 MB LLC)3 TBc344 Intel Xeon E7-8870 v4 (20c, 2.1/3.0 GHz, 50 MB LLC)128 GBb7522 Intel Xeon E5-2695 v3 (14c, 2.3/3.3 GHz, 35MB LLC)128 GBb162 Intel Xeon E5-2695 v3 2 NVIDIA Tesla K80128 GBc322 Intel Xeon E5-2683 v4 (16c, 2.1/3.0 GHz, 40MB LLC) 2 NVIDIA Tesla P100GPU-AI161.5 TBd116 NVIDIA V100 32GB SXM2 2 Intel Xeon Platinum 8168 8 3.84 TB NVMe SSDsNVIDIA DGX-2 delivered by HPEGPU-A8192 GBd92 Intel Xeon Gold 6148 2 3.84 TB NVMe SSDsHPE Apollo 6500 Gen10DB-s128 GBb62 Intel Xeon E5-2695 v3 SSDHPE ProLiant DL360DB-h128 GBb62 Intel Xeon E5-2695 v3 HDDsHPE ProLiant DL380Web128 GBb62 Intel Xeon E5-2695 v3HPE ProLiant DL360Othera128 GBb162 Intel Xeon E5-2695 v3HPE ProLiant DL360, DL38064 GBb42 Intel Xeon E5-2683 v3 (14c, 2.0/3.0 GHz, 35MB LLC)64 GBc42 Intel Xeon E5-2683 v396 GBd22 Intel Xeon128 GBb52 Intel Xeon E5-2680 v3 (12c, 2.5/3.3 GHz, 30 MB LLC)256 GBc152 Intel Xeon E5-2680 v4 (14c, 2.4/3.3 GHz, 35 MB LLC)286.5 a.b.c.d.Other nodes front end (2) management/log (8) boot (4) MDS (4)DDR4-2133DDR4-2400DDR4-2666ServerHPE Integrity Superdome XHPE ProLiant DL580HPE Apollo 2000HPE ProLiant DL380Supermicro X10DRi

Getting ConnectedThe first time you use your account sheet, you must go to apr.psc.edu to set a password. You may already have done so, ifnot, we will take a minute to do this shortly.We will be working on bridges.psc.edu. Use an ssh client (a Putty terminal, for example), to ssh to the machine.At this point will be on a login node. It will have a name like “login001” or “login006”. This is a fine place to edit andcompile codes. However we must be on compute nodes to do actual computing. We have designed Bridges to be theworld’s most interactive supercomputer. We generally only require you to use the batch system when you want to.Otherwise, you get your own personal piece of the machine. For this workshop we will useinteractto get a regular node of the type we will be using with Spark. You will then see name like “r251” on the command line tolet you know you are on a regular node. Likewise, to get a GPU node, useinteract –gpuThis will be for our TensorFlow work tomorrow. You will then see a prompt like “gpu32”.Some of you may follow along in real time as I explain things; some of you may wait until exercise time, and some of youmay really not get into the exercises until after we wrap up tomorrow. It is all good.

ModulesWe have hundreds of packages on Bridges. They each have many paths and variables that need to be set for their ownproper environment, and they are often conflicting. We shield you from this with the wonderful modules command orcontainers. You can load the two packages we will be using asSparkmodule load sparkTensorflowmodule load singularitysingularity shell --nv /pylon5/containers/ngc/tensorflow 20.02-tf2-py3.sif

EditorsFor editors, we have several options:emacsvinano: use this if you aren’t familiar with the othersFor this workshop, you can actually get by just working from the variouscommand lines.

Programming LanguageWarning! Warning!o We have to pick somethingo Pick best domain languageo Pythono But not “Pythonic”Several of the packages we are using arevery prone to throw warnings about theJVM or some python dependency.We’ve stamped most of them out, but don’tpanic if a warning pops up here or there.In our other workshops we would nottolerate so much as a compiler warning,but this is the nature of these softwarestacks, so consider it good experience.o I try to write generic pseudo-codeo If you know Java or C or R, etc. you should be fine.

Our Setup For This WorkshopAfter you copy the files from the training directory, you will eareDatasets, andalso cut andpaste codesamples are inhere.

PreliminaryExerciseLet’s get the boring stuff out of the way now.Log on to apr.psc.edu and set an initial password if you have not.Log on to Bridges.ssh username@bridges.psc.eduCopy the Big Data exercise directory from the training directory to your home directory.cp-r training/BigData .(note the ".", it is important)Edit a file to make sure you can do so. Use emacs, vi or nano (if the first two don’t sound familiar).Start an interactive session.interact

XSEDE HPC Monthly Workshop Schedule January 21 HPC Monthly Workshop: OpenMP February 19-20 HPC Monthly Workshop: Big Data March 3 HPC Monthly Workshop: OpenACC April 7-8 HPC Monthly Workshop: Big Data May 5-6 HPC Monthly Workshop: MPI June 2-5 Summer Boot Camp August 4-5 HPC Monthly Workshop: Big Data September 1-2 HPC Monthly Workshop: MPI October 6-7 HPC Monthly Workshop: Big Data

Related Documents:

ISO TS 5083 ‒SAFETY FOR AUTOMATED DRIVING SYSTEMS. PRINCIPALS. safetronic 2021 ISO TS 5083 Web Meeting 17-Nov-2021 7 PSC-01 Cybersecurity Automated Driving System Human Factors Automated Vehicle and Related Aspects PSC-05 Safe Operation PSC-08 Operational Design Domain PSC-10 Driver Initiated Takeover PSC-11 Vehicle Initiated Takeover .

PH PSC 236 Healthcare & the Law Prof. McNulty 1 C:\Users\mmcnult4\Dropbox\Healthcare Law\Syllabus and Course Plan\2019\PH PSC 236 Syllabus Healthcare and Law 2019.docx PH PSC 236 HEALTHCARE1 AND THE LAW v. 14 November 2019 Last printed 1/9/2020 2:23 PM

Rated HP CFM dB(A) Body (W H D) inch Packing(W H D) inch Net/Gross weight lbs Liquid side/ Gas side inch 11.9 17.5 18.7 21.9 20 30 30 35 18000 23200 29000 36000 14.0 14 14 14 18000 22200 29800 34600 7.8 7.8 8.3 8.3 Scroll Scroll Scroll Scroll PSC PSC PSC PSC 1075 1075 825 825 1/12 1/12 1/6 1/6 1581 1581 2

hp psc 2400 series chapter 1 2 hp psc overview the hp psc 2400 series at a glance feature purpose 1Lid 2 Color graphics display 3 Front panel 4 Memory card slots 5 Print-carriage access door y a r t t u p n 6I 7 Paper-length guide y a r t t u p t u 8O 9 Paper-width guide 10 Glass 11 Po

either the HP PSC 1100/1200 Series CD-ROM or the HP PSC 1100 Series or HP PSC 1200 Series program folder. The Readme file contains late-breaking information that does not appear in the Reference Guide or the online help. To access the Readme file, do the following: For Windows: go to the Windows taskbar, click Start, point to

The Rise of Big Data Options 25 Beyond Hadoop 27 With Choice Come Decisions 28 ftoc 23 October 2012; 12:36:54 v. . Gauging Success 35 Chapter 5 Big Data Sources.37 Hunting for Data 38 Setting the Goal 39 Big Data Sources Growing 40 Diving Deeper into Big Data Sources 42 A Wealth of Public Information 43 Getting Started with Big Data .

big data systems raise great challenges in big data bench-marking. Considering the broad use of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads, which is the prerequisite for evaluating big data systems and architecture. Most of the state-of-the-art big data benchmarking efforts target e-

of general rough paths. However, in this paper, we will focus on the case where the driving signal is of bounded variation. Following [6] we interpret the whole collection of iterated integrals as a single algebraic object, known as the signature, living in the algebra of formal tensor series. This representation exposes the natural algebraic structure on the signatures of paths induced by the .