HPC Basics And Introduction To Anaducluster

2y ago
13 Views
2 Downloads
7.98 MB
50 Pages
Last View : 28d ago
Last Download : 3m ago
Upload by : Farrah Jaffe
Transcription

HPC basics and introduction toVijender SinghComputational Biology Coreanadu cluster

A non textbook introduction Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Development of models begins at small scale.Working on your laptop is convenient, simple.Actual analysis, however is slow.“Scaling up” typically means a small server or fast multicore desktop.Speed exists, but for very large models, not significant.Single machines don’t scale up forever.Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

For the larger problems/models, a different approach is requiredInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

High-performance computing (HPC)High-Performance computing involves mainly distinctcomputer processors working together on the sameproblem/calculation.Large problem/calculations are divided into smaller partsand distributed among the many computers.HPC is a cluster of quasi-independent computers which arecoordinated by a central scheduler.Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Typical HPC clusterSubmit/head/login nodeUSERSSGE:Sun Grid EnginePBS:Portable Batch SystemSLURM:Simple Linux Utility forResource ManagementSGEPBSSLURMCompute Nodes/ HOME/data storageInstitute for Systems Genomics:Computational Biology Core

HPC architecture :anadu (Queue/Partition info )Queue/Partition NodesUser 1Login a01-19xanadu20-28himem1-5xanadu29-31gpu /xeon/generalxanadu01User 25User. 3.xanadu-submit-extxanadu-submit-int1User Ncpu : 4RAM: 8GBcpu : 32 – 48RAM: 128 – 512 GB7

Performance comes at a price: Complexity Applications must be written specifically to take advantage ofdistributed computing Debugging becomes more of challengeInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

AccessinganaduInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Get an /or google search : CBC UCONNInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Connecting toanadu Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

anaducpu : 4RAM: 8 GBUser 1Login Nodecpu : 32 – 48RAM: 128 – 512 GBQueue/Partition la01-19xanadu20-28himem1-5xanadu29-31gpu /xeon/generalxanadu01User 25User. 3.xanadu-submit-extxanadu-submit-intUser cam.uchc.edu13

MacTERMINALUtilitiesApplicationsInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

WindowsPutty : https://www.chiark.greenend.org.uk/ sgtatham/putty/latest.html or google search “putty”Open Putty it will open window1.1. Provide host name e.g. username@xanadu-submitext.cam.uchc.edu2. Expand SSH tab and select X11 (shown in window2)3. Enable X11 forwarding by selecting it. (window2)4. Scroll up the left panel and select Session.(window1)5. Name your session e.g. BBC cluster and click savetab to save.6. Your session name should appear in saved sessions.Double click on your session name to connect to serverwith SSH session.Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

t-via-vpn-client-2/Login: On the submit-int node (Storrs and Outside Health Centre)Use VPN (Open Pulse secure)1.2.3.4.5.Open Pulse secureAdd new connectionSet Server URL to : vpn.uchc.edu/camSaveConnect and login with CAM ID and PasswdInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Connecting to Xanaduxanadu-submit-int Node: ssh user name @xanadu-submit-int.cam.uchc.eduXanadu-submit-ext Node: ssh user name @xanadu-submit-ext.cam.uchc.eduLogin: (using terminal on mac)Logged on ext-submit node:Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

anadussh tion NodesUser 1Login a01-19xanadu20-28himem1-5xanadu29-31gpu /xeon/generalxanadu01User 25User. 3.xanadu-submit-extxanadu-submit-int1User Ncpu : 2RAM: 8GBcpu : 32 – 48RAM: 128 – 512 GB18

anadussh tion NodesUser 1Login a01-19xanadu20-28himem1-5xanadu29-31gpu /xeon/generalxanadu01User 25User. 3.xanadu-submit-extxanadu-submit-int1User Nsrun --pty bash : Start of an interactive sessionsbatch myscript.sh

Common Linux commandspwdcd destinationcdlscp source/file destination/filemv source/file destination/filemv name name2touch filenamemkdir directoryrm filerm –r directory catlesshead -10 filetail -10 file::::::::::::::::Present Working directoryChange directory to destinationChange directory to HOMEList contents of directoryCopy file from source in destination folderMove file from source to destination folderRename file from name to name2Create an empty file with name filenameMake directorydelete filedelete file with its contentHome directoryRead contents of fileContents of file, scroll, q to quit itfirst 10 lines of fileBottom 10 lines of fileResources: http://linuxcommand.org/writing shell scripts.phpInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Text EditorsNotepad NANO VIM EMACSvimnanoemacsInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Demo Nano and VIMInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Software/tool/packages on clusterEnvironment Modules:The Environment Modules package provides for the dynamic modificationof a user's environment via module files.module avail: List modules that are availablemodule load modulefile: Loads the module to user environmentmodule list: List modules that are loadedmodule unload modulefile: unloads module from user environmentmodule display modulefile : Displays information on moduleswap [modulefile1] modulefile2 :Switch loaded modulefile1 with modulefile2.Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Demo moduleInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Xanadu ResourcesPartitionsmixidleallocatedInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Partition: generalInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Partition: himemInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Summary of the nodes in the Xanadu clusterInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Interactive sessionInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

anadussh tion NodesUser 1Login a01-19xanadu20-28himem1-5xanadu29-31gpu /xeon/generalxanadu01User 25User. 3.xanadu-submit-extxanadu-submit-int1User Nsrun --pty bash: Start of an interactive session

anaduQueue/Partition NodesUser 1Login a01-19xanadu20-28himem1-5xanadu29-31gpu /xeon/generalxanadu01User 25User. 3.xanadu-submit-extxanadu-submit-int1User Ncpu : 4RAM: 8GBcpu : 32 – 48RAM: 128 – 512 GB32

anaduQueue/Partition NodesUser 1User 1-19xanadu20-28himem1-5xanadu29-31gpu /xeon/generalxanadu015User. 3.xanadu-submit-extxanadu-submit-int1User Ncpu : 4RAM: 8GBcpu : 32 – 48RAM: 128 – 512 GB33

Interactive sessionsrun --pty bashscreen –S screen namesrun --pty bashhostname Ctrl A D:::::Start screen sessionStart interactive sessionConfirm interactive sessionRun code /commands (wget/ftp/others)Detach an active screenscreen –r NNNN: Attach a detached screenscreen –ls: List all screensscreen -X –S NNNN quit : Kill or quit a screen sessionInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Composing a script for clusterScript:1. Resource request number of CPUs, computing expected duration, amounts of RAM or disk space, etc2. Job commands describe tasks that must be done, software which must be runInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Resource request:#!/bin/bash#SBATCH –j myscript#SBATCH -n 1#SBATCH -N 1#SBATCH -c 1#SBATCH –p general#SBATCH --mail-type END#SBATCH --mail-user first.last@uconn.edu#SBATCH -o myscript-%j.out#SBATCH -e myscript-%j.err#SBATCH –j myscript Is the name of your job#SBATCH -n 1 Request number of tasks#SBATCH -N 1 This line requests that the cores are all on node.Only change this to 1 if you know your code uses a messagepassing protocol like MPI. SLURM makes no assumptions onthis parameter -- if you request more than one core (-n 1) andyour forget this parameter, your job may be scheduled across nodes; and unless your job is MPI (multinode) aware, your job will runslowly, as it is oversubscribed on the master node and wastingresources on the other(s).#SBATCH -c 1 Request number of cores for your job#SBATCH –p general This line specifies the SLURM partition(in this instance it will be the general partition) under which the scriptwill be run#SBATCH --mail-user first.last@uconn.edu Email which thenotification should be sent to#SBATCH --mail-type END Mailing options to indicate the state of thejob. In this instance it will send a notification at the end#SBATCH -o myscript-%j.out Specifies the file to which the standardoutput will be appended#SBATCH -e myscript-%j.err Specifies the file to which standard errorwill be appendedInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

more on Resource request:#!/bin/bash#SBATCH --time 10-01:00:00 # days-hh:mm:ss#SBATCH --job-name masurca KG#SBATCH --mail-user user@uconn.edu#SBATCH --mail-type ALL#SBATCH --comment dataset with jump libraries#SBATCH -N 1#SBATCH -n 1#SBATCH -c 8#SBATCH --mem-per-cpu 10240 # 10GBor #SBATCH --mem 100G#SBATCH -o filterGTF-%j.output#SBATCH -e filterGTF-%j.errorInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Job commands:They are regular linux/module commandsecho "Hello World"Final script:#!/bin/bash#SBATCH --job-name myscript#SBATCH -n 1#SBATCH -N 1#SBATCH -c 1#SBATCH --partition general#SBATCH --mail-type END#SBATCH --mail-user first.last@uconn.edu#SBATCH -o myscript-%j.out#SBATCH -e myscript-%j.errhostnameecho "Hello World"save the script as myscript.shInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Submit the script:sbatch sbatch myscript.shMonitor Job:squeue squeueJOBID 251general203252generalNAMEUSER STTIME NODES NODELIST(REASON)STO 001 dtrujill CG4:481 xanadu-21file18 etulman CG 22-11:03:241 xanadu-21masurca hralicki PD0:001 (Resources)ProtMasNvsingh PD0:001 (Priority)ProtMasWvsingh PD0:001 (Priority)ProtMasNvsingh R21:44:591 xanadu-24ProtMasWvsingh R21:46:471 xanadu-20bfo 111 dtrujill R 1-23:18:044 xanadu-[30-33]blastp cfisher R 9-05:35:431 xanadu-23blastp cfisher R 9-05:35:431 xanadu-23Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

squeue –j jobIDNUMBER clear squeue -j 301013JOBID PARTITIONNAME301013amd ProtMasNUSER STvsingh RTIME NODES NODELIST(REASON)21:49:061 xanadu-24JOB STATE CODESJobs typically pass through several states in the course of their execution. The typicalstates are PENDING, RUNNING,SUSPENDED, COMPLETING, and COMPLETED. An explanation of eachstate follows.CA CANCELLEDJob was explicitly cancelled by the user or system administrator. The jobmay or may not have been initiated.CD COMPLETEDJob has terminated all processes on all nodes with an exit code of zero.CF CONFIGURINGJob has been allocated resources,become readyfor use (e.g. booting).CG COMPLETINGFFAILEDJob terminated with non-zero exit code or other failure condition.NF NODE FAILJob terminated due to failure of one or more allocated nodes.PD PENDINGJob is awaiting resource allocation.PR PREEMPTEDJob terminated due to preemption.RRUNNINGJob currently has an allocation.ST STOPPEDJob has an allocation, but execution has been stopped with SIGSTOP signal.CPUS have been retained by this job.SSUSPENDEDJob has an allocation, but execution has been suspended and CPUs have beenreleased for other jobs.TO TIMEOUTJob terminated upon reaching its time limit.Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

squeue –u userID squeue -u vsinghJOBID AMEProtMasNProtMasWProtMasNProtMasWUSER STvsingh PDvsingh PDvsingh Rvsingh RTIME NODES NODELIST(REASON)0:001 (Priority)0:001 (Priority)22:00:081 xanadu-2422:01:561 xanadu-20squeue –u userID –t PENDING squeue -u vsingh -t PENDINGJOBID PARTITIONNAME301086himem ProtMasN301089himem ProtMasWUSER STvsingh PDvsingh PDTIME NODES NODELIST(REASON)0:001 (Priority)0:001 (Priority)squeue –u userID –t RUNNINGInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

scontrol show jobid jobid scontrol show jobid 301086JobId 301086 JobName ProtMasNoSOAPUserId vsingh(183147) GroupId domain users(10000) MCS label N/APriority 5262 Nice 0 Account pi-wegrzyn QOS generalJobState PENDING Reason Priority Dependency (null)Requeue 1 Restarts 0 BatchFlag 1 Reboot 0 ExitCode 0:0RunTime 00:00:00 TimeLimit UNLIMITED TimeMin N/A SubmitTime 2017-06-28T15:50:23EligibleTime 2017-06-28T15:50:23 StartTime 2018-06-27T14:06:17 EndTime UnknownDeadline N/APreemptTime None SuspendTime None SecsPreSuspend 0Partition himem AllocNode:Sid xanadu-submit-ext:32674ReqNodeList (null) ExcNodeList (null)NodeList (null)NumNodes 1-1 NumCPUs 30 NumTasks 30 CPUs/Task 1 ReqB:S:C:T 0:0:*:*TRES cpu 30,mem 512000,node 1 Socks/Node * NtasksPerN:B:S:C 0:0:*:1 CoreSpec *MinCPUsNode 1 MinMemoryNode 500G MinTmpDiskNode 0 Features (null) Gres (null)Reservation (null) OverSubscribe OK Contiguous 0 Licenses (null) Network (null)Command /home/CAM/vsingh/protea repens/scripts/assemble protea config noSOAPassembly.sh WorkDir /home/CAM/vsingh/protea repens/scriptsStdErr /home/CAM/vsingh/protea repens/LogFiles/ProtMasNoSOAP-301086.errorStdIn /dev/null StdOut /home/CAM/vsingh/protea repens/LogFiles/ProtMasNoSOAP301086.output Power scontrol show jobid –dd jobid Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Script submission and other commandssbatch myscript.shsqueuesqueue –j jobIDNUMBERsqueue -u UserIDscancel jobID numberscancel jobid index scancel –u UserID:::::::Sumit script for executionStatus of Jobs currently running on cluster (all users)Status of job with jobIDNumberStatus of all the jobs submitted by userDelete job with jobID numberDelete an array jobDelete all the jobs of a userInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Script for Array jobs#!/bin/bash#SBATCH –mail-user user@uconn.edu#SBATCH --mail-type ALL#SBATCH --ntasks 1#SBATCH --mem 1G#SBATCH --array 1-1002%100#SBATCH --output fastqc %A %a.outhostnameThis line will create 1002 jobs, but it instructs slurm to limit the total number ofsimultaneously running jobs to 100. This avoids swamping the queue, and sharesbursting level with others in the groupThis will create 1002 files to catch stdin, stdout and stderr for each respective job inthe array. If the array job ID is 23678, we will fine 1002 files starting withfastqc 23678 1.out fastqc 23678 1002.outcd /NGSseq/datamodule load fastqc/0.11.5echo "SLURM JOBID: " SLURM JOBIDecho "SLURM ARRAY TASK ID: " SLURM ARRAY TASK IDecho "SLURM ARRAY JOB ID: " SLURM ARRAY JOB IDarrayfile ls awk -v line SLURM ARRAY TASK ID '{if(NR line) print 0}' Start: Slurm job ID and increase with each array jobSlurm job IDArray job ID : 1-1002This will list all the files from the directory (/NGSseq/data)and then pick up one file at a time and then run it through fastqc application.fastqc arrayfileInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Some important Informations:XanaduUser SpaceLab/Group SpaceArchiving dataCollaborative projects/home/CAM/username or/home/FCAM/username r(Request)/linuxshare/projects(Request)/scratch : Please do not use this directory as long term storage the disk gets cleaned up regularly.Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Data Transfer to/from Xanandu(wget/ftp/sftp/scp)DON’T : Do not initiate transfer on Submit nodesDOStart an interactive session: Use scp for small size files: For large size files use Globous:Tutorial: nts/tutorials/Section: Data transferInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Reporting Issues or submitting nical-issues/Email : cbcsupport@uconn.eduInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Cluster EtiquetteDo not run code on the head node.Do not ssh directly into a node.Do not submit a large number of jobs without testing.Do not Hog Resources.Do monitor your jobs periodicallyMonitor your disk usage:Do not fill up the whole disk with unnecessary output files from your runs.Institute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

TOP: Process Status'D' uninterruptible sleep'R' running'S' sleeping'T' traced or stopped'Z' zombieInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Thank youInstitute for Systems Genomics:Computational Biology Corebioinformatics.uconn.edu

Xanadu-submit-ext Node: ssh user_name @xanadu-submit-ext.cam.uchc.edu Connecting to Xanadu. 18 User 1 User 2 User 3 User N. Queue/Partition xeon/general . Composing a script for cluster Script: 1.Resource request number of CPUs, computing expected duration, amounts of RAM or disk space, etc

Related Documents:

XSEDE HPC Monthly Workshop Schedule January 21 HPC Monthly Workshop: OpenMP February 19-20 HPC Monthly Workshop: Big Data March 3 HPC Monthly Workshop: OpenACC April 7-8 HPC Monthly Workshop: Big Data May 5-6 HPC Monthly Workshop: MPI June 2-5 Summer Boot Camp August 4-5 HPC Monthly Workshop: Big Data September 1-2 HPC Monthly Workshop: MPI October 6-7 HPC Monthly Workshop: Big Data

Microsoft HPC Pack [6]. This meant that all the users of the Windows HPC cluster needed to be migrated to the Linux clusters. The Windows HPC cluster was used to run engineering . HPC Pack 2012 R2. In order to understand how much the resources were being used, some monitoring statis-tics were extracted from the cluster head node. Figure 2 .

HPC Architecture Engineer Sarah Peter Infrastructure & Architecture Engineer LCSB BioCore sysadmins manager UniversityofLuxembourg,BelvalCampus MaisonduNombre,4thfloor 2,avenuedel’Université L-4365Esch-sur-Alzette mail: hpc@uni.lu 1 Introduction 2 HPCContainers 11/11 E.Kieffer&Uni.luHPCTeam (UniversityofLuxembourg) Uni.luHPCSchool2020/PS6 .

This document illustrates some of the context and processes associated with onboarding an HPC application workload to Azure to help bridge your prospect's HPC familiarity and comfortability to a cloud-based future. Specialized Workload Types Compute-Intensive There are two main types of HPC workloads: loosely coupled and tightly coupled.

The Sun HPC Software installation process is designed to accommodate a variety of customer environments. Two recommended methods for installing the Sun HPC Software on the head node are: Method 1 : Installing from an ISO image (RHEL, CentOS, or SLES). An ISO image of the Sun HPC Software is downloaded from the Sun web site and optionally .

High Performance Computing (HPC) Tuning Guide for AMD EPYC 7003 Series Processors - https://bit.ly/3k0oiZL High Performance Computing (HPC) Tuning Guide for AMD EPYC 7002 Series Processors - https://bit.ly/3xRJzd1 HPC Performance and Scalability Results with Azure HBv2 VMs - https://bit.ly/2XD7Ebj

Broader Motivations for Interactive HPC Ongoing push for greater reproducibility - Sharing of data to permit further/alternative analyses by others - Large data difficult to "share" w/ non-HPC environments - Remote access to interactive, graphically-driven science apps, eliminate roadblocks for non-HPC-expert scientists

Uni.lu HPC School 2019 PS3: [Advanced] Job scheduling (SLURM) Uni.lu High Performance Computing (HPC) Team C. Parisot University of Luxembourg (UL), Luxembourg