End To End Workflow Monitoring Panorama 360

2y ago
1.68 MB
26 Pages
Last View : 1y ago
Last Download : 11m ago
Upload by : Oscar Steel

End to EndWorkflow MonitoringPanorama 360George u

Panorama 360: Project Overview Leverage the Pegasus WMS to structure, execute andmonitor workflow execution Characterize performance: instrument data capture,summarize data, and publish results Create an open access common repository for storingend-to-end workflow performance and resource datacaptured using a variety of tools*Open for external contributors Apply and develop ML techniques for workflowperformance analysis and infrastructuretroubleshooting Record findings, distill best practices, and share andrefine them with other program teamshttps://panorama360.github.io2

Data Sources: Application and Infrastructure Pegasus Stampede events regarding the workflow andits status Pegasus Kickstart Online collects resource usage traceswith frequency as low as 1 second in real-time Darshan collects file access statistics (eg. POSIX, MPIIO) during the execution Globus collects transfer statistics and generalinformation about the transfer (throughput, filetransfer errors etc.)https://panorama360.github.io3

Pegasus Kickstart Online Traces In the Panorama branch pegasus-kickstart supports finegrained monitoring capabilities by invoking a helper toolcalled pegasus-monitor. pegasus-monitor can pull resource usage statistics forworkflow running tasks within a predefined time interval. Minimum interval is limited to 1 second The statistics are being read from the /proc entry of therunning task and among them include: Number of processes and threads stime and utime Bytes read and written iowaithttps://panorama360.github.io4

Darshan Darshan is an HPC lightweight application-level I/O profilingtool that captures statistics on the behavior of HPC I/Ooperations. It captures data for each file opened by the application,including I/O operation counts, common I/O access sizes,cumulative timers, etc. I/O behavior is captured for POSIX IO, MPI-IO, HDF5, andParallel netCDF data interface layers. It also captures a set of job-level characteristics, such as thenumber of application processes, the job’s start and endtimes, and the job unique identification provided by thescheduler. In Panorama we only expose accumulated performance datafrom the STDIO and POSIX modules Reference: https://panorama360.github.io5

Globus Globus is a research data management service, built ontop of gridftp. It can be used to transfer files for your owncomputations or share files with the community. For every transfer request Globus creates logscontaining transfer statistics, such as: Request and Completion time Source and Destination Transfer rate Number of failures Reference: o6

Data Sources: Problems They are scattered across multiple locations (Eg. execution site, cloud service, pegasus logs) They don’t contain metadata about the workflow, and it’s very hard to locate and matchthem in the future Captured data don’t have a common format Pegasus Kickstart logs are in XML format Pegasus Stampede events are in JSON format Pegasus Kickstart online logs are in JSON format Globus logs are in JSON format Darshan logs are in binary formathttps://panorama360.github.io7

Data Collection: End-to-End Workflow Execution Monitoring Pegasus apart from planningand running the workflow,orchestrates the data collection A message queueing system isused, to decouple the publishersfrom the datastore Flexible search and visualizationengines are used to explore thedatahttps://panorama360.github.io8

Data Collection: Architecture Overviewhttps://panorama360.github.io9

Data Collection: Tool enhancements and new tools pegasus-monitord: extended with JSON output format, the ability to pickup jobrelated monitoring messages, and publish to amqp endpoints pegasus-transfer: extended to support Globus transfers and the ability topublish statistics in json format to amqp endpoints pegasus-darshan: wrapper to darshan-parser, that pushes darshan logs in JSONformat to pegasus-monitordhttps://panorama360.github.io10

Visualization: Detailed Workflow and Job Characteristicshttps://panorama360.github.io11

Visualization: Time Series Data of Workflow PerformanceCumulative Time Spent inUser and System ExecutionBytes Read and WrittenAverage CPU UtilizationNumber of Compute Threadshttps://panorama360.github.io12

Repository: OrganizationElasticSearch IndexDescriptionpanorama transferGlobus logspanorama kickstartPegasus-Kickstart online tracespanorama stampedeWorkflow Events and Darshan logshttps://panorama360.github.io13

Repository: Open access norama.isi.eduhttps://panorama360.github.io14

How to Deploy: Prerequisites HTCondor 8.6 : https://research.cs.wisc.edu/htcondor/downloads/ Pegasus Panorama: Compile from source: ma Pre-compiled binaries: ama/ Docker 17.02 : https://docs.docker.com/install/ Docker Compose: 0.github.io15

How to Deploy: Monitoring Backend (RabbitMQ, ELK Stack) On a host that has Docker and Docker Compose installed, n-arch Change to the cloned directory and execute the following command:docker-compose up –d Example:https://panorama360.github.io16

How to Deploy: Checking Services (RabbitMQ, ELK Stack)Now the host should have RabbitMQ, Elasticsearch, Logstash, and Kibana runningas Docker containers with their service ports exposed. Try to access them RabbitMQ: http:// hostname or ip :15672 Elasticsearch: http:// hostname or ip :9200 Logstash: http:// hostname or ip :9600 Kibana: http:// hostname or ip :5601https://panorama360.github.io17

How to Deploy: Enabling Stampede Events In order to get pegasus-monitord to publish all of its events to an AMQP endpointin JSON format, 3 properties must be specified in the workflow’s properties file(eg. “pegasus.properties”). pegasus.monitord.encoding json pegasus.catalog.workflow.amqp.url amqp://[username:password]@ hostname [:port]/ exchange name pegasus.catalog.workflow.amqp.events stampede.* Example: More about stampede events: https://pegasus.isi.edu/documentation/stampede wf events.phphttps://panorama360.github.io18

How to Deploy: Enabling Transfer Events In order to get pegasus-transfer to publish transfer statistics from the GlobusTransfer Service to an AMQP endpoint in JSON format, 2 profiles must bespecified in the workflow’s sites catalog (eg. “sites.xml”), under the site wherepegasus-transfer is going to be invoked (eg. “local”). env.PEGASUS TRANSFER PUBLISH 1 env.PEGASUS AMQP URL amqp://[username:password]@ hostname [:port]/ exchange name Example:https://panorama360.github.io19

How to Deploy: Enabling Kickstart Online Traces In order to get pegasus-kickstart to publish traces of resource usage statistics toan AMQP endpoint in JSON format, 2 profiles must be specified in the workflow’ssites catalog (eg. “sites.xml”) under the compute site. pegasus.gridstart.arguments -m interval in seconds env.KICKSTART MON URL rabbitmq://[USERNAME:PASSWORD]@ hostname [:port]/api/exchanges/ exchange name /publish Example: Alternatively if we want to customize the monitoring interval per computationaltask we can specify the profile in the workflow’s transformation catalog (eg.“tx.txt”)https://panorama360.github.io20

How to Deploy: Enabling Kickstart Online Traces (MPI Jobs) Usually MPI jobs are not launched by Pegasus-Kickstart. Thus, adding thegridstart.arguments profile doesn’t have any effect. We can work around this by using a wrapper script for the MPI job, that invokesdirectly pegasus-monitor. We still need to specify KICKSTART MON URL in the sites catalog. Example:https://panorama360.github.io21

How to Deploy: Enabling Darshan Logs (MPI Jobs) In case your MPI application wasn’tcompiled and statically linked withDarshan’s library, we need to set aprofile in the transformationcatalog, adding the path of thelibrary to LD PRELOAD.Transformation Catalog We launch the application using awrapper script, and as post jobsteps: Build the darshan log pathfrom the environmentalvariables Invoke pegasus-darshan withthe files as inputWrapper Shell Scripthttps://panorama360.github.io22

How to Deploy: Enabling Darshan Logs (MPI Jobs) pegasus-darshan will outputin stdout a monitoringpayload, that will be pickedby pegasus-monitord, whichin its turn will publish it tothe AMQP endpoint. This can also be used as ageneric way of adding newtools to this architecture.https://panorama360.github.io23


GitHub:https://github.com/Panorama360 Website:https://panorama360.github.ioGeorge PapadimitriouComputer Science PhD StudentUniversity of Southern Californiaemail: ://panorama360.github.io/

PegasusAutomate, recover, and debug scientific computations.Pegasus Websitehttp://pegasus.isi.eduGet StartedUsers Mailing i.eduPegasus Online Office ffice-hours/Bi-monthly basis on second Friday ofthe month, where we address userquestions and also apprise thecommunity of new developments

Globus Globus is a research data management service, built on top of gridftp. It can be used to transfer files for your own computations or share files with the community. For every transfer request Globus creates logs containing transfer statistics, such as: Request an

Related Documents:

Figure 4: Create a Workflow in Nintex Workflow 2. Select the Library Ribbon, click on Workflow Settingsand then Create a Workflow in Nintex Workflow. This will open the Nintex Workflow Designer. To initiate the workflow, we will configure the workflow to add a menu item to the context menu in the workspace.

5. Create a Workflow Template 6. How to Set Workflow Template Options 7. Approve a PO with a Workflow Template Purchase Orders – Approve w/ Workflow In this lesson, you will learn how to create a PO workflow template, create and delete Team Members, and approving a PO with a workflow template. Topics inclu

The Workflow Builder provides a graphical view of the workflow definition. The Workflow Builder screen is divided into the following frames (see Figure 1 as well), which can be resized: Workflow You can insert new steps into the workflow definition and process existing ones. Double-clicking on a step will display the associated step definition .

Workflow template The business process flow is implemented as a workflow definition within a workflow template. You can find this workflow template in your R/3 System. Workflow Template AF_process (Process Notification of Absence) [Page 1

If you reset the workflow or close the Workflow Designer, you lose all work. Clicking the Create a workflow entry in the Workbench Toolbox to restart the Workflow Designer activates a new workspace layout. You can only save your workflow de

Workflow 2007 is Nintex's second- generation SharePoint workflow product. Nintex Workflow 2007 extends Microsoft SharePoint 2007 technologies including Microsoft Office SharePoint Server (MOSS) 2007 and Microsoft Windows SharePoint Services (WSS) 3.0. Nintex Workflow 2007 provides advanced workflow capabilities via a graphical web-

xii Oracle Workflow Developer's Guide Audience for This Guide Welcome to the Oracle Workflow Developer's Guide. This guide assumes you have a working knowledge of the following: The principles and customary practices of your business area. Oracle Workflow Oracle Workflow Developer's Guide. Workflow.

Note Workflow Manager 2016 can be installed side-by-side an earlier version of Workflow Manager. If you already have Workflow Manager installed, the Workflow Manager 2016 installer will install the application to the next available port. Preparing to Install Before running the Workflow Manager installation, perform these tasks: