Elasticsearch At Fermilab - INDICO-FNAL (Indico)

2y ago
15 Views
2 Downloads
9.40 MB
12 Pages
Last View : 7d ago
Last Download : 2m ago
Upload by : Rosemary Rios
Transcription

Elasticsearch at FermilabKevin Retzke30 Sept 2019

Overview2 Fermilab has one production Elasticsearch cluster, operated by the ScientificComputing Division “Landscape” project for grid, job, and data transfer monitoring. 19 data nodes (old grid workers) with 100 TiB Three dedicated masters Two dedicated “client” nodes – all queries and ingest goes through these Two “frontend” nodes (behind HAProxy) with: Grafana (six instances for different user groups)KibanaGraphQL API - “Lens”HTTP data collection endpoint - “Ingest”Apache httpd proxy handling routing and Shibboleth (SAML) SSO authenticationGraphite time-series databasePrometheus (internal service monitoring)Kevin Retzke Elasticsearch at Fermilab9/30/19

Deployment3 Elasticsearch 6.8 basic (free) license Deployed in Docker containers with docker-compose, local diskbind-mounted into container 40 TiB NAS disk NFS mounted for daily snapshots Index lifecycle maintenance with Curator Things we’d like to test: Native lifecycle management Rollup APIKevin Retzke Elasticsearch at Fermilab9/30/19

Security4 ReadOnlyRest (free license) on client nodes Write access limited to Landscape-operated nodes Read access on-site Kibana admin and write access via LDAP groups httpd/shibboleth proxy limits Kibana access to logged-in users iptables limits in-cluster communication to Elasticsearch nodes Things we’d like to test/experiment with: ES native security now that it’s included in basic Open Distro - in Retzke Elasticsearch at Fermilab9/30/19

Data Sources5 HTCondor Job, slot, other classads - condorbeat event log – filebeat job history – filebeat IFDH (data movement client) events - rsyslogSLURM jobcomp history - ingestRucio transfer and deletion events – ingestdCache billing events – direct to KafkaService container logs - logspoutKevin Retzke Elasticsearch at Fermilab9/30/19

Data Pipelines6 All data goes into “ingest” Kafka topics Some services report directly Beats go through logstash server (moving direct to Kafka) External services talk to “Ingest” public HTTP service (next slide)Data processing with logstash or Python apps using Faust library.Enriched data goes to “digest” topicsLogstash “store” processes read from Kafka and write to Elasticsearch (or other)All data pipeline services are run on single machine in Docker containers withdocker-compose. Moving to OpenShift/OKD Kubernetes cluster soon Kafka cluster is three old grid workers, Kafka and Zookeeper running in Dockercontainers Kevin Retzke Elasticsearch at Fermilab9/30/19

Ingest Service7 Libbeat-based server thatimplements a limited set ofElasticsearch API write endpoints Allows services that can alreadytalk to elasticsearch (e.g. SLURM)use that functionality, but givesus fine-grained control over Logstash process routes data to Kafkawhere the data goestopics based on index and type that Simple HTTP API for others todata was written toimplementKevin Retzke Elasticsearch at Fermilab9/30/19

Data Access8 Primary use of data is curated Grafana dashboards Kibana for data exploration and ad-hoc visualization Users can request access to save visualizations Direct read access to Elasticsearch allowed but discouragedoutside Landscape GraphQL API provides programmatic access to job data withoutneeding to know Elasticsearch topic and field details – allows us tochange or move dataKevin Retzke Elasticsearch at Fermilab9/30/19

Lens GraphQL API9 Written in Go with gqlgen https://gqlgen.com/ In-memory shared cache of Elasticsearch queries using groupcache Web-based schema documentation and query builder (public)https://landscape.fnal.gov/lens Combines data from several index patterns. Minimum queries are made toprovide only the data requested Allows us to change index patterns, mapping, fields, etc. without affectingusers Success story: allowed POMS (production job workflow management tool)to remove all job monitoring and job status databaseKevin Retzke Elasticsearch at Fermilab9/30/19

Lens API ExampleQueryKevin Retzke Elasticsearch at Fermilab10Response9/30/19

Elasticsearch Monitoring and Alerting11 Cluster health and status collected with Prometheus xporter Collected by Prometheus servers running on “frontend” nodes Monitoring in Grafana dashboards with alerts on key metricsKevin Retzke Elasticsearch at Fermilab9/30/19

Issues12 Mysterious unresponsive master nodes causing monitoring(including Kibana!) to timeout (on 5, not seen since upgrade to 6) Painful and time-consuming upgrade from 5 to 6 due to manybreaking changes (mainly mapping type removal). Expect 7 will bejust as bad since some deprecated behavior is being removed. Nodes crashing (SIGILL) after restart while loading index state offdisk (ongoing) Mapping explosions (notably with job classads) I miss joins Kevin Retzke Elasticsearch at Fermilab9/30/19

Computing Division “Landscape” project for grid, job, and data transfer monitoring. 19 data nodes (old grid workers) with 100 TiB . Prometheus (internal service monitoring) Kevin Retzke Elasticsearch at Fermilab 9/30/19 2. . Kafka cluster is three old grid worker

Related Documents:

The history of ASIC design for HEP is tied to the development of Si strip detectors. The first Fermilab ASIC : QPA02 (Quad Preamp), bipolar, semi-custom The Fermilab ASIC Group 2 2/24/2021 Rubinov ASIC Design and Development R. Yarema, "ASI Designat Fermilab", FERMILA-Conf-91/170 First Si strip detector at CERN NA11 (1981)

Complete and test Project X RF unit test Complete 1. st. β 0.8 cryomodule Viable U.S. vendors for Project X cavities and CM parts Good cavity processing yields at Project X gradients (25 MV/M) infrastructure capable of 1 Cryomodule per month output - Using ANL, JLAB, SLAC, and FNAL infrastructure . Project-X planning May 09

Elasticsearch was originally designed as a text and document search engine. The GSI Elasticsearch k-NN plugin expands Elasticsearch’s ability to search beyond just text. The plugin opens the door to other data types like images, video, audio—any data type that can be re

Configuring Elasticsearch data nodes.199 Configuring the Elasticsearch master node.200 Configuring the Elasticsearch query node .201 Validating that the Elasticsearch

QA/QC officer responsible for QA/QC plans, documentation, and data management For the major component fabrication groups . 7 The QA/QC Planning for DUNE started on May/June based on ProtoDUNE 1 lessons and with FNAL Quality Team support. Definitions : - Quality Assurance Plan

Apr 05, 2018 · Demonstration of CW 2.45 GHz magnetron driving a . CCR 1.3 GHz 100 kW magnetron testing at HTS Fermilab Chase Science and Technology WG 4/05/18 Isolator with shorting plate Klystron . A0 Vertical test stand, Jlab 2.45 GHz

Fermilab services account (ECL) - this is the account you use for service desk tickets, Fermilab email if you have one, etc. And an ECL account Fermilab Kerberos account (ssh/sign into Linux machines) - this is what you need to set up to ssh to Fermilab

Consider the task of buying a copy of AI: A Modern Approach from an online bookseller. Suppose there is one buying action for each 10-digit ISBN number, for a total of 10 billion actions. The search algorithm would have to examine the outcome states of all 10 billion actions to find one that satisfies the goal, which is to own a copy of ISBN 0137903952. A sensible planning agent, on the .