Using Apache Spark, Apache Kafka And Apache Cassandra

2y ago

28 Views

3 Downloads

3.40 MB

14 Pages

Last View : 6d ago

Last Download : 3m ago

Upload by : Esmeralda Toy

Report this link

Download PDF

Transcription

Using Apache Spark,Apache Kafkaand Apache Cassandrato Power Intelligent Applications

Apache Cassandra is well known as the database of choicefor powering the most scalable, reliable architecturesavailable. Apache Spark is the state-of-the-art advanced andscalable analytics engine. Apache Kafka is the leading streamprocessing engine for scale and reliability.Deployed together, these technologies give developers thebuilding blocks needed to build reliable, scalable andintelligent applications that adapt based on the data theycollect.This paper discusses the use cases, architectural pattern andoperations considerations for deploying these technologiesto deliver intelligent applications.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 02

Use CasesInternet of ThingsAt the core of an IoT application there is a stream of regularobservations from (potentially) a large number of devices oritems with embedded electronics (e.g. switches, sensors, tags).A stream of IoT data is just “big data”, but analysing that bigdata in a way that drives actions, recommendations, orprovides information is where the application delivers value.Apache Cassandra is extremely well suited to receiving andstoring streams of data. It’s always-on availability matches theconstant stream of data sent by devices to ensure yourapplication is always able to store data. In addition, its nativestorage formats are well suited to efficient storing and usingtime series data such as that produced by IoT devices. Thescalability of Apache Cassandra means you can be assured thatyour datastore will smoothly scale as the number of devicesand stream of data grows.The powerful analytics capabilities and distributed architectureof Apache Spark is the perfect engine to help you make senseand make decisions based on the data you’re receiving fromyour IoT devices. Spark’s stream processing can quicklydetermine answers from short-term views of your data as it’sreceived. For analysis running over longer time periods, theSpark Cassandra connector enables Spark to efficiently accessdata stored in Cassandra to perform analysis.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 03

In this context, Apache Kafka is often used as a reliablemessage buffer. In many IOT scenarios, the flow of data fromdevices is constant and the devices have very limitedcapacity to buffer data in the event the central processingservice is unavailable. Events from the devices can be writtento Kafka when first received and then picked up andprocessed by the downstream applications. This ensuresevents are not lost even if the processing elements for thecentral system become backed up or suffer downtime. Inaddition, use of Kafka in this manner easily allows additionalconsumers of the event stream to be added to the system.For example, your initial implementation may have a simpleapplication that just saves data to Cassandra for later use butyou then you add a second application that performs realtime processing on the event stream. Kafka Streams may alsobe used as an alternative to Spark Streaming for real timestream processing.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 04

Financial ServicesThe pressures for financial services companies to gaina technological edge in data processing are coming not onlyfrom the competition but also from consumers. Gaining acompetitive edge requires systems that can collect andquickly analyse vast streams of data. Consumers expect thatthe systems they interact with will be instantly up to date,always available and, increasingly, be aware of the context ofall their previous interactions and related information.Addressing these joint pressures, while containing technologycosts, requires the adoption of new generation architecturalpatterns and technologies. Apache Cassandra, Apache Kafkaand Apache Spark are technologies that are ideally placed toform the core of such an architecture. The applicability ofthese technologies in financial services has been proven manytimes by leading organisations such as ING and UBS.One common application we see for Cassandra in financialservices is as a persistent cache to support high volume clientrequests. In particular, we see this requirement with banksimplementing the Payment Services Directive (PSD2) in theEU. This leverages Cassandra’s extreme reliability andbuilt-for-the-cloud architecture to enable financial servicesorganisations to deliver always-on service and avoid the highcost of scaling their legacy (often mainframe) architectures tomeet increased client interactions needs. Spark is oftenincluded in this architecture to enrich the cached data withsophisticated analysis of trends and patterns in the dataenabling user-facing applications to make this analysis withinteractive response times. Kafka often sits in this picture as amessage bus to connect the core processing system tomultiple downstream consumers.USING APACHE SPARK AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 05

OthersThe two use-cases above are great examples where we see regularadoption of Spark, Kafka and Cassandra. However, there are manyother business problems where the three technologies can combine toprovide an ideal solution. Some examples that we have seen include:Ad-TechRelying on the low-latency (low double digit ms)responsiveness and always-on availability of Cassandra tomake online advertising placement decisions backed by deepanalysis calculated with Spark. Massive flows of inboundevents and information can be managed with Kafka.Application MonitoringWe use a combination of Spark and Cassandra in our ownmonitoring system that monitors close to 1500 servers.Cassandra seamlessly handles a steady stream of writes withmetrics data while Spark is used to calculate regular roll-upsto allow viewing summarised data over long time periods.Kafka acts as a centralisation point for the messages and alsoa message buffer.Inventory Management, particularly in travelUse Cassandra to track inventory records and Spark to analyseavailable inventory to determine dynamic pricing, capacitytrends, etc.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 06

Architectural PatternsBatch UpdatingAt a more rudimentary level, many Cassandra applications have aneed for periodic batch processing for data maintenance. Whilethis can include summarisation it can also include requirementslike implementing complex data expiry rules. Running thesebatches through a single threaded (or single machine) batchengine will not scale to the same extent your Cassandra clusterwill. Implementing these batch jobs in Spark not only provides apre-built set of libraries to assist with development of the dataprocessing functionality but also the frameworks to automaticallyscale the jobs and scale and execute processing logic on the sameservers where the data is stored.Stream EnrichmentFor most applications, a strong design will store in a singleCassandra table all of the information required to service aparticular read request (i.e. the data will be highly denormalised). Insome cases this denormalisation process will require calculating orlooking up additional data to add to a stream before the stream ofdata is saved. Using Spark Streaming to process data before savingto Cassandra provides a scalable and reliable technology base toimplement this pattern. Kafka Streams is an alternative engine forimplementing this form of stream enrichment.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 07

Lambda ArchitectureThe Lambda Architecture is an increasingly popular architectural pattern forhandling massive quantities of data through both a combination of stream andbatch processing. With the Lambda Architecture, you maintain a short-term,volatile view of your data (the speed layer) and a longer term, morepermanent view (the batch layer) with a service to join views across the two(the serving layer).With Spark and Cassandra, you have the key architectural building blocks youneed to implement the Lambda Architecture. Spark Streaming is an idealengine for implementing the speed layer of the architecture (potentially withresults stored in TTL’d tables in Cassandra) while Spark can also be used toperform the longer-term batch calculations and store results in Cassandra.Kappa ArchitectureThe Kappa Architecture takes the next step from the Lambda Architecture,removing the batch layer and treating the stream of events as the immutablerecord of system state. Stream processing maintain summary views as thestream is processed. If the logic of summary views needs to change then thestream processing logic is updated and the saved streams reprocessed. TheKappa Architecture removes the need to maintain separate stream and batchlogic that is required for the Lambda Architecture.Once again, the combination of Spark and Cassandra gives you thearchitectural components you need to implement Kappa Architecture. SparkStreaming is an ideal processing engine to undertake the calculations neededon the stream of data. Apache Cassandra can be used both as the long term,immutable store of the data stream and as a store for the results of the streamcalculations that are used by the serving layer. An alternative is to use ApacheKafka as your immutable event store and Apache Cassandra as the store forthe materialized views calculated based on these events.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 08

OperationsOperating as part of a mission-critical application isthe normal mode of operation for Cassandra andthere is a well established body of knowledge abouthow to operate Cassandra to achieve the highestlevels of availability. Although Kafka is a little newerit is also widely operated at the highest levels of scaleand reliability.Spark, on the other hand, is often run to provide ananalytics environment for use by a small number ofdata scientists. In this situation, reliability andpredictable performance are not as critical as whenSpark is deployed as a component of a productionapplication. This section of the paper describes someof the considerations to be applied when deployingSpark for production usage.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 09

Management EnvironmentThe key to reliable operations of any technology is to have a solidoverall management environment including aspects such as:Automated (or at least well controlled)deployment and configuration management.High quality testing of new configurations priorto deployment.Backup and disaster recovery procedures.Appropriate monitoring, and systems and peoplethat are paying attention to what is beingreported by that monitoring.Rigorous incident response procedures andwell-trained staff.None these items is specific to Kafka, Spark or Cassandra. However,introducing production usage of these technologies will requireexamination of each of these areas to ensure they are fit forpurpose with introduction of new architectural components andapplications.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 10

High AvailabilityOne specific area to be considered is high availabilityarchitecture (ensuring your overall service continues to runeven when components fail). Cassandra is effectivelyhigh-availability by default — if you use multiple machines anda basic, competent setup you will have a high-availabilitycluster. Of course, there is more you can do for the absolutehigh level of availability. Kafka follows a somewhat similararchitecture and has similar considerations in terms ofdistributing data across multiple replicas and placing replicasin multiple availability zones.For Spark, more detailed consideration is required. Spark bydefault is resilient to the failure of worker processes with workbeing automatically redistributed to running workers should aworker fail. However, the Spark Master and Driver requirefurther consideration. Apache Spark has built-in capability tomake the Spark Master highly available by using an ApacheZookeeper cluster to control the election of which machinewill be the active Master at any point in time.For the Driver component (that submits jobs to the cluster), itis possible to configure Spark to automatically retry jobs thatfail. To enable this, the job must be submitted in cluster mode(--deploy-mode:cluster) and with the --superviseflag set. As this will restart failed jobs from scratch, it isnecessary to ensure your jobs are idempotent when using thisfunctionality.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 11

MonitoringIt is important for any production system to have quality monitoring inplace to help detect and diagnose problems. This starts with basicoperating-system level monitoring of metrics such as CPU load and freedisk space. It should then extend to monitoring that the expectedsystem processes are running.For Cassandra and Kafka, a broad range of metrics are available out ofthe box and are sufficient to monitor usage of Cassandra and Kafka forthe vast majority of use cases. It will likely be necessary to tune alertingthresholds for your application, but the important metrics to monitor arefairly standard and well known.Spark also provides built-in monitoring capabilities including a UI toallow you to review the progress of your jobs. However, given theextremely diverse nature of workloads that Spark can handle, it will alsolikely be necessary to implement error handling and reporting as part ofyour production jobs as well as relying on the native Spark metrics.Workload IsolationOne of the unique advantages of Cassandra is its ability to provideworkload isolation through its native multi-data center architecturesupport. By setting up two Cassandra “data centres” in the same physicaldata centre (or cloud provider region) you can isolate the loads of yourSpark analytic reads to a single data centre, ensuring processing capacityand response times of your online process are minimally impacted whenbatch processing runs in Spark.USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 12

ConclusionCassandra, Kafka and Spark form a powerful combinationfor many use cases. However, architecting and runningdistributed technologies at scale and with the highestlevels of reliability and security requires a specialistenvironment including tools such as monitoring,management processes and skilled and experienced staff.Instaclustr’s focus is the provision of the world’s bestmanaged environment for running open-source,distributed technologies reliably, at scale. We bring toyour application a proven management platform andover 13 million node hours of experience running thesetechnologies in production.Discover MoreApache SparkApache CassandraApache KafkaUSING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS 13

Brought to you byinstaclustr.com

scalability of Apache Cassandra means you can be assured that your datastore will smoothly scale as the number of devices and stream of data grows. The powerful analytics capabilities and distributed architecture of Apache Spark is the perfect engine to help you make sense a

Related Documents:

Course Slides: Cloud Fundamentals (191213)

Getting Started with the Cloud . Apache Bigtop Apache Kudu Apache Spark Apache Crunch Apache Lucene Apache Sqoop Apache Druid Apache Mahout Apache Storm Apache Flink Apache NiFi Apache Tez Apache Flume Apache Oozie Apache Tika Apache Hadoop Apache ORC Apache Zeppelin

42 Views

3y ago

Kafka Low-Level Design discussion of Kafka Design Kafka Architecture ...

Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Kafka Design Motivation Goals Kafka built to support real-time analytics Designed to feed analytics system that did real-time processing of streams Unified platform for real-time handling of streaming data feeds Goals: high-throughput streaming data platform supports high-volume event streams like log aggregation, user

7 Views

1y ago

Slides - Apache Kafka® Architecture & Fundamentals Explained

for Apache Kafka (aligns to Confluent Developer Skills for Building Apache Kafka course) Confluent Certified Administrator for Apache Kafka (aligns to Confluent Operations Skills for Apache Kafka) What you Need to Know Qualifications: 6-to-9 months hands-on experience Duration: 90 mins Availability: Live, online 24/7 Cost: 150

12 Views

1y ago

Micro Focus ArcSight Kafka FlexConnector

Apache \Storm and Spark for real-time streaming data analysis. For more information about Apache Kafka, refer to the Kafka documentaion. Understanding Kafka Architecture. Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that handles a high volume of data and enables you to pass messages from one end-point to .

16 Views

1y ago

Understanding Apache Kafka® - Instaclustr

Apache Kafka Overview Apache Kafka is a hot technology amongst application developers and architects looking to build the latest generation of real-time and web-scale applications. According the official Apache Kafka website "Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable,

10 Views

1y ago

KafkaDirect: Zero-copy Data Access for Apache Kafka over RDMA Networks

only focus on Apache Kafka [26], but the RDMA design could be borrowed by other systems (§6). scalledaproducer that pushes records to containers called Kafka topics. A Kafka's subscriber, called a consumer, subscribes to Kafka topics to fetch

6 Views

1y ago

Apache Kafka - riptutorial.com

from: apache-kafka It is an unofficial and free apache-kafka ebook created for educational purposes. All the content is extracted from Stack Overflow Documentation, which is written by many hardworking individuals at Stack Overflow. It is neither affiliated with Stack Overflow nor official apache-kafka.

12 Views

1y ago

Apache Kafka - Learn programming languages with books and examples

11 Views

1y ago

Recent Views

Quotes within Quotes: When Single (') and Double (") Quotes . - SAS

Here the outside double quotes are replaced by a single quote and the apostrophe is replaced by two single quotes. This works because when the parser sees two single (or double) quotes immediately following each other, the parser resolves them into one quote mark after the closing quote has been determined.

1y ago

237 Views

What These Inspirational Quotes Say

Self Motivation Quotes Success Quotes Teacher Quotes And after reading all of these inspirational quotes you’d like to share which quotation is . -- Brian Tracy "You must constantly ask yourself these questions: Who am I around? What are they doing to me? Wha

2y ago

302 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Quotations - Free Website Builder: Create free websites

cards, but sometimes, playing a poor hand well." . 50th Birthday Quotes 60th Birthday Quotes And there are more. Funny Birthday Quotes Cute Birthday Quotes . it a try, itʼs free. Triumph over failure can be a

2y ago

267 Views

The Top 100 Motivational & Inspirational Quotes for 2015

I've spent hours crawling through the web trying to find the best quotes to keep me motivated and inspired all throughout the New Year. I've saved hundreds of quotes on my laptop and figured that words alone could motivate and inspire me. but if I couple the quotes

2y ago

329 Views

Inspirational Quotes - Guideposts

Inspirational Quotes Inspiring quotes are like vitamins for the soul. From the heartfelt to the humorous, the words of wisdom you’ll find here will strengthen your faith, lift your spirits, and even spark a positive change in your life. This collection of some our favorite inspirational quotes from religious figures, world leaders, authors,

2y ago

553 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Common Questions About Home Insurance

Homes with good security will generally be offered lower insurance quotes than the equivalent homes with poor security. In fact, some insurers may not offer quotes at all for homes with poor security. Contents Insurance Is money automatically covered? Most insurance policies will cover a limited amount of money (say up to 500) as part of

1y ago

257 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Using Apache Spark, Apache Kafka And Apache Cassandra

It looks like you're using an ad-blocker