Using Azure Data Services For Modern Data Applications

1y ago
15 Views
2 Downloads
4.58 MB
68 Pages
Last View : 20d ago
Last Download : 3m ago
Upload by : Luis Waller
Transcription

Using Azure Data Services forModern Data ApplicationsLara Rubbelke Principal SDE MicrosoftAllan Mitchell Cloud Architect elastabytes

OutcomesWe want you to leave here understanding:key concepts for moderndatabase architecturesdatabase / datastoretypesreasons to go explore

Agenda Context of Lambda Architecture Ingestion Hot Path Processing Cold Path Processing Staging Enrichment and Serving

4 Questions

?

What Will Happen?

Lambda Architecture

The Old and the New Data Processing

Lambda Architecture – High Level ViewAll dataPartialaggregatePrecompute viewsPartialaggregatePartialaggregateBatch viewsReal-time dataReal-time viewsProcess StreamIncrement views

Lambda Architecture – Detailed ViewEvent DecorationEvent Processing n(message nalyticalStoreData Movement/ SyncInteractiveServing andConsumption

Lambda Architecture – Detailed ViewEvent DecorationEvent Processing n(message nalyticalStoreData Movement/ SyncIngestionInteractiveProcessingStagingServing andConsumption

Cortana Intelligence Gallery Demo

IngestionProcessingStagingServingEnrichment and CurationModern Data Lifecycle

IngestionEvent HubsIoT HubsService BusKafkaProcessingHDInsightADLAStormSparkStream AnalyticsStagingServingADLSAzure StorageAzure SQL DBADLSAzure DWAzure SQL DBHbaseCassandraAzure StoragePower BIEnrichment and CurationAzure Data FactoryAzure MLModern Data Lifecycle

IngestionProcessingStagingServingEvent HubsIoT HubsService BusKafkaEnrichment and CurationModern Data Lifecycle

IngestionData pushed to abroker for furtherprocessingTypically indicative of streaming approaches.Data pushed tostorage for furtherprocessingBest alignment with many existingsystems.Use scheduled jobs to synchronize ormove data.Typical “batch” processing.

Ingestion Options IoT Hub Bi-Directional Communication Event Hub Device to cloud mass ingestion Service Bus Complex filters and processing rules Kafka Common OSS integration RabbitMQ More common OSS integration, run in IaaS

IoT Hub

Azure IoT Suite: IoT HubConnect millions of devices to a partitioned applicationback-endDevices are not serversUse IoT Hub to enable secure bi-directional commsDevice-to-cloud and Cloud-to-deviceDurable messages (at least once semantics)Delivery receipts, expired messagesDevice communication errorsIndividual device identities and credentialsSingle device-cloud connection for allcommunications (C2D, D2C)Natively supports AMQP, HTTPDesigned for extensibility to custom protocolsDevice SDKs available for multiple platforms (e.g.RTOS, Linux, Windows)Multi-platform Service SDK.

Setup Retention Period Event Hub Endpoint C2D settings

Device provisioning Many systems involved(IoT Hub, device registry, ERPs, ) Device identity(composite devices, many concerns)1.2.3.4.5.Device provisioned at manufacturing into systemDevice connects for the first time and gets associated to itsregional data center (bootstrapped)As a result of customer interactions the device is activatedDevices can be deactivated for security and other reasonsA device can also be de-provisioned at end-of-life ordecommission.

IoT HubOpening up the channels

IngestionProcessingStagingServingStream AnalyticsStormSpark StreamingEnrichment and CurationModern Data Lifecycle

ProcessingMultiple processing”instances” can workoff of the broker.Must think about the end use:- Indexing for operational dashboards- Transformation/Curation for persistence- Spooling to an alternate storeMultiple processing”instances” can workon top of disaggregated storageLot’s of options abound depending on the need:- Relational engines- Big Data solutions / in-mem or not- Other

Bounded vs. Unbounded Processing

Stream Options Azure Stream Analytics Spark Streaming Storm

Azure Stream Analytics

What is Azure Stream Analytics? Cost effective event processing engine SQL-like syntax Naturally integrated with Azure IoT Hub and Event Hubs

Azure Stream AnalyticsEnd-to-End Architecture Overview Temporal Semantics Guaranteed delivery Guaranteed up timeEvent Inputs- Event Hub- IoT Hub- Azure BlobOutputsTransform-Temporal joinsFilterAggregatesProjectionsWindowsEtc.-SQL AzureAzure BlobsEvent HubADLSPowerBI EnrichCorrelateReference Data- Azure Blob- Azure SQL DBAzureStorage

Inputs sources for a Stream Analytics Job Currently supported input Data Streamsare Azure Event Hub , Azure IoT Hub andAzure Blob Storage. Multiple input DataStreams are supported. Advanced options lets you configure howthe Job will read data from the input blob(which folders to read from, when a blobis ready to be read, etc). Reference data is usually static or changesvery slowly over time. Must be stored in Azure BlobStorage. Cached for performance

Output for Stream Analytics JobsCurrently data stores supported as outputsAzure Blob storage: creates log files with temporal query resultsIdeal for archivingAzure Table storage:More structured than blob storage, easier to setup than SQLdatabase and durable (in contrast to event hub)SQL database: Stores results in Azure SQL Database tableIdeal as source for traditional reporting and analysisEvent hub: Sends an event to an event hubIdeal to generate actionable events such as alerts ornotificationsService Bus Queue: sends an event on a queueIdeal for sending events sequentiallyService Bus Topics: sends an event to subscribersIdeal for sending events to many consumersPowerBI.com:Ideal for near real time reporting!DocumentDb:Ideal if you work with json and object graphs

ASA: Three Types of Windows Every window operation outputs events at the end of the window The output of the window will be single event based on the aggregatefunction used. The event will have the time stamp of the window All windows have a fixed lengthTumbling windowHopping windowSliding windowAggregate per time intervalSchedule overlapping windowsWindows constant re-evaluated

Multiple Steps, Multiple Outputs A query can have multiple steps to enablepipeline executionWITH Step1 AS (SELECT Count(*) AS CountTweets, Topic A step is a sub-query defined using WITH(“common table expression”) The only query outside of the WITH keywordis also counted as a step Can be used to develop complex queriesmore elegantly by creating a intermediarynamed result Each step’s output can be sent to multipleoutput targets using INTOFROM TwitterStream PARTITION BYPartitionIdGROUP BY TumblingWindow(second, 3),Topic, PartitionId),Step2 AS (SELECT Avg(CountTweets)FROM Step1GROUP BY TumblingWindow(minute, 3))SELECT * INTO Output1 FROM Step1SELECT * INTO Output2 FROM Step2SELECT * INTO Output3 FROM Step2

Azure Streaming Analyticswhere the smarts happen

ure StorageAzure SQL DBServingEnrichment and CurationModern Data Lifecycle

Batch Processing Azure Data Lake HDInsight Spark

Azure Data LakeAnalyticsServicePartnersHDInsight Integrated analytics andstorage Fully managed Easy to use–“dial for scale”U-SQL Proven at scaleApplicationsDevicesRelationalWebYARN Analyze data of any size,shape or speedHDFS Open-standards basedStoreSensorsSocialClickstreamVideo48

HDInsight Patterns/What Works Anti-Pattern/Danger Batch processingAnything that requires: Map and reduce Joins Lots of aggregation Complex transactional needs Multiple schemas on same data Fast Granular security requirements Not a relational database replacement Not fast

ADL Store Unlimited Scale Optimized for analytics and IoT systems. Each file in ADL Store is sliced into blocks,distributed across multiple data nodes in thebackend storage system.Azure Data Lake Store fileBlock 1Block 2 Block 2 With sufficient number of backend storage datanodes, files of any size can be stored. Backend storage runs in the Azure cloud whichhas virtually unlimited resources.BlockData nodeBlockData nodeBlockData nodeBlockData nodeBackend StorageBlockData nodeBlockData node

ADL Store: High Availability and ReliabilityWrite Azure maintains 3 replicas of each data object perregion across three fault and upgrade domainsCommitReplica 1Fault/upgradedomains Each create or append operation on a replica isreplicated to other two Writes are committed to application only after allreplicas are successfully updated Read operations can go againstany replicaReplica 2Replica 3 Provides ‘read-after-write’ consistencyData is never lost or unavailableeven under failures

ADL Store: Enterprise Grade Security Auditing, alerting, access control - all from withina single web-based portal Azure Active Directory integration for identity andaccess management

Apache Spark – An Unified FrameworkAn unified, open source, parallel, data processing framework for Big Data AnalyticsSpark SQLInteractiveQueriesSparkStreamingSpark ocessingSpark Core EngineYarnMesosStandaloneSchedulerIntro to Apache Spark (Brain-Friendly Tutorial): https://www.youtube.com/watch?v rvDpBTV89AM

What makes Spark fast?Read fromHDFSRead fromHDFSWrite toHDFSRead fromHDFSWrite toHDFS

Spark (Preview) on Azure HDInsight Fully Managed Service 100% open source Apache Spark and Hadoop bitsLatest releases of SparkFully supported by Microsoft and Hortonworks99.9% Azure Cloud SLA Coming Soon: Advanced Enterprise Features Integration with Azure Data Lake StoreRole based security and auditEncryption at rest and in transitCertifications: PCI in addition to existing ISO 27018, SOC, HIPAA, EU-MC

Optimized for Data Scientist Productivity On-demand compute Dynamically scale cluster to 1000s of cores to compress time of the ML job Coming Soon: Auto-scale during job execution or time-based Tools for experimentation and development Jupyter Notebooks (scala, python, automatic data visualizations)IntelliJ plugin (integrated job submission, remote debugging)ODBC connector for Power BI, Tableau, Excel, etcon SparkML algorithms in R parallelized using SparkR Studio

Basic building blocksResilient Distributed Datasets (RDDs)Lowest level set of object representing data, can be stored in memory or disk across a cluster.DataFrameHigher level abstraction API.A distributed collection of rows organized into named columns.RDD with schema and optimizationsDataSet APIsExtension of Spark’s DataFrame API that supports static typing and user functions that run directly on existingJVM types (such as user classes).Compile time type safety with optimizations.“Preview”.Spark 2.0 will unify these APIsStructuring Spark: DataFrames, Datasets, and Streaming: https://www.youtube.com/watch?v i7l3JQRx7Qw&feature youtu.be57

RDDs vs DataFrames vs DataSetsWhich one to ng-spark-dataframes-datasets-and-streaming

Spark Cluster ArchitectureDriver ProgramSparkContextCluster ManagerWorker NodeWorker NodeWorker NodeHDFS60

RDDs: Transformations and ActionsObviously does not apply to persistent RDDs.transformationsRDDRDDRDDRDDRDDactions61Value

Developing Spark Apps with Notebooks62

JupyterJupytr Interactive web-based Notebookfull list hereJupyter Qt consoleJupyter Terminal consoleNotebook viewer (nbviewer)63

Integration with BI Reporting Tools64

IngestionProcessingStagingServingADLSAzure DWAzure SQL DBHbaseCassandraAzure StoragePower BIEnrichment and CurationAzure Data FactoryAzure MLModern Data Lifecycle

DashboardsInteractiveExplorationServing

ServingTypically serving here is constrained:- Constrained on particular access patterns- Constrained on dashboard scenariosServing here optimized for reducing “observationlatency”Typically serving here can be unconstrained- Still used for dashboards- Used for data exploration and ML- Used for interactive BI- Often used for broad sharing

Azure DW and Analytical WorkloadsStore large volumes of data.Consolidate disparate data into a single location.Shape, model, transform and aggregate data.Perform query analysis across large datasets.Ad-hoc reporting across large data volumes.All using simple SQL constructs.

ADL and SQLDW

Pattern: Compute consumption

Pattern: SaaS customer isolation

Logical overview

Distributed queries

Azure Data Warehousewhere the smarts happen

Interactive and Exploration

Interactive and Dashboards

IngestionEvent HubsIoT HubsService BusKafkaProcessingHDInsightADLAStormSparkStream AnalyticsStagingServingADLSAzure StorageAzure SQL DBADLSAzure DWAzure SQL DBHbaseCassandraAzure StoragePower BIEnrichment and CurationAzure Data FactoryAzure MLModern Data Lifecycle

key concepts for moderndatabase architecturesdatabase / datastoretypesreasons to go explore

are Azure Event Hub , Azure IoT Hub and Azure Blob Storage.Multiple input Data Streams are supported. Advanced options lets you configure how the Job will read data from the input blob (which folders to read from, when a blob is ready to be read, etc). Reference data is usually static or changes very slowly over time.

Related Documents:

Gain Insights into your Microsoft Azure Data using Splunk Jason Conger Splunk. Disclaimer 2 . Deploying Splunk on Azure Collecting Machine Data from Azure Splunk Add-ons Use cases for Azure Data in Splunk 3. Splunk available in Azure Marketplace 4. Splunk in Azure Marketplace

Introducing Windows Azure Mobile Services Windows Azure Mobile Services is a Windows Azure service offering designed to make it easy to create highly-functional mobile apps using Windows Azure. Mobile Services brings together a set of Windows Azure services that enable backend capabilities for your apps. Mobile Services provides the

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

AZURE TAGGING BEST PRACTICES Adding tags to your Azure resources is very simple and can be done using Azure Portal, Azure PowerShell, CLI, or ARM JSON templates. You can tag any resources in Azure, and using this service is free. The tagging is done on the Azure platform level and does not impact the performance of the resource in any way.

Resource Manager and the Azure portal through Azure Arc to facilitate resource management at a global level. This also means a single vendor for support and billing. Save time and resources with regular and consistent feature and security updates. Access Azure hybrid services such as Azure Security Center, Azure Backup, and Azure site recovery.

DE LAS UNIDADES PROGRAMA CURRICULAR UNIDAD 2 - Introduccion a los servicios de azure - Los servicios de Azure - Cómo crear un App Service en Azure - Administrar App Service con Azure Cloud Shell Azure UNIDAD 3 - Introduccion al Modulo - Regiones y centros de datos en azure - Zonas Geograficas en

students solve a variety of challenges faced in education through Microsoft Azure and the cloud. Azure for research staff Azure for teaching staff Azure for students Azure for academic institutions Azure is a powerful tool for research and education, and Microsoft provides a number of programs to meet the needs of academic institutions.

What You can do with Azure Data Factory Access to data sources such as SQL Server On premises, SQL Azure, and Azure Blob storage Data transformation through Hive, Pig, Stored Procedure, and C#. Monitoring the pipeline of data, validation and execution of scheduled jobs Load it into desired Destinations such as SQL Server On premises, SQL Azure, and Azure Blob storage