Using Azure Data Services forModern Data ApplicationsLara Rubbelke Principal SDE MicrosoftAllan Mitchell Cloud Architect elastabytes
OutcomesWe want you to leave here understanding:key concepts for moderndatabase architecturesdatabase / datastoretypesreasons to go explore
Agenda Context of Lambda Architecture Ingestion Hot Path Processing Cold Path Processing Staging Enrichment and Serving
4 Questions
?
What Will Happen?
Lambda Architecture
The Old and the New Data Processing
Lambda Architecture – High Level ViewAll dataPartialaggregatePrecompute viewsPartialaggregatePartialaggregateBatch viewsReal-time dataReal-time viewsProcess StreamIncrement views
Lambda Architecture – Detailed ViewEvent DecorationEvent Processing n(message nalyticalStoreData Movement/ SyncInteractiveServing andConsumption
Lambda Architecture – Detailed ViewEvent DecorationEvent Processing n(message nalyticalStoreData Movement/ SyncIngestionInteractiveProcessingStagingServing andConsumption
Cortana Intelligence Gallery Demo
IngestionProcessingStagingServingEnrichment and CurationModern Data Lifecycle
IngestionEvent HubsIoT HubsService BusKafkaProcessingHDInsightADLAStormSparkStream AnalyticsStagingServingADLSAzure StorageAzure SQL DBADLSAzure DWAzure SQL DBHbaseCassandraAzure StoragePower BIEnrichment and CurationAzure Data FactoryAzure MLModern Data Lifecycle
IngestionProcessingStagingServingEvent HubsIoT HubsService BusKafkaEnrichment and CurationModern Data Lifecycle
IngestionData pushed to abroker for furtherprocessingTypically indicative of streaming approaches.Data pushed tostorage for furtherprocessingBest alignment with many existingsystems.Use scheduled jobs to synchronize ormove data.Typical “batch” processing.
Ingestion Options IoT Hub Bi-Directional Communication Event Hub Device to cloud mass ingestion Service Bus Complex filters and processing rules Kafka Common OSS integration RabbitMQ More common OSS integration, run in IaaS
IoT Hub
Azure IoT Suite: IoT HubConnect millions of devices to a partitioned applicationback-endDevices are not serversUse IoT Hub to enable secure bi-directional commsDevice-to-cloud and Cloud-to-deviceDurable messages (at least once semantics)Delivery receipts, expired messagesDevice communication errorsIndividual device identities and credentialsSingle device-cloud connection for allcommunications (C2D, D2C)Natively supports AMQP, HTTPDesigned for extensibility to custom protocolsDevice SDKs available for multiple platforms (e.g.RTOS, Linux, Windows)Multi-platform Service SDK.
Setup Retention Period Event Hub Endpoint C2D settings
Device provisioning Many systems involved(IoT Hub, device registry, ERPs, ) Device identity(composite devices, many concerns)1.2.3.4.5.Device provisioned at manufacturing into systemDevice connects for the first time and gets associated to itsregional data center (bootstrapped)As a result of customer interactions the device is activatedDevices can be deactivated for security and other reasonsA device can also be de-provisioned at end-of-life ordecommission.
IoT HubOpening up the channels
IngestionProcessingStagingServingStream AnalyticsStormSpark StreamingEnrichment and CurationModern Data Lifecycle
ProcessingMultiple processing”instances” can workoff of the broker.Must think about the end use:- Indexing for operational dashboards- Transformation/Curation for persistence- Spooling to an alternate storeMultiple processing”instances” can workon top of disaggregated storageLot’s of options abound depending on the need:- Relational engines- Big Data solutions / in-mem or not- Other
Bounded vs. Unbounded Processing
Stream Options Azure Stream Analytics Spark Streaming Storm
Azure Stream Analytics
What is Azure Stream Analytics? Cost effective event processing engine SQL-like syntax Naturally integrated with Azure IoT Hub and Event Hubs
Azure Stream AnalyticsEnd-to-End Architecture Overview Temporal Semantics Guaranteed delivery Guaranteed up timeEvent Inputs- Event Hub- IoT Hub- Azure BlobOutputsTransform-Temporal joinsFilterAggregatesProjectionsWindowsEtc.-SQL AzureAzure BlobsEvent HubADLSPowerBI EnrichCorrelateReference Data- Azure Blob- Azure SQL DBAzureStorage
Inputs sources for a Stream Analytics Job Currently supported input Data Streamsare Azure Event Hub , Azure IoT Hub andAzure Blob Storage. Multiple input DataStreams are supported. Advanced options lets you configure howthe Job will read data from the input blob(which folders to read from, when a blobis ready to be read, etc). Reference data is usually static or changesvery slowly over time. Must be stored in Azure BlobStorage. Cached for performance
Output for Stream Analytics JobsCurrently data stores supported as outputsAzure Blob storage: creates log files with temporal query resultsIdeal for archivingAzure Table storage:More structured than blob storage, easier to setup than SQLdatabase and durable (in contrast to event hub)SQL database: Stores results in Azure SQL Database tableIdeal as source for traditional reporting and analysisEvent hub: Sends an event to an event hubIdeal to generate actionable events such as alerts ornotificationsService Bus Queue: sends an event on a queueIdeal for sending events sequentiallyService Bus Topics: sends an event to subscribersIdeal for sending events to many consumersPowerBI.com:Ideal for near real time reporting!DocumentDb:Ideal if you work with json and object graphs
ASA: Three Types of Windows Every window operation outputs events at the end of the window The output of the window will be single event based on the aggregatefunction used. The event will have the time stamp of the window All windows have a fixed lengthTumbling windowHopping windowSliding windowAggregate per time intervalSchedule overlapping windowsWindows constant re-evaluated
Multiple Steps, Multiple Outputs A query can have multiple steps to enablepipeline executionWITH Step1 AS (SELECT Count(*) AS CountTweets, Topic A step is a sub-query defined using WITH(“common table expression”) The only query outside of the WITH keywordis also counted as a step Can be used to develop complex queriesmore elegantly by creating a intermediarynamed result Each step’s output can be sent to multipleoutput targets using INTOFROM TwitterStream PARTITION BYPartitionIdGROUP BY TumblingWindow(second, 3),Topic, PartitionId),Step2 AS (SELECT Avg(CountTweets)FROM Step1GROUP BY TumblingWindow(minute, 3))SELECT * INTO Output1 FROM Step1SELECT * INTO Output2 FROM Step2SELECT * INTO Output3 FROM Step2
Azure Streaming Analyticswhere the smarts happen
ure StorageAzure SQL DBServingEnrichment and CurationModern Data Lifecycle
Batch Processing Azure Data Lake HDInsight Spark
Azure Data LakeAnalyticsServicePartnersHDInsight Integrated analytics andstorage Fully managed Easy to use–“dial for scale”U-SQL Proven at scaleApplicationsDevicesRelationalWebYARN Analyze data of any size,shape or speedHDFS Open-standards basedStoreSensorsSocialClickstreamVideo48
HDInsight Patterns/What Works Anti-Pattern/Danger Batch processingAnything that requires: Map and reduce Joins Lots of aggregation Complex transactional needs Multiple schemas on same data Fast Granular security requirements Not a relational database replacement Not fast
ADL Store Unlimited Scale Optimized for analytics and IoT systems. Each file in ADL Store is sliced into blocks,distributed across multiple data nodes in thebackend storage system.Azure Data Lake Store fileBlock 1Block 2 Block 2 With sufficient number of backend storage datanodes, files of any size can be stored. Backend storage runs in the Azure cloud whichhas virtually unlimited resources.BlockData nodeBlockData nodeBlockData nodeBlockData nodeBackend StorageBlockData nodeBlockData node
ADL Store: High Availability and ReliabilityWrite Azure maintains 3 replicas of each data object perregion across three fault and upgrade domainsCommitReplica 1Fault/upgradedomains Each create or append operation on a replica isreplicated to other two Writes are committed to application only after allreplicas are successfully updated Read operations can go againstany replicaReplica 2Replica 3 Provides ‘read-after-write’ consistencyData is never lost or unavailableeven under failures
ADL Store: Enterprise Grade Security Auditing, alerting, access control - all from withina single web-based portal Azure Active Directory integration for identity andaccess management
Apache Spark – An Unified FrameworkAn unified, open source, parallel, data processing framework for Big Data AnalyticsSpark SQLInteractiveQueriesSparkStreamingSpark ocessingSpark Core EngineYarnMesosStandaloneSchedulerIntro to Apache Spark (Brain-Friendly Tutorial): https://www.youtube.com/watch?v rvDpBTV89AM
What makes Spark fast?Read fromHDFSRead fromHDFSWrite toHDFSRead fromHDFSWrite toHDFS
Spark (Preview) on Azure HDInsight Fully Managed Service 100% open source Apache Spark and Hadoop bitsLatest releases of SparkFully supported by Microsoft and Hortonworks99.9% Azure Cloud SLA Coming Soon: Advanced Enterprise Features Integration with Azure Data Lake StoreRole based security and auditEncryption at rest and in transitCertifications: PCI in addition to existing ISO 27018, SOC, HIPAA, EU-MC
Optimized for Data Scientist Productivity On-demand compute Dynamically scale cluster to 1000s of cores to compress time of the ML job Coming Soon: Auto-scale during job execution or time-based Tools for experimentation and development Jupyter Notebooks (scala, python, automatic data visualizations)IntelliJ plugin (integrated job submission, remote debugging)ODBC connector for Power BI, Tableau, Excel, etcon SparkML algorithms in R parallelized using SparkR Studio
Basic building blocksResilient Distributed Datasets (RDDs)Lowest level set of object representing data, can be stored in memory or disk across a cluster.DataFrameHigher level abstraction API.A distributed collection of rows organized into named columns.RDD with schema and optimizationsDataSet APIsExtension of Spark’s DataFrame API that supports static typing and user functions that run directly on existingJVM types (such as user classes).Compile time type safety with optimizations.“Preview”.Spark 2.0 will unify these APIsStructuring Spark: DataFrames, Datasets, and Streaming: https://www.youtube.com/watch?v i7l3JQRx7Qw&feature youtu.be57
RDDs vs DataFrames vs DataSetsWhich one to ng-spark-dataframes-datasets-and-streaming
Spark Cluster ArchitectureDriver ProgramSparkContextCluster ManagerWorker NodeWorker NodeWorker NodeHDFS60
RDDs: Transformations and ActionsObviously does not apply to persistent RDDs.transformationsRDDRDDRDDRDDRDDactions61Value
Developing Spark Apps with Notebooks62
JupyterJupytr Interactive web-based Notebookfull list hereJupyter Qt consoleJupyter Terminal consoleNotebook viewer (nbviewer)63
Integration with BI Reporting Tools64
IngestionProcessingStagingServingADLSAzure DWAzure SQL DBHbaseCassandraAzure StoragePower BIEnrichment and CurationAzure Data FactoryAzure MLModern Data Lifecycle
DashboardsInteractiveExplorationServing
ServingTypically serving here is constrained:- Constrained on particular access patterns- Constrained on dashboard scenariosServing here optimized for reducing “observationlatency”Typically serving here can be unconstrained- Still used for dashboards- Used for data exploration and ML- Used for interactive BI- Often used for broad sharing
Azure DW and Analytical WorkloadsStore large volumes of data.Consolidate disparate data into a single location.Shape, model, transform and aggregate data.Perform query analysis across large datasets.Ad-hoc reporting across large data volumes.All using simple SQL constructs.
ADL and SQLDW
Pattern: Compute consumption
Pattern: SaaS customer isolation
Logical overview
Distributed queries
Azure Data Warehousewhere the smarts happen
Interactive and Exploration
Interactive and Dashboards
IngestionEvent HubsIoT HubsService BusKafkaProcessingHDInsightADLAStormSparkStream AnalyticsStagingServingADLSAzure StorageAzure SQL DBADLSAzure DWAzure SQL DBHbaseCassandraAzure StoragePower BIEnrichment and CurationAzure Data FactoryAzure MLModern Data Lifecycle
key concepts for moderndatabase architecturesdatabase / datastoretypesreasons to go explore
are Azure Event Hub , Azure IoT Hub and Azure Blob Storage.Multiple input Data Streams are supported. Advanced options lets you configure how the Job will read data from the input blob (which folders to read from, when a blob is ready to be read, etc). Reference data is usually static or changes very slowly over time.
Gain Insights into your Microsoft Azure Data using Splunk Jason Conger Splunk. Disclaimer 2 . Deploying Splunk on Azure Collecting Machine Data from Azure Splunk Add-ons Use cases for Azure Data in Splunk 3. Splunk available in Azure Marketplace 4. Splunk in Azure Marketplace
Introducing Windows Azure Mobile Services Windows Azure Mobile Services is a Windows Azure service offering designed to make it easy to create highly-functional mobile apps using Windows Azure. Mobile Services brings together a set of Windows Azure services that enable backend capabilities for your apps. Mobile Services provides the
Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original
AZURE TAGGING BEST PRACTICES Adding tags to your Azure resources is very simple and can be done using Azure Portal, Azure PowerShell, CLI, or ARM JSON templates. You can tag any resources in Azure, and using this service is free. The tagging is done on the Azure platform level and does not impact the performance of the resource in any way.
Resource Manager and the Azure portal through Azure Arc to facilitate resource management at a global level. This also means a single vendor for support and billing. Save time and resources with regular and consistent feature and security updates. Access Azure hybrid services such as Azure Security Center, Azure Backup, and Azure site recovery.
DE LAS UNIDADES PROGRAMA CURRICULAR UNIDAD 2 - Introduccion a los servicios de azure - Los servicios de Azure - Cómo crear un App Service en Azure - Administrar App Service con Azure Cloud Shell Azure UNIDAD 3 - Introduccion al Modulo - Regiones y centros de datos en azure - Zonas Geograficas en
students solve a variety of challenges faced in education through Microsoft Azure and the cloud. Azure for research staff Azure for teaching staff Azure for students Azure for academic institutions Azure is a powerful tool for research and education, and Microsoft provides a number of programs to meet the needs of academic institutions.
What You can do with Azure Data Factory Access to data sources such as SQL Server On premises, SQL Azure, and Azure Blob storage Data transformation through Hive, Pig, Stored Procedure, and C#. Monitoring the pipeline of data, validation and execution of scheduled jobs Load it into desired Destinations such as SQL Server On premises, SQL Azure, and Azure Blob storage