Data Warehouse And OLAPData Warehouse And OLAP

1y ago
13 Views
2 Downloads
1.14 MB
31 Pages
Last View : 3m ago
Last Download : 3m ago
Upload by : Isobel Thacker
Transcription

Data Warehouse and OLAP Week 5 1

Midterm I Friday, March 4 Scope – Homework assignments 1 – 4 – Open book

Team Homework Assignment #7 R Read d pp. 121 – 139, 139 146 – 150 off the h text book. b k Do Examples 3.8, 3.10 and Exercise 3.4 (b) and (c). Prepare for the results of the homework assignment. Due date – beginning of the lecture on Friday March11th.

Topics Definition of data warehouse Multidimensional data model Data warehouse architecture From data warehousing to data mining

What is Data Warehouse? (1) A data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site A data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise need to make strategic decisions

What is Data Warehouse? (2) Data warehouses provide on on‐line line analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitate effective data generalization and data mining Many other data mining functions, such as association, classification, prediction, and clustering, can be b integrated i t t d with ith OLAP operations ti tto enhance h interactive mining of knowledge at multiple levels of abstraction

What is Data Warehouse? (3) A decision support database that is maintained separately from the organization’s operational database “A data warehouse is a subject‐oriented, integrated, time‐variant, and nonvolatile collection of data in support of management’s decision‐making process [Inm96].”—W. H. Inmon

Data Warehouse Framework data mining Figure 1.7 Typical framework of a data warehouse for AllElectronics 8

Data Warehouse is S bj Subject-Oriented Oi d Organized around major subjects, such as customer, product, sales, etc. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process

Data Warehouse is I t Integrated t d Constructed by integrating multiple, heterogeneous data sources – relational databases, flat files, on‐line transaction records Data cleaning and data integration techniques are applied. applied – Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc.

D t W Data Warehouse h i Time is Ti Variant V i t The time horizon for the data warehouse is significantly longer g than that of operational p systems y – Operational database: current value data – Data warehouse data: provide information from a historical perspective (e.g., past 5‐10 years) Every key structure in the data warehouse – Contains an element of time, explicitly or implicitly

D t W Data Warehouse h i Nonvolatile is N l til A physically separate store of data transformed from the operational environment Operational update of data does not occur in the data warehouse environment – Does not require transaction processing, recovery, and concurrency control mechanisms – Requires R i only l two t operations ti i d in data t accessing: i initial loading of data and access of data

OLTP vs. vs OLAP Table 3.1 Comparison between OLTP and OLAP 13

Why Separate is Data Warehouse Needed? (1) Why not perform on‐line analytical processing directly on operational databases instead of spending additional time and resources to construct a separate data warehouse?

Why Separate is Data Warehouse N d d? (2) Needed? ( ) High g p performance for both systems y – DBMS— tuned for OLTP: searching for particular records, indexing, hashing, concurrency control, recovery – Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, consolidation (summarization and aggregation) ti )

Topics Definition of data warehouse Multidimensional data model Data warehouse architecture From data warehousing to data mining

From Tables and Spreadsheets to D C Data Cubes b A data warehouse is based on a multidimensional data model This model views data in the form of a data cube A data cube allows data to be modeled and viewed in multiple dimensions

From Tables and Spreadsheets to Data C b (1) Cubes A data cube is defined by facts and dimensions – Facts are data which data warehouse focus on Fact tables contain numeric measures (such as dollars sold) and keys to each of the related dimension tables – Dimensions are perspectives with respect to fact Dimension tables describe the dimension with attributes. For example, item (item name, brand, type), or time(day, time(day week week, month, month quarter, quarter year)

Figure 1.6. Fra agments of o relations from m a relatio onal datab base for AlllElectronic cs 19

From Tables and Spreadsheets t Data to D t Cubes C b (2) dimensions Facts (numerical measures) Table 3.2 A 2-D view of sales data for AllElectronics according to the di dimensions i ti time and d item, it where h the th sales l are from f branches b h located l t d in i the city of Vancouver. The measure displayed is dollar sold (in thousands). 20

From Tables and Spreadsheets t Data to D t Cubes C b (3) Table 3.3 A 3-D view of sales data for AllElectronics according to the dimensions time, item, and location. The measure displayed is dollar sold (in thousands). 21

From Tables and Spreadsheets t Data to D t Cubes C b (4) Figure 3.1 A 3-D data cube representation of the data in Table 3.3, according to the dimensions time, item, and location. The measure displayed is dollar sold (in thousands). 22

From Tables and Spreadsheets t Data to D t Cubes C b (5) Figure 3.2 A 4-D data cube representation, according to the dimensions time, item, location, and supplier. The measure displayed is dollar sold dollar sold (in thousands). 23

Cuboid A data cube is a lattice of cuboids The total number of cuboids The apex cuboid The base cuboid 24

Figure g 3.14 Lattice of cuboids, making g up p a 3-D data cube. Each cuboid represents a different group-by. The base cuboid contains the three dimensions city, item, and year. 25

The Curse of Dimensionality How many cuboids are there in a n‐dimensional data cube? How many cuboids are there in a n‐dimensional n dimensional data cube and each dimension (i) has the number of level, (Li)? 26

Conceptual Modeling of Data W h Warehouses M Modeling d li data d t warehouses: h di dimensions i & measures – Star schema: A fact table in the middle connected to a set of dimension tables – Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake – Fact F t constellations: t ll ti M lti l fact Multiple f t tables t bl share h di dimension i tables, viewed as a collection of stars, therefore called ggalaxyy schema or fact constellation

Star Schema time time keyy day day of the week month quarter q year item Sales Fact Table time key time key item key item key item name brand type supplier type branch key branch branch key branch name branch type location key dollars sold unit sold location location key street cit city province or street country Figure 3.4 Star schema of a data warehouse for sales. 28

Snowflake Schema time time key day day of the week month quarter year supplier item Sales Fact Table time key item key item key item name brand type supplier key supplier key supplier key supplier type branch key location key location key branch branch key branch name branch type b h t dollars sold units sold location location key street city city city key city province or street country Figure 3.4 Snowflake schema of a data warehouse for sales. 29

Fact Constellation Shipping Fact Table time time key day day of the week month quarter year item key Sales Fact Table time key item key k branch key item item key item name brand type supplier type branch key branch name branch type branch type dollars sold unit sold shipper key from location to location dollars sold unit shipped pp location key location key branch time key ti k location location key street city shipper province or street country shipper hi shipper key shipper name location key location key shipper type Figure 3.5 Fact constellation schema of a data warehouse for sales and shipping. 30

Exercise Exercise 3 3.5 5 (a) – page 153 31

The time horizon for the data warehouse is significantly longer than that of operational systems - Operational database: current value data - Data warehouse data: provide information from a historical perspective (e.g., past 5‐10 years) Every key structure in the data warehouse

Related Documents:

Management under Master Data Define Warehouse Numbers. 2. Check the warehouse number assignment in Customizing for Extended Warehouse Management under Master Data Assign Warehouse Numbers. 3. Check the warehouse number control in Customizing for Extended Warehouse Management under Master Data Define Warehouse Number Control.

1.3 Common Data Warehouse Tasks 1-4 1.4 Data Warehouse Architectures 1-5 1.4.1 Data Warehouse Architecture: Basic 1-5 1.4.2 Data Warehouse Architecture: with a Staging Area 1-6 1.4.3 Data Warehouse Architecture: with a Staging Area and Data Marts 1-6 2 Data Warehousing Logical Design 2.1 Logical Versus Physical Design in Data Warehouses 2-1

location: fort worth, tx warehouse status: approved county: tarrant warehouse capacity: 85,000 warehouse code: 853007 001 location(s) warehouse name: eugene b smith & company , inc license type: unlicensed location: galveston, tx warehouse status: approved county: galveston warehouse capacity: 37,180 warehouse code: 858054 001 location(s)

location: fort worth, tx warehouse status: approved county: tarrant warehouse capacity: 85,000 warehouse code: 853007 001 location(s) warehouse name: eugene b smith & company , inc license type: unlicensed location: galveston, tx warehouse status: approved county: galveston warehouse capacity: 37,180 warehouse code: 858054 001 location(s)

Inventory data Warehouse Outgoing Inventory IoT Cloud gathers warehouse inventory data from Warehouse IoT Cloud gathers dispatched inventory data from Warehouse . Based on the warehouse floor design, budget, type of industry and materials , suitable option or combination of options possible to choose.

The following table maps standard data-warehouse concepts to those in BigQuery: Data warehouse BigQuery Data warehouse The BigQuery service replaces the typical hardware setup for a traditional data warehouse. That is, it serves as a collective home for all analytical data in an organization. Data mart Datasets are collections of tables that .

business value of a data warehouse is for the business owners. In Section 2.3 we discuss different data models and the major building blocks in a data warehouse. In Section 2.4 we discuss different operations required to implement a data warehouse. In Section 2.5 we discuss how we can use the data warehouse for reporting, and we summarize in Sec-

Accretion in Astrophysics: Theory and Applications Solutions to Problem Set I (Ph. Podsiadlowski, SS10) 1 Luminosity of a Shakura-Sunyaev (SS) Disk In lecture we derived the following expression for the effective temperature, Te ff as a function of radial distance from the central compact star: Teff " 3GMM 8πσr3 #1/4 1 q r0/r 1/4 where σ is the Stefan-Boltzmann constant. a.) The .