DATA VAULT MODELING GUIDE - The Hans Blog DATA .

3y ago
21 Views
2 Downloads
780.23 KB
16 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Allyson Cromer
Transcription

DATA VAULT MODELING GUIDEIntroductory Guide to Data Vault ModelingGENESEE ACADEMY, LLC2012Authored by: Hans Hultgren

DATA VAULT MODELING GUIDEIntroductory Guide to Data Vault ModelingF o rwardData Vault modeling is most compelling when applied to an enterprise data warehouseprogram (EDW). Several key decisions concerning the type of program, related projects, andthe scope of the broader initiative are then answered by this designation. In short, theorganization contemplating this initiative is committing to an integrated, non-volatile, timevariant and business key driven data warehouse program.The data vault principles are specifically well suited for such a program and – when appliedconsistently – can provide the organization with some very compelling benefits. Theseinclude auditability, agility, adaptability, alignment with the business, and support foroperational data warehousing initiatives.DATA VAULT MODELING GUIDE 5/15/2012To gain these benefits however, the organization will need to commit to both EDW programlevel factors as well as specific data vault modeling patterns, rules and methods. This guidepresents data vault modeling in the context of the EDW.1

I nde xDATA VAULT MODELING GUIDE 5/15/2012FORWARD.1INDEX .2THE EDW P ROGRAM .3THE DATA VAULT FUNDAMENTALS.4MODELING WITH THE DATA VAULT.6THINK DIFFERENTLY.7THE BUSINESS KEY .9BUSINESS KEY ALIGNMENT. 10ARCHITECTURE . 12SAMPLE DATA VAULT MODEL. 13HYBRID TABLES . 14APPLYING THE DATA VAULT . 14FINAL NOTE . 152

The EDW P ro gramThe Enterprise Data Warehousing (EDW) Program represents the ongoing data warehousingactivities of the organization. These activities will include the maintenance functions of thedata warehouse in addition to the continuous flow of incremental projects related to theenterprise data warehouse. These incremental projects are comprised ofa) Adapting to new data sources from internal new systems, external integrations,and from acquisitions, andb) Absorbing changes to existing sources including new tables, new attributes, newdomain values, new formats and new rules, andc) Adapting to new business rules concerning the alignment, grain, cardinality anddomain values of business keys as well as changes to the relationships betweenthem, andd) Accommodating new downstream delivery requirements including new subjectareas, new business rules, additional regulatory and other compliance reportingand changes to operational latency requirements.DATA VAULT MODELING GUIDE 5/15/2012For this reason, the EDW itself is not designated a “project” (there is no discernablebeginning and end, and no pre-determined set of specific goals).3In a broader sense, this program can be defined as the BI Function or BI Program within anorganization. To be clear however, this is not simply the group that owns the OLAP tools.This is the higher level view of all data warehousing and business intelligence (DWBI) withinthe organization which includes the business intelligence competency center or BICC, theEDW or CDW team, the related governance components and the environment both technicaland organizational. The success of a DWBI program depends on an organizationalcommitment and a corporate BI culture.It is precisely in this context where the data vault approach is the most valuable. So theData Vault EDW is defined first and foremost by the enterprise wide, long term DWBIprogram – from a technical architecture perspective and from an organizational culturalalignment perspective as well.

The Dat a Va ult F undamentalsThe data vault consists of three core components, the Hub, Link and Satellite. Above allother DV Program rules and factors, the commitment to the consistency and integrity ofthese constructs is paramount to a successful DV Program.The Hub represents a Core Business Concept such as Customer, Vendor, Sale or Product.The Hub table is formed around the Business Key of this concept and is established the firsttime a new instance of that business key is introduced to the EDW. It may require a multiplepart key to assure an enterprise wide unique key however the cardinality of the Hub must be1:1 with a single instance of the business concept. The Hub contains no descriptiveinformation and contains no FKs. The Hub consists of the business key only, with awarehouse machine sequence id, a load date/time stamp and a record source.Fig. 1 Hu bFig. 2 Li n kNotice the similarity between the Hub and the Link. Both represent the first time a corebusiness concept (Hub) or natural business relationship (Link) is introduced to the DW.DATA VAULT MODELING GUIDE 5/15/2012A Link represents a natural business relationships between business keys and isestablished the first time this new unique association is presented to the EDW. It canrepresent an association between several Hubs and sometimes other Links. It doesmaintain a 1:1 relationship with the unique and specific business defined associationbetween that set of keys. Just like the Hub, it contains no descriptive information. The Linkconsists of the sequence ids from the Hubs and Links that it is relating only, with awarehouse machine sequence id, a load date/time stamp and a record source.4

The Satellite contains the descriptive information (context) for a business key. There canbe several Satellites used to describe a single business key (or association of keys) howevera Satellite can only describe one key (Hub or a Link). There is a good amount of flexibilityafforded the modelers in how they design and build Satellites. Common approaches includeusing the subject area, rate of change, source system, or type of data to split out contextand design the Satellites. The Satellite is keyed by the sequence id from the Hub or Link towhich it is attached plus the date/time stamp to form a two part key.Note that the Satellite then is the only construct that manages time slice data (datawarehouse historical tracking of values over time).Fig. 3 Sat el l i t eA Satellite does not have a Sequence ID of its own and in fact cannot have a different keythan the Hub or Link sequence to which it is attached. Further, a Satellite does not have anyforeign key constraints (no snow-flaking, branching or bridging).DATA VAULT MODELING GUIDE 5/15/2012These three constructs are the building blocks for the DV EDW. Together they can be usedto represent all integrated data from the organization. The Hubs are the business keys, theLinks represent all relationships and the Satellites provide all the context and changes overtime.5Fig. 4 D at a Vau l t M odel

When we look at the Hub and Link together, they form the backbone or “Skeletal Structure”of the model. This backbone model represents a 1:1 relationship with core BusinessConcepts and their natural business relationships.Fig. 5 Bac kbon e or Skel et al St ru c t u reNote that all context (descriptive information) and all history are found in the Satellites.Mo de ling w it h t he Da ta Va ultThe process of modeling with the Data Vault is closely aligned with business analysis. Thefirst step is to identify the Hubs for the given subject area. Once the Hubs are defined wenext model the natural business relationships between these Hubs. Then we design andadd the Satellites to provide context to these constructs.TASK1.1Identify Business Concepts1.2Establish EWBK for Hubs1.3Model Hubs2.1Identify Natural Business Relationships2.2Analyze Relationships Unit of Work2.3Model Links3.1Gather Context Attributes to Define Keys3.2Establish Criteria & Design Satellites3.3Model SatellitesFig. 6 St eps t o model i n g wi t h D at a Vau l tThis process is not concerned with separating facts from dimensions, or from separatingmaster entities from events or transactions. The focus is squarely on core businessDATA VAULT MODELING GUIDE 5/15/2012STEP6

concepts – and their unique business keys. In that regard, all of the above are candidatesfor Hubs. For example events including transactions are modeled as Hubs.Think Dif ferentlyModeling with Data Vault requires us to think differently. Most of us first learned 3NFmodeling for operational databases. To manage third normal form, all attributes in an Entitymust depend directly on the key of that Entity. So the context attributesthat describe a customer (last name, first name, address, city, state,postal code, home phone, mobile phone, etc.) must be placed in theCustomer Entity where the key uniquely identifies an instance of acustomer. If we included attributes that do not depend on the key of thatentity then we would not be in 3rd normal form. Likewise if we placedsome of the attributes that depend on that key into another entity thenagain we would no longer be in 3rd normal form.At some point we may have also learned how to model using dimensionalmodeling techniques. Though different modeling constructs and otherrules for modeling, the concept of including context attributes inside atable with a key for those attributes remains the same. A ConformedDimension requires that context attributes depend on the key of thatDimension. Again if we move out attributes depending on a dimension keyto some other construct then we no longer have a conformed dimension.DATA VAULT MODELING GUIDE 5/15/2012Shown here is a Customer Entity in 3NF where we can see the BusinessKey (Customer Code), the relationship (Customer Class SID) and all theFig. 7 3NFcontext in the form of all remaining attributes in the table. Notice that this is one tableincluding all of this these components.7Fig. 8 D at a Vault CustomerCu s t omerWith Data Vault modeling we separatethe business keys from the relationshipsfrom the context. All of the businesskeys are modeled as Hubs, allrelationships and associations aremodeled as Links, and all context andhistory is provided for through theSatellites. Shown here we can see thatthe Business Key (Customer Code) is ina Hub (H Customer), the relationship(Customer Class ID) is in a Link(L Customer Cust Class), and thecontext is modeled in several Satellites.

Look back to the 3NF model and now consider that all of the same information (the samecomponents of data) about “Customer” are represented fully in both models. Interestinglyboth models represent a dependency on a single business key. Actually if we draw circlesaround each of these models we can see that what is inside each circle is a representationof the same single business key, the same set of attributes and the same relationship.Notice here that the reaching out from the “Customer”to the Customer Class is modeled through arelationship with a FK inside the 3NF circle.The same is true for the DVcircle in that reaching out from“Customer” to Customer Class ismodeled though a relationship(Link) with a FK in that Link andon the perimeter of the circle.Fig. 9 3NF M odelThis means that a) all things in eithercircle are dependent on a singlebusiness key, b) relationships passthrough the circle directly from tablewith the BK, c) the only grain shift ineither circle is based on Date/Timestamp for the purpose of trackinghistory.Fig. 10 D at a Vau l t M odelHINT: As you progress with Data Vault Modeling, this view of thinking differently willbecome more and more important. We tend to see tables the same way we havealways see them. For this reason, we tend to re-combine keys with relationships withcontext. But as soon as you do, you actually stop vaulting and return to other formsof modeling. So before you change the grain of a satellite, include a relationship FKin a Satellite or Hub, please consider the above circles analysis and reconsider.DATA VAULT MODELING GUIDE 5/15/2012It is important to think of the DVcircle in the same way as the 3NFcircle.8

The Bus iness K e yAt the core of the Data Vault is the Hub which we refer to as the business key. Perhaps themost important initial step in modeling a DV EDW is to identify and thoughtfully design thesebusiness keys. To begin with, a Business Key is representative of the core business entitylike “customer” or “product” for example. In addition, the BK also represents event basedkeys such as “sale” or “transfer”. In this way, the design process for the Data Vault does notconcern itself with the differences between the person/place/thing type entities and theevent type entities. To put this another way, we are not concerned with differentiatingDimensions from Facts but rather are focusing on identifying Business Keys which canrepresent either.This approach is then different from traditional approaches for modeling operationalsystems or data marts. The closest comparison would be to consider our efforts in definingMaster Data elements for an MDM initiative. In this case as well, the focus is on the coreterms used in managing the business.Since the DV Program is organizational in scope, the business keys should also strive to bemeaningful across the enterprise. So our quest for these keys should result in EnterpriseWide Business Keys (EWBKs). Note also that the keys arriving from specific source systemsare typically not fully aligned with these EWBKs. For this reason, we do not place too muchemphasis on the keys represented in any particular source system.NOTE: Since we are typically dealing with hundreds of sources, each commonly subject toupdates and changes, we should not plan to model our EDW using keys driven by a subset ofthese source systems.DATA VAULT MODELING GUIDE 5/15/2012The process of identifying and modeling these EWBKs is then closer to a businessrequirements gathering process than a source system analysis. Balancing the various inputfactors, with an emphasis on the business versus the technical, effectively summarizes thebest practices for this process.9Fig. 11 In put factors for designing the DV EDW Business Keys

The primary inputs for this process include:First the business process designs, business interviews, existing data marts,business metadata, process metadata, a business glossary if is exists, semanticmodels, logical models, information models and the master data managementartifacts (if existing) and the industry reference model (to the extent certaincomponents are aligned) and thenSecond the technical designs, source databases, source metadata, applicationsystem (guides, manuals, designs) and actual source system data.Note that the EWBK should be a key that transcends time and withstands the replacementof any specific source system. The source system keys will then require some form ofalignment to match up with their related target EWBKs. This alignment will often be at oddswith the fully raw and auditable characteristics of persisted source system loads. In thepast we have either resolved this alignment on the way to the marts, or more commonly,created a cleansed “gold” record within the four walls of the data warehouse itself. Theformer solution leads to silos and anomalies while the latter can compromise auditabilityand user acceptance.Bus iness K e y AlignmentFor the one side we know that we cannot rely on leaving the raw details only in the stagingarea or in our archives. We do need to have all data loaded into the EDW to be a trueauditable “mirror” of the sources. See the bottom left “Raw Keys” in figure 5 below.Fig. 12 D V E DW Key AlignmentDATA VAULT MODELING GUIDE 5/15/2012Because the DV EDW absorbs all data all of the time and maintains full traceability back tosource system feeds, the data warehouse must not lose resolution on these auditablesystems of record. At the same time integration around the business key – the EWBK – is acore function of the DV EDW. So the EDW today has a built-in challenge related to dataintegration – the alignment of the Business Keys (enterprise-wide) with the Raw & Auditablecomponents of the Data Vault.10

On the other side we know that the enterprise data warehouse (EDW) must be aligned withthe organizational view of the business keys/terms (EWBKs). See the top right “BusinessKeys” in figure 5.The integration of these Raw keys with the EWBKs represents a core function of the EDWtoday. In effect, we have been boxed in by the upstream requirements (build a DW thatincludes all data at the atomic level and with full traceability) and the downstreamrequirements (align the Business Keys with the organization at the enterprise level usingbusiness terminology).NOTE: We cannot rely on having these transformations happen in the Mart Staging or DataMart layers as a) the Mart Staging is not intended to be persisted and b) the Data Marts aredepartmental in scope (not Enterprise Wide). Not sharing this business key alignmentthrough the EDW will result in a failure to integrate around the true business keys and willresult in the downstream inconsistencies common to data silos.The extent to which the sources are already aligned with the EWBKs will determine thescope of integration and alignment that must occur in the DV EDW. See figure 6 below.Fig. 13 D V E D W Sc ope of Key A l i gn men tDATA VAULT MODELING GUIDE 5/15/2012The alignment of these keys is facilitated through Links. As can be seen in figure 5 on theprevious page, the Raw Key of “Person” is aligned with the Business Key of “Customer”using a link structure.11Note that the naming conventions are not adequate in and of themselves to warrant theseparation of Raw and Business key Hubs. If in fact Person and Customer meant the exactsame thing to the business and were in fact true business term synonyms, then the rawsystem load of Person records could populate the Customer Hub directly. However, in thiscase the Person Raw key does in fact mean something different than the CustomerBusiness key. In this case we assume that there are business rules at play – for example aPerson record is determined to be a Customer if they were involved in a Sale transaction,there was a non-zero purchase price, the transaction was successfully completed, and theSale was not cancelled. As you can see here, the raw auditable load is to the Person andthe business aligned load is to the Customer.Those tables that are loaded using this type of business processing must be identified as“sysgen” record source records (generated by us through a business rule driven process).

These components of the DV EDW are often referred to as business data warehouse (BDW)or business data vault (BDV) components.NOTE: Business logic in the BDW or BDV can take many forms and can relate to many typesof transformations. The logic specifically targeting the alignment of raw and business keys isa subset of this area and often referred to as t

afforded the modelers in how they design and build Satellites. Common approaches include using the subject area, rate of change, source system, or type of data to split out context and design the Satellites. The Satellite is keyed by the sequence id from the Hub or Link to which it is attached plus the date/time stamp to form a two part key.

Related Documents:

Published by Vault, Inc. o Vault Guide to Leveraged Finance. Published by Vault, Inc. o Vault Career Guide to Private Wealth Management. Published by Vault, Inc. o Vault Guide to Sales and Trading. Published by Vault, Inc. o Vault Guide to the Top 50 Banking Employers. Published by Vault, Inc. o Vault Guide

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

5.2.2.1 Oracle Key Vault Program with Batching 5-4 5.2.2.2 Detailed Oracle Key Vault Program 5-6 5.3 Oracle Key Vault Program Environment 5-8 5.4 Oracle Key Vault Program Connection 5-9 5.5 Oracle Key Vault Program Session 5-9. Part II . Oracle Key Vault Client C SDK API Reference. 6 . Oracle Key Vault Datatypes and Structures

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Vault Cache and Virtual Vault Vault Cache and Virtual Vault are built-in features of Enterprise Vault focused on making the end user experience as seamless as possible. Vault Cache provides users with an intuitive, fast way to access their archived email whether connected to the network or not, by caching a copy of archived messages locally.