Informatica MDM Match Best Practices

2y ago

22 Views

2 Downloads

969.74 KB

29 Pages

Last View : 27d ago

Last Download : 3m ago

Upload by : Vicente Bone

Report this link

Download PDF

Transcription

Informatica MDMMatch best practices Anuvinda KulkarniLead Support Engineer, MDM GCS Team

Housekeeping Tips Today’s Webinar is scheduled for 1 hour The session will include a webcast and then your questions will be answered live at the end of the presentation All dial-in participants will be muted to enable the speakers to present without interruption Questions can be submitted to “All Panelists" via the Q&A option and we will respond at the end of the presentation The webinar is being recorded and will be available to view on our INFASupport YouTube channel and Success Portal.The link will be emailed as well. Please take time to complete the post-webinar survey and provide your feedback and suggestions for upcoming topics.2 Informatica. Proprietary and Confidential.

Feature Rich Success PortalBootstrap trial andPOC CustomersEnriched CustomerOnboardingexperience Informatica. Proprietary and Confidential.Product LearningPaths and WeeklyExpert SessionsInformaticaConcierge withChatbot integrationsTailored training andcontentrecommendations

More InformationSuccess Portalhttps://success.informatica.com4Communities &Supporthttps://network.informatica.com Informatica. Proprietary and ces-and-training/informaticauniversity.html

Safe HarborThe information being provided today is for informational purposes only. Thedevelopment, release, and timing of any Informatica product or functionalitydescribed today remain at the sole discretion of Informatica and should not berelied upon in making a purchasing decision.Statements made today are based on currently available information, which issubject to change. Such statements should not be relied upon as arepresentation, warranty or commitment to deliver specific products orfunctionality in the future.5 Informatica. Proprietary and Confidential.

Agenda6 Introduction to MDM matching Walk through an example of match rules setup Match rules setup and tuning phases Phase 1: Data discovery and analysis Phase 2: Define Fuzzy Match Key, Key Width, Match Paths, Match Columns Phase 3: Setup match rules: do’s and don'ts Phase 4: A dry run of the match job using draft rules; review match results Phase 5: Tune match rules with exact columns Phase 6: Review final match results Tuning Process Server and Base Object properties Tuning cmxcleanse.properties Tuning the database Q&A Informatica. Proprietary and Confidential.

Introduction to MDM matching The match process helps consolidate records coming from multiple sources. There are 2 ways to do so: Batch API (SOAP - searchMatch, BES REST - action match) Two types of matching: Fuzzy ExactKey Level & Search LevelOptional match ruleset filter & Exact match columnsMatch Purpose & Match Levels7 Informatica. Proprietary and Confidential.

Walk through an example of match rules setupConsider the following data model:Expected match results:8 Informatica. Proprietary and Confidential.

Match rules setup and tuning phases9 Informatica. Proprietary and Confidential.

Phase 1: Data discovery and analysis Auditing Get a reasonable-sized sample of data that best represents real or production-like data Understand what needs to be considered for matching Identify fields that will contribute to the match process, including Fuzzy Match Key Quality and Profiling Ensure data completeness (e.g. Person records have both First Name and Last Name) Ensure data accuracy (e.g. gender field has only gender values) Use tools like Informatica Data Profiler, pattern analysis (SQL queries) Standardization Standardize/format data as much as possible (e.g. Junior to JR, case & trim for exact fields) Avoid non-ASCII characters Use data quality tools such as MDM cleanse functions or Informatica Data Quality Use an address cleansing tool to standardize and clean addresses10 Informatica. Proprietary and Confidential.

Phase 2. Define Fuzzy Match Key Any one of the following can be defined. Multiple fuzzy match keys are not supportedFuzzy Match KeyUsagePerson NameData contains only IndividualsOrganization NameData contains only Organizations, or if data contains both Individuals and OrganizationsAddress Part1Data has addresses that need to be consolidated First Name-only or Last Name-only fuzzy key (Person Name) can cause high number of candidates causing performance impact NULLs in fuzzy match key column produce null keys (K under SSA KEY). They are potential candidates for each other Initials in First or Last Name can cause high number of candidates Irrelevant “noise words” in fuzzy match key column produce null keys11 Informatica. Proprietary and Confidential.

Phase 2. Define Key Width How big is the dataset? How important is match quality VS performance? Wider key has higher chance of finding a search candidate, but it will lower the overall performance Limited - Tradeoff between match quality and disk space. May cause fewer match candidates but faster searches.Use if disk space is limited or if data volume is extremely large Preferred - Single key per BO record. Might result in fewer match candidates Standard - Most appropriate; balances reliability and space usage Extended - Might result in more match candidates at the cost of longer processing time togenerate keys. Works best: Data set not extremely large Data set not complete Sufficient resources are available (disk space) For e.g., the SSA keys for names ‘ASHLEY ROSENBERG’ and ‘ASHLEY ROSEN BERG’ fallwithin the same SSA range for each of the width types, so they are possible matchcandidates for each other12 Informatica. Proprietary and Confidential.

Phase 2. Define Match Paths and Match Columns Enable ‘Allow missing child records’ helps matching on parent recordsthat do not have child records in the child base object.13 Informatica. Proprietary and Confidential.

Phase 3. Setup Match rules: Match LevelTypicalConservativeLooseAppropriate for most matchesTighter than Typical, causing undermatchingMore matches than Typical, causing overmatching. Good to use this in a match rule for manual merges SSA Workbench tool, available as part of MDM Resource Kit, helps decide on the appropriate match level Demo to look at how records “Arim Bose” matches with “Arim Gore” along with their addresses, using different match levels and theirscores U(Undecided)/R(Rejected) are considered as rejected matches in MDM14 Informatica. Proprietary and Confidential.

Phase 3. Setup Match rules: Search LevelNarrow Most stringent, faster, undermatching Correct and complete datasets andhighly matchy datasetsTypical Apt for most match rulesets15 Informatica. Proprietary and Confidential.Exhaustive More match candidates than Typical,more time, overmatching Smaller, less complete, less reliabledatasetsExtreme More match candidates than Exhaustive,much more time, overmatching Datasets that are even less reliable andless complete

Phase 3. Setup Match rules: Match Purpose For data with both Organizations and Individuals, use appropriate match purpose based on the party/customer type If there is no customer type indicator, you can use Organization. Or use Division as match purpose for mixed data types If you are trying to identify matches for people where address is important to determine if two records are for the same person, you can useResident match purpose Different match purposes available:16 Informatica. Proprietary and Confidential.

Phase 3. Setup Match rules:– do’s and don'ts Start with rules that will provide the tightest matches Fuzzy match rules are evaluated first, followed by exact match rules For each fuzzy match rule, exact columns are evaluated first. Use exact match columns when you can. Saves fuzzy calls made to SSA Exact match rules are processed almost exclusively on the database. If database performance is not sufficient, convert them to Filteredmatch rules. Comes with trade-off between match quality and performance Run SQL queries on exact match columns to find rough estimate of potential candidates returned Loose filters will pass more potential candidates to SSA, creating more work and decreasing performance.Examples of tight filters – Id, Date Of Birth, Postal Code. Loose filters – City, State Avoid subtype match; makes multiple SSA calls for each type. Use a match path filter instead Use filter on root path filter to exclude records from match, instead of filtering on match rule level. Saves those records from beingtokenized and thus will not participate in match17 Informatica. Proprietary and Confidential.

Phase 4: A dry run of the match job using draft rules Avoid having tighter match rules during this phase. Below example has ex postalCode as exact This will give you a feel of how fuzzy name and address matches look like. Gives you an idea on the quality of matches.Helps assess any underlying data issues18 Informatica. Proprietary and Confidential.

Phase 4: Review match results from the dry run Match results: Run a query against MTCH table group by match rule; helps reviserules that have gained less matches Make a copy of MTCH table for each iteration Review undermatches VS overmatchesE.g. “Time Inc” did not match with “Time Warner Cable Inc” as theiraddresses are different Use SSA Workbench to know why certain records matched anddid not match SSA workbench tool also helps to make adjustments on accept limits To change accept limits in MDM:19 Informatica. Proprietary and Confidential.

Phase 5: Tune match rules with exact columns Introduce unique identifies (as exact match column) to further qualify matches and to further tighten the rules If there’s no unique identifier, then use exact column such as DateOfBirth Prevent performance issues by including at least one exact match column in each match rule Use several identical match rules with varying exact match columns20 Informatica. Proprietary and Confidential.

Phase 6: Review match results What to review? STRP table If there’s a large set of data (outliers; for e.g. records more than 50K) residing between a set of SSA keysSELECT DISTINCT ROWID OBJECT, DATA COUNT,SSA DATA, DATA ROW FROM C PARTY STRP WHERE SSA KEY BETWEEN‘YBJ ’ AND ‘YBLVZZZZ’ AND INVALID IND 0 ORDER BY ROWID OBJECT, DATA ROW Match key distribution21 Informatica. Proprietary and Confidential.

Phase 6: Review final match results (continued) Review cleanse server logRanger5 Matching TCan:167038393 Tgr:167038393 TSSA:98428393 TM:0 TR:1 Cur RI:1800219 Cur Range:YBJ to YBLVZZZZ CompsPerRange:167043000Ranger7 Matching TCan:173858958 Tgr:173858958 TSSA:104410228 TM:14 TR:1 Cur RI:1802487 Cur Range:YBJ to YBLVZZZZ CompsPerRange:173862000[RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: RangerProducer Candidates Read:2020666[RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: MatchGatherer received 6544[RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: RangeSorter Sorting: Recs in:2,020,666 with 13,621,427 ranges.SortManager: Ranges in: 13,621,427 Sorted Ranges out: 81,000 file Count: 137 Sort Count: 1263[RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: run minutes:778 Max minutes:2880What does this mean? It is processing 2,020,666 records and that those records produced 13,621,427 search ranges that need to be evaluated to complete thematching It has currently only processed 81,000 of the ranges yet. It has taken 778 minutes to do that Range: YBJ to YBLVZZZZ keeps appearing on the log and is a potential hotspot. The comparison count (CompsPerRange) is over167 million and counting This job will take a long time to complete. Maybe there’s a high frequency word (e.g. ‘Medical’) in the data within this range. Clean up thisdata22 Informatica. Proprietary and Confidential.

Phase 6: Review final match results (continued) Review if matches are slow Slow DB readRanger0 Matching TCan:156020763 Tgr:156020763 TSSA:2188740 TM:2165385 TR:186577 Cur RI:7511404 Cur Range:YKMGBBQ to YKMGBBQ/CompsPerRange:160Ranger0 Matching TCan:160711852 Tgr:160711852 TSSA:2268773 TM:2244506 TR:193761 Cur RI:9600897 Cur Range:YKVA VA to YKVA VA/CompsPerRange:2196(156,020,763 – 160,711,852) 4,691,089 total number of candidates read from DB from one minute to another – a low count could indicate a potentialdatabase or network issue – expect millions High number of candidates going to SSA; poor exact match columns are usedTCan:70056898 Tgr:70056898 TSSA:1,023,821 TM:1012237 TR:88571 Cur RI:7502399 Cur Range:S M to S M/ZZZZ CompsPerRange:236474Note: How to track progress of a match job? KB - .aspx -- Helps determine theapproximate time taken by the job to run and complete eventually23 Informatica. Proprietary and Confidential.

Tuning Process Server and Base Object propertiesPropertyUsageThreads for Cleanse OperationsTo achieve parallelismNumber of rows per match job batch cycleStart with 10% of volume of records to be matched and adjust upwardsMaximum matches for manual consolidationIncrease it as needed to avoid match job failureMax Elapsed Match MinutesDefault is 20. Increase only if match rules and data is complexDynamic Match Analysis Threshold (DMAT)Helps improve performance when large ranges are causing it [2015-03-13 20:16:05,306] [RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: Dist:Ranger4Matching TCan:56892103 Tgr:50659217 TSSA:12285983 TM:7230 TR:22165 Cur RI:100207398 CurRange:OG to OGZZZZZZ CompsPerRange:97999 .[2015-03-13 22:09:52,611] [RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: Dist:Ranger4Matching TCan:82741526 Tgr:73413451 TSSA:21394992 TM:7250 TR:22165 Cur RI:99784575 CurRange:OG to OGZZZZZZ CompsPerRange:2594753825 Million comparisonsAnalyze the data to assess why a given search range contains a large count ; maybe matchy dataSetting the DMAT level too low may cause under matchingNote: Any DMAT changes on Production should be reviewed with Informatica GCS24 Informatica. Proprietary and Confidential.

Tuning .distributed matchSet to 1 to enable. Default is 0 (disabled)cmx.server.match.file loadSet to true to use an intermediate file to load data. Set to false for direct data load. Default is true for Oracleand IBM DB2 environments. Default is false for Microsoft SQL Server environmentscmx.server.match.loader batch sizeDefault is 1000, when file load is setMaximum number of insert statements to send to the database during direct load of the match process25 Informatica. Proprietary and Confidential.

Tuning the database Exact rules are converted to SQL queries based on exact match columns in the match rule and their matchpaths. Look for CREATE/INSERT for T MLE and T MT tables If you find the exact match query running slow, query related to T MLE or T MT Ensure all tables in the exact match query are analyzed Create index on one or more exact match columns26 Informatica. Proprietary and Confidential.

References MDM Fuzzy Match Deep Dive https://www.youtube.com/watch?v T6x24bMnP8&feature youtu.be How to configure SSA Workbench on MDM Resource Kit - https://youtu.be/Jp2gcFgE 5Q How to use SSA Name3 workbench - https://youtu.be/lLwTHA0SnY4 How to track progress of a Match job in MDM .aspx27 Informatica. Proprietary and Confidential.

28 Informatica. Proprietary and Confidential.

Thank You

Match rules setup and tuning phases Phase 1: Data discovery and analysis Phase 2: Define Fuzzy Match Key, Key Width, Match Paths, Match Columns . Use tools like Informatica Data Profiler, pattern analysis (SQL queries) . If database performance is not sufficient, convert them to . Filtered.

Related Documents:

Informatica PowerCenter (Version 9.1.0)

PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange, Informatica On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Pro

82 Views

2y ago

U s e r G u i d e - Informatica

Jun 14, 2019 · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica

25 Views

2y ago

Informatica PowerCenter (Version 9.0.1) - Data & Co

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange and Informatica .

26 Views

2y ago

Getting Started - Informatica

39 Views

2y ago

Informatica Database Download Manager - 1.0.0 - User …

20 Views

2y ago

Informatica PowerCenter (Version 9.0.1)

32 Views

2y ago

1 0 . 1 I n f o r m a t i c a P o w e r C e n t e r

PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging, . Informatica Master Data .

33 Views

1y ago

G u i d e I n s t a l l a t i o n a n d U p g r a d e - Informatica

Informatica Dynamic Data Masking Installation and Upgrade Guide . Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging,

9 Views

1y ago

Recent Views

AUTOMOTIVE INDUSTRY ANALYSIS REPORT and GUIDE

3.1 General Outlook of the Automotive Industry in the World 7 3.2 Overview of the Automotive Industry in Turkey 10 3.3 Overview of the Automotive Industry in TR42 Region 12 4 Effects of COVID-19 Outbreak on the Automotive Industry 15 5 Trends Specific to the Automotive Industry 20 5.1 Special Trends in the Automotive Industry in the World 20

1y ago

86 Views

Automotive Pathway Automotive Services Fundamentals

Automotive Pathway Automotive Services Fundamentals Course Number: IT11 Prerequisite: None Aligned Industry Credential: S/P2- Safety and Pollution Prevention and SP2- Mechanical and Pollution Prevention Description: This course introduces automotive safety, basic automotive terminology, system & component identification, knowledge and int

2y ago

228 Views

Articulation Agreements: College of Applied Technologies .

Hernando High School FL Automotive . Central Nine Career Center IN Automotive Elkhart Area Career Center IN Automotive . Kokomo Area Career Center IN Automotive North Lawrence Vo-Tech IN MLR Porter County Career Center IN Automotive Richmond High School IN Automotive Southeastern Career

2y ago

376 Views

Automotive Basics - Auto Upkeep

Automotive Basics - Course Description "Automotive Basics includes knowledge of the basic automotive systems and the theory and principles of the components that make up each system and how to service these systems. Automotive Basics includes applicable safety and environmental rules and regulations. In Automotive Basics, students will gain

1y ago

197 Views

Automotive Automotive Automotive - HSBC Bank Malaysia

This Merchant list is subject to change from time to time. Merchant(s) who are terminated from the Instalment program after the published date might still be reflected in this list. HSBC Cardholder(s) are advised to confirm the availability of HSBC Card Instalment Plan with the merchant. Automotive Automotive Automotive

1y ago

173 Views

On the Road: U.S. Automotive Parts Industry Annual Assessment

Table 12: Acquisitions of U.S. Automotive Parts Companies (SIC 3714) Table 13: Automotive Parts Exports, 2000-2010 Table 14: Automotive Parts Imports, 2000-2010 . Automotive parts consumption is linked to the demand for new vehicles, since roughly 70 percent of U.S. automotive parts production is for Original Equipment (OE) products. .

10m ago

72 Views

EMC TEST SYSTEMS FOR AUTOMOTIVE

AUTOMOTIVE EMC TEST SYSTEMS FOR AUTOMOTIVE ELECTRONICS AUTOMOTIVE EMC TEST SYSTEMS FOR AUTOMOTIVE ELECTRONICS Step 1 Step 2 Step 3: Set the parameters Step 4: Active test. Load dump pulses have high pulse energy, which can be highly destructive to electrical or electronic equipment. The LD 200N series simulates these pulses with high energy in a range of up to 1.2 seconds. The LD 200N .

3y ago

266 Views

Automotive Manufacturing - Select Georgia

Jobs created by Georgia’s automotive-related locations Toyo Tire North America Manufacturing and expansions in the last three years 32,000 Automotive-related engineers and production workers in Georgia Sources: EMSI 2020.3, press releases and Automotive Database, Georgia Power Community & Economic Development, 2020 Automotive Manufacturing

2y ago

166 Views

#1 OSAT for Automotive Packaging and Test

We Know Automotive Amkor has extensive experience with automotive process requirements shipping billions of units every year for automotive applications. Our packages meet or exceed automotive quality, reliability, burn-in and safe launch plan criteria. Amkor also has failure analysis, tri-temp test and statistical process capability in all .

1y ago

145 Views

Ipsos Automotive Center of Excellence

Global Automotive Center of Excellence -2014 Ipsos Automotive 9 Automotive Center of Excellence As global automotive markets get more sophisticated, they require vehicle manufacturers to offer the most relevant market propositions to match consumer needs. There is greater value than ever before for a global research partner, who understands

1y ago

126 Views

All about automotive engineering in a pocketbook The 8th edition has .

Automotive Automotive Handbook Handbook All about automotive engineering in a pocketbook The 8th edition has been revised and extended. Automotive Handbook Reference handbook for academic and personal use. ISBN 978--7680-4851-3 Contents - central themes Basic principles: physics, materials, machine parts, joining and bonding techniques

1y ago

135 Views

Brochure: Advanced Flash Storage Solutions for Automotive Applications

iNAND Automotive Embedded Flash Drives (EFDs) are designed to support the harsh environments, high reliability and quality required by the automotive industry. The automotive iNAND product portfolio supports both UFS and e.MMC interfaces in a small 11.5x13mm package with a wide range of capacities to provide automotive OEMs and Tier-1

1y ago

161 Views

Industry Skills Forecast and Proposed Schedule of Work Automotive

Executive summary The Automotive Retail, Service and Repair (AUR) and Automotive Manufacturing (AUM) Training Packages are critical elements in the Vocational Education and Training (VET) system, playing central roles in the training of learners that engage in the automotive industries. A productive and valuable Automotive Training

1y ago

134 Views

Automotive Programs Student Handbook - SCCIowa

include a basic knowledge of all facets of the automotive repair industry, followed by classroom practice and drills of basic skills utilized in the automotive repair industry. The curriculum includes an internship experience in an automotive repair business. The curriculum is evaluated and revised as automotive repair needs change in the industry.

10m ago

72 Views

automotIve

automotive manufacturers worldwide. Those companies that take a forward-thinking approach will gain a competitive advantage and secure a leadership position in a realigned automotive value chain. At Seco, we partner with OEMs and other vehicle-based organisations around the globe to help automotive manufacturers overcome their

3y ago

145 Views

Informatica MDM Match Best Practices

It looks like you're using an ad-blocker