Informatica MDM Match Best Practices

2y ago
22 Views
2 Downloads
969.74 KB
29 Pages
Last View : 27d ago
Last Download : 3m ago
Upload by : Vicente Bone
Transcription

Informatica MDMMatch best practices Anuvinda KulkarniLead Support Engineer, MDM GCS Team

Housekeeping Tips Today’s Webinar is scheduled for 1 hour The session will include a webcast and then your questions will be answered live at the end of the presentation All dial-in participants will be muted to enable the speakers to present without interruption Questions can be submitted to “All Panelists" via the Q&A option and we will respond at the end of the presentation The webinar is being recorded and will be available to view on our INFASupport YouTube channel and Success Portal.The link will be emailed as well. Please take time to complete the post-webinar survey and provide your feedback and suggestions for upcoming topics.2 Informatica. Proprietary and Confidential.

Feature Rich Success PortalBootstrap trial andPOC CustomersEnriched CustomerOnboardingexperience Informatica. Proprietary and Confidential.Product LearningPaths and WeeklyExpert SessionsInformaticaConcierge withChatbot integrationsTailored training andcontentrecommendations

More InformationSuccess Portalhttps://success.informatica.com4Communities &Supporthttps://network.informatica.com Informatica. Proprietary and ces-and-training/informaticauniversity.html

Safe HarborThe information being provided today is for informational purposes only. Thedevelopment, release, and timing of any Informatica product or functionalitydescribed today remain at the sole discretion of Informatica and should not berelied upon in making a purchasing decision.Statements made today are based on currently available information, which issubject to change. Such statements should not be relied upon as arepresentation, warranty or commitment to deliver specific products orfunctionality in the future.5 Informatica. Proprietary and Confidential.

Agenda6 Introduction to MDM matching Walk through an example of match rules setup Match rules setup and tuning phases Phase 1: Data discovery and analysis Phase 2: Define Fuzzy Match Key, Key Width, Match Paths, Match Columns Phase 3: Setup match rules: do’s and don'ts Phase 4: A dry run of the match job using draft rules; review match results Phase 5: Tune match rules with exact columns Phase 6: Review final match results Tuning Process Server and Base Object properties Tuning cmxcleanse.properties Tuning the database Q&A Informatica. Proprietary and Confidential.

Introduction to MDM matching The match process helps consolidate records coming from multiple sources. There are 2 ways to do so: Batch API (SOAP - searchMatch, BES REST - action match) Two types of matching: Fuzzy ExactKey Level & Search LevelOptional match ruleset filter & Exact match columnsMatch Purpose & Match Levels7 Informatica. Proprietary and Confidential.

Walk through an example of match rules setupConsider the following data model:Expected match results:8 Informatica. Proprietary and Confidential.

Match rules setup and tuning phases9 Informatica. Proprietary and Confidential.

Phase 1: Data discovery and analysis Auditing Get a reasonable-sized sample of data that best represents real or production-like data Understand what needs to be considered for matching Identify fields that will contribute to the match process, including Fuzzy Match Key Quality and Profiling Ensure data completeness (e.g. Person records have both First Name and Last Name) Ensure data accuracy (e.g. gender field has only gender values) Use tools like Informatica Data Profiler, pattern analysis (SQL queries) Standardization Standardize/format data as much as possible (e.g. Junior to JR, case & trim for exact fields) Avoid non-ASCII characters Use data quality tools such as MDM cleanse functions or Informatica Data Quality Use an address cleansing tool to standardize and clean addresses10 Informatica. Proprietary and Confidential.

Phase 2. Define Fuzzy Match Key Any one of the following can be defined. Multiple fuzzy match keys are not supportedFuzzy Match KeyUsagePerson NameData contains only IndividualsOrganization NameData contains only Organizations, or if data contains both Individuals and OrganizationsAddress Part1Data has addresses that need to be consolidated First Name-only or Last Name-only fuzzy key (Person Name) can cause high number of candidates causing performance impact NULLs in fuzzy match key column produce null keys (K under SSA KEY). They are potential candidates for each other Initials in First or Last Name can cause high number of candidates Irrelevant “noise words” in fuzzy match key column produce null keys11 Informatica. Proprietary and Confidential.

Phase 2. Define Key Width How big is the dataset? How important is match quality VS performance? Wider key has higher chance of finding a search candidate, but it will lower the overall performance Limited - Tradeoff between match quality and disk space. May cause fewer match candidates but faster searches.Use if disk space is limited or if data volume is extremely large Preferred - Single key per BO record. Might result in fewer match candidates Standard - Most appropriate; balances reliability and space usage Extended - Might result in more match candidates at the cost of longer processing time togenerate keys. Works best: Data set not extremely large Data set not complete Sufficient resources are available (disk space) For e.g., the SSA keys for names ‘ASHLEY ROSENBERG’ and ‘ASHLEY ROSEN BERG’ fallwithin the same SSA range for each of the width types, so they are possible matchcandidates for each other12 Informatica. Proprietary and Confidential.

Phase 2. Define Match Paths and Match Columns Enable ‘Allow missing child records’ helps matching on parent recordsthat do not have child records in the child base object.13 Informatica. Proprietary and Confidential.

Phase 3. Setup Match rules: Match LevelTypicalConservativeLooseAppropriate for most matchesTighter than Typical, causing undermatchingMore matches than Typical, causing overmatching. Good to use this in a match rule for manual merges SSA Workbench tool, available as part of MDM Resource Kit, helps decide on the appropriate match level Demo to look at how records “Arim Bose” matches with “Arim Gore” along with their addresses, using different match levels and theirscores U(Undecided)/R(Rejected) are considered as rejected matches in MDM14 Informatica. Proprietary and Confidential.

Phase 3. Setup Match rules: Search LevelNarrow Most stringent, faster, undermatching Correct and complete datasets andhighly matchy datasetsTypical Apt for most match rulesets15 Informatica. Proprietary and Confidential.Exhaustive More match candidates than Typical,more time, overmatching Smaller, less complete, less reliabledatasetsExtreme More match candidates than Exhaustive,much more time, overmatching Datasets that are even less reliable andless complete

Phase 3. Setup Match rules: Match Purpose For data with both Organizations and Individuals, use appropriate match purpose based on the party/customer type If there is no customer type indicator, you can use Organization. Or use Division as match purpose for mixed data types If you are trying to identify matches for people where address is important to determine if two records are for the same person, you can useResident match purpose Different match purposes available:16 Informatica. Proprietary and Confidential.

Phase 3. Setup Match rules:– do’s and don'ts Start with rules that will provide the tightest matches Fuzzy match rules are evaluated first, followed by exact match rules For each fuzzy match rule, exact columns are evaluated first. Use exact match columns when you can. Saves fuzzy calls made to SSA Exact match rules are processed almost exclusively on the database. If database performance is not sufficient, convert them to Filteredmatch rules. Comes with trade-off between match quality and performance Run SQL queries on exact match columns to find rough estimate of potential candidates returned Loose filters will pass more potential candidates to SSA, creating more work and decreasing performance.Examples of tight filters – Id, Date Of Birth, Postal Code. Loose filters – City, State Avoid subtype match; makes multiple SSA calls for each type. Use a match path filter instead Use filter on root path filter to exclude records from match, instead of filtering on match rule level. Saves those records from beingtokenized and thus will not participate in match17 Informatica. Proprietary and Confidential.

Phase 4: A dry run of the match job using draft rules Avoid having tighter match rules during this phase. Below example has ex postalCode as exact This will give you a feel of how fuzzy name and address matches look like. Gives you an idea on the quality of matches.Helps assess any underlying data issues18 Informatica. Proprietary and Confidential.

Phase 4: Review match results from the dry run Match results: Run a query against MTCH table group by match rule; helps reviserules that have gained less matches Make a copy of MTCH table for each iteration Review undermatches VS overmatchesE.g. “Time Inc” did not match with “Time Warner Cable Inc” as theiraddresses are different Use SSA Workbench to know why certain records matched anddid not match SSA workbench tool also helps to make adjustments on accept limits To change accept limits in MDM:19 Informatica. Proprietary and Confidential.

Phase 5: Tune match rules with exact columns Introduce unique identifies (as exact match column) to further qualify matches and to further tighten the rules If there’s no unique identifier, then use exact column such as DateOfBirth Prevent performance issues by including at least one exact match column in each match rule Use several identical match rules with varying exact match columns20 Informatica. Proprietary and Confidential.

Phase 6: Review match results What to review? STRP table If there’s a large set of data (outliers; for e.g. records more than 50K) residing between a set of SSA keysSELECT DISTINCT ROWID OBJECT, DATA COUNT,SSA DATA, DATA ROW FROM C PARTY STRP WHERE SSA KEY BETWEEN‘YBJ ’ AND ‘YBLVZZZZ’ AND INVALID IND 0 ORDER BY ROWID OBJECT, DATA ROW Match key distribution21 Informatica. Proprietary and Confidential.

Phase 6: Review final match results (continued) Review cleanse server logRanger5 Matching TCan:167038393 Tgr:167038393 TSSA:98428393 TM:0 TR:1 Cur RI:1800219 Cur Range:YBJ to YBLVZZZZ CompsPerRange:167043000Ranger7 Matching TCan:173858958 Tgr:173858958 TSSA:104410228 TM:14 TR:1 Cur RI:1802487 Cur Range:YBJ to YBLVZZZZ CompsPerRange:173862000[RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: RangerProducer Candidates Read:2020666[RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: MatchGatherer received 6544[RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: RangeSorter Sorting: Recs in:2,020,666 with 13,621,427 ranges.SortManager: Ranges in: 13,621,427 Sorted Ranges out: 81,000 file Count: 137 Sort Count: 1263[RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: run minutes:778 Max minutes:2880What does this mean? It is processing 2,020,666 records and that those records produced 13,621,427 search ranges that need to be evaluated to complete thematching It has currently only processed 81,000 of the ranges yet. It has taken 778 minutes to do that Range: YBJ to YBLVZZZZ keeps appearing on the log and is a potential hotspot. The comparison count (CompsPerRange) is over167 million and counting This job will take a long time to complete. Maybe there’s a high frequency word (e.g. ‘Medical’) in the data within this range. Clean up thisdata22 Informatica. Proprietary and Confidential.

Phase 6: Review final match results (continued) Review if matches are slow Slow DB readRanger0 Matching TCan:156020763 Tgr:156020763 TSSA:2188740 TM:2165385 TR:186577 Cur RI:7511404 Cur Range:YKMGBBQ to YKMGBBQ/CompsPerRange:160Ranger0 Matching TCan:160711852 Tgr:160711852 TSSA:2268773 TM:2244506 TR:193761 Cur RI:9600897 Cur Range:YKVA VA to YKVA VA/CompsPerRange:2196(156,020,763 – 160,711,852) 4,691,089 total number of candidates read from DB from one minute to another – a low count could indicate a potentialdatabase or network issue – expect millions High number of candidates going to SSA; poor exact match columns are usedTCan:70056898 Tgr:70056898 TSSA:1,023,821 TM:1012237 TR:88571 Cur RI:7502399 Cur Range:S M to S M/ZZZZ CompsPerRange:236474Note: How to track progress of a match job? KB - .aspx -- Helps determine theapproximate time taken by the job to run and complete eventually23 Informatica. Proprietary and Confidential.

Tuning Process Server and Base Object propertiesPropertyUsageThreads for Cleanse OperationsTo achieve parallelismNumber of rows per match job batch cycleStart with 10% of volume of records to be matched and adjust upwardsMaximum matches for manual consolidationIncrease it as needed to avoid match job failureMax Elapsed Match MinutesDefault is 20. Increase only if match rules and data is complexDynamic Match Analysis Threshold (DMAT)Helps improve performance when large ranges are causing it [2015-03-13 20:16:05,306] [RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: Dist:Ranger4Matching TCan:56892103 Tgr:50659217 TSSA:12285983 TM:7230 TR:22165 Cur RI:100207398 CurRange:OG to OGZZZZZZ CompsPerRange:97999 .[2015-03-13 22:09:52,611] [RangerManger] [INFO ] com.siperian.mrm.util.threads.ThreadMonitor: Dist:Ranger4Matching TCan:82741526 Tgr:73413451 TSSA:21394992 TM:7250 TR:22165 Cur RI:99784575 CurRange:OG to OGZZZZZZ CompsPerRange:2594753825 Million comparisonsAnalyze the data to assess why a given search range contains a large count ; maybe matchy dataSetting the DMAT level too low may cause under matchingNote: Any DMAT changes on Production should be reviewed with Informatica GCS24 Informatica. Proprietary and Confidential.

Tuning .distributed matchSet to 1 to enable. Default is 0 (disabled)cmx.server.match.file loadSet to true to use an intermediate file to load data. Set to false for direct data load. Default is true for Oracleand IBM DB2 environments. Default is false for Microsoft SQL Server environmentscmx.server.match.loader batch sizeDefault is 1000, when file load is setMaximum number of insert statements to send to the database during direct load of the match process25 Informatica. Proprietary and Confidential.

Tuning the database Exact rules are converted to SQL queries based on exact match columns in the match rule and their matchpaths. Look for CREATE/INSERT for T MLE and T MT tables If you find the exact match query running slow, query related to T MLE or T MT Ensure all tables in the exact match query are analyzed Create index on one or more exact match columns26 Informatica. Proprietary and Confidential.

References MDM Fuzzy Match Deep Dive https://www.youtube.com/watch?v T6x24bMnP8&feature youtu.be How to configure SSA Workbench on MDM Resource Kit - https://youtu.be/Jp2gcFgE 5Q How to use SSA Name3 workbench - https://youtu.be/lLwTHA0SnY4 How to track progress of a Match job in MDM .aspx27 Informatica. Proprietary and Confidential.

28 Informatica. Proprietary and Confidential.

Thank You

Match rules setup and tuning phases Phase 1: Data discovery and analysis Phase 2: Define Fuzzy Match Key, Key Width, Match Paths, Match Columns . Use tools like Informatica Data Profiler, pattern analysis (SQL queries) . If database performance is not sufficient, convert them to . Filtered.

Related Documents:

PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange, Informatica On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Pro

Jun 14, 2019 · Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange and Informatica .

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange and Informatica

PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging, . Informatica Master Data .

Informatica Dynamic Data Masking Installation and Upgrade Guide . Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging,