Deliver Trusted Data By Leveraging ETL Testing - Cognizant

1y ago
35 Views
2 Downloads
809.87 KB
9 Pages
Last View : 7d ago
Last Download : 2m ago
Upload by : Noelle Grant
Transcription

Cognizant 20-20 InsightsDeliver Trusted Data by LeveragingETL TestingData-rich organizations seeking to assure data quality cansystemize the validation process by leveraging automated testingto increase coverage, accuracy and competitive advantage, thusboosting credibility with end users.Executive SummaryAll quality assurance teams use the process ofextract, transform and load (ETL) testing withSQL scripting in conjuction with eyeballing thedata on Excel spreadsheets. This process cantake a huge amount of time and can be errorprone due to human intervention. This processis tedious because to validate data, the sametest SQL scripts need to be executed repeatedly. This can lead to a defect leakage due toassorted, capacious and robust data. To test thedata effectively, the tester needs advanced database skills that include writing complex joinqueries and creating stored procedures, triggersand SQL packages.Manual methods of data validation can alsoimpact the project schedules and undermineend-user confidence regarding data delivery (i.e.,delivering data to users via flat files or on Websites). Moreover, data quality issues can undercut competitive advantage and have an indirectimpact on the long-term viability of a companyand its products.cognizant 20-20 insights december 2014Organizations can overcome these challenges bymechanizing the data validation process. But thatraises an important question: How can this bedone without spending extra money? The answerled us to consider Informatica‘s ETL testing tool.This white paper demonstrates how Informaticacan be used to automate the data testing process. It also illustrates how this tool can helpQE&A teams reduce the numbers of hours spenton their activities, increase coverage and achieve100% accuracy in validating the data. This meansthat organizations can deliver complete, repeatable, auditable and trustable test coverage in lesstime without extending basic SQL skill sets.Data Validation ChallengesConsistency in the data received for ETL is aperennial challenge. Typically, data received fromvarious sources lacks commonality in how it isformatted and provided. And big data only makesit more pressing an issue. Just a few years ago, 10million records of data was considered a big deal.Today, the volume of the data stored by enterprises can be in the range of billions and trillions.

Quick TakeAddressing Organizational Data Quality IssuesOur experimentation with automated data validation with a U.S.-based client revealed that bymechanizing the data validation process, dataquality issues can be completely eradicated. Theautomation of the data validation process bringsthe following value additions: Provides a data validation platform which isworkable and sustainable for the long term. Tailored, project-specific framework for dataquality testing. Reduces turnaround time of each test executioncycle. Simplifies the test management process bysimplifying the test approach. Increases test coverage along with greateraccuracy of validation.The Data Release Cycle and Internal ChallengesThis client releases product data sets on a periodic basis, typically monthly. As a result, the datavolume in each release is huge. One product suitehas seven different products under its umbrellaand data is released in three phases per month.Each phase has more than 50 million records tobe processed from each product within the suite.Due to manual testing, the turnaround time foreach phase used to be three to five days, depending on the number of tasks involved in each phase.Production release of the quality data is a hugeundertaking by the QE&A team, and it was a bigchallenge to make business owners happy byreducing the time-to-market (i.e., the time fromprocessing the data once it is received to releasing it to the market). By using various automationmethods, we were able to reduce time-to-marketfrom between three and five days to between oneand three days (see Figure 1).Data Release CycleDAY1PreparingData UpdateRelease toProductionReceive DataDAY3PMO & Functional ManagersApply ETLon DataETL/DB TeamFunctional DataValidation inProd Env (UAT)QA TeamDAY2Functional DataValidation in QA EnvTest Data inProd Env & Sign-offFigure 1cognizant 20-20 insights2Test Data inQA Env & Sign-off

Reasons for accretion of voluminous data include: Executive management’s need to focus ondata-driven decision-making by using businessintelligence tools. Company-wide infrastructural changes such asdata center migrations. Mergers and acquisitions among data-producing companies. Business owners’ need to gain greater insightinto streamlining production, reducing time-tomarket and increasing product quality.If the data is abundant, and from multiple sources,there is a chance junk data can be present. Also,odds are there is excessive duplication, null setsand redundant data available in the assortment.And due to mishandling, there is potential loss ofthe data.However, organizations must overcome thesechallenges by having appropriate solutions inplace to avoid credibility issues. Thus, for datawarehousing and migration initiatives, data validation plays a vital role ensuring overall operationaleffectiveness. But operational improvements arenever without their challenges, including: Data validation is significantly different fromconventional ways of testing. It requires moreadvanced scripting skills in multiple SQLservers such as Microsoft SQL 2008, SybaseIQ, Vertica, Netizza, etc. Heterogeneity in the data sources leads tomishandling of the interrelationships betweenmultiple data source formats. During application upgrades, making sure thatolder application repository data is the same asthe data in the new repository. SQL query execution is tedious andcumbersome, because of repetitious executionof the queries. Missing test scenarios, due to manual executionof queries. Total accuracy may not always be possible. Strict supervision is required with each test.Time taken for execution varies from oneperson to another. A Business-Driven Approachto Data ValidationTo meet the business demand for data validation,we have developed a surefire and comprehensivesolution that can be utilized in various areas suchas data warehousing, data extraction, transformations, loading, database testing and flat-filevalidation.The Informatica tool that is used for the ETL process can also be used as a validation tool to verifythe business rules associated with the data. Thistool has the capability to significantly reducemanual effort and increase ETL productivity bylowering costs, thereby improving the bottom line.Our Data Validation Procedures as a FrameworkThere are four methods required to implementa one-stop solution for addressing data qualityissues (see Figure 2).Data Validation MethodsInformaticaData ValidationDB StoredProceduresMacrosSeleniumFigure 2The ETL process entails numerous stages; itcan be difficult to adopt a testing schedulegiven the manual effort required.cognizant 20-20 insightsThe quality assurance team needs progressive elaboration (i.e., continuous improvementof key processes) to standardize the processdue to complex architectures and multilayereddesigns.3

Each methods has its own adoption procedures.High-level details include the following: Trigger Informatica workflows for executingjobs and send e-mail notifications withvalidation results.Informatica Data ValidationThe following activities are required to createan Informatica data validation framework (seeFigure 3):Validate Comprehensive Data with StoredProceduresThe following steps are required for data validation using stored procedures (see Figure 4, nextpage): Accrual of business rules from product/business owners based on their expectations. Convert business rules into test scenarios andtest cases. Derive expected results of each test caseassociated with each scenario. Write a SQL query for each of the test cases.Update the SQL test cases in input files (testcase basic info, SQL query). Compile all SQL queries as a package ortest build. Create Informatica workflows to execute thequeries and update the results in the respectiveSQL tables. Store all validation transact-SQL statementsin a single execution plan, calling it “storedprocedure.” Execute the stored procedure whenever anydata validation is carried out. Prepare validation test scenarios.Convert test scenarios into test cases.Derive the expected results for all test cases.Write stored procedure-compatiblequeries that represent each test case.SQLA Data Validation Framework: Pictorial ViewETLSource FilesETLApply TransformationRulesETLStaging Area(SQL Server)Export Flat FilesETLETLTest CaseValidation ReportETLTest Caseswith ExpectedResultsSybase date TestCasesETLETLUPDATE TEST RESULTSETLTest CaseResults(Pass/Fail?)FAILPASSEnd-Users(External and Internal)Web ProductionFigure 3cognizant 20-20 insights4QA DB Tables

Validating with Stored ProceduresSybase IQStored ProcedureFACT TABLETest Case DescAccuracyMin PeriodMax PeriodTime CheckData Period CheckChange ValuesBuild a test suite that contains multiple testbuilds according to test scenarios. Have a framework containing multiple testsuites. Execute the automation test suite per thevalidation requirement. Analyze the test results and share those resultswith project stakeholders.Salient Solution Features,Benefits SecuredEXECUTIONTest CaseTest 01Test 02Test 03Test 04Test 05Test 06 The following features and benefits of ourframework were reinforced by a recent clientengagement (see sidebar, page 7).Core Features Compatible with all database servers.Zero manual intervention for the execution ofvalidation queries. 100% efficiency in validating the larger-scaledata. Reconciliation of production activities with thehelp of automation. Reduces level of effort and resources requiredto perform ETL testing.Figure 4One-to-One Data Comparision Using MacrosThe following activities are required to handledata validation (see Figure 5): Applying Macro Magic toData ValidationPrepare validation test scenarios.Convert test scenarios into test cases.Derive a list of expected results for each testscenario. Specify input parameters for a given testscenario, if any. Write a macro to carry out validation work forone-to-one data comparisions.Marketing FilesDATA 1FAILAdd MacroPASSSelenium Functional Testing AutomationQuality AssuranceThe following are required to perform data validation (see Figure 6, next page): Execute MacroPrepare validation test scenarios.Convert test scenarios into test cases.Derive an expected result for each test case.Specify input parameters for a given testscenario.Derive test configuration data for setting upthe QA environment.cognizant 20-20 insightsSybase IQData WarehouseFigure 55DATA 2Publications

Facilitating Functional Test AutomationFAILPASSGUI: Web-BasedApplicationSeleniumAutomationFunctional Validationby SeleniumProductionGUI UsersFigure 6 Comprehensive test coverage ensures lowerbusiness risk and greater confidence in dataquality. Remote scheduling of test activities.Benefits Reaped Test case validation results are in user-friendlyformats like .csv, .xlsx and HTML. Validationpurposes. Reuse of test cases and SQL scripts forregression testing. No scope for human errors.resultsarestoredforfutureSupervision isn’t required while executing testcases. 100% accuracy in test case execution at alltimes. Easy maintainance of SQL scripts and relatedtest cases. No variation in time taken for execution of testcases. 100% reliability on testing and its coverage.The Bottom LineAs the digital age proceeds, it is very importantfor organizations to progressively elaborate theirprocesses with suitable information and awareness to drive business success. Hence, businessdata collected from various operational sourcesis cleansed and condolidated per the businessrequirement to separate signals from noise. Thisdata is then stored in a protected environment,for an extended time period.cognizant 20-20 insightsFine-tuning this data will help facilitate performance management, tactical and strategicdecisions and the execution thereof for business advantage. Well-organized business dataenables and empowers business owners to makewell-informed decisions. These decisions havethe capacity to drive competitive advantage foran organization. On an average, organizationslose 8.2 million annually due to poor data quality, according to industry research on the subject.A study by B2B research firm Sirius Decisionsshows that by following best practices in dataquality, a company can boost its revenue by66%.1 And market research by Information Weekfound that 46% of those surveyed believe dataquality is a barrier that undermines businessintelligence mandates.2Hence, it is safe to assume poor data quality isundercutting many enterprises. Few have takenthe necessary steps to avoid jeopardizing theirbusinesses. By implementing the types of datatesting frameworks discussed above, companies can improve their processes by reducing thetime taken for ETL. This, in turn, will dramaticallyreduce their time-to-market turnaround and support the management mantra of ”under-promiseand over-deliver.” Moreover, few organizationsneed to spend extra money on these frameworks given that existing infrastructure is beingused. This has a direct positive impact on a company’s bottom line since no additional overheadis required to hire new human resources or addadditional infrastructure.6

Quick TakeFixing an Information Services Data Quality IssueThis client is a U.S.-based leading financial services provider for real estate professionals. Theservices it provides include comprehensive data,analytical and other related services. Powerfulinsight gained from this knowledge provides theperspective necessary to identify, understand andtake decisive action to effectively solve key business challenges.Challenges FacedBenefits 50% reduction in time-to-market.100% test coverage.82% automation capability.Highly improved data quality.Figure 7 illustrates the breakout of each methodused and their contributions to the entire QE&Aprocess.This client faced major challenges in the end-toend quality assurance of its data, as the majorityof the company’s test procedures were manual.Because of these manual methods, turnaroundtime or time-to-market of the data was greaterthan its competitors. As such, the client wanted along-term solution to overcome this.18%The Solution30%38%Our QE&A team offered various solutions. Thefocus areas were database and flat file validations. As we explained above, database testingwas automated by using Informatica and otherconventional methods such as the creation ofstored procedures and macros which were usedfor validating the flat files.11%3%ManualDB – Stored ProcedureSeleniumMacroDB – InformaticaFigure 7Looking AheadIn the Internet age, where data is considered abusiness currency, organizations must capitalizeon their return on investment in a most efficientway to maintain their competitive edge. Hence,data quality plays a pivotal role when makingstrategic decisions.The impact of poor information quality on business can be measured with four differentmagnitudes: increased costs, decreased revenues,cognizant 20-20 insightsdecreased confidence and increased risk. Thus itis crucial for any organization to implement afoolproof solution where a company can use itsown product to validate the quality and capabilities of the product. In other words, adopting an“eating your own dog food” ideology.Having said that, it is necessary for any datadriven business to focus on data quality, as poorquality has a high probability of becoming themajor bottleneck.7

Footnotes1“Data Quality Best Practices Boost Revenue by 66 Percent,” -by-66-Percent-52324.aspx.2Douglas Henschen, “Research: 2012 BI and Information Management,” References CoreLogic U.S., Technical & Product Management, Providing IT Infrastructural Support and BusinessKnowledge on the Data. Cognizant Quality Engineering & Assurance (QE&A) and Enterprise Information Management (EIM),ETL QE&A Architectural Set Up and QE & A Best Practices. Ravi Kalakota & Marcia Robinson, E-Business 2.0: Roadmap for Success, “Chapter Four: ThinkingE-Business Design — More Than a Technology” and “Chapter Five: Constructing The E-Business Architecture-Enterprise Apps.” Jonathan G. Geiger, “Data Quality Management, The Most Critical Initiative You Can Implement,” Intelligent Solutions, Inc., Boulder, CO, www2.sas.com/proceedings/sugi29/098-29.pdf. www.informatica.com/in/etl-testing/. (An article on Informatica’s proprietary Data Validation Optionavailable in its Data Integration Tool.) www.expressanalytics.net/index.php?option com content&view article&id 10&Itemid 8. (Literatureon the importance of the data warehouse and business intelligence.) http://spotfire.tibco.com/blog/?p 7597. (Understanding the benefits of data warehousing.) f. (Significance of data warehousing and datamining in business ny.aspx#container-Overview. (About the CoreLogic pdf. (A white paper on dataTestPro, a proprietary tool by Cognizant used for automating the data validation process.)cognizant 20-20 insights8

About the AuthorVijay Kumar T V is a Senior Business Analyst on Cognizant’s QE&A team within the company’s Bankingand Financial Services Business Unit. He has 11-plus years of experience in business analysis, consulting and quality engineering/assurance. Vijay has worked in various industry segments such as retail,corporate, core banking, rental and mortage, and has an analytic background, predominantly in theareas of data warehousing and business intelligence. His expertise involves automating the data warehouse and business intelligence test practices to align with the client’s strategic business goals. Vijayhas also worked with a U.S.-based client on product development, business process optimization andbusiness requirement management. He holds a bachelor’s degree in mechanical engineering fromBangalore University and a post-graduate certificate in business management from XLRI, Xavier Schoolof Management. Vijay can be reached at Vijay-20.Kumar-20@cognizant.com.About CognizantCognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process outsourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered inTeaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industryand business process expertise, and a global, collaborative workforce that embodies the future of work. With over 75delivery centers worldwide and approximately 199,700 employees as of September 30, 2014, Cognizant is a member ofthe NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performingand fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.World HeadquartersEuropean HeadquartersIndia Operations Headquarters500 Frank W. Burr Blvd.Teaneck, NJ 07666 USAPhone: 1 201 801 0233Fax: 1 201 801 0243Toll Free: 1 888 937 3277Email: inquiry@cognizant.com1 Kingdom StreetPaddington CentralLondon W2 6BDPhone: 44 (0) 20 7297 7600Fax: 44 (0) 20 7121 0102Email: infouk@cognizant.com#5/535, Old Mahabalipuram RoadOkkiyam Pettai, ThoraipakkamChennai, 600 096 IndiaPhone: 91 (0) 44 4209 6000Fax: 91 (0) 44 4209 6060Email: inquiryindia@cognizant.com Copyright 2014, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by anymeans, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein issubject to change without notice. All other trademarks mentioned herein are the property of their respective owners.

Provides a data validation platform which is workable and sustainable for the long term. Tailored, project-specific framework for data quality testing. Reduces turnaround time of each test execution cycle. Simplifies the test management process by simplifying the test approach. Increases test coverage along with greater

Related Documents:

2.3 Trusted Computing The Trusted Computing Group (TCG) [10] proposed a set of hardware and software technologies to enable the construction of trusted platforms. In particular, the TCG proposeda standardforthe design of the trusted platform module (TPM) chip that is now bundled with com

TC Trusted Computing TCG Trusted Computing Group, group of companies developing the TC specs TCPA Trusted Computing Platform Alliance, predecessor of TCG TPM Trusted Platform Module, the hardware Palladium, LaGrande, implementations from various companies, are not always

Trusted Computing refers to a platform of the type specified by the Trusted Computing Group (TCG)1 as well as the next generation of hardware [43, 81, 4] and operating system [63, 49, 9] designed to provide trusted features and hardware-enforced isolation. A trusted platform (TP) is a platform that has a

92 Trusted Computing and Linux a section on future work. 2 Goals of Trusted Computing The Trusted Computing Group (TCG) has cre-ated the Trusted Computing specifications in response to growing security problems in the technology field. “The purpose of TCG is to develop,

The Trusted Contact(s) must be at least 18 years old. Trusted Contact Information . Trusted Contact information provided on this form will replace all Trusted Contact information currently on file. Person 1. If you have no changes to your existing Trusted Contact, please skip this section. Name . Title, First Middle Name Last Name, Suffix .

ACES DIGITAL CERTIFICATE PROGRAM ACES Trusted Agent Summary of Relevant Policy Provisions ACES CP Provisions (Edited) 1.3.2 Trusted Agents CAs may choose to use the services of Trusted Agents to assist CAs in performing identity verification tasks. Trusted Agents do not have privileged access to CA functions, but are considered agents of the CA.

o OpenStack can launch VMs and Containers with the extensions that are already mainstream (Trusted Compute Pools) o Get engaged, get started with Trusted VMs and OpenStack. Extensions to OpenStack for Trusted Docker containers, will be available in Q3 timeframe. o iKGT is available now on 01.org. Download it and try it out. Summary & Call to Action

Systems and Internet Infrastructure Security (SIIS) Laboratory Page 3 Trusted Computing The Trusted Computing Group suggests we: ‣ Deploy a Trusted Platform Module (TPM) in all systems ‣ And an infrastructure to support their use Shamon? TPMs allow a system to: ‣ Gather and attest system state ‣ Store and generate cryptographic data ‣ Prove platform identity