Multivariate Data Analysis - GBV

3y ago
116 Views
12 Downloads
633.25 KB
13 Pages
Last View : 1d ago
Last Download : 1m ago
Upload by : Aliana Wahl
Transcription

Multivariate DataAnalysis6th EditionAn introduction to MultivariateAnalysis, Process AnalyticalTechnology and Quality by DesignKim H. EsbensenandBrad Swarbrickwith contributions from Frank Westad, Pat Whitcombe and Mark Anderson f T'MBrine data to life

ContentsPrefaceChapterxv1. Introduction to multivariateanalysis11.1The world is multivariate11.2Indirect observations and correlations21.3Data must carry useful information21.4Variance, covariance and correlation31.5Causality vs61.6Hidden data structures—correlations1.7Multivariate data1.8Maincorrelationanalysismultivariate statisticsobjectives of multivariatedata1.8.1Data description (exploratory1.8.2Discrimination and classification1.8.31.91.10 The1.11techniquesGeometry,analysisdata structuremodelling)asgeometric projectionsReferences1011somefundamental statistics1313someimportant measurements and concepts142.2.1The2.2.2The median162.2.3The mode172.2.4Variance and standard deviation1715meanSamples and representative sampling2.3.12.4912Terminology2.3811grand overview in multivariate data analysisDefinitions of810mathematics, algorithmsChapter 2: A review of2.269and predictionRegressionMultivariate1.9.12.1vsagainAn example from the pharmaceuticalThe normal distribution and its2.4.1Graphical representationsindustryproperties18192020

2.5Hypothesis testing262.5.1Significance, risk and power2.5.2Defining2.5.3A2.5.4A Test for2.5.5Tests for equivalence ofanappropriate26risk levelgeneral guideline for applying28formal statistical testsEquivalence of Variances:30The F-test35means382.6An introduction to time series and control charts2.7Joint confidence intervals and the need for multivariate2.8Chapter summary502.9References52Chapter3.13:Theory of Sampling ty543.2.1Constitutional3.2.2Distributional heterogeneity (DH)heterogeneity (CH)55553.3Sampling3.4Total3.5Sampling Unit Operations (SUO)3.6Replication experiment—quantifying sampling3.7TOS in relation to multivariate data analysis623.8Process sampling—variographic analysis633.8.13.9error vsSamplingpractical samplingError57(TSE)—Fundamental Sampling Principle (FSP). 58Appendix A. Terms and definitions59errors61used in the TOS literatureReferences6568Chapter 4: Fundamentals of principal component analysis (PCA)694.1Representing4.2The variabledataas amatrixspace—plotting objects in p-dimensions69704.2.1Plotting4.2.2The variable space and dimensions704.2.3Visualisation in 3-D70data in 1 -d and 2d space4.3Plotting objects4.4Example—plotting(or more)in variable a set71

4.5The first73principal component4.5.1Maximum variance directions4.5.2The firstprincipal component73as a leastsquares fit4.6Extension to4.7Principal component models—scores and loadingshigher-order principal components7475764.7.1Maximum number of principal components764.7.2PC model centre77between X and PCs4.7.3Introducing loadings—relations4.7.4Scores—coordinates in PC space784.7.5Object residuals784.8Objectives of PCA794.9Score80plot-object relationships4.9.1Interpreting4.9.2Choice of4.10 The4.10.182plotsloading plot-variable relationshipsofComparison4.10.3The 1 -dimensionalscoresandloading plotsIntroduction4.11.2Plotting4.11.3PCA results and8687loading plotExample: city temperatures in europe4.11.18384Correlation loadings4.10.24.1180plotsscorescore778989data anddecidingon thevalidation scheme8990interpretation934.12 Principal component models4.12.1The PC model934.12.2Centring934.12.3step calculation of PCsStep by4.12.4A preliminary comment4.12.5Residuals-the E-matrix954.12.6Residual variance95on94the algorithm: NIPALS964.12.7residualsObject4.12.8The total4.12.9Explained/residual4.12.10How many PCs to use?4.12.11A note4.12.12A doubtful4.12.13Variable residuals4.12.14More about variances—modellingresidual96variance plots96squared objecton9497the number of PCs98case—using external evidence9899error variance99

4.13Example: interpretingaPCA model99(peas)4.13.1Purpose1004.13.2Data set1004.13.3Tasks1004.13.4How to do it1004.13.5Summary1014.14 PCA4.15modelling-the NIPALS algorithm102Chapter summary1034.16 References104Chapter 5: Preprocessing5.1106Introduction5.2106discrete dataPreprocessing of1065.2.1Variable weighting and scaling1065.2.2Logarithm transformation1085.2.3Averaging1085.3Preprocessing of spectroscopic malisation1135.3.4Baseline correction1145.3.5Derivatives1165.3.6Correcting multiplicative effects in spectra1225.3.7Other general preprocessing methods1255.4transformationsPractical aspects of5.4.1Scatter effects plot5.4.2Detailed110preprocessing127129example: preprocessing gluten-starchmixtures1305.5Chapter summary1335.6References1346. Principal Component Analysis (PCA)—in practice1356.1The PCA overview1356.2PCA-Step by Step1366.3Interpretation of PCA modelsof pretation of score plots6.3.3Interpretationofplots—lookloading plots-138patternslook for140importantvariables140

Example:6.4alcohol in waterPCA—what6.5cananalysis141144wrong?go6.5.1Is there any information in the data set?1446.5.2Too few PCs1456.5.3Too many PCs6.5.4Outliers which6.5.5Outliers that contain important information6.5.6were notareused in the modelused in the modelareareThe score plotstruly dueto erroneous data were not removedwere removedwith the wrong number of PCsLoadings6.5.8Too much reliance on the standardinterpreteddiagnosticsthinking for 45145explored sufficiently6.5.7were145145in the computer program without145used145146OutliersP ahalanobis distance1486.6.4Influence lidation scores6.8149plot and PCA projection150projection150152Exercise—detecting outliers (Troodos)6.8.1Purpose1526.8.2Data : PCA in practice1576.10 References1587. Multivariate calibration7.1Multivariatestage1587.2Multivariate modelling (X, Y): the prediction stage1597.3Calibration set7.4Introduction to validationmodelling (X, Y):the calibrationrequirements (training set)1601617.4.1Test set validation1617.4.2Other validation methods1627.4.3Modelling1627.5errorNumber of components/factors ity)163163

7.6Univariateregression (y x) and MLR7.6.1Univariate regression7.6.2Multiple linear regression, ncipal component regression7.8.1PCA7.8.2Are all thescores inMLR166PCs needed?possible7.8.3Example: prediction7.8.4Weaknesses of PCR7.9ofmultiple167components inanalcohol mixture168170PLS-regression (PLSR)1717.9.1PLSR-a powerful alternative to PCR1717.9.2PLSR1727.9.3PLS-NIPALS algorithm7.9.4PLSR with7.9.5(X, Y):initialcomparisonone or moreInterpretationPCA(X), PCA(Y)173Y-variables175of PLS models7.9.6Loadings (p)7.9.7The PLS1 NIPALS algorithm7.10withand176176loading weights (w)177Example—interpretation of PLS1 (octane in gasoline) part 1: modeldevelopment7.10.1178Purpose1787.10.2Data set1787.10.3Tasks1787.10.4Initial data considerations7.10.5Always perform7.10.6Regression analysis7.10.7Assessment of7.10.8Assessment of7.10.9Always7.10.10Predicted7.10.11Regression analysis of octane (Part 1) summary7.10.12use178initial PCAanloadings182loading weightsvsregression182coefficientsloading weightsvs181for model building andreference plotA short discourse on 3Residuals in .16Hotelling'sP statistic7.10.17Influence7.10.18Always check the7.10.19Which7.10.20Residualsplotsobjectsin Yfor regression modelsraw data!should be removed?188189189189190

7.11Error192measures7.11.1Calculating7.11.2Further estimates of model precisionthe SEL for a reference method1937.11.3X-Y relation outlier194plots {T vs193U ns7.11.5Sample elimination1957.11.6Variable elimination1967.11.7X-Y1987.12of PLS1 (octane in gasoline) Part 2: advanced195relationship outlier plotPredictionusing multivariate models1997.12.1Projected7.12.2Prediction influence plots2027.12.3Y-deviation2037.12.4Inlier statistic7.12.5Example-interpretation of7.13202scores203PLS1(octaneingasoline)Part 3:prediction203Uncertainty estimates, significance and stability—Martens' uncertaintytest2057.13.1Uncertainty estimates in regression coefficients, b2067.13.2Rotation of perturbed models2067.13.3Variable gluten in starch calibration2077.13.7Raw data model2097.13.8MSC data model2107.13.9EMSC data model2107.13.10mEMSC data model2117.13.11Comparison of results2117.14stabilitydata from paperexample usingmanufacturingPLSR and PCR multivariate calibration—in practice7.14.1What is7.14.2Signs7.14.3Possibleofa"good"or"bad" model?unsatisfactory datareasons212213models—a useful checklistfor bad modelling207orvalidation results2142157.15Chapter summary2167.16References2178. Principles of Proper Validation (PR/)2188.1Introduction2188.2The2198.3Data quality—dataPrinciples of Validation: overviewrepresentative220

8.4Validation220objectives8.4.1Test set validation—a necessary and sufficient paradigm2218.4.2Validation in data222analysisand chemometrics8.5Fallacies and abuse of the central limit theorem2228.6Systematics of cross-validation2228.7Data structure display via t-u plots2238.8Multiple validation approaches2278.9Verdict227ontraining set splitting and many other myths8.10 Cross-validation does havearole—category and model comparisons.2328.11Cross-validationvs test set8.12 Visualisation of validation is8.13Final remarkonvalidation inpractice234234everything235several test sets8.14 Conclusions2368.15 References2379. Replication—replicates—but of what?2399.1Introduction2399.2Understanding uncertainty2419.3TheReplication Experiment (RE)2429.4RE consequences for validation2459.5Replication applied9.6Analytical9.7Referencestoanalytical method development247sampling biasvs24910. An introduction to multivariate classification10.1Supervisedorunsupervised,that is thequestion!10.2 Principles of unsupervised classification and clustering10.2.1245/(-Means clustering25125125125210.3Principles of supervised classification25910.4Graphical interpretation of classification results26410.4.1The Coomans' plot10.5 Partial least squares discriminant analysis (PLS-DA)10.5.1Multivariate classification using class differences, PLS-DA264272272

10.6 Linear Discriminant275Analysis (LDA)10.7 Support vector machine classification10.8Advantages of SIMCA10.9Application of supervised classificationvegetable oils using FTIR10.9.1overData visualisation CA model10.9.5Developing277traditional methods andnewmethods.280methods to authentication plicationto a test set282283diagnosticsPLS-DA method and application toDevelopingaPCA-LDA method and application to a test set28510.9.7DevelopingaSVMC method and application to28810.9.8Conclusions from theatest set284a10.9.6a test setVegetableOil classification28810.10 Chapter summary29010.11 References292Chapter 11. Introduction to DesignMethodology11.1Experiment (DoE)293293Experimental design11.1.1Why11.1.2The ad hoc approach11.1.3The traditional11.1.4The alternative11.2of293is experimental design useful?293approach—varyonevariable atatime294295approach296Experimental design in practice11.2.1Define stage29611.2.2Design stage29611.2.3Analyse stage29711.2.4Improve stage29711.2.5The concept of factorial designs29711.2.6Full factorial designs29711.2.7Naming11.2.8Calculating11.2.9The concept of fractional factorial designs30211.2.10Confounding30311.2.11Types of variables encountered11.2.12of variation forRanges299conventioneffects when thereare manyexperimentsin DoEexperimental sation30811.2.15Blocking in designed experiments309

30911.2.16Types of experimental design11.2.17Which11.2.18Important effects31611.2.19Hierarchy l11.2.22Sum of squares regression11.2.23Residual11.2.24Model degrees of freedom {v)optimisation designto choose in319of squares (SS.)sumof squaressum320(SSFsg)320{SStrJ320the ANOVA table for a 23 full factorial11.2.25Example: building11.2.26Supplementary11.2.27Pure error and lack of fit assessment11.2.28Graphical11.2.29Model11.2.30The chemical process11.2.31An introduction to constrained designs11.3315practicedesign323statistics330tools used for assessing designed experiments333336interpretation plotsas a322fractional factorial design339352381Chapter summary38511.4 ReferencesChapter 12. Factor rotation and multivariate curve resolution—387introduction to multivariate data analysis, tier II12.1Simple structure38712.2PCA rotation38712.3Orthogonal389rotation methods12.3.1Varimax rotation38912.3.2Quartimax rotation38912.3.3Equimax12.3.4Parsimax f rotated PCA resultsPCA rotation appliedto NIR data of fish12.5 An introduction to 1What is multivariate curve resolution?39412.5.2How multivariate curve resolution works39512.5.3suitable for MCR395Data types12.6 Constraints in modality12.6.3Closure constraints39812.6.4Other constraints398constraints397

12.6.512.7Ambiguities andconstraints in MCRAlgorithms used in Initial estimates for MCR-ALScurveresolution401factor analysis (EFA)curveComputational parameters12.7.5Tuningthe401resolution-alternating least squares (MCR-ALS)12.7.412.8400sensitivityof the401403of MCR-ALSto pureanalysis403404componentsMain results of MCR40412.8.1Residuals40412.8.2Estimated actical use of estimated concentrations and12.8.5Outliers and noisy variables in MCR12.9MCRspectra and quality checks405applied to fat analysis of fish12.10Chapter406409summary12.11 References410analytical technology (PAT)quality by design (QbD) initiativeChapter13.113. Processand its role in the413413Introduction13.2 The405Quality by Design (QbD) initiative41413.2.1The International Conference on Harmonisation13.2.2US FDA process validation guidance416analytical technology (PAT)41713.3Process13.3.1At-line, online, inline13.3.2Enablers of PATor offline: what is(ICH) guidancethe difference?41741942513.4 The link between QbD and PAT13.5 Chemometrics: the glue that holds QbD and PAT together13.5.1A newapproachto batch processunderstanding: relative time cation ediction-prediction 6415434435hierarchies437pharmaceutical manufacturing:the embodiment of QbD and PAT 43813.6 An introduction to multivariate statistical process control(MSPC)44044113.6.1Aspectsof data fusion13.6.2Multivariate statistical process control (MSPC) principles13.6.3Total process measurement systemqualitycontrol(TPMSQC)443444

13.7 Model lifecycle management13.7.1The iterative model13.7.2A13.7.3Summary of model13.8building cyclegeneral procedurefor modelupdating.lifecycle managementChapter summary13.9 References

6.7.1 Multivariate projection 150 6.7.2 Validation scores 150 6.8 Exercise—detecting outliers (Troodos) 152 6.8.1 Purpose 152 6.8.2 Dataset 152 6.8.3 Analysis 153 6.8.4 Summary 156 6.9 Summary:PCAin practice 156 6.10 References 157 7. Multivariate calibration 158 7.1 Multivariate modelling (X, Y): the calibration stage 158 7.2 Multivariate .

Related Documents:

Introduction to Multivariate methodsIntroduction to Multivariate methods – Data tables and Notation – What is a projection? – Concept of Latent Variable –“Omics” Introduction to principal component analysis 8/15/2008 3 Background Needs for multivariate data analysis Most data sets today are multivariate – due todue to

GBV. Increase public awareness of GBV by moving away from the mere 16 days of activism to a robust 365 days campaign against GBV and encourage every citizen to take specific steps to prevent GBV in both the private and public arena. In light of the recommendations made the Government UN GBV JP facilitated a one day workshop

An Introduction to Multivariate Design . This simplified example represents a bivariate analysis because the design consists of exactly two dependent or measured variables. The Tricky Definition of the Multivariate Domain Some Alternative Definitions of the Multivariate Domain . “With multivariate statistics, you simultaneously analyze

Multivariate longitudinal analysis for actuarial applications We intend to explore actuarial-related problems within multivariate longitudinal context, and apply our proposed methodology. NOTE: Our results are very preliminary at this stage. P. Kumara and E.A. Valdez, U of Connecticut Multivariate longitudinal data analysis 5/28

Multivariate data 1.1 The nature of multivariate data We will attempt to clarify what we mean by multivariate analysis in the next section, however it is worth noting that much of the data examined is observational rather than collected from designed experiments. It is also apparent th

managing a GBV program with an NGO or coordinating a GBV working group. GBV coordinators can and are recruited without GBV program management experience and sometimes move from coordination into program management positions. Many GBV Program Managers do not move

The Gender Based Violence (GBV) conference will bring together partners working in the GBV Sector; Government Department stakeholders; representatives from donors in the Development Sector, SANAC and the Global Fund. The conference will focus on: The State's Response in dealing with GBV Exploring the link between GBV & HIV

2 The Adventures of Tom Sawyer. already through with his part of the work (picking up chips), for he was a quiet boy, and had no adventurous, troublesome ways. While Tom was eating his supper, and stealing sugar as opportunity offered, Aunt Polly asked him questions that were full of guile, and very deep for she wanted to trap him into damaging revealments. Like many other simple-hearted souls .