Statistical Analysis With Latent VariablesUser’s GuideLinda K. MuthénBengt O. Muthén
Following is the correct citation for this document:Muthén, L.K. and Muthén, B.O. (1998-2017). Mplus User’s Guide. Eighth Edition.Los Angeles, CA: Muthén & MuthénCopyright 1998-2017 Muthén & MuthénProgram Copyright 1998-2017 Muthén & MuthénVersion 8April 2017The development of this software has been funded in whole or in part with Federal fundsfrom the National Institute on Alcohol Abuse and Alcoholism, National Institutes ofHealth, under Contract No. N44AA52008 and Contract No. N44AA92009.Muthén & Muthén3463 Stoner AvenueLos Angeles, CA 90066Tel: (310) 391-9971Fax: (310) 391-8971Web: www.StatModel.comSupport@StatModel.com
TABLE OF CONTENTSChapter 1: Introduction1Chapter 2: Getting started with Mplus13Chapter 3: Regression and path analysis19Chapter 4: Exploratory factor analysis43Chapter 5: Confirmatory factor analysis and structural equation modeling55Chapter 6: Growth modeling, survival analysis, and N 1 time series analysis113Chapter 7: Mixture modeling with cross-sectional data165Chapter 8: Mixture modeling with longitudinal data221Chapter 9: Multilevel modeling with complex survey data261Chapter 10: Multilevel mixture modeling395Chapter 11: Missing data modeling and Bayesian analysis443Chapter 12: Monte Carlo simulation studies465Chapter 13: Special features499Chapter 14: Special modeling issues515Chapter 15: TITLE, DATA, VARIABLE, and DEFINE commands563Chapter 16: ANALYSIS command651Chapter 17: MODEL command711Chapter 18: OUTPUT, SAVEDATA, and PLOT commands791Chapter 19: MONTECARLO command859Chapter 20: A summary of the Mplus language893
PREFACEWe started to develop Mplus in 1995 with the goal of providing researchers withpowerful new statistical modeling techniques. We saw a wide gap between newstatistical methods presented in the statistical literature and the statistical methods usedby researchers in substantively-oriented papers. Our goal was to help bridge this gapwith easy-to-use but powerful software. Version 1 of Mplus was released in November1998; Version 2 was released in February 2001; Version 3 was released in March 2004;Version 4 was released in February 2006; Version 5 was released in November 2007,Version 6 was released in April 2010; and Version 7 was released in September 2012.After four expansions of Version 7 during the last five years, we are now proud to presentthe new and unique features of Version 8. With Version 8, we have gone a considerableway toward accomplishing our goal, and we plan to continue to pursue it in the future.The new features that have been added between Version 7 and Version 8 would neverhave been accomplished without two very important team members, TihomirAsparouhov and Thuy Nguyen. It may be hard to believe that the Mplus team has onlytwo programmers, but these two programmers are extraordinary. Tihomir has developedand programmed sophisticated statistical algorithms to make the new modeling possible.Without his ingenuity, they would not exist. His deep insights into complex modelingissues and statistical theory are invaluable. Thuy has developed the post-processinggraphics module, the Mplus editor and language generator, and the Mplus Diagrammerbased on a framework designed by Delian Asparouhov. In addition, Thuy hasprogrammed the Mplus language and is responsible for producing new release versions,testing, and keeping control of the entire code which has grown enormously. Herunwavering consistency, logic, and steady and calm approach to problems keep everyoneon target. We feel fortunate to work with such a talented team. Not only are theyextremely bright, but they are also hard-working, loyal, and always striving forexcellence. Mplus Version 8 would not have been possible without them.Another important team member is Michelle Conn. Michelle was with us at thebeginning when she was instrumental in setting up the Mplus office and returned fifteenyears ago. Michelle wears many hats: Chief Financial Officer, Office Manager, andSales Manager, among others. She was the driving force behind the design of the newshopping cart. With the vastly increased customer base, her efficiency in multi-taskingand calm under pressure are much appreciated. Noah Hastings joined the Mplus team in2009. He is responsible for testing the Graphics Module and the Mplus Diagrammer,creating the pictures of the models in the example chapters of the Mplus User’s Guide,
keeping the website updated, and providing assistance to Bengt with presentations,papers, and our book. He has proven to be a most trustworthy and valuable teammember.We would also like to thank all of the people who have contributed to the development ofMplus in past years. These include Stephen Du Toit, Shyan Lam, Damir Spisic, KerbyShedden, and John Molitor.Initial work on Mplus was supported by SBIR contracts and grants from NIAAA that weacknowledge gratefully. We thank Bridget Grant for her encouragement in this work.Linda K. MuthénBengt O. MuthénLos Angeles, CaliforniaApril 2017
IntroductionCHAPTER 1INTRODUCTIONMplus is a statistical modeling program that provides researchers with aflexible tool to analyze their data. Mplus offers researchers a widechoice of models, estimators, and algorithms in a program that has aneasy-to-use interface and graphical displays of data and analysis results.Mplus allows the analysis of both cross-sectional and longitudinal data,single-level and multilevel data, data that come from differentpopulations with either observed or unobserved heterogeneity, and datathat contain missing values. Analyses can be carried out for observedvariables that are continuous, censored, binary, ordered categorical(ordinal), unordered categorical (nominal), counts, or combinations ofthese variable types. In addition, Mplus has extensive capabilities forMonte Carlo simulation studies, where data can be generated andanalyzed according to most of the models included in the program.The Mplus modeling framework draws on the unifying theme of latentvariables. The generality of the Mplus modeling framework comes fromthe unique use of both continuous and categorical latent variables.Continuous latent variables are used to represent factors correspondingto unobserved constructs, random effects corresponding to individualdifferences in development, random effects corresponding to variation incoefficients across groups in hierarchical data, frailties corresponding tounobserved heterogeneity in survival time, liabilities corresponding togenetic susceptibility to disease, and latent response variable valuescorresponding to missing data. Categorical latent variables are used torepresent latent classes corresponding to homogeneous groups ofindividuals, latent trajectory classes corresponding to types ofdevelopment in unobserved populations, mixture componentscorresponding to finite mixtures of unobserved populations, and latentresponse variable categories corresponding to missing data.THE Mplus MODELING FRAMEWORKThe purpose of modeling data is to describe the structure of data in asimple way so that it is understandable and interpretable. Essentially,the modeling of data amounts to specifying a set of relationships1
CHAPTER 1between variables. The figure below shows the types of relationshipsthat can be modeled in Mplus. The rectangles represent observedvariables. Observed variables can be outcome variables or backgroundvariables. Background variables are referred to as x; continuous andcensored outcome variables are referred to as y; and binary, orderedcategorical (ordinal), unordered categorical (nominal), and countoutcome variables are referred to as u. The circles represent latentvariables. Both continuous and categorical latent variables are allowed.Continuous latent variables are referred to as f. Categorical latentvariables are referred to as c.The arrows in the figure represent regression relationships betweenvariables. Regressions relationships that are allowed but not specificallyshown in the figure include regressions among observed outcomevariables, among continuous latent variables, and among categoricallatent variables. For continuous outcome variables, linear regressionmodels are used. For censored outcome variables, censored (tobit)regression models are used, with or without inflation at the censoringpoint. For binary and ordered categorical outcomes, probit or logisticregressions models are used. For unordered categorical outcomes,multinomial logistic regression models are used. For count outcomes,Poisson and negative binomial regression models are used, with orwithout inflation at the zero point.2
IntroductionModels in Mplus can include continuous latent variables, categoricallatent variables, or a combination of continuous and categorical latentvariables. In the figure above, Ellipse A describes models with onlycontinuous latent variables. Ellipse B describes models with onlycategorical latent variables. The full modeling framework describesmodels with a combination of continuous and categorical latentvariables. The Within and Between parts of the figure above indicatethat multilevel models that describe individual-level (within) and clusterlevel (between) variation can be estimated using Mplus.MODELING WITH CONTINUOUS LATENTVARIABLESEllipse A describes models with only continuous latent variables.Following are models in Ellipse A that can be estimated using Mplus:3
CHAPTER 1 Regression analysisPath analysisExploratory factor analysisConfirmatory factor analysisItem response theory modelingStructural equation modelingGrowth modelingDiscrete-time survival analysisContinuous-time survival analysisTime series analysisObserved outcome variables can be continuous, censored, binary,ordered categorical (ordinal), unordered categorical (nominal), counts,or combinations of these variable types.Special features available with the above models for all observedoutcome variables types are: 4Single or multiple group analysisMissing data under MCAR, MAR, and NMAR and with multipleimputationComplex survey data features including stratification, clustering,unequal probabilities of selection (sampling weights), subpopulationanalysis, replicate weights, and finite population correctionLatent variable interactions and non-linear factor analysis usingmaximum likelihoodRandom slopesIndividually-varying times of observationsLinear and non-linear parameter constraintsIndirect effects including specific pathsMaximum likelihood estimation for all outcomes typesBootstrap standard errors and confidence intervalsWald chi-square test of parameter equalitiesFactor scores and plausible values for latent variables
IntroductionMODELING WITH CATEGORICAL LATENTVARIABLESEllipse B describes models with only categorical latent variables.Following are models in Ellipse B that can be estimated using Mplus: Regression mixture modeling Path analysis mixture modeling Latent class analysis Latent class analysis with covariates and direct effects Confirmatory latent class analysis Latent class analysis with multiple categorical latent variables Loglinear modeling Non-parametric modeling of latent variable distributions Multiple group analysis Finite mixture modeling Complier Average Causal Effect (CACE) modeling Latent transition analysis and hidden Markov modeling includingmixtures and covariates Latent class growth analysis Discrete-time survival mixture analysis Continuous-time survival mixture analysisObserved outcome variables can be continuous, censored, binary,ordered categorical (ordinal), unordered categorical (nominal), counts,or combinations of these variable types. Most of the special featureslisted above are available for models with categorical latent variables.The following special features are also available. Analysis with between-level categorical latent variablesTests to identify possible covariates not included in the analysis thatinfluence the categorical latent variablesTests of equality of means across latent classes on variables notincluded in the analysisPlausible values for latent classes5
CHAPTER 1MODELING WITH BOTH CONTINUOUS ANDCATEGORICAL LATENT VARIABLESThe full modeling framework includes models with a combination ofcontinuous and categorical latent variables. Observed outcome variablescan be continuous, censored, binary, ordered categorical (ordinal),unordered categorical (nominal), counts, or combinations of thesevariable types. Most of the special features listed above are available formodels with both continuous and categorical latent variables. Followingare models in the full modeling framework that can be estimated usingMplus: Latent class analysis with random effectsFactor mixture modelingStructural equation mixture modelingGrowth mixture modeling with latent trajectory classesDiscrete-time survival mixture analysisContinuous-time survival mixture analysisMost of the special features listed above are available for models withboth continuous and categorical latent variables. The following specialfeatures are also available. Analysis with between-level categorical latent variablesTests to identify possible covariates not included in the analysis thatinfluence the categorical latent variablesTests of equality of means across latent classes on variables notincluded in the analysisMODELING WITH COMPLEX SURVEY DATAThere are two approaches to the analysis of complex survey data inMplus. One approach is to compute standard errors and a chi-square testof model fit taking into account stratification, non-independence ofobservations due to cluster sampling, and/or unequal probability ofselection.Subpopulation analysis, replicate weights, and finitepopulation correction are also available. With sampling weights,parameters are estimated by maximizing a weighted loglikelihoodfunction. Standard error computations use a sandwich estimator. Forthis approach, observed outcome variables can be continuous, censored,6
Introductionbinary, ordered categorical (ordinal), unordered categorical (nominal),counts, or combinations of these variable types.A second approach is to specify a model for each level of the multileveldata thereby modeling the non-independence of observations due tocluster sampling. This is commonly referred to as multilevel modeling.The use of sampling weights in the estimation of parameters, standarderrors, and the chi-square test of model fit is allowed. Both individuallevel and cluster-level weights can be used. With sampling weights,parameters are estimated by maximizing a weighted loglikelihoodfunction. Standard error computations use a sandwich estimator. Forthis approach, observed outcome variables can be continuous, censored,binary, ordered categorical (ordinal), unordered categorical (nominal),counts, or combinations of these variable types.The multilevel extension of the full modeling framework allows randomintercepts and random slopes that vary across clusters in hierarchicaldata. Random slopes include the special case of random factor loadings.These random effects can be specified for any of the relationships of thefull Mplus model for both independent and dependent variables and bothobserved and latent variables. Random effects representing acrosscluster variation in intercepts and slopes or individual differences ingrowth can be combined with factors measured by multiple indicators onboth the individual and cluster levels. In line with SEM, regressionsamong random effects, among factors, and between random effects andfactors are allowed.The two approaches described above can be combined. In addition tospecifying a model for each level of the multilevel data therebymodeling the non-independence of observations due to cluster sampling,standard errors and a chi-square test of model fit are computed takinginto account stratification, non-independence of observations due tocluster sampling, and/or unequal probability of selection. When there isclustering due to both primary and secondary sampling stages, thestandard errors and chi-square test of model fit are computed taking intoaccount the clustering due to the primary sampling stage and clusteringdue to the secondary sampling stage is modeled.Most of the special features listed above are available for modeling ofcomplex survey data.7
CHAPTER 1MODELING WITH MISSING DATAMplus has several options for the estimation of models with missingdata. Mplus provides maximum likelihood estimation under MCAR(missing completely at random), MAR (missing at random), and NMAR(not missing at random) for continuous, censored, binary, orderedcategorical (ordinal), unordered categorical (nominal), counts, orcombinations of these variable types (Little & Rubin, 2002). MARmeans that missingness can be a function of observed covariates andobserved outcomes. For censored and categorical outcomes usingweighted least squares estimation, missingness is allowed to be afunction of the observed covariates but not the observed outcomes(Asparouhov & Muthén, 2010a). When there are no covariates in themodel, this is analogous to pairwise present analysis. Non-ignorablemissing data (NMAR) modeling is possible using maximum likelihoodestimation where categorical outcomes are indicators of missingness andwhere missingness can be predicted by continuous and categorical latentvariables (Muthén, Jo, & Brown, 2003; Muthén et al., 2011).In all models, missingness is not allowed for the observed covariatesbecause they are not part of the model. The model is estimatedconditional on the covariates and no distributional assumptions are madeabout the covariates. Covariate missingness can be modeled if thecovariates are brought into the model and distributional assumptionssuch as normality are made about them. With missing data, the standarderrors for the parameter estimates are computed using the observedinformation matrix (Kenward & Molenberghs, 1998). Bootstrapstandard errors and confidence intervals are also available with missingdata.Mplus provides multiple imputation of missing data using Bayesiananalysis (Rubin, 1987; Schafer, 1997). Both the unrestricted H1 modeland a restricted H0 model can be used for imputation. Multiple data setsgenerated using multiple imputation can be analyzed using a specialfeature of Mplus. Parameter estimates are averaged over the set ofanalyses, and standard errors are computed using the average of thestandard errors over the set of analyses and the between analysisparameter estimate variation (Rubin, 1987; Schafer, 1997). A chi-squaretest of overall model fit is provided (Asparouhov & Muthén, 2008c;Enders, 2010).8
IntroductionESTIMATORS AND ALGORITHMSMplus provides both Bayesian and frequentist inference. Bayesiananalysis uses Markov chain Monte Carlo (MCMC) algorithms. Posteriordistributions can be monitored by trace and autocorrelation plots.Convergence can be monitored by the Gelman-Rubin potential scalingreduction using parallel computing in multiple MCMC chains. Posteriorpredictive checks are provided.Frequentist analysis uses maximum likelihood and weighted leastsquares estimators. Mplus provides maximum likelihood estimation forall models. With censored and categorical outcomes, an alternativeweighted least squares estimator is also available. For all types ofoutcomes, robust estimation of standard errors and robust chi-squaretests of model fit are provided. These procedures take into account nonnormality of outcomes and non-independence of observations due tocluster sampling. Robust standard errors are computed using thesandwich estimator. Robust chi-square tests of model fit are computedusing mean and mean and variance adjustments as well as a likelihoodbased approach. Bootstrap standard errors are available for mostmodels. The optimization algorithms use one or a combination of thefollowing: Quasi-Newton, Fisher scoring, Newton-Raphson, and theExpectation Maximizatio
1998; Version 2 was released in February 2001; Version 3 was released in March 2004; Version 4 was released in February 2006; Version 5 was released in November 2007, Version 6 was released in April 2010; and Version 7 was released in September 2012. After four expansions of Version 7 during the last five years, we are now proud to present the .
Polarization in Survey Data The main objects of interest are latent constructs (measured through multiple manifest variables). Information about distributional parameters of latent variables provided by relevant statistical software is limited. Measuring polarization for aggregated factor scores seems to be an inaccurate approach due
Machine Learning for Computer Vision Expectation-Maximization EM is an elegant and powerful method for MLE problems with latent variables Main idea: model parameters and latent variables are estimated iteratively, where average over the latent variables (expectation) A typical exam
Structural equation modeling Item response theory analysis Growth modeling Latent class analysis Latent transition analysis (Hidden Markov modeling) Growth mixture modeling Survival analysis Missing data modeling Multilevel analysis Complex survey data analysis Bayesian analysis Causal inference Bengt Muthen & Linda Muth en Mplus Modeling 9 .
Latent print analysis is defined as experience in comparison of latent prints with inked and/or imaged prints, experience in crime scene processing for latent prints, all phases of physical evidence processing, and expert testimony to the
Topic models were inspired by latent semantic indexing (LSI,Landauer et al.,2007) and its probabilistic variant, probabilistic latent semantic indexing (pLSI), also known as the probabilistic latent semantic analysis (pLSA,Hofmann,1999). Pioneered byBlei et al. (2003), latent Dirichlet alloca
Sep 30, 2021 · 1.8.4.1. The Latent Print Analyst should, when possible, examine the item first. 1.8.4.2. Prior to conducting any part of the latent print examination, the Latent Print Analyst shall ensure that the firearm is safe. If there is any question as to the safety of the f
analysis, factor analysis, structural equation modeling, and growth mixture modeling. Due to lack of space, survival, latent class, and latent transition analysis are not covered. All of these topics, how-ever, are covered within the latent variable framework of the Mplus software, which is the basis for this chapter. A technical description
Abstract We consider the use of interventions for resolving a problem of unidentified statistical models. The leading examples are from latent variable modelling, an influential statistical tool in the social sciences. We first explain the problem of statistical identifiability and contrast it with the identifiability of causal models.