Spectral Multivariate Calibration With Wavelength .

2y ago
43 Views
2 Downloads
406.43 KB
8 Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Dani Mulvey
Transcription

Spectral Multivariate Calibration with Wavelength SelectionUsing Variants of Tikhonov RegularizationJOSHUA OTTAWAY, JOHN H. KALIVAS,* and ERIK ANDRIESDepartment of Chemistry, Idaho State University, Pocatello, Idaho 83209 (J.O., J.H.K.); Department of Mathematics, Central New MexicoCommunity College, Albuquerque, New Mexico 87106 (E.A.); and Center for Advanced Research Computing, University of New Mexico,Albuquerque, New Mexico 87106 (E.A.)Tikhonov regularization (TR) is a general method that can be used toform a multivariate calibration model and numerous variants of it exist,including ridge regression (RR). This paper reports on the uniqueflexibility of TR to form a model using full wavelengths (RR), individuallyselected wavelengths, or multiple bands of selected wavelengths. Of thesethree TR variants, the one based on selection of wavelength bands is foundto produce lower prediction errors. As with most wavelength selectionalgorithms, the model vector magnitude indicates that this error reductioncomes with a potential increase in prediction uncertainty. Results arepresented for near-infrared, ultraviolet–visible, and synthetic spectraldata sets. While the focus of this paper is wavelength selection, the TRmethods are generic and applicable to other variable-selection situations.Index Headings: Tikhonov regularization; Wavelength selection; Variableselection; Multivariate calibration.INTRODUCTIONMultivariate calibration models used to analyze spectroscopic data have the general formy ¼ Xbð1Þwhere X contains m calibration samples measured at nwavelengths, b is an n 3 1 model vector, and y is an m 3 1vector containing the analytical information for the analyte,such as concentration. To predict future samples, Eq. 1 mustfirst be solved for b, typically accomplished mathematically byb̂ ¼ Xþ y, where Xþ is a generalized inverse of X. There areseveral approaches to calculating Xþ, the most common ofwhich are ridge regression (RR), partial least squares (PLS),and principle component regression (PCR).1–3 The method ofRR is actually a form of Tikhonov regularization (TR)4,5 asfurther described in the Mathematics section. With an estimateof b, a new sample spectrum is predicted for its analyteconcentration by y ¼ xtb̂, where x denotes a column vector ofthe spectral responses.While the methods of RR, PLS, and PCR can be used withall wavelengths measured (full spectral methods where n . m),reduced prediction errors are common when care is taken toselect wavelengths spanning useful analyte predictive information.6–23 However, this prediction error reduction comeswith a tradeoff of potential prediction variance inflation.15,18,22,24,25 The methods of RR, PLS, or PCR can beused when wavelengths are selected such that n m or n , m.When the latter is true, the method of multiple linear regression(MLR) can also be used.1–3Methods of wavelength selection are usually one of twomodes. One is to select individual wavelengths and the other isReceived 27 May 2010; accepted 8 September 2010.* Author to whom correspondence should be sent. E-mail: kalijohn@isu.edu.1388Volume 64, Number 12, 2010to determine wavelength intervals (bands).13,21,23 Wavelengthselection algorithms such as genetic algorithms or simulatedannealing commonly necessitate lengthy iterative sequentialprocesses involving wavelength selection, model formingusing RR, PLS, or another method, and prediction testing ofthe selected wavelengths.6,14–16 These and other optimizationalgorithms often require user-set operating parameters. Alteringthese parameters for a specific algorithm can result in differentsubsets of wavelengths being selected and, hence, the dilemmaof selecting wavelengths with chance correlations.13,17–19 Thelikelihood of this dilemma occurring increases as the ratio ofwavelengths to samples increases.20A common approach to wavelength selection is to rankwavelengths according to a merit that reflects importance andthen using an empirically determined cutoff to retain the finalset.6–8,21,26,27 One measure for ranking is the magnitude ofcoefficients in a full-wavelength model vector, such as wouldbe obtained from PLS. In this case, multiple bands ofwavelengths are commonly selected, i.e., adjacent wavelengthsoften obtain nearly equivalent model coefficients. References 6,21, 23, and 28 recently reviewed many of these methods aswell as interval PLS.29An alternative to wavelength selection algorithms arevariants of TR with built-in wavelength selection processes.These variants of TR select wavelengths and build modelssimultaneously where individual wavelengths and/or multiplebands of wavelengths can be selected. Basically, the fullwavelength TR model coefficients can have values at or nearzero. These wavelengths have nearly no affect on predictionand are essentially considered non-selected wavelengths.Because TR models are formed in a systematic process, theL-curve approach is useful to determine the final wavelengthselected model.3–5,30 The L-curve approach to meta-parameterselection is well documented. Briefly, a measure reflective ofprediction variance is plotted against a prediction-biasdiagnostic and the best set of wavelengths are those for modelsin the corner of the resulting L-curve, i.e., models with anacceptable bias/variance tradeoff. Many wavelength selectionalgorithms only use a prediction bias information criterion toidentify acceptable wavelengths, i.e., the prediction-biasdiagnostic is used to guide the algorithm in selectingwavelengths. Algorithmic optimization of only a bias diagnostic such as the root mean square error of calibration (RMSEC)or root mean square error of validation (RMSEV) typicallyresults in over-fitting.13,22,31–33 The TR wavelength-selectionvariants require determination of a model meta-parameter(tuning parameter) for an acceptable bias/variance tradeoff.This is no different than other biased modeling methods suchas PLS or PCR in full-wavelength modes or in wavelengthselection studies requiring a meta-parameter determination toidentify the final model.0000–0000/10/6412-1388 2.00/0Ó 2010 Society for Applied SpectroscopyAPPLIED SPECTROSCOPY

Studied in this paper are three variants of TR used to formmodels with full wavelengths or selected wavelengths(individual wavelengths and/or wavelength bands). The Lcurve is used to select the final models. The three TRapproaches include (1) a model vector two-norm (L2)minimization criterion for a full-wavelength model, (2) amodel vector L2 minimization criterion that includes a prioriinformation to weight model coefficients favoring selection ofmultiple wavelength intervals, and (3) a model vector onenorm (L1) minimization criterion that may or may not include apriori information to weight model coefficients favoringindividual wavelength selection. Simulated spectral data areassessed to verify algorithm operations. Following this,ultraviolet–visible and near-infrared spectral data sets areevaluated.MATHEMATICSRidge Regression. Ridge regression model vectors for Eq. 1obtained by TR are those that satisfyminðjjXb yjj22 þ k2 jjbjj22 Þð2Þelements being spectral noise estimates of the respectivewavelengths.38,39 Using spectral noise estimates is a form ofwavelength selection as noisy wavelengths obtain modelcoefficients at or near zero.This approach of wavelength selection is again studied inthis paper for comparison to using a full-wavelength RRdetermined model vector on the diagonal of M where the ithdiagonal element is 1/jb̂ij for the ith RR model wavelengthcoefficient. The idea is that using a priori information aboutexpected model coefficient magnitudes and, hence, possibly ameasure of importance, a greater emphasis can be put on keywavelengths (bands) and less emphasis on those wavelengthswith near-zero or small RR model coefficients; therefore, abetter model vector should result. Other studies have found ituseful to select wavelengths based on the magnitudes of PLSmodel coefficients.7–9,26 Because the spectral noise structure isnot always known, using a full-wavelength model vector on thediagonal of M is more feasible.Using Expression 5 will henceforth be referred to as TR 2 inthis paper. When M ¼ I in Expression 5, TR 2 will be referredto as RR for distinguishing the full-wavelength form of TR.Computation of the models is obtained from the followingwhere jj jj indicates the vector norm and the subscript 2 definesthe 2-norm (L2), also termed the Euclidean norm, and k is themeta-parameter controlling the weight given to the second termrelative to the first term. The 2-norm for b is computed by jjbjj2¼ ðRni¼1 b2i Þ1 2 . For Expression 2, RR models are obtained byvarying the k meta-parameter and computing respective modelsusingand the L-curve approach is used to determine k.Tikhonov Regularization in L1 (LASSO TR). Anothergeneral form of TR found useful in calibration maintenanceand transfer studies isb̂ ¼ ðXt X þ k2 IÞ 1 Xt yminðjjXb yjj22 þ sjjMb yM jj11 Þð3Þwhere I denotes the identity matrix.The well documented L-curve approach is valuable indetermining a value for k and, hence, the final model.3–5,30 TheL-curve has also been used to determine the number of basisvectors for PLS and PCR,3,34 variable selection for MLR,30 andin TR calibration maintenance and transfer studies.35–37To form the RR L-curve and select k, model vector 2-norms(the greater the magnitude, the greater the likelihood of overfitting and increased uncertainty in the prediction) are plottedagainst the respective RMSEC values (a bias indicator). In sucha plot, an L-shaped curve is formed and the better modelsreside in the corner region of the L-curve with acceptabletradeoffs in the plotted criteria. Other model diagnostics, suchas R2, can be plotted.34Tikhonov Regularization in L2 (TR 2). A more generalform of Expression 2 found useful for calibration maintenanceand transfer35–37 isminðjjXb yjj22 þ k2 jjMb yM jj22 Þð4Þwhere M and yM represent a set of spectra and analyte valuesmeasured under conditions different than the calibrationsamples in X and y. When M ¼ I and yM ¼ 0, Expression 4reduces to RR in Expression 2 and TR is said to be in standardform.4 A key variant of Expression 4 isminðjjXb yjj22 þ k2 jjMbjj22 Þð5ÞThe M matrix has been set to derivative operators inExpression 5 for computing a smoothed model vector38 andM has also been set to a diagonal matrix with diagonalb̂ ¼ ðXt X þ k2 Mt MÞ 1 Xt yð6Þð7Þwhere the subscript 1 indicates the vector 1-norm (L1) and srepresents the weight given to the second term.35,37 The 1-normfor b is computed by jjbjj1 ¼ Rni¼1 jbij. If only wavelengthselection is desired, then Expression 7 is simplified tominðjjXb yjj22 þ sjjMbjj11 Þð8Þwhere, as with Expression 5, M will be set to a diagonal matrixwith spectral noise structure estimates or a previous RR modelvector (1/jb̂ij). When M ¼ I, Expression 8 further reduces to thevariable selection method known as the least absoluteshrinkage and selection operator (LASSO),2,40 albeit theapproach was developed earlier.41,42 When M in Expression8 is set to a diagonal matrix (M 6¼ I), the minimization hasbeen termed adaptive LASSO.2,43 Using Expression 8 in thispaper will be referred to as LASSO TR whether M ¼ I or M is adiagonal matrix with either the spectral noise structure or 1/jb̂ijfrom a RR model.Tikhonov regularization in the LASSO format with M ¼ Ihas been used for wavelength selection and found to producelower prediction errors than full-wavelength models.11,12 InRef. 12, the L-curve approach with the L1 norm of the modelvectors was used instead of the L2 norm to determine anacceptable model. Such an L-curve will be used in this study.Reported in this study is the first application of LASSO TRwith M 6¼ I for the specific intent of wavelength selection.The least angle regression (LAR) algorithm44 is one ofseveral algorithms that can be used for LASSO. At eachiteration, a non-zero model vector coefficient in b is added to orremoved from the model vector in the previous iteration. Thevalue s ¼ 0 forms the classical least squares solution (modelAPPLIED SPECTROSCOPY1389

vector in the last iteration of the LAR algorithm). In this paper,the LAR algorithm is used to solve Expression 8 with themodification that the columns of X are not scaled to mean zeroand standard deviation of one (autoscaled) as originallysuggested for LAR.44 It was found that with autoscaled data,inappropriate wavelengths are selected, i.e., wavelengths withthe best sensitivity are not selected for the data sets used in thispaper as all wavelengths are seen to have equal sensitivity afterautoscaling. When M ¼ I, the LAR algorithm can be directlyused as this is LASSO. However, when M is a diagonal matrix,then processes described in Refs. 4, 30, and 45 are used totransform the data and Expression 8 is now written asminðjjX̄b̄ ȳjj22 þ sjjb̄jj11 Þð9Þwhich can now be solved by the LAR algorithm. The barindicates transformed data and the LAR-estimated modelvector b̄ˆ must be transformed back to the estimated modelvector b̂ that is used to obtain predicted concentrations. Whilethis transformation process is new to using LASSO TR, thetransformation has been used with TR 2, PLS, and PCR inorder to obtain smooth model vectors using derivativeoperators for M. The transformation has also been appliedwith penalty weights in M based on spectroscopic noise.38 Thetransformation process was also recently used with TR writtenas Expression 7 for calibration maintenance and transfer.35,37EXPERIMENTALApparatus and Algorithms. MatLab 7.8 (The Math Works,Natick, MA) programs for RR, TR 2, and LASSO TRalgorithms as described in the Mathematics section werewritten by the authors. All programs were run on an Intel Core2 Quad personal computer.Data Sets. Two simulated data sets are studied to test thealgorithms and understand how the algorithms work, i.e., whattype of wavelengths are the TR variants selecting (selectivityversus sensitivity). These two simulated data sets havepreviously been used.12,38,39 A near-infrared spectroscopicdata set also previously studied12 is evaluated in this paper.Also assessed is a visible spectroscopic inorganic data set. Allsamples are mean centered relative to the calibration samples inX and y. As in previous work, the simulated and near-infrareddata sets are split by first sorting samples according toconcentration magnitudes in y and then every other sample wasplaced into the validation set, with the remaining samplesforming the calibration set. The same procedure was followedfor the inorganic data set except all replicate spectra for asample are placed in the validation set and similarly for thecalibration set.Simulated Set 1. The simulated wavelength selection dataset used in Refs. 12, 38, and 39 is again examined in this study.The broad-banded spectral data set is based on a Gaussiancurve to simulate the pure-component analyte spectrum at unitconcentration over 50 wavelength units while using a secondGaussian curve to simulate an overlapping pure-componentinterferent spectrum at unit concentration (see Fig. 1a).Random concentrations ranging from 0 to 1 over 66 samplesare used to form respective mixture spectra from the product ofconcentrations with pure-component spectra. Random homoscedastic noise from a normal distribution with a zero meanand standard deviation of one was added to spectra at 1%maximum peak height of the respective mixture spectra1390Volume 64, Number 12, 2010FIG. 1. Simulated set 1 model vectors using RR or TR 2 (solid lines) andLASSO TR (dotted line). (a) M ¼ I (RR), (b) diagonal of M set to the RRmodel vector (TR 2), and (c) diagonal of M set to the spectral noise structure(TR 2). Pure-component spectra at unit concentration without noise added forthe anlayte (solid line) and the interferent (dash line) shown in (a).followed by additional noise at 3% maximum peak height tomixture spectra from wavelengths 20 through 30 (the spectraloverlap region).Simulated Set 2. The simulated wavelength selection dataset from Ref. 12 is also reexamined in this study. Narrowpeaked pure-component spectra at unit concentration werecreated using Gaussian peaks five wavelength units wide with aone-wavelength baseline separating each peak across 61wavelengths (see Fig. 2a). Analyte peaks are centered at everysixth wavelength from 4 to 52. Interferent peaks are centered at

moisture values.47 Spectra were measured from 1100 to 2498nm at 2-nm intervals on a near-infrared spectrometer. Everytenth recorded wavelength was used, yielding 70 wavelengths.RESULTS AND DISCUSSIONSimulated Set 1. When using M ¼ I, values tabulated inTable I show that LASSO TR produces a lower RMSEV thanRR. This comes with a tradeoff with a potential increase inprediction variance as indicted by the increase in the model 2norm of the regression vector, which agrees with previouswork on this data set.38,39When coefficients from a full-wavelength RR model areused as 1/jb̂ij for the diagonal elements of M, results presentedin Table I reveal that TR 2 reduces the prediction errorcompared to using M ¼ I for RR. Additionally, predictionresults from TR 2 are improved over LASSO TR with M ¼ I orthe diagonal of M set to the same full-wavelength RR model.Compared to the RR model vector shown in Fig. 1a, the modelfrom TR 2 plotted in Fig. 1b based on the diagonal of M set tothe RR model vector is more similar to the LASSO TR modelvectors plotted in Figs. 1a and 1b with M ¼ I or the diagonal ofM set to the RR model vector, respectively. Thus, TR 2 is ableto eliminate unnecessary wavelengths, creating a model vectorsimilar to the wavelength selected LASSO TR models. Figures1a and 1b also show that there is little difference in the LASSOTR model vectors with M formed as the identity matrix or asthe diagonal matrix.FIG. 2. Simulated set 2 model vectors using RR or TR 2 (solid lines) andLASSO TR (dotted line). (a) M ¼ I (RR) and (b) diagonal of M set to the RRmodel vector (TR 2). Pure-component spectra at unit concentration withoutnoise added for the anlayte (solid line) and the interferent (dash line) in (a). Anoffset of 0.3 has been added to the analyte for visual clarity.wavelengths 22, 28, 34, 40, and 52, producing four peaks withperfect selectivity for the analyte, centered at wavelengths 4,10, 16, and 46. Sixty-six sample concentrations for bothanalyte and interferent were created from the absolute values ofrandom numbers with a normal distribution of a mean of zeroand a standard deviation of one. Mixture spectra are theproduct of the concentrations with the pure-component spectra.Random heteroscedastic noise with a normal distribution ofmean zero and standard deviation of one was added to themixture spectra at 1% of respective wavelength spectral values.Inorganic. The inorganic data set is a three-componentsystem of Co II, Cr III, and Ni II described in Ref. 46. Based ona three-level, three-factor calibration design, 26 samples wereprepared. Five replicate spectra were obtained for each samplein randomized blocks creating 128 spectra after removing twospectra as outliers due to an offset. Absorbances were measuredfrom 350 to 650 nm at 2-nm intervals with a diode arrayspectrophotometer, yielding spectra with 151 wavelengths.Pure-component and mixture spectra are shown in Fig. 3. Thestandard deviation at each wavelength for each spectrum is alsoprovided. For the purposes of this study Cr was analyzed. Thenoise structure for M is the mean of the provided calibrationstandard deviations.Corn. The corn data set as used in Ref. 12 is reexamined inthis study and consists of 80 corn samples with referenceFIG. 3. (a) Pure-component spectra of Co (solid line), Cr (dashed line), and Ni(dotted line) and (b) mixture spectra.APPLIED SPECTROSCOPY1391

TABLE I. Simulated data set 1 results.MethodRRTR 2TR 2LASSO TRLASSO TRLASSO TRMRMSEVR2jjb̂jj2k or sIRR modelNoiseIRR 271.6361.64

selection; Multivariate calibration. INTRODUCTION Multivariate calibration models used to analyze spectroscop-ic data have the general form y ¼ Xb ð1Þ where X contains m calibration samples measured at n wavelengths, b is an n 3 1 model vector, and y is an m 3 1 vector containing the analy

Related Documents:

Multivariate calibration has received significant attention in analytical chemistry, particularly in spectroscopy. Martens and Naesl provide an excellent general reference on multivariate calibration. Examples of multivariate calibration in a spectroscopic context are associated w

6.7.1 Multivariate projection 150 6.7.2 Validation scores 150 6.8 Exercise—detecting outliers (Troodos) 152 6.8.1 Purpose 152 6.8.2 Dataset 152 6.8.3 Analysis 153 6.8.4 Summary 156 6.9 Summary:PCAin practice 156 6.10 References 157 7. Multivariate calibration 158 7.1 Multivariate modelling (X, Y): the calibration stage 158 7.2 Multivariate .

Multivariate Calibration Quick Guide 3 You are now ready to setup the calibration model. Select the Soybean Oil project node in the Project explorer. Choose New Multivariate Calibration from the Quantify menu. The calibration wizard opens and guides you through

Quantify – Multivariate Calibration Overview Calibration Model Development Steps Load calibration spectra Merge all spectra into one data view Create a project (and a folder) Add labels with concentrations (Label Editor) Start the calibration wizard Create and optimize a calibration

with chemometrics of multivariate calibration (Che Man et al., 2010). Two multivariate calibrations commonly used are partial least square (PLS) and principal component regression (PCR). Both calibration methods are based on reduction of spectral dat

Multivariate calibration Multivariate calibration was initially conducted by preprocessing raw data of scanned spectral into five types of UV-Vis spectral such as original, first derivative, second derivative, SNV, and SG. The purpose of spectral preprocessing was to improve the subsequen

Multivariate calibration models are secondary analytical techniques in that they require primary analytical data for calibration. Thus, a robust multivariate calibration model requires a sufficient number of representative sampl

In astrophysics, we use ideas from the various parts of physics - electromagnetism, gravitation, theory of matter, mechanics, quantum theory - to explain what we can see. It’s like being a detective. There is what we observe (the evidence) and there is piecing it together (the thinking). The first year, and a major part of the second year, cover skills and the fundamental principles. The .