User’s Guide To The Weighted-Multiple-Linear Regression .

2y ago
27 Views
2 Downloads
5.16 MB
30 Pages
Last View : 27d ago
Last Download : 3m ago
Upload by : Randy Pettway
Transcription

User’s Guide to the Weighted-Multiple-LinearRegression Program (WREG version 1.0)Techniques and Methods 4–A8U.S. Department of the InteriorU.S. Geological Survey

User’s Guide to the WeightedMultiple-Linear Regression Program(WREG version 1.0)By Ken Eng, Yin-Yu Chen, and Julie E. KiangTechniques and Methods 4–A8U.S. Department of the InteriorU.S. Geological Survey

U.S. Department of the InteriorKEN SALAZAR, SecretaryU.S. Geological SurveyMarcia K. McNutt, DirectorU.S. Geological Survey, Reston, Virginia: 2009For more information on the USGS—the Federal source for science about the Earth, its natural and living resources,natural hazards, and the environment, visit http://www.usgs.gov or call 1–888–ASK–USGS.For an overview of USGS information products, including maps, imagery, and publications,visit http://www.usgs.gov/pubprod.To order this and other USGS information products, visit http://store.usgs.gov.Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by theU.S. Government.Although this report is in the public domain, permission must be secured from the individual copyright owners toreproduce any copyrighted materials contained within this report.On the cover: Examples of plots resulting from analysis of data with the weighted-multiple-linearregression (WREG) program (version 1.0).Suggested citation:Eng, Ken, Chen, Yin-Yu, and Kiang, J.E., 2009, User’s guide to the weighted-multiple-linear-regression program(WREG version 1.0): U.S. Geological Survey Techniques and Methods, book 4, chap. A8, 21 p. (Also available athttp://pubs.usgs.gov/tm/tm4a8.)

iiiContentsIntroduction.1Multiple-Linear Regression.2Independent and Dependent-Variable Transformations.2Estimation of Multiple-Linear-Regression Parameters.2Ordinary Least Squares (OLS).3Weighted Least Squares (WLS).3Generalized Least Squares (GLS).4Performance Metrics.5Model- and Time-Sampling Errors.5Coefficient of Determination, R2, R2adj, and R2pseudo. 6Leverage and Influence Statistics.6Significance of Regression Parameters.7Definition of Regions.7Use of the WREG Program.8Program Requirements.8Installation.8Input 13Running the Program.13Set Up Model.13Select Variables.13Select Transformations.13Model Selection.15GUI Outputs.16Regression Summary.16Residuals Versus Estimated Flow Characteristics.17Leverage Values Versus Observations.17Influence Values Versus Observations.17Output Files.17ConventionalOLS.txt, ConventionalWLS.txt, and ConventionalGLS.txt.18RegionofInfluenceOLS.txt, RegionofInfluenceWLS.txt, and nvXLX.txt.20SSres.txt and SStot.txt.20EventLog.txt.20

ivOther Program Notes.20Acknowledgments.20References .16.17.18.19.20.21.Examples of input file SiteInfo.txt in text tab-delimited format.9Example of input file FlowChar.txt in text tab-delimited format.11Example of input file LP3G.txt.11Example of input file LP3K.txt.11Example of input file LP3s.txt.12Example of input file UserWLS.txt.12An example of a USGS########.txt file for streamflow-gaging station 09183000.13Example of WREG window used to select variables to be used in the regression.13Examples of WREG window for selecting transformations.14Example of WREG window for selecting the regression.15Example of WREG window for selecting parameters of the smoothing function forcorrelation as a function of distance between streamflow-gaging stations.15Example of WREG window for selecting option to include uncertainty in skew.16WREG window showing the regression results for an OLS regression using theparameters and transformations specified in figure 9.16Examples of plots.16Example of output file ConventionalOLS.txt.17Example of output file ConventionalWLS.txt.18Example of output file ConventionalGLS.txt.18Example of output file RegionofInfluenceOLS.txt.19Example of regression model equation shown by RegressionModel.txt.20Example of InvXLX.txt output file.20Examples of SSres.txt and SStot output files.20Tables1. WREG input files.82. WREG output files.17

vConversion FactorsMultiplyinch (in.)inch (in.)foot (ft)mile (mi)mile, nautical (nmi)yard (yd)ByLength2.5425.40.30481.6091.8520.9144To obtaincentimeter (cm)millimeter (mm)meter (m)kilometer (km)kilometer (km)meter (m)

User’s Guide to the Weighted-Multiple-Linear RegressionProgram (WREG version 1.0)By Ken Eng, Yin-Yu Chen,1 and Julie E. KiangIntroductionStreamflow is not measured at every location in a streamnetwork. Yet hydrologists, State and local agencies, and thegeneral public still seek to know streamflow characteristics,such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guideare to introduce and familiarize the user with the weightedmultiple-linear regression (WREG) program, and to alsoprovide the theoretical background for program features. Theprogram is intended to be used to develop a regional estimation equation for streamflow characteristics that can be appliedat an ungaged basin, or to improve the corresponding estimateat continuous-record streamflow gages (henceforth referred toas simply gages) with short records. The regional estimationequation results from a multiple-linear regression that relatesthe observable basin characteristics, such as drainage area, tostreamflow characteristics (for example, Thomas and Benson,1970; Giese and Mason, 1993; Ries and Friesz, 2000; Eng andothers, 2005; Eng and others, 2007a; Eng and others, 2007b;Kenney and others, 2007; Funkhouser and others, 2008).The general multiple-linear regression for estimating astreamflow characteristic can be given by,(1)y is the streamflow characteristic (dependentvariable),xik are basin characteristics (independentvariables),i ( 1, 2, 3, , n) is the index for gage i,k is the number of basin characteristics,β0, β1, β2, and βk are the regression parameters, andδi is the model error.A critical issue in regional analyses is to understandthe various sources of variability and error in the data. Byunderstanding these sources, a user can select the appropriateapproaches to estimate the regression parameters in equation1, and the appropriate network of gages forming the regionused to develop an estimate. Three approaches to estimatewhereFormer U.S. Geological Survey volunteer.1regression parameters are provided in the WREG program:ordinary-least-squares (OLS), weighted-least-squares (WLS),and generalized-least-squares (GLS). All three approaches arebased on the minimization of the sum of squares of differences between the gage values and the line or surface definedby the regression. The OLS approach is appropriate for manyproblems if the δi values are all independent of one another,and they have the same variance. Streamflow characteristicsare estimated at gages using the available length of streamflowrecord. Because the length of record varies among gages, theprecision of these estimates also varies, meaning that differentδi values will have different variances.A way to address the variation in the precision of estimatedstreamflow characteristics at each gage is to weight gages differently using WLS or GLS. A WLS approach reflects the precision of the estimated streamflow characteristic at that gage. Anadditional issue is that concurrent flows observed at differentgages in a region exhibit cross correlations. If these correlationsare not represented in a regional analysis, the regression parameters are less precise, and estimators of precision are inaccurate.A regional analysis that accounts for the precision of estimatedstreamflow characteristics, and the cross correlations amongthese characteristics, is known as GLS.In addition to the variation and error in the data, theregression parameters in equation 1 are impacted by the choiceof a network of gages forming a region. In a “conventional”regression, a region can be defined in several ways before amultiple-linear-regression study is initiated, such as by politicalboundaries or by physiographic boundaries. Within the context of “conventional” regressions, regions can also be definedduring the regression study by using geographic information asan independent variable in the regression. Such regions can bedefined using a variety of criteria, such as geographic groupingof similar residuals from an overall regression (Wandle, 1977),use of watershed boundaries (Neely, 1986), or physiographiccharacteristics. When performing a conventional analysis, a userof WREG must define the regions before using the program.WREG allows the user to perform conventional regressionsusing either OLS, WLS, or GLS.An alternative to the conventional approach of pre-definingregions using political or physiographic boundaries is to definea region for each location of interest. This “region-of-influence”(RoI) regression approach defines a region and associated

2   User’s Guide to the Weighted-Multiple-Linear Regression Program (WREG, v. 1.0)multiple-linear regression for every ungaged basin (for example,Acreman and Wiltshire, 1987; Burns, 1990; Tasker and others, 1996; Merz and Blöschl, 2005; Eng and others, 2005; Engand others, 2007a; Eng and others, 2007b). A regression isformed on a subset of gages for which the values of independentvariables are, by some measure, closest to those at the ungagedbasin of interest. While the WREG program allows testing ofRoI regressions, the application of RoI regression to ungagedbasins must be accomplished using other programs, such as theNational Streamflow Statistics (NSS) Program (Ries, 2006).The first part of this report provides an overview of themultiple-linear regression techniques that are employed byWREG. It is followed by a step-by-step guide to the actual useof the program, including a description of the input files, theuse of the graphical user interface, and an explanation of theoutput files.Independent and Dependent-VariableTransformationsThe independent and dependent variables used inequation 5 can be transformed to obtain a linear relationship between the and X values. Common transformationsinclude log (base 10), log (natural), and addition or subtraction of a constant. A user of WREG must choose appropriatetransformations in the graphical user interface (GUI) before amultiple-linear regression is performed. A general transformation equation used by WREG is given as,(6)V is the dependent or independent variableto be transformed,Vnew is the transformed independent ordependent variable,f is either the log (base 10), log (natural),or exponential function, or atransformation can be omitted.C1, C2, C3, and C4 are constants entered by the user.Use of equation 6 for transforming variables is further discussed in the section of this report titled Select Transformations.whereMultiple-Linear RegressionIn practice, the dependent variable, yi, in equation 1 is anestimate, , often obtained from a limited sample size at eachgage. The associated time-sampling error for the ith gage, ηi, isdefined by.(2)Substituting equation 2 into equation 1 gives,(3)where εi δi ηi (δi as given by equation 1).The ηi values from gages close together will generally becorrelated, because the finite sample of observed streamflowsat one gage temporally overlaps the sample from another andtemporal variations of streamflows are spatially correlated.Thus, the cross correlation between ηi and ηj for gage i and jwill depend upon the cross correlation of concurrent flows atthe two gages, and the number of concurrent years of recordincluded in the dataset.For a collection of gages with associated dependent andindependent variables, equation 3 can be conveniently writtenin matrix notation as,(4)whereEstimation of Multiple-LinearRegression ParametersFollowing transformations of the dependent and independent variables, the transformed variables are used in WREGto estimate multiple-linear-regression parameters. When usingany of the least squares regression approaches (OLS, WLS, orGLS), the regression parameters are estimated by.(7)where XT is the transpose of matrix X,Λ-1 is the inverse of the weighting matrix L (I L-1L,where I is equal to the identity matrix).The L matrix is constructed differently for OLS, WLS,and GLS, as described in the following sections. Once isdetermined, it can be used to estimate the regression estimateof at the ith gage, iR, as., (5)and the total error, ε, is a random variable with a mean equalto zero and variance equal to σε2.(8)However, a user should first check if the regression isadequate.The estimators of L used in WREG for WLS and GLSapproaches are applicable only to frequency-based streamflowcharacteristics. Alternative estimators of L to those presentedin this manual can be explored using a user-defined option in

Estimation of Multiple-Linear-Regression Parameters   3WREG (see section UserWLS.txt) for non-frequency-basedstreamflow characteristics, such as flow-duration exceedences.Ordinary Least Squares (OLS)For the OLS approach, β is estimated by (for example,Montgomery and others, 2001), (9)where.(10)The OLS approach is suitable for estimating regression parameters when there is no variation in the precision ofcalculated dependent variables among gages, and the errors inequation 5 are independent of each other.whereis the model-error variance,mi is the record length for the ith gage,is the observed mean-square error (MSE)of estimate using ordinary-least-squaresapproach,is the arithmetic average of the log-Pearson TypeIII deviates for all gages in the regression, andis the arithmetic average of the skew values at allgages (either at-gage skew, g, or weightedskew, Gw ; explained below).The log-Pearson Type III deviate values are a function ofprobability of exceedence and g (Interagency Advisory Committee on Water Data, 1982). is the arithmetic average ofstandard deviation of the annual-time series of the streamflowcharacteristic estimated by regression. This “sigma regression”is determined by OLS regression of the standard deviation ofthe annual-time series at each gage against basin characteristics at each gage (Tasker and Stedinger, 1989),,(15)σi is the standard deviation of the annual-timeseries of the streamflow characteristicfor the ith gage,xik is the kth basin characteristic for the ithgage,α0, βσ1, βσ2, and βσk are parameters, andεσ is the model error for the sigma regression.The σi values are a required input into the WREG program as discussed in section LP3s.txt.The weighted skew for the ith gage is given by (Bulletin 17Bof the Interagency Advisory Committee on Water Data, 1982)whereWeighted Least Squares (WLS)For the WLS approach, β is estimated by (for example,Tasker, 1980),(11)whereLWLS is the covariance matrix used to determineweights.The components of the LWLS matrix are a function of thetype and source of the dependent variable. As with the OLSapproach, the WLS approach is suitable when the errors inequation 5 are independent. However, for the WLS approach,weights in the weighting matrix are assigned so that gages thathave more “reliable” estimates of streamflow characteristicshave larger weights.For streamflow characteristics calculated from a logPearson Type III frequency analysis (Bulletin 17B of the Interagency Advisory Committee on Water Data, 1982), Tasker(1980) provides a method for estimation of LWLS that is usedby WREG for this option:,(12)where, and(13),(14),(16)where GR,i is the regional skew estimate applicable to the ithgage, and,where(17)is equal to the estimated mean square error ofthe skew value at the gage, andis the estimated mean square error of theregional skew values.A variety of methods are available to determine GR values(Bulletin 17B of the Interagency Advisory Committee onWater Data,1982). Either g or Gw values are required input toWREG program as discussed in section LP3G.txt.An alternative approach to calculating is presented byStedinger and Tasker (1986). Their estimator is demonstratedto be more precise than equation 13, but their study did notinclude a mix of approaches to compute streamflow characteristics at partial-record-stream gages (Funkhouser and others,2008). Use of equations 12 to 14 within the WREG programallows future versions to account for this mix of approachesfor partial-record-stream gages.

4   User’s Guide to the Weighted-Multiple-Linear Regression Program (WREG, v. 1.0)Generalized Least Squares (GLS)For streamflow characteristics calculated from a log-Pearson Type III frequency analysis, a GLSapproach described by Stedinger and Tasker (1985) builds on the WLS approach by accounting forboth correlated streamflows and time-sampling errors. This GLS approach estimates the β values by,(18)where LGLS is a matrix containing the estimates of the covariances of εi among gages.The main diagonal elements of LGLS thus include a part associated with the model error, δi, andall elements include the effect of the time-sampling error, ηi. In Tasker and Stedinger (1989), LGLS isestimated by,(19)where i and jGi and Gjmi and mjmijρijare indices of locations of gages in the region of interest,are skew values equal to either g or Gw (equation 16) values for gages i and j,are record lengths for gages i and j,is the concurrent record length for gages i and j, andis an estimated value for the cross-correlation of the time series of flow values used tocalculate the streamflow characteristic at gages i and j.Values of the cross-correlation are estimated approximately by (Tasker and Stedinger, 1989),(20)where dij is the distance between gages i and j in miles, andθ and α are dimensionless parameters estimated from data as discussed in section ModelSelection.Thevalues in equation 19 and the values in equation 18 are jointly determined by iterativelysearching for a nonnegative solution to (Stedinger and Tasker, 1985).(21)Equation 19 does not account for error associated with estimating G. Depending on the actualmagnitude of errors in estimation of skew, this additional error may unduly influence the estimationof β. Griffis and Stedinger (2007) proposed an approach to account for the uncertainty in the skewestimates in LGLS, and this approach is used as an option by WREG. As implemented in WREG, thisoption assumes that weighted skews are provided by the user, and so this option should be activatedonly when weighted skews were used. The modified LGLS matrix, LGLS,skew, is given by, (22)

Performance Metrics   5whereandare the partial derivatives for gages iand j calculated from the Kite (1975;1976) approximation for K given as, (23)zp is the standard normal deviate corresponding toprobability p.The COV[gi,gj] in equation 22 is the covariance betweenthe skew values at gages i and j, and is given by,(30)whereis the estimated provided by the regression (seeequation 8).The residual mean square-error (MSE) is computed as,(31)where,where(24)is estimated by (Martins and Stedinger, 2002), and(25)is equal to one if is positive and to minusone if is negative.The Var(gi) and Var(gj) in equation 24 are approximatedby (Griffis and Stedinger, 2009), (26)where,(27), and.(28)(29)Equation 19 is a simplified version of equation 22 thatassumes the skew is without error. Equation 19 is provided inthe WREG program to reproduce previous studies that do notuse equation 22.Performance MetricsThe WREG program reports multiple performancemetrics for multiple-linear regressions, depending upon theoptions (OLS, WLS, or GLS) selected. Specific metrics arereported either in the GUI or the output files of WREG.Model- and Time-Sampling ErrorsFor conventional OLS and RoI regressions, the residualerrors, ei, are computed aswhere ei is calculated from equation 30.The MSE metric does not distinguish the proportion oftotal error, εi that is composed of model error, δi, and timesampling error, ηi. WLS and GLS regression provide estimatesof the model error variance, , which is the same as the MSEonly if the time sampling error variance, , is equal to zero.For conventional regressions using WLS and GLS, WREGreports the average variance of prediction, AVP, as the performance metric (Tasker and Stedinger, 1986) and is given by,(32)where xp is a vector containing the values of theindependent variables of the pth gageaugmented by a value of one.When corresponds to the logarithm of the variable ofinterest, equation 32 can be reported as a percentage of the predicted value. When expressed in this way, the metric is knownas the average standard error of prediction, Sp (Aitchison andBrown, 1957, modified for use of common logarithms), and isgiven by.(33)The standard model error as a percentage of the observedvalue can be calculated by substitutingfor AVP in equation33. WREG program reports both Sp and the standard modelerror for WLS and GLS regressions.For RoI regression, a regression is developed for eachungaged basin of interest. An overall performance metricreported by WREG program for RoI regressions using OLS,WLS, and GLS is a root mean square error, RMSE(%). Thismetric is similar to, but not the same as the prediction errorsum of squares, PRESS, performance metric (for example,Montgomery and others, 2001). Every gage is treated in turnas an ungaged basin and a regression is developed for that site,and equations 30 and 31 are used to calculate a mean-squareerror value, MSERoI, that is used in place of AVP in equation33 to give a root mean square error of prediction expressed asa percentage of the observed value, RMSE(%), given by (Engand others, 2005; Eng and others, 2007a).(34)

6   User’s Guide to the Weighted-Multiple-Linear Regression Program (WREG, v. 1.0)Coefficient of Determination, R2, R2adj, and R2pseudoA metric reported by WREG for determining the proportion of the variation in the dependent variable explained by theindependent variables in OLS regressions, is the coefficient ofdetermination, R2, (Montgomery and others, 2001) given as,(35)regressions and RoI regressions. For non-RoI regressions,leverage, h, for the ith gage is given as,where L is equal to either LOLS, LWLS, LGLS, or LGLS,skew.The leverage metric for RoI regressions using eitherOLS, WLS, or GLS is given by (Eng and others, 2007b),where, and(36),(37)whereis the arithmetic mean of all values,SST is the total sum of squares that is equal to the sumof the amount of variability in the observations,andSSr is the residual sum of squares.SST and SSr values are provided as WREG output filesand can be used to calculate an adjusted coefficient of determination, Radj2, given as.(38)The adjusted Radj2 adjusts for the number of independent variables used in the regression.For WLS and GLS regressions, a more appropriate performance metric than R2 or Radj2 is the R2pseudo described by Griffisand Stedinger (2007). Unlike the R2 metric in equation 35 andRadj2 in equation 38, R2pseudo is based on the variability in thedependent variable explained by the regression, after removingthe effect of the tim

ordinary-least-squares (OLS), weighted-least-squares (WLS), and generalized-least-squares (GLS). All three approaches are based on the minimization of the sum of squares of differ-ences between the gage values and the line or surface defined by the regression. The OLS approach is

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. Crawford M., Marsh D. The driving force : food in human evolution and the future.

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. 3 Crawford M., Marsh D. The driving force : food in human evolution and the future.