7m ago

7 Views

0 Downloads

1.18 MB

23 Pages

Transcription

Introduction to the Design andAnalysis of ExperimentsViolet R. SyrotiukSchool of Computing, Informatics,and Decision Systems Engineering1

Complex Engineered Systems What makes an engineered system complex?– Its large size.– Humans control their structure, operation, andevolution over time. Examples of complex engineering networks:– The power grid.– Transportation networks.– Computer networks (e.g., the Internet, etc.).2

Experimentation Experimentation is a way to improve ourunderstanding of CESs. Experiments are used widely for, e.g.:– Process characterization and optimization.– Improve the reliability and performance ofproducts and processes.– Product/process design and development.– Achieve product and process robustness.3

“All experiments are designed experiments;some are poorly designed,some are well-designed.”George E. P. Box4

neering ExperimentsA General Model of a†Process/Systeme toop newrocessesformance ofessesability andof productsduct & processof materials,atives, setting& system“Design and Analysis of Experiments,” by Douglas C. Montgomery, Wiley, 8tc.†thedition, 2013.5

Four Eras in the History of DoE The agricultural origins, 1918-1940s:– R.A. Fisher and his co-workers.– Profound impact on agricultural science.– Factorial designs, ANOVA. The first industrial era, 1951- late 1970s:– Box and Wilson, response surfaces.– Applications in the chemical and processindustries.6

Four Eras in the History of DoE (cont’d) The second industrial era, late 1970s - 1990:– Quality improvement initiatives in manycompanies.– Taguchi and robust parameter design, processrobustness. The modern era, beginning circa 1990.7

Experimentation in CENs From a recent workshop report†:“The science of experiment design is widely used in science andengineering disciplines, but is often ignored in the study of complexengineered networks. This in turn has led to a shortage of simulationsthat we can believe in, of experiments driven by empirical data, andof results that are statistically illuminating and reproducible in thisfield.”† Networkingand Information Technology Research and Development (NITRD), Large Scale Networking (LSN), Workshop Reporton Complex Engineered Networks, September 2012.8

Factorial Designs In a factorial experiment,all possible combinations offactor levels are tested. The golf experiment:–––––Type of driver.Type of ball.Walking versus riding.Type of beverage.Time of round, etc.9

The Experimental Design An experiment is given by an N k array.– The k columns correspond to the factors. Each factor Fi, 1 i k has a set of levels Li. Each of the N rows corresponds to a test inwhich each factor Fi is set to a level in Li. For the two-factor factorial experiment:Ball Driver1BO2BR3TO4TR10

Factorial Designs with Several Factors11

A Fractional Factorial12

Statistical Rigour Understanding common statistical methods isinvaluable in being able to represent resultscoherently and accurately. When measuring a system that does not havefixed behaviour:– Perform multiple measurements (replicates).– Statistical error arises from variation that isuncontrolled; it is generally unavoidable.13

In general, if the mean and median are rather close,but the mode is vastly different (or there are two candidates for the mode), a bimodal or multi-modal distribution is suggested (see Figure 1b). As described above inSection 3.2.3, the standard deviation of a bimodal distribution can be quite large, which can serve as a check onthe assumption that a distribution is normal.It is important to note that these guidelines are notfool-proof; comparing the mean, median, and mode canonly suggest the type of distribution from which datawas collected. Unfortunately, there is no rule of thumbthat always works, and when in doubt, the best course ofsense of how the data is distributed and what theaction is to plot the data, look at it, andtry to determineMedian120expected behavior of the system will be.Meanwhat is happening.ModeIn general, if the mean and median are rather close, It is critical100 to select the appropriate metric of cenbut the mode is vastly different (or there are two canditrality in order to properly present data. “No mathematidates for the mode), a bimodal or multi-modal distribucal rule can80tell us which measure of central tendencytion is suggested (see Figure 1b). As described above willin be most appropriate for any particular problem.60Section 3.2.3, the standard deviation of a bimodal distriProper decisions rest upon knowledge of all factors in abution can be quite large, which can serve as a check on40 upon basic honesty” [Gould96].given case, andthe assumption that a distribution is normal.206.2 ExpressingVariationIt is important to note that these guidelines are notMeasures of 0centrality are not sufficient to completelyfool-proof; comparing the mean, median, and mode can510 to include15 a mea-20describe a data0set. It is oftenhelpfulonly suggest the type of distribution from which datasure of the variance ofA:theNormaldata. ADistributionsmall variance implieswas collected. Unfortunately, there is no rule of thumbthatthemeanisagoodrepresentativeof the ft the180Medianwhereas a large varianceit is a poor one. Inaction is to120plot the data, look at it, andtry to determineModeimplies thatMeanMeanMedianthe papers 150we surveyed, wefound that fewer than 15%what is happening.Modelose,100It is critical to select the appropriate metric of cenof experiments included some measure of variance.andi120 commonly used measure of variance istrality in order to properly present data. “No mathemati- The most80ribucal rule can tell us which measure of central tendencythe standard deviation, which is a measure of howve inwill be most appropriate for any particular problem.90widely spreadthe data points are. As a rule of thumb, in60istri-Proper decisions rest upon knowledge of all factors inaanormal distribution, about 2/3 of the data falls within60ck ongiven case, 40and upon basic honesty” [Gould96].one standard deviation of the mean (in either direction,on the horizontalaxis). 95% of the data falls within two3020e not6.2 ExpressingVariationstandard deviations of the mean, and three standard0e canMeasures of0centrality are not sufficient to completelydeviations accountfor more than 99% of the data.0510 15 20 25 30 35 40helpful510 to include15 a mea- 20For example,datadescribe a data0 set. It is oftenin Figure 1a, which follows a normaltheNormaldata. ADistributionsmall variance implieshumbsure of the variance ofA:distribution, the mean,B:median,mode are equal, ntativeof the data,se ofthe standarddeviationisapproximately40% of the250180it is a poor one. Inminewhereas a large variancemean. However,Modein Figure 1b, which shows a bimodalModeimplies thatMeanMedianthe papers we surveyed, wefound that fewer than 15%Mediandistribution,200the mean,Meanmedian, and mode are quite erent,andthestandarddeviation is 75% of the mean3.cenis1c shows an exponential distribution where the120 commonly used measure of variance Figuremati- The dian and mode are close, but rather different than theencyin90 the data points are. As a rule of thumb, mean.(We discuss techniques for determining the distriblem.widely spread100a normal distribution, about 2/3 of the data falls withinbutionof a data set in Section 6.3.)s in a60one standard deviation of the mean (in either direction,Mode100806040Sample Distributions20005101520A: Normal Distribution180ModeMeanMedian The relationship between the measurementsof centrality (mean, median, and mode) givehints about the distribution of the datacollected.15012090603000510152025303540B: Bimodal 60008000C: Exponential DistributionFigure 1. Sample distributions. The relationshipbetween the mean, median, and mode give hints about thedistribution of the data collected. In a normal distribution,the mean is representative of the data set, while in anexponential distribution, the mode and median are morerepresentative. In a bimodal distribution, no single14 metricaccurately describes the data.Another metric for analyzing the usefulness of themean in an experiment is the margin of error. The mar-

Expressing Variation Measures of centrality are not sufficient tocompletely describe a data set. It is often helpful to include a measure of thevariance of the data.– A small variance implies that the mean is a goodrepresentative of the data, whereas a largevariance implies that it is a poor one. The most commonly used measure of varianceis the standard deviation.15

Margin of Error Another metric for analyzing the usefulness ofthe mean is the margin of error.– The margin of error expresses a range of valuesabout the mean in which there is a high level ofconfidence that the true value falls.16

60024681012Number of packetsA: Latency Improvement without error margins140Latency (ms)Latency (ms)Graphing and Error Margins100806002468Number of packets1012B: Latency Improvement with small error margins140Latency (ms)Latency (ms)Graph1Graph2120 The value of the error margins depict theresults in completely different ways.†Latency (ms)ing.05erevedom-reduced latency by 10%. However, this graph does notinclude any indication of the margin of error, or confidence intervals on the data. If the margin of error issmall, as in Figure 2b, it is reasonable to believe thatlatency has been reduced. Figure 2c, however, shows amargin of error that is as large as the stated improvement. The 10% reduction in latency falls within theerror bars, and might have arisen from experimentalthe margin of error would be four percent. Assumingthat this margin of error had been computed for a error.0.05Graph1140Graph2level of significance, then if the experiment wereIt is very useful to be able to place resultsin therepeated 100 times, 95 of those times the observedcontext of an error120 margin, and it is essential to be ablelatency would be within four percent of the value comto do so when trying to determine the value of a newputed in the corresponding experiment.100technique.Figure 2 is an example of the importance of show-A related problem, which appears when measure80ing the margin of error. In our example, Figure 2a ismentsput are taken, ismistaking measurement precision forforward to support a claim that a new techniquemeasurementhasaccuracy. For example, on many versions60reduced latency by 10%. However, this graph does notof Unix, gettimeofday() returns the current time ininclude any indication of the margin of error, or confimicroseconds (its precision),butevery12024 is only6 updated810dence intervals on the data. If the margin of error isNumberofpackets(its accuracy). Timing measurementssmall, as in Figure 2b, it is reasonable to believetenthatmillisecondsA: Latency Improvement without error marginstakenusinggettimeofday()on these systems will belatency has been reduced. Figure 2c, however, shows arounded up (or down) to nearest multiple of ten millisecmargin of error that is as large as the stated improveGraph1140such as these, it is critical tobe awarement. The 14010% reduction in latency falls Graph1withinonds.the In entis,butalsohowerror bars, and might have arisen from experimental120accurate. On a system with a 10ms clock granularity, iterror.120a waste of time to attempt to make distinctions at theIt is very useful to be able to place results inis the100context of anerrormargin,anditisessentialtobeablemicrosecond level.100Graph1Graph2120100Latency (ms)Latency (ms)to do so when trying to determine the value of a new80806.3 Probability Distributions and Testingtechnique. 80A related problem, which appears when measureAs stated above,60normal distributions are commonly60ments are taken,for in nature, but rarely found in computer science.found60 is mistaking measurement precisionmeasurement accuracy. For example, on many versions024 systems,68 is 10more12When measuring experimentalone024681012of packetsof Unix, gettimeofday()returnsin tolikelyof distributions.Unfortu0246the current8 time1012encounter other types NumberNumber of packetsmicroseconds (its precision),Numberbut is onlyupdated everyof packetsnately, it is not a trivial task to correctly identify whichB: Latencywith smallerrormarginsA: Latencywithouterror distributionmarginsten milliseconds(its Improvementaccuracy). TimingmeasurementsC: Latency Improvement with large error marginsbestmodelsImprovementa given a dataset.Froma statistaken using gettimeofday() on these systems willbeFigure 2. Graphing and Error Margins. The value oftical point of view, an estimate of the mean and standardrounded up (or† down) to nearest multiple of ten millisecthe error margins will depict the results in surebutGhosh,H. Saleeb,Selzter, rvardGraph1140 C. Small,differentways. Computer Science Group,onds. In situationssuch asN.these,it is criticaltobeM.awareGraph1without knowing the actual distribution, it is Graph2impossibleGraph2TechnicalReportTR-16-97.not only of how precise a measurement is, but also howto calculate the truemean and standard deviation. Fortu120ural sciences, and often represent the characteristics of120accurate. Ona system with a 10ms clock granularity, itnately,therearesimplemethods for determining the disrepeated samples of a homogenous population. As hetribution of a dataset.100100microsecondlevel.tioned in Section 6.1, skewed distributions often occurbleowputhasnotnfir ishatws avethentalPlotting a histogram of the values in a sampled dataset is easy way to80get an idea of what type of distributionAs stated above, normal distributions are commonlythe data follows. Figure 1 shows examples of severalure- found in nature,60common distributionswith noticeably different shapes.60 but rarely found in computer science.for When measuring experimental systems, one is moreNormal distributions0 (Figure1a)4 are commonin thenat-1226810ons likely to encounter0 other2 types4of distributions.68 Unfortu1012new 6.3 Probability Distributions and Testing80when some phenomenon limits either the high or lowvalues in a distribution. Personal income17is an exampleof a skewed distribution. An exponential distribution(Figure 1c) might be seen when modeling a continuousmemoryless system, such as inter-arrival time of net-

Probability Distributions & Testing Plotting a histogram of the values in asampled data set is easy way to get an idea ofwhat type of distribution the data follows. The Χ2 test can be used to determine ifsampled data follows a specific distribution.– Χ2 can be used to obtain a p-value from a family ofΧ2 distributions; the larger the p-value, the higherthe probability that the measured distributionmatches the candidate distribution.18

Basic Statistical Concepts Hypothesis testing: A statement either aboutthe parameters of a probability distribution orthe parameters of a model.H0: μ1 μ2H1: μ1 μ2(null hypothesis)(alternative hypothesis) If the null hypothesis is rejected when it istrue, a type I error has occurred. If the null hypothesis is not rejected when itis false, a type II error has been made.19

Analysis of Variance (ANOVA) Analysis of the fixed effects model.– Estimation of model parameters. Model adequacy checking.– The normality assumption.– Residuals. Plots in time, versus fitted values,versus other variables. Practical interpretation of results.20

Response Surface MethodologyFramework Factor screening. Finding the regionof the optimum. Modelling andoptimization ofthe response.21

Other Aspects of RSM Robust parameter design and processrobustness studies.– Find levels of controllable variables that optimizemean response and minimize variability in theresponse transmitted from “noise” variables.– Original approaches due to Taguchi (1980s).– Modern approach based on RSM.22

Summary There is much known about designing andanalyzing experiments!– Follow good practices, to improve repeatabilityand reproducibility of your experiments.23

† “Design and Analysis of Experiments,” by Douglas C. Montgomery, Wiley, 8th edition, 2013. 5 Four Eras in the History of DoE The agricultural origins, 1918-1940s: – R.A. Fisher and his c