Stat 342 - Wk 13: Plotting And Final Exam Prep. - SFU

1y ago
17 Views
3 Downloads
1.20 MB
58 Pages
Last View : 2d ago
Last Download : 3m ago
Upload by : Kairi Hasson
Transcription

Stat 342 - Wk 13: Plotting and final exam prep.Unfinished From last week:Binary responses (proc logistic)Ordinal and multinomial responses (proc logistic)New this week:ODS Graphics vs SAS/GraphGraphing Examples (KDE, LOESS, Interaction, ROC)Exam TopicsTA EvaluationStat 342 Notes. Week 12 Page 1 / 58

The basic syntax of PROC LOGISTIC follows the samepatterns of the GLM and GLMSELECT procedures.However, random effects don't work with the LOGISTIC proc.proc logistic data copenhagen;class categorical predictors ;model response explanatory / options ;freq varname ;output out dataset var newname ;run;Stat 342 Notes. Week 12 Page 2 / 58

For the predicting the level of neighbour contact.proc logistic data copenhagen;class housing influence satisfaction;freq n;model contact housing influencesatisfaction;run;Stat 342 Notes. Week 12 Page 3 / 58

Here, everything is done predicting the chance of 'high'.Stat 342 Notes. Week 12 Page 4 / 58

Here, everything is done predicting the chance of 'high'. which was decided by SAS, and it may not have been thecategory that we wanted to have as our 'yes' category.To change this, define the 'yes' category that you want withthe event option in the model statement.model contact(event influence satisfaction;Stat 342 Notes. Week 12 Page 5 / 58'low') housing

Significance levels are the same, estimates are 'reversed'.Stat 342 Notes. Week 12 Page 6 / 58

Don't confuse logistic models with logical models.Stat 342 Notes. Week 12 Page 7 / 58

Model selection is done with the SELECTION option in theMODEL statement after the slash.The DETAILS option tells SAS to report on the entire modelselection process, not just the end result.proc logistic data copenhagen;class housing influence satisfaction;freq n;model contact housing influencesatisfactiondetails;run;Stat 342 Notes. Week 12 Page 8 / 58/ selection stepwise

The ODDSRATIO option shows the odds-ratio (andconfidence interval of the odds-ratio) of each categorycompared to the baseline for your selected variable.proc logistic data copenhagen;class housing influence satisfaction;freq n;model contact housing influencesatisfaction;oddsratio housing;run;Stat 342 Notes. Week 12 Page 9 / 58

Stat 342 Notes. Week 12 Page 10 / 58

There is more than one link function, which is the functionused to convert probability, which is bounded by [0,1], intosomething that is unbounded.This is important because logistic regression is doingsomething very similar to linear regression at its basicmechanic, and linear regression depends on the responsevariable to be some continuous variable which can,theoretically, take any value.Stat 342 Notes. Week 12 Page 11 / 58

The default link function is the 'logit' link, which is the onewe use to put log-odds in the place of probability.One common alternative is the 'probit' link, which uses theCDF of the normal distribution instead. The theory behindwhy one link is selected over any other link is graduate-leveltheory (in Generalized Linear Models), so for now Irecommend using the default 'logit' most of the time.Stat 342 Notes. Week 12 Page 12 / 58

If there are numerical issues (e.g. failure to converge,nonsense summary data) with the logit link, you can treatprobit as an 'alternate mode' of logistic regression, whichmay have better luck.proc logistic data copenhagen;class housing influence satisfaction;freq n;model contact housing influencesatisfactionrun;Stat 342 Notes. Week 12 Page 13 / 58/ link probit;

Why deal with link function at all? Because there's anotherprocedure called proc probit, which is like proc logistic, but isolder with fewer features.To avoid having to learn an outdated proc, if you ever haveto use a probit link instead of a logit link, then just use theLINK option in the MODEL statement of PROC LOGISTIC.Stat 342 Notes. Week 12 Page 14 / 58

We can also get additional summary data, such asNaglekirke's R-squared (a logistic version of the regular rsquared), and the confidence limits of the odds ratios withthe rsquared and clodds options, respectively.proc logistic data copenhagen;class housing influence satisfaction;freq n;model contact housing influencesatisfactionrun;Stat 342 Notes. Week 12 Page 15 / 58/ rsquare clodds wald;

Stat 342 Notes. Week 12 Page 16 / 58

But what if there's more than 2 levels?Stat 342 Notes. Week 12 Page 17 / 58

If you are trying to make predictions about a categoricalresponse with more than two levels, there's one thing youhave to ask before going any further.Do the categories I wish to predict form a natural ordering,(e.g. None, Low, Medium, High, Extreme), or,are they just nominal , unordered categories(e.g. Cat, Dog, Dragon, Capybara)?Stat 342 Notes. Week 12 Page 18 / 58

If the data is ordered, you can use proc logistic to conduct anORDINAL LOGISTIC REGRESSION.Just code you categories into integers {1,2,.,k} and usethose coded categories as your response.Data copenhagen;set copenhagen;sat lvl 1;if satisfaction 'medium' then sat lvl 2;if satisfaction 'high' then sat lvl 3;run;Stat 342 Notes. Week 12 Page 19 / 58

The logistic procedure will understand that each integervalue is an ordered category.proc logistic data copenhagen;class housing influencecontact;freq n;model sat lvl housinginfluence contact / rsquare;oddsratio housing;run;Stat 342 Notes. Week 12 Page 20 / 58

With ordinal responses, all the effect sizes refer to the logodds of any given response being in the 'next category up'.Each response category after the first has its own intercept.Stat 342 Notes. Week 12 Page 21 / 58

If there is no natual ordering to the categories, you can usethe generalized logit to do a logistic regression on severalcategorical responses together.To do this, use the link option and set it to 'glogit'proc logistic data copenhagen;class contact influence satisfaction;freq n;model housing contact influencesatisfaction /run;Stat 342 Notes. Week 12 Page 22 / 58link glogit;

There results show the effect of each variable on the logodds of any observation having the list response (comparedto the 'baseline' response)Stat 342 Notes. Week 12 Page 23 / 58

ODS Graphics: More entertaining thanNickelbackStat 342 Notes. Week 12 Page 24 / 58

ODS vs SAS/GraphLike other statistical software (R, SPSS, JMP), SAS has a baseand packages that are installed on top of it.SAS University Edition is only comes with some of thesepackages, like SAS/IML, which make it great forprogramming and research. However, it's missingSAS/Graph, which means that many of the plotting functionsin our textbook aren't available.Stat 342 Notes. Week 12 Page 25 / 58

For our purposes, there are two graphical system: ODS(Open Document System) and SAS/Graph.In ODS:proc gplot, proc sgplot, proc sgscatter, proc sgpanel,proc kde,In SAS/Graph:proc gcontour, proc gchart, proc g3d, proc gmap,Stat 342 Notes. Week 12 Page 26 / 58

KDE stands for Kernel Density Estimation. It's used to make asmooth estimation of the probability density of adistribution from the points in a data set.The most commonly used method of KDE is to make anormal curve centred at each data point and add up thedensities.The densities are divided by the sample size N, so that theyprobability density still integrates to 1. By default, thestandard deviations are the same around each point.Stat 342 Notes. Week 12 Page 27 / 58

KDE is a quick way to compare the distribution of differentcontinuous variables.proc kde data ds;univar x y / plots densityoverlay;run;Stat 342 Notes. Week 12 Page 28 / 58

KDE is like the continuous version of a histogram.proc kde data ds;univar x y / plots run;Stat 342 Notes. Week 12 Page 29 / 58histdensity;

There a couple viable options for the plots you can producewith the Kernel Density Estimation procedure, but it'ssimplest to print them all if you don't mind the output.This plots all option works for MANY procedures that useODS graphics.proc kde data ds;univar x y /run;Stat 342 Notes. Week 12 Page 30 / 58plots all;

In a histogram, we would select (manually or automatically)'bin widths' for each bar. In KDE, we select the standarddeviation around each point (or more generally, the'bandwidth' in case we're using a Kernel other than Normal)The 'bwm' option changes the default, automaticallygenerated, bandwidth with a 'BandWidth Multiplier'. Alower bandwidth creates a curve that fits the data moreexactly, and a higher bandwidth creates more smoothing.Stat 342 Notes. Week 12 Page 31 / 58

proc kde data ds;univar x (bwd 0.5) x (bwd 1) x (bwd 2)/ plots densityoverlay;run;Stat 342 Notes. Week 12 Page 32 / 58

For bivariate plots (contours, 3D surface), use a BIVAR statementproc kde data ds;bivar x y / plots all;run;Stat 342 Notes. Week 12 Page 33 / 58

Similar to the contour plot is the heatmap, which is adiscrete version of the contour plot. This is especiallyappropriate when one of the variables is categorical/ordinal,or when both variables are whole numbers only.proc sgplot data ds;heatmap y z x x;run;Stat 342 Notes. Week 12 Page 34 / 58

The colour scheme can be changed, but the default is blueto red.Stat 342 Notes. Week 12 Page 35 / 58

Heatmaps are meant for hundreds or thousands of datapoints, not n 30.Stat 342 Notes. Week 12 Page 36 / 58

SAS doesn't have the best reputation forgraphics.Stat 342 Notes. Week 12 Page 37 / 58

LOESS CurvesWith scatterplots and regression, we have learned to makelines and curves of best fit by specifying a model in advanceand assessing it.Sometimes it would be easier to see something moreexploratory first by letting SAS draw the pattern betweentwo variables first, before we specify what kind of model wewant.Stat 342 Notes. Week 12 Page 38 / 58

LOESS, or LOcal regrESSion (also called LOWESS: LOcallyWEighted Scatterplot Smoothing) is a system that allows youdo that.Like KDE, it smooths out a function at each observationpoint, and takes the average. Also like KDE, it relies on abandwidth setting (called the smoothing parameter in loess),which is usually determined automatically.Unlike KDE, the variable being averaged is the value of Y ateach level of X, rather than simply the number of points at X.Stat 342 Notes. Week 12 Page 39 / 58

Try these.proc sgplot data ds;loess y z x x;run;proc sgplot data ds;loess y y x x;run;Stat 342 Notes. Week 12 Page 40 / 58

Stat 342 Notes. Week 12 Page 41 / 58

Now try with the smoothing parameter.proc sgplot data ds;loess y y x x / smooth 0.5;run;proc sgplot data ds;loess y y x x / smooth 0.2;run;Stat 342 Notes. Week 12 Page 42 / 58

Stat 342 Notes. Week 12 Page 43 / 58

Interaction PlotOne important graphical option that was overlooked whenwe were doing regression was the interaction plot.An interaction plot gives the mean response (y-axis) fordifferent combinations of two explanatory variables (x-axisand colour).Stat 342 Notes. Week 12 Page 44 / 58

If the lines are close to parallel, the effects of the twoexplanatory variables are additive, and there is no evidenceof an interaction.If the lines have different slopes, especially if they cross,then an interaction term may be warranted.When there many categories in one or both explanatoryvariables, some crossing over will happen by randomchance. This doesn't necessarily mean that an interactionterm will improve the model.Stat 342 Notes. Week 12 Page 45 / 58

proc glm data ds;class block block2;model z block block2;run;Stat 342 Notes. Week 12 Page 46 / 58

At least one of the explanatory variables (the colour one)should be categorical, or else you will end up with a differentcolour for each observed value for the continuous variable.proc glm data dsmodel z plots all;x y;run;proc glm data ds plots all;class block;model z x block;run;Stat 342 Notes. Week 12 Page 47 / 58

Receiver-Operator CurvesAnother option for logistic regression is the receiveroperator character curve (ROC curve).These are popular graphs in medical statistics because theycan be used to determine the best cutoff to detect somebinary response (such as disease status).Stat 342 Notes. Week 12 Page 48 / 58

proc logistic data dsplots(only) roc;model block2 x y;run;Stat 342 Notes. Week 12 Page 49 / 58

For reference, the best cutoffs are far above the diagonalline. Any cutoff point touching the diagonal line is literally asgood as guessing.Stat 342 Notes. Week 12 Page 50 / 58

ROC curves are also given a measurement of the generalquality of the data used to make the logistic regression: thisis called the Area Under Curve (AUC).Higher AUC is better.An AUC of 0.5 is as good as the null model (no predictors),An AUC of 1 is perfect – the explanatory variables canperfectly predict the response.Stat 342 Notes. Week 12 Page 51 / 58

For future reference:Basic ODS Graphics Examples (214 page PDF textbook) byWarren F. grstat/9.4/en/PDF/odsbasicg.pdfBasic ODS Graphics Examples (250 page PDF textbook) alsoby Warren F. grstat/9.4/en/PDF/odsadvg.pdfStat 342 Notes. Week 12 Page 52 / 58

Some parting comments: Certification ExamsThe stats department will pay for any of its own students toget their first PASS in a SAS certification exam. As in, thechair will reimburse the 90 exam fee to anyone in thedepartment that presents proof of a passing mark.If you're planning on taking this, I highly recommend startingwith the base programmer certification to get more of thefundamentals that were overlooked in this class.Stat 342 Notes. Week 12 Page 53 / 58

The textbooks for Base Programmer and AdvancedProgrammer are available through the SFU library bysearching or following this link:http://search.lib.sfu.ca/?q sas%20certification%20prep%20guideAfter base programmer, 'Statistical Business Analyst' is agood next step, because it matches up nicely with the skillsyou've learned in this course and other applied statistics /regression / linear modelling courses in the department.Stat 342 Notes. Week 12 Page 54 / 58

Passing any one of these exams also puts your name in adatabase that SAS uses for hiring directly (e.g. in theirCanada HQ in Toronto, or their world HQ campus at Cary,North Carolina) or for their many clients.It also gives you an Acclaim badge that connects directly toyour Linkedin profile (I think).Stat 342 Notes. Week 12 Page 55 / 58

Some parting comments: Stat 342 Exam and Practice ExamThe final exam is Dec 11, 3:30pm to 6:30pm.Yes, that means it's a Sunday.The exam is at WMC (West Mall Centre), 3260.The room is in the general area of Tim Horton's (I think), butit's a bit hard to find because the door number isn't highlyvisible. Either come early or make sure you know where it isin advance.Stat 342 Notes. Week 12 Page 56 / 58

The final exam will be structured a lot like the midtermexam, and will be slightly harder. This just means I won't begiving as many free marks for things like including 'run' atthe end of every proc.You will be allowed to bring a cheat sheet like last time, andthis time you can use both sides of the paper.About 80% of the exam will be on the material AFTERmidterm 1.Stat 342 Notes. Week 12 Page 57 / 58

Topics include:- Loading and saving data with proc import, export.- Loading data with the datalines command in a data step.- Making new variables with the data step.- Operations in PROC IML- Continuous data with PROC UNIVARIATE, PROC MEANS- Anova and Regression models with PROC GLM- Model selection with PROC GLMSELECT- Logistic regression models with PROC Logistic- Basics of plottingStat 342 Notes. Week 12 Page 58 / 58

proc gplot, proc sgplot, proc sgscatter, proc sgpanel, . In SAS/Graph: proc gcontour, proc gchart, proc g3d, proc gmap, Stat 342 Notes. Week 12 Page 26 / 58. KDE stands for Kernel Density Estimation. It's used to make a smooth estimation of the probability density of a distribution from the points in a data set.

Related Documents:

STAT 810: Alpha Seminar STAT 822: Statistical Methods ll STAT 821: Statistical Methods l STAT 883: Mathematical Statistics ll STAT 850: Computing Tools Elective STAT 882: Mathematical Statistics l Choose a faculty advisor and form a MS Supervisory Committee STAT 892*: TA Prep Choose an MS Comprehensive Exam option with the

2. 3 practice plotting exercises. 3. 2 plotting exercises on a radar simulator by transfer plotting to a radar transfer-plotting sheet (90% on each required to pass). You will plot targets graphically (e.g. rapid-radar plotting technique) to correctly derive solutions and determine own ship’s position while underway. Course Instructions 1.

MET Grid-Stat Tool John Halley Gotway METplus Tutorial July 31 -August 2, 2019 NRL-Monterey, CA. 2 PB2NC ASCII2NC Gridded NetCDF Gridded Forecast Analysis Obs PrepBufr Point STAT ASCII NetCDF Point Obs ASCII . l Grid-Stat, Point-Stat, and Stat-Analysiscan output the ECLV line type.

astm a266/a266m-13 342, 345 astm a333/a333m-13 246, 249 astm a334/a334m-04a (2010) 246, 249 astm a485-14 628 astm a508/a508m-14 343, 346 astm a537/a537m-13 115, 125 astm a541/a541m-05 (2015) 342, 345 1a asme sa-508/sa-508m 342, 345 asme sa-541/sa-541m 342, 345 astm a508/a508m-14 343, 346 astm a541/a541m-05 (2015) 342, 345 1cr12 gb 1220-92 519, 522 gb 3280-92 438, 440 gb 4226-84 519 gb 4237-92 .

1 Art: 765874-00 Rev. A Rev. Date: 26-Feb-2020 i-STAT CHEM8 Cartridge Intended for US only. NAME i-STAT CHEM8 Cartridge INTENDED USE The i-STAT CHEM8 cartridge with the i-STAT 1 System is intended for use in the in vitro quantification of sodium, potassium, chloride, ionized calcium, glucose, blood urea nitrogen, creatinine, hematocrit, and total

MedEvac, support this project, and mentor and support me through the project in the midst of a pandemic. 1 1.0 Introduction 1.1 STAT MedEvac Background STAT MedEvac (STAT) is a large air medical service provider based at Allegheny County Airport in West Mifflin, Pennsylvania. STAT operates 18 helicopters, each at its own base, and 4

12430 55th st n oak park heights, mn 55082 p - 651-342-1756 f - 651-342-1293 info@diacro.com

Reading music from scratch; Easy, effective finger exercises which require minimal reading ability; Important musical symbols; Your first tunes; Audio links for all tunes and exercises; Key signatures and transposition; Pre scale exercises; Major and minor scales in keyboard and notation view; Chord construction; Chord fingering; Chord charts in keyboard view; Arpeggios in keyboard and .