The R Book

3y ago
46 Views
4 Downloads
9.75 MB
1.1K Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Kian Swinton
Transcription

The R Book

The R BookSecond EditionMichael J. CrawleyImperial College London at Silwood Park, /index.htmA John Wiley & Sons, Ltd., Publication

This edition first published 2013 C 2013 John Wiley & Sons, LtdRegistered officeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, United KingdomFor details of our global editorial offices, for customer services and for information about how to apply for permission to reuse thecopyright material in this book please see our website at www.wiley.com.The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs andPatents Act 1988.All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by anymeans, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs andPatents Act 1988, without the prior permission of the publisher.Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available inelectronic books.Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product namesused in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher isnot associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritativeinformation in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in renderingprofessional services. If professional advice or other expert assistance is required, the services of a competent professional shouldbe sought.Library of Congress Cataloging-in-Publication DataCrawley, Michael J.The R book / Michael J. Crawley. – 2e.pages cmIncludes bibliographical references and index.ISBN 978-0-470-97392-9 (hardback)1. R (Computer program language) 2. Mathematical statistics–Data processing. I. Title.QA276.45.R3C73 2013519.50285 5133–dc232012027339A catalogue record for this book is available from the British Library.ISBN: 978-0-470-97392-9Set in 10/12pt Times by Aptara Inc., New Delhi, India.

ChaptersPrefacexxiii1Getting Started12Essentials of the R Language3Data matics2588Classical Tests3449Statistical Modelling38810Regression44911Analysis of Variance49812Analysis of Covariance53713Generalized Linear Models55714Count Data57915Count Data in Tables59916Proportion Data62817Binary Response Variables65018Generalized Additive Models66619Mixed-Effects Models68120Non-Linear Regression71521Meta-Analysis74022Bayesian Statistics75212

viCHAPTERS23Tree Models76824Time Series Analysis78525Multivariate Statistics80926Spatial Statistics82527Survival Analysis86928Simulation Models89329Changing the Look of Graphics907References and Further Reading971Index977

Detailed ContentsPreface12xxiiiGetting Started1.1 How to use this book1.1.1 Beginner in both computing and statistics1.1.2 Student needing help with project work1.1.3 Done some R and some statistics, but keen to learn more of both1.1.4 Done regression and ANOVA, but want to learn more advanced statisticalmodelling1.1.5 Experienced in statistics, but a beginner in R1.1.6 Experienced in computing, but a beginner in R1.1.7 Familiar with statistics and computing, but need a friendly reference manual1.2 Installing R1.3 Running R1.4 The Comprehensive R Archive Network1.4.1 Manuals1.4.2 Frequently asked questions1.4.3 Contributed documentation1.5 Getting help in R1.5.1 Worked examples of functions1.5.2 Demonstrations of R functions1.6 Packages in R1.6.1 Contents of packages1.6.2 Installing packages1.7 Command line versus scripts1.8 Data editor1.9 Changing the look of the R screen1.10 Good housekeeping1.11 Linking to other computer languages11122222333455566778899101011Essentials of the R Language2.1 Calculations2.1.1 Complex numbers in R2.1.2 Rounding2.1.3 Arithmetic2.1.4 Modulo and integer quotients121313141617

viiiDETAILED CONTENTS2.22.32.42.52.62.72.82.92.102.1.5 Variable names and assignment2.1.6 Operators2.1.7 Integers2.1.8 FactorsLogical operations2.2.1 TRUE and T with FALSE and F2.2.2Testing for equality with real numbers2.2.3Equality of floating point numbers using all.equal2.2.4 Summarizing differences between objects using all.equal2.2.5 Evaluation of combinations of TRUE and FALSE2.2.6 Logical arithmeticGenerating sequences2.3.1 Generating repeats2.3.2 Generating factor levelsMembership: Testing and coercing in RMissing values, infinity and things that are not numbers2.5.1 Missing values: NAVectors and subscripts2.6.1 Extracting elements of a vector using subscripts2.6.2 Classes of vector2.6.3 Naming elements within vectors2.6.4 Working with logical subscriptsVector functions2.7.1 Obtaining tables of means using tapply2.7.2 The aggregate function for grouped summary statistics2.7.3 Parallel minima and maxima: pmin and pmax2.7.4Summary information from vectors by groups2.7.5Addresses within vectors2.7.6Finding closest values2.7.7Sorting, ranking and ordering2.7.8Understanding the difference between unique and duplicated2.7.9Looking for runs of numbers within vectors2.7.10 Sets: union, intersect and setdiffMatrices and arrays2.8.1Matrices2.8.2Naming the rows and columns of matrices2.8.3Calculations on rows or columns of the matrix2.8.4Adding rows and columns to the matrix2.8.5The sweep function2.8.6 Applying functions with apply, sapply and lapply2.8.7Using the max.col function2.8.8Restructuring a multi-dimensional array using apermRandom numbers, sampling and shuffling2.9.1The sample functionLoops and repeats2.10.1 Creating the binary representation of a number2.10.2 Loop 374

DETAILED CONTENTS2.112.122.132.142.152.10.3The slowness of loops2.10.4Do not ‘grow’ data sets by concatenation or recursive function calls2.10.5Loops for producing time seriesLists2.11.1Lists and lapply2.11.2Manipulating and saving listsText, character strings and pattern matching2.12.1Pasting character strings together2.12.2Extracting parts of strings2.12.3Counting things within strings2.12.4Upper- and lower-case text2.12.5The match function and relational databases2.12.6Pattern matching2.12.7Dot . as the ‘anything’ character2.12.8Substituting text within character strings2.12.9Locations of a pattern within a vector using regexpr2.12.10 Using %in% and which2.12.11 More on pattern matching2.12.12 Perl regular expressions2.12.13 Stripping patterned text out of complex stringsDates and times in R2.13.1Reading time data from files2.13.2The strptime function2.13.3The difftime function2.13.4Calculations with dates and times2.13.5The difftime and as.difftime functions2.13.6Generating sequences of dates2.13.7Calculating time differences between the rows of a dataframe2.13.8Regression using dates and times2.13.9Summary of dates and times in REnvironments2.14.1Using with rather than attach2.14.2Using attach in this bookWriting R functions2.15.1Arithmetic mean of a single sample2.15.2Median of a single sample2.15.3Geometric mean2.15.4Harmonic mean2.15.5Variance2.15.6Degrees of freedom2.15.7Variance ratio test2.15.8Using variance2.15.9Deparsing: A graphics function for error bars2.15.10 The switch function2.15.11 The evaluation environment of a function2.15.12 Scope2.15.13 Optional 5116118119119120121123125126126126

xDETAILED CONTENTS2.15.14 Variable numbers of arguments (.)2.15.15 Returning values from a function2.15.16 Anonymous functions2.15.17 Flexible handling of arguments to functions2.15.18 Structure of an object: str2.16 Writing from R to file2.16.1Saving your work2.16.2Saving history2.16.3Saving graphics2.16.4Saving data produced within R to disc2.16.5Pasting into an Excel spreadsheet2.16.6Writing an Excel readable file from R2.17 Programming tips1271281291291301331331331341341351351353Data Input3.1 Data input from the keyboard3.2 Data input from files3.2.1The working directory3.2.2Data input using read.table3.2.3Common errors when using read.table3.2.4Separators and decimal points3.2.5Data input directly from the web3.3 Input from files using scan3.3.1Reading a dataframe with scan3.3.2Input from more complex file structures using scan3.4 Reading data from a file using readLines3.4.1Input a dataframe using readLines3.4.2Reading non-standard files using readLines3.5 Warnings when you attach the dataframe3.6 Masking3.7 Input and output formats3.8 Checking files from the command line3.9 Reading dates and times from files3.10 Built-in data files3.11 File paths3.12 Connections3.13 Reading data from an external database3.13.1Creating the DSN for your computer3.13.2Setting up R to read from the 1491501501511511521521531541551554Dataframes4.1 Subscripts and indices4.2 Selecting rows from the dataframe at random4.3 Sorting dataframes4.4 Using logical conditions to select rows from the dataframe4.5 Omitting rows containing missing values, NA4.5.1Replacing NAs with zeros4.6 Using order and !duplicated to eliminate pseudoreplication159164165166169172174174

DETAILED CONTENTS4.74.84.94.104.114.124.134.144.155Complex ordering with mixed directionsA dataframe with row names instead of row numbersCreating a dataframe from another kind of objectEliminating duplicate rows from a dataframeDates in dataframesUsing the match function in dataframesMerging two dataframesAdding margins to a dataframeSummarizing the contents of dataframesGraphics5.1 Plots with two variables5.2 Plotting with two continuous explanatory variables: Scatterplots5.2.1 Plotting symbols: pch5.2.2 Colour for symbols in plots5.2.3 Adding text to scatterplots5.2.4 Identifying individuals in scatterplots5.2.5 Using a third variable to label a scatterplot5.2.6 Joining the dots5.2.7 Plotting stepped lines5.3 Adding other shapes to a plot5.3.1 Placing items on a plot with the cursor, using the locator function5.3.2Drawing more complex shapes with polygon5.4 Drawing mathematical functions5.4.1Adding smooth parametric curves to a scatterplot5.4.2Fitting non-parametric curves through a scatterplot5.5 Shape and size of the graphics window5.6 Plotting with a categorical explanatory variable5.6.1Boxplots with notches to indicate significant differences5.6.2Barplots with error bars5.6.3Plots for multiple comparisons5.6.4Using colour palettes with categorical explanatory variables5.7 Plots for single samples5.7.1Histograms and bar charts5.7.2Histograms5.7.3Histograms of integers5.7.4Overlaying histograms with smooth density functions5.7.5Density estimation for continuous variables5.7.6Index plots5.7.7Time series plots5.7.8Pie charts5.7.9The stripchart function5.7.10 A plot to test for normality5.8 Plots with multiple variables5.8.1The pairs function5.8.2The coplot function5.8.3Interaction 220220221224225226227228230231232234234236237

xiiDETAILED CONTENTS5.9Special plots5.9.1 Design plots5.9.2 Bubble plots5.9.3 Plots with many identical values5.10 Saving graphics to file5.11 Summary2382382392402422426Tables6.1 Tables of counts6.2 Summary tables6.3 Expanding a table into a dataframe6.4 Converting from a dataframe to a table6.5 Calculating tables of proportions with prop.table6.6 The scale function6.7 The expand.grid function6.8 The model.matrix function6.9 Comparing table and 7.1 Mathematical functions7.1.1Logarithmic functions7.1.2Trigonometric functions7.1.3Power laws7.1.4Polynomial functions7.1.5Gamma function7.1.6Asymptotic functions7.1.7Parameter estimation in asymptotic functions7.1.8Sigmoid (S-shaped) functions7.1.9Biexponential model7.1.10 Transformations of the response and explanatory variables7.2 Probability functions7.3 Continuous probability distributions7.3.1Normal distribution7.3.2The central limit theorem7.3.3Maximum likelihood with the normal distribution7.3.4Generating random numbers with exact mean and standard deviation7.3.5Comparing data with a normal distribution7.3.6Other distributions used in hypothesis testing7.3.7The chi-squared distribution7.3.8Fisher’s F distribution7.3.9Student’s t distribution7.3.10 The gamma distribution7.3.11 The exponential distribution7.3.12 The beta distribution7.3.13 The Cauchy distribution7.3.14 The lognormal distribution7.3.15 The logistic distribution7.3.16 The log-logistic 01

DETAILED CONTENTS7.47.57.67.787.3.17 The Weibull distribution7.3.18 Multivariate normal distribution7.3.19 The uniform distribution7.3.20 Plotting empirical cumulative distribution functionsDiscrete probability distributions7.4.1 The Bernoulli distribution7.4.2 The binomial distribution7.4.3 The geometric distribution7.4.4 The hypergeometric distribution7.4.5 The multinomial distribution7.4.6 The Poisson distribution7.4.7 The negative binomial distribution7.4.8 The Wilcoxon rank-sum statisticMatrix algebra7.5.1 Matrix multiplication7.5.2 Diagonals of matrices7.5.3 Determinant7.5.4 Inverse of a matrix7.5.5 Eigenvalues and eigenvectors7.5.6 Matrices in statistical models7.5.7 Statistical models in matrix notationSolving systems of linear equations using matricesCalculus7.7.1 Derivatives7.7.2 Integrals7.7.3 Differential equationsClassical Tests8.1 Single samples8.1.1 Data summary8.1.2 Plots for testing normality8.1.3 Testing for normality8.1.4 An example of single-sample data8.2 Bootstrap in hypothesis testing8.3 Skew and kurtosis8.3.1 Skew8.3.2 Kurtosis8.4 Two samples8.4.1 Comparing two variances8.4.2 Comparing two means8.4.3 Student’s t test8.4.4 Wilcoxon rank-sum test8.5 Tests on paired samples8.6 The sign test8.7 Binomial test to compare two proportions8.8 Chi-squared contingency tables8.8.1 Pearson’s chi-squared8.8.2 G test of 367369

xiv9DETAILED CONTENTS8.8.3 Unequal probabilities in the null hypothesis8.8.4 Chi-squared tests on table objects8.8.5 Contingency tables with small expected frequencies: Fisher’s exact test8.9 Correlation and covariance8.9.1 Data dredging8.9.2 Partial correlation8.9.3 Correlation and the variance of differences between variables8.9.4 Scale-dependent correlations8.10 Kolmogorov–Smirnov test8.11 Power analysis8.12 cal Modelling9.1 First things first9.2 Maximum likelihood9.3 The principle of parsimony (Occam’s razor)9.4 Types of statistical model9.5 Steps involved in model simplification9.5.1Caveats9.5.2Order of deletion9.6 Model formulae in R9.6.1Interactions between explanatory variables9.6.2Creating formula objects9.7 Multiple error terms9.8 The intercept as parameter 19.9 The update function in model simplification9.10 Model formulae for regression9.11 Box–Cox transformations9.12 Model criticism9.13 Model checking9.13.1 Heteroscedasticity9.13.2 Non-normality of errors9.14 Influence9.15 Summary of statistical models in R9.16 Optional arguments in model-fitting functions9.16.1 Subsets9.16.2 Weights9.16.3 Missing values9.16.4 Offsets9.16.5 Dataframes containing the same variable names9.17 Akaike’s information criterion9.17.1 AIC as a measure of the fit of a model9.18 Leverage9.19 Misspecified model9.20 Model checking in R9.21 Extracting information from model objects9.21.1 Extracting information by name9.21.2 Extracting information by list 417418418420421421

DETAILED ng components of the model using 9.21.4Using lists with modelsThe summary tables for continuous and categorical explanatory variablesContrasts9.23.1Contrast coefficients9.23.2An example of contrasts in R9.23.3A priori contrastsModel simplification by stepwise deletionComparison of the three kinds of contrasts9.25.1Treatment contrasts9.25.2Helmert contrasts9.25.3Sum contrastsAliasingOrthogonal polynomial contrasts: contr.polySummary of statistical 44344810Regression10.1 Linear regression10.1.1The famous five in R10.1.2Corrected sums of squares and sums of products10.1.3Degree of scatter10.1.4Analysis of variance in regression: SSY SSR SSE10.1.5Unreliability estimates for the parameters10.1.6Prediction using the fitted model10.1.7Model checking10.2 Polynomial approximations to elementary functions10.3 Polynomial regression10.4 Fitting a mechanistic model to data10.5 Linear regression after transformation10.6 Prediction following regression10.7 Testing for lack of fit in a regression10.8 Bootstrap with regression10.9 Jackknife with regression10.10 Jackknife after bootstrap10.11 Serial correlation in the residuals10.12 Piecewise regression10.13 Multiple regression10.13.1 The multiple regression model10.13.2 Common problems arising in multiple 7247547848148348448548949049711Analysis of Variance11.1 One-way ANOVA11.1.1Calculations in one-way ANOVA11.1.2Assumptions of ANOVA11.1.3A worked example of one-way ANOVA11.1.4Effect sizes11.1.5Plots for interpreting one-way ANOVA11.2 Factorial experiments11.3 Pseudoreplication: Nested designs and split plots498498502503503509511516519

xviDETAILED CONTENTS11.411.511.611.711.3.1Split-plot experiments11.3.2Mixed-effects models11.3.3Fixed effect or random effect?11.3.4Removing the pseudoreplication11.3.5Derived variable analysisVariance components analysisEffect sizes in ANOVA: aov or lm?Multiple comparisonsMultivariate analysis of variance51952252352352452452753153512Analysis of Covariance12.1 Analysis of covariance in R12.2 ANCOVA and experimental design12.3 ANCOVA with two factors and one continuous covariate12.4 Contrasts and the parameters of ANCOVA models12.5 Order matters in summary.aov53753854854855155413Generalized Linear Models13.1 Error structure13.2 Linear predictor13.3 Link function13.3.1Canonical link functions13.4 Proportion data and binomial errors13.5 Count data and Poisson errors13.6 Deviance: Measuring the goodness of fit of a GLM13.7 Quasi-likelihood13.8 The quasi family of models13.9 Generalized additive models13.10 Offsets13.11 Residuals13.11.1 Misspecified error structure13.11.2 Misspecified link function13.12 Overdispersion13.13 Bootstrapping a GLM13.14 Binomial GLM with ordered categorical 956957057057414Count Data14.1 A regression with Poisson errors14.2 Analysis of deviance with count data14.3 Analysis of covariance with count data14.4 Frequency distributions14.5 Overdispersion in log-linear models14.6 Negative binomial errors57957958158658859259515Count Data in Tables15.1 A two-class table of counts15.2 Sample size for count data15.3 A four-class table of counts15.4 Two-by-two contingency tables15.5 Using log-linear models for simple contingency tables599599600600601602

DETAILED CONTENTSxvii15.6 The danger of contingency tables15.7 Quasi-Poisson and negative binomial models compared15.8 A contingency table of intermediate complexity15.9 Schoener’s lizards: A complex contingency table15.10 Plot methods for contingency tables15.11 Graphics for count data: Spine plots and spinograms60460660861061662116Proportion Data16.1 Analyses of data on one and two proportions16.2 Count data on proportions16.3 Odds16.4 Overdispersion and hypothesis testing16.5 Applications16.5.1 Logistic regression with binomial errors16.5.2 Estimating LD50

This edition first published 2013 C 2013 John Wiley & Sons, Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The .

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. Crawford M., Marsh D. The driving force : food in human evolution and the future.

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. 3 Crawford M., Marsh D. The driving force : food in human evolution and the future.