STATISTICS 462 – Summer 2016 Homework 3
DUE Friday, July 15th
Unless otherwise stated, you can use R for any of the calculations, but make sure you include your code. Your code should not be a copy of anyone else’s! Any code you turn in should be well organized and commented so the grader can understand your answers.
All programming questions should be submitted to the dropbox on ANGEL for this assignment as a .pdf file using the naming convention HWNum_FirstInitialLastName.pdf. For example, John Doe would submit a file titled HW1_JDoe.pdf for the first assignment. Your answer to programming questions should include both code and a description of your result. I recommend using R-markdown for writing up your answers. A template for writing up an assignment in R-markdown can be found on ANGEL. R-markdown files can be compiled directly within R-Studio. Alternatively, answers may be saved in a word document or LaTeX, and converted into a .pdf file.
Non-coding questions can either be written and submitted in the same file as your coding questions using LaTeX typesetting (see https://latex-project.org/intro.html) or they may be handwritten and turned in separately during class.
1. Download the data1.Rdata dataset from ANGEL, and load it into R using load(“data1.Rdata”). This data contains two columns x and y
(a) Perform an EDA for this data.
(b) Fit a simple linear regression model to this data using y as the response and x as the covariate. Report your estimates.
(c) Assuming the stronger set of SLR assumptions presented in class, perform any relevant diagnostics to test those model assumptions. Comment on what you observe.
(d) Which modeling assumptions may be violated for this data?
(e) Describe and apply techniques presented in class to alleviate the modeling violations that you found. Justify all modeling steps.
(f) Present the estimates for your final model and interpret the regression parameters.
2. Load the automobile metrics data from “https://archive.ics.uci.edu/ml/machine-learning- databases/auto-mpg/auto-mpg.data”. You can use the following R-code to load the data.
dat <- read.table(paste(“https://archive.ics.uci.edu/ml/”, “machine-learning-databases/”, “auto-mpg/auto-mpg.data”, sep = “”),
header = FALSE, colClasses = c(“numeric”, “numeric”,”numeric”, “numeric”,
“numeric”, “numeric”, “numeric”, “numeric”, “factor”),
na.strings = “?”) colnames(dat) <- c(“mpg”, “cylinders”,”displacement”,
“horsepower”,”weight”, “acceleration”, “modyr”, “origin”,”name”)
(a) Do some brief EDA for this data.
(b) Fit a simple linear regression model with mpg as the response and horsepower as the predictor.
(c) Repeat parts (c) – (f) from question 1 for this data.