Lab 9: Linear regression#

Simple linear regression#

Simulating linear regression data#

x = rnorm(100, mean=3, sd=1)
error = rnorm(100, mean=0, sd=2)
y = 2.1+1.25*x+error

data=data.frame(cbind(x,y))

Fitting the linear regression model#

result = lm(data$y~data$x)
summary(result)
Call:
lm(formula = data$y ~ data$x)

Residuals:
   Min     1Q Median     3Q    Max 
-4.492 -1.564  0.035  1.184  6.029 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.3126     0.6830   4.850 4.64e-06 ***
data$x        0.8620     0.2122   4.062 9.82e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.962 on 98 degrees of freedom
Multiple R-squared:  0.1441,	Adjusted R-squared:  0.1354 
F-statistic:  16.5 on 1 and 98 DF,  p-value: 9.818e-05

Residual plot#

par(mfrow=c(2,2))
plot(x,y,pch=16)
abline(result,col="brown")
plot(result$residuals)
abline(a=0,b=0, col="brown")
hist(result$residuals)
qqnorm(result$residuals, pch=16)
qqline(result$residuals, col = "brown")
_images/64664cb5c1feec85af9077a6fe7a4f686d33ab0cf623bbd8e86838b2e1ff8149.png

Checking normality#

shapiro.test(result$residuals)
	Shapiro-Wilk normality test

data:  result$residuals
W = 0.99047, p-value = 0.7028

Prediction for new observations#

x_new = 0.7
y_new = result$coefficients[1]+result$coefficients[2]*x_new
y_new
(Intercept): 3.91606027399812

Fitting quadratic relationships#

x = rnorm(100, mean=0, sd=1)
y = 2.1+1.25*x^2+error
data=data.frame(cbind(x,y))
result = lm(data$y~data$x)
summary(result)
Call:
lm(formula = data$y ~ data$x)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.6360 -1.5817 -0.0843  1.5278  8.2568 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.2918     0.2470  13.327   <2e-16 ***
data$x       -0.4565     0.2494  -1.831   0.0702 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.454 on 98 degrees of freedom
Multiple R-squared:  0.03306,	Adjusted R-squared:  0.02319 
F-statistic: 3.351 on 1 and 98 DF,  p-value: 0.07021

Multiple linear regression#

Generating data#

x1 = rnorm(100, mean=3, sd=1)
x2 = rnorm(100, mean=2.5, sd=2.1)

error = rnorm(100, mean=0, sd=2)
y = 2.1 + 1.25*x1 - 3*x2 + error

data=data.frame(cbind(y,x1,x2))
plot(data)
_images/224a9295ecd608c31bcf029c0eb2e3cbff7ccd5370f60ea043c54be3fb6c896a.png

Fitting the model#

result = lm(y~x1+x2,data=data)
summary(result)
Call:
lm(formula = y ~ x1 + x2, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.8136 -1.5113  0.0226  1.4080  5.0071 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.3971     0.7364   1.897   0.0608 .  
x1            1.3777     0.2347   5.870 6.08e-08 ***
x2           -2.8028     0.1018 -27.523  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.022 on 97 degrees of freedom
Multiple R-squared:  0.8898,	Adjusted R-squared:  0.8876 
F-statistic: 391.8 on 2 and 97 DF,  p-value: < 2.2e-16

Residual plot#

par(mfrow=c(2,2))
plot(result$residuals)
abline(a=0,b=0, col="red")
hist(result$residuals)
qqnorm(result$residuals, pch=16)
qqline(result$residuals, col = "brown")
_images/7958a6d43c3fe2c5011af592eaddbaac6622f05e472dc7954defeeba1c8ed18d.png

Checking the normality assumption#

shapiro.test(result$residuals)
	Shapiro-Wilk normality test

data:  result$residuals
W = 0.99217, p-value = 0.8337

Logistic regression#

Generating data#

age <- round(runif(100, 18, 80))
log_odds = -2.2 + 0.02*age
p = 1/(1 + exp(-log_odds))
y <- rbinom(n = 100, size = 1, prob = p)

Fitting the logistic model#

mod <- glm(y ~ age, family = "binomial")
summary(mod)
Call:
glm(formula = y ~ age, family = "binomial")

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.0593  -0.7440  -0.5259  -0.3979   2.2962  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.27148    0.86797  -3.769 0.000164 ***
age          0.03734    0.01500   2.490 0.012787 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 102.791  on 99  degrees of freedom
Residual deviance:  95.974  on 98  degrees of freedom
AIC: 99.974

Number of Fisher Scoring iterations: 4