11 Day 11 (June 15)

11.1 Announcements

Assignment 3 is posted and due Friday
Vote on if tomorrow should be a work day?
Recommended reading
- Chapters 3 and 4 (pgs 33 - 58) in Linear Models with R
Questions and comments from journal
- “I am still trying to understand how we know if the estimated slopes from a linear model are reliable.”
- “It seems like results from models can then lead to more questions. Are those questions then incorporated into the model to account for that new question or does it lead to needing a new model entirely?”
- Comment on questioning model-based estimates or conclusions (return to deer example).

11.2 Maximum Likelihood Estimation

Maximum likelihood estimation for the linear model
- Assume that \(\mathbf{y}\sim \text{N}(\mathbf{X}\boldsymbol{\beta},\sigma^2\mathbf{I})\)
- Write out the likelihood function \[\mathcal{L}(\boldsymbol{\beta},\sigma^2)=\prod_{i=1}^n\frac{1}{\sqrt{2\pi\sigma^2}}\textit{e}^{-\frac{1}{2\sigma^2}(y_{i} - \mathbf{x}_{i}^{\prime}\boldsymbol{\beta})^2}.\]
- Maximize the likelihood function
  1. First take the natural log of \(\mathcal{L}(\boldsymbol{\beta},\sigma^2)\), which gives \[\mathcal{l}(\boldsymbol{\beta},\sigma^2)=-\frac{n}{2}\text{ log}(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^{n}(y_{i} - \mathbf{x}_{i}^{\prime}\boldsymbol{\beta})^2.\]
  2. Recall that \(\sum_{i=1}^{n}(y_{i} - \mathbf{x}_{i}^{\prime}\boldsymbol{\beta})^2 = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^{\prime}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\)
  3. Maximize the log-likelihood function \[\underset{\boldsymbol{\beta}, \sigma^2}{\operatorname{argmax}} -\frac{n}{2}\text{ log}(2\pi\sigma^2) - \frac{1}{2\sigma^2}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^{\prime}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\]
- Visual for data \(\mathbf{y}=(0.16,2.82,2.24)'\) and \(\mathbf{x}=(1,2,3)'\)

Three ways to do it in program R

Using matrix calculus and algebra

# Data 
y <- c(0.16,2.82,2.24)
X <- matrix(c(1,1,1,1,2,3),nrow=3,ncol=2,byrow=FALSE)
n <- length(y)

# Maximum likelihood estimation using closed from estimators
beta <- solve(t(X)%*%X)%*%t(X)%*%y 
sigma2 <- (1/n)*t(y-X%*%beta)%*%(y-X%*%beta)

beta

##       [,1]
## [1,] -0.34
## [2,]  1.04

sigma2

##        [,1]
## [1,] 0.5832

Using modern (circa 1970’s) optimization techniques

# Maximum likelihood estimation estimation using the Nelder-Mead algorithm
y <- c(0.16,2.82,2.24)
X <- matrix(c(1,1,1,1,2,3),nrow=3,ncol=2,byrow=FALSE)

negloglik <- function(par){
beta <- par[1:2]
sigma2 <- par[3]
-sum(dnorm(y,X%*%beta,sqrt(sigma2),log=TRUE))
}
optim(par=c(0,0,1),fn=negloglik,method = "Nelder-Mead")

## $par
## [1] -0.3397162  1.0398552  0.5831210
## 
## $value
## [1] 3.447978
## 
## $counts
## function gradient 
##      150       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Using modern and user friendly statistical computing software

# Maximum likelihood estimation using lm function
# Note: estimate of sigma2 is not the MLE! 
df <- data.frame(y = c(0.16,2.82,2.24),x = c(1,2,3))
m1 <- lm(y~x,data=df)

coef(m1)

## (Intercept)           x 
##       -0.34        1.04

summary(m1)$sigma^2

## [1] 1.7496

summary(m1)

## 
## Call:
## lm(formula = y ~ x, data = df)
## 
## Residuals:
##     1     2     3 
## -0.54  1.08 -0.54 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.3400     2.0205  -0.168    0.894
## x             1.0400     0.9353   1.112    0.466
## 
## Residual standard error: 1.323 on 1 degrees of freedom
## Multiple R-squared:  0.5529, Adjusted R-squared:  0.1057 
## F-statistic: 1.236 on 1 and 1 DF,  p-value: 0.4663

Live example

11.3 Confidence intervals for paramters

Example

Northern bobwhite quail

y <- c(63, 68, 61, 44, 103, 90, 107, 105, 76, 46, 60, 66, 58, 39, 64, 29, 37,
27, 38, 14, 38, 52, 84, 112, 112, 97, 131, 168, 70, 91, 52, 33, 33, 27,
18, 14, 5, 22, 31, 23, 14, 18, 23, 27, 44, 18, 19)
year <- 1965:2011
df <- data.frame(y = y, year = year)
plot(x = df$year, y = df$y, xlab = "Year", ylab = "Annual count", main = "",
col = "brown", pch = 20, xlim = c(1965, 2040))

Is the population size really decreasing?
Write out the model we should use answer this question
How can we assess the uncertainty in our estimate of \(\beta_1\)?
Confidence intervals in R

m1 <- lm(y~year,data=df)
coef(m1)

## (Intercept)        year 
##  2356.48797    -1.15784

confint(m1,level=0.95)

##                 2.5 %       97.5 %
## (Intercept) 929.80699 3783.1689540
## year         -1.87547   -0.4402103

Note \[\hat{\boldsymbol{\beta}}\sim\text{N}(\boldsymbol{\beta},\sigma^2(\mathbf{X}^{\prime}\mathbf{X})^{-1})\] and let \[\hat{\beta_1}\sim\text{N}(\beta_1,\sigma^{2}_{\beta_1})\] where \(\sigma^{2}_{\beta_1}= \sigma^2\mathbf{c}^{\prime}(\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{c}\) and \(\mathbf{c}\equiv(0,1,0,0,\ldots,0)^{\prime}\). In R we can extract \(\sigma^2(\mathbf{X}^{\prime}\mathbf{X})^{-1}\) using vcov().

vcov(m1)

##             (Intercept)         year
## (Intercept) 501753.2825 -252.3792372
## year          -252.3792    0.1269513

Note that

diag(vcov(m1))^0.5

## (Intercept)        year 
## 708.3454542   0.3563023

corresponds to the column Std. Error from summary()

summary(m1)

## 
## Call:
## lm(formula = y ~ year, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -45.333 -20.597  -9.754  14.035 117.929 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 2356.4880   708.3455   3.327  0.00176 **
## year          -1.1578     0.3563  -3.250  0.00219 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 33.13 on 45 degrees of freedom
## Multiple R-squared:  0.1901, Adjusted R-squared:  0.1721 
## F-statistic: 10.56 on 1 and 45 DF,  p-value: 0.00219

When \(\sigma_{\beta_1}\) is known \[\hat{\beta_1}\sim\text{N}(\beta_1,\sigma^{2}_{\beta_1})\] \[\hat{\beta_1} - \beta_1 \sim\text{N}(0,\sigma^{2}_{\beta_1})\] \[\frac{\hat{\beta_1} - \beta_1}{\sigma_{\beta_1}} \sim\text{N}(0,1)\]
When \(\sigma_{\beta_1}\) is estimated \[\frac{\hat{\beta_1} - \beta_1}{\hat{\sigma}_{\beta_1}} \sim\text{t}(\nu)\] where \(\nu = n - p\)
Deriving the confidence interval for \(\hat{\beta_1}\) \[\text{P}(a < \frac{\hat{\beta_1} - \beta_1}{\hat{\sigma}_{\beta_1}} < b) = 1-\alpha\] \[\text{P}(a\hat{\sigma}_{\beta_1} < \hat{\beta_1} - \beta_1 < b\hat{\sigma}_{\beta_1}) = 1-\alpha\] \[\text{P}(-\hat{\beta_1} + a\hat{\sigma}_{\beta_1} < - \beta_1 < -\hat{\beta_1} + b\hat{\sigma}_{\beta_1}) = 1-\alpha\] \[\text{P}(\hat{\beta_1} - b\hat{\sigma}_{\beta_1} < \beta_1 < \hat{\beta_1} - a\hat{\sigma}_{\beta_1}) = 1-\alpha\]
What is \(\text{P}(a < \frac{\hat{\beta_1} - \beta_1}{\hat{\sigma}_{\beta_1}} < b) = 1-\alpha\)? \[\text{P}(a < z < b) = \int_{a}^{b}[z|\nu]dz\]
“By hand” in R

nu <- length(df$y) - 2
a <- qt(p = 0.025,df = nu)
a

## [1] -2.014103

b <- qt(p = 0.975,df = nu)
b

## [1] 2.014103

beta1.hat <- coef(m1)[2]
sigma.beta1.hat <- (diag(vcov(m1))^0.5)[2]

beta1.hat - b*sigma.beta1.hat

##     year 
## -1.87547

beta1.hat - a*sigma.beta1.hat

##       year 
## -0.4402103

confint(m1)[2,]

##      2.5 %     97.5 % 
## -1.8754696 -0.4402103