Aufgabe 1

Zeigen Sie, dass die Score-Teststatistik für die globale Nullhypothese

\[H_0\!: \beta_2 = \ldots = \beta_k = 0\]

im logistischen Regressionsmodell die Form

\[U = (\hat{\pi}_0 (1 - \hat{\pi}_0))^{-1} (Y - \hat{\pi}_0 \mathbb{1})^T X (X^TX)^{-1} \boldsymbol{X}^T(Y - \hat{\pi}_0 \mathbb{1})\]

hat, wobei \(\hat{\pi}_0 = \frac{1}{n} \sum_{i=1}^n Y_i\) und \(\mathbb{1} = (1, \ldots, 1)^T\).

\[\begin{aligned} \boldsymbol{F}^{-1}(\tilde{\beta}) &= \left(\boldsymbol{X}^T \boldsymbol{X}\right)^{-1} \frac{1}{\hat{\pi}_0 (1 - \hat{\pi}_0 )} \qquad \boldsymbol{s}(\tilde{\beta}) = \sum_{i=1}^n \boldsymbol{x}_i^T (Y_i - \hat{\pi}_0) \\[1.5em] U &= \boldsymbol{s}(\tilde{\beta}) \boldsymbol{F}^{-1}(\tilde{\beta}) \boldsymbol{s}(\tilde{\beta}) \\ &= \frac{1}{\hat{\pi}_0 (1 - \hat{\pi}_0 )} \left(\sum_{i=1}^n (Y_i - \hat{\pi}_0) \boldsymbol{x}_i\right) \left(\boldsymbol{X}^T \boldsymbol{X}\right)^{-1} \left(\sum_{i=1}^n \boldsymbol{x}_j^T (Y_j - \hat{\pi}_0)\right) \\ &= (\hat{\pi}_0 (1 - \hat{\pi}_0 ))^{-1} \sum_{j=1}^n \sum_{i=1}^n (Y_i - \hat{\pi}_0) \boldsymbol{x}_i \left(\boldsymbol{X}^T \boldsymbol{X}\right)^{-1} \boldsymbol{x}_j^T (Y_j - \hat{\pi}_0) \\ &= (\hat{\pi}_0 (1 - \hat{\pi}_0 ))^{-1} (\boldsymbol{Y} - \hat{\pi}_0 \mathbb{1})^T \boldsymbol{X} \left(\boldsymbol{X}^T \boldsymbol{X}\right)^{-1} \boldsymbol{X}^T (\boldsymbol{Y} - \hat{\pi}_0 \mathbb{1}) \end{aligned}\]

Aufgabe 2

In dem Datensatz education.csv stellt die Variable education eine Zählvariable dar. Führen Sie eine Poisson-Regression durch, indem Sie education als Zielvariable auffassen und die anderen Variablen des Datensatzes als Kovariablen. Nehmen Sie dafür an, dass \(\mathtt{education}_i\) \(Po(\lambda_i)\)-verteilt ist mit

education <- read.csv("data/education.csv")

\[\begin{aligned} \log(\lambda_i) =& \beta_0 + \beta_1 \cdot \mathtt{age}_i + \beta_2 \cdot \mathtt{electric}_i + \beta_3 \cdot \mathtt{tv}_i + \\ & \beta_4 \cdot \mathtt{urban}_i + \beta_5 \cdot \mathtt{married}_i + \beta_6 \cdot \mathtt{firsthalf}_i + \\ & \beta_7 \cdot \mathtt{radio}_i + \beta_8 \cdot \mathtt{bicycle}_i + \beta_9 \cdot \mathtt{knowmethod}_i + \beta_{10} \cdot \mathtt{usemethod}_i \end{aligned}\]

Führen Sie manuell den Fisher-Scoring-Algorithmus (mit \(\epsilon = 10^{-8}\)) durch, d.h.:

a)

Berechnen Sie zunächst einen Kleinste-Quadrate Schätzer für \(\boldsymbol{\beta}\). Diese Schätzung bildet den Startwert \(\boldsymbol{\hat{\beta}}^{(0)}\) für den Algorithmus.

Y <- education$education

# Designmatrix
X <- education %>%
  mutate(intercept = 1) %>%
  select(intercept, everything(), -X, -education) %>%
  as.matrix()

# KQ-schätzer: solve() -> Inverse
pois_kq <- solve(t(X) %*% X) %*% t(X) %*% Y

pois_kq %>%
  round(4) %>%
  as.data.frame() %>%
  tibble::rownames_to_column(var = "beta") %>%
  kable(
    booktabs = TRUE, linesep = "", escape = FALSE, 
    col.names = c("Term", "KQ $\\hat{\\beta}_j$")
  ) %>%
  kable_styling(protect_latex = TRUE, position = "center")
Term KQ \(\hat{\beta}_j\)
intercept 5.9615
age -0.1234
electric 1.7281
tv 2.3767
urban 0.5179
married -0.8734
firsthalf -0.5201
radio 1.3816
bicycle -0.0356
knowmethod 1.5076
usemethod 1.4506

b)

Berechnen sie iterativ \(\boldsymbol{\hat{\beta}}^{(k+1)} = \boldsymbol{\hat{\beta}}^{(k)} + \boldsymbol{F}^{-1}(\boldsymbol{\hat{\beta}}^{(k)}) s(\boldsymbol{\hat{\beta}}^{(k)})\) bis \(||\boldsymbol{\hat{\beta}}^{(k+1)} - \boldsymbol{\hat{\beta}}^{(k)}|| / ||\boldsymbol{\hat{\beta}}^{(k)}|| < \epsilon\)

Die Score-Funktion (SM3-Teil6, Folie 6) für \(\phi = 1\) und \(\omega_i = 1\):

\[ \boldsymbol{s}(\beta) = \sum_{i=1}^n \boldsymbol{x}_i^T \cdot \left(Y_i - \exp(\boldsymbol{x}_i \beta)\right) \]

In Matrixschreibweise, der einfacheren Berechnung halber:

\[ \boldsymbol{s}(\beta) = \boldsymbol{X}^T (Y - \exp(\boldsymbol{x}_i \beta)) \]

scorefun <- function(betahat) {
  t(X) %*% (Y - exp(X %*% betahat))
}

# Beispiel:
scorefun(pois_kq)
##                  [,1]
## intercept   -18461664
## age        -408395894
## electric    -15026609
## tv          -14765643
## urban       -16796616
## married      -4387332
## firsthalf    -6943467
## radio       -18001788
## bicycle      -7078881
## knowmethod  -18407425
## usemethod   -14928496

Die Fisher-Information (SM3-Teil6, Folie 6), wieder für \(\phi = 1\) und \(\omega_i = 1\):

\[\begin{aligned} \boldsymbol{F}(\beta, \phi) &= \phi^{-1} \sum_{i=1}^n \boldsymbol{x}_i^T \boldsymbol{x}_i \exp(\boldsymbol{x}_i \beta) \omega_i \\ &= \sum_{i=1}^n \boldsymbol{x}_i^T \boldsymbol{x}_i \exp(\boldsymbol{x}_i \beta) \end{aligned}\]

In Matrixschreibweise:

\[ \boldsymbol{F}(\beta, \phi = 1) = \boldsymbol{X}^T \mathrm{diag}(\exp(\boldsymbol{x}_i \beta)) \boldsymbol{X} \]

fisherinf <- function(betahat) {
  t(X) %*% diag(as.numeric(exp(X %*% betahat))) %*% X
}

# Beispiel (die ersten 4 Spalten):
fisherinf(pois_kq)[, 1:4]
##            intercept        age  electric        tv
## intercept   18486854  409040072  15031871  14769470
## age        409040072 9633835108 335501007 328769337
## electric    15031871  335501007  15031871  13540524
## tv          14769470  328769337  13540524  14769470
## urban       16811607  371732522  14346578  14073318
## married      4397555  121979294   3856052   3756971
## firsthalf    6956078  152795117   5681067   5583981
## radio       18021561  398778605  14765202  14595178
## bicycle      7086278  158427937   6237885   6124165
## knowmethod  18432217  408102124  14987254  14724619
## usemethod   14944749  345887318  12279468  11963419

Fisher-Scoring Hilfsfunktionen: Euklidnorm und Funktion zum Vergleich zweier Iterationsschritte, sprich die Implementation von

\[||\boldsymbol{\hat{\beta}}^{(k+1)} - \boldsymbol{\hat{\beta}}^{(k)}|| / ||\boldsymbol{\hat{\beta}}^{(k)}|| < \epsilon\]

euklidnorm <- function(x) {
  sqrt(sum(x^2))
}

coef_compare <- function(beta1, beta2, tol = 1e-8) {
  diff <- euklidnorm(beta2 - beta1) / euklidnorm(beta1)
  tibble(diff = diff, converged = diff < tol)
}

# Nächster (bzw. erster) Iterationsschritt
# betak1 <- pois_kq + solve(fisherinf(pois_kq)) %*% scorefun(pois_kq)
# coef_compare(pois_kq, betak1)

Der eigentliche Fisher-Scoring Algorithmus:

\[\boldsymbol{\hat{\beta}}^{(k+1)} = \boldsymbol{\hat{\beta}}^{(k)} + \boldsymbol{F}^{-1}(\boldsymbol{\hat{\beta}}^{(k)}) s(\boldsymbol{\hat{\beta}}^{(k)})\]

# Berechne beta_k+1 aus beta_k
iterate_step <- function(beta) {
  beta + solve(fisherinf(beta)) %*% scorefun(beta)
}

# Fisher-Scoring:
# Leere Datenstruktur für Resultate
betas <- list(coef = list(), status = list())

# Inititalisiere mit KQ-Schätzer
betas$coef[1] <- list(pois_kq)

# Führe 14 Iterationsscritte durch, speichere Zwischenergebnisse
for (k in 1:14) {
  betas$coef[k + 1] <- list(iterate_step(betas$coef[[k]]))
  betas$status[k + 1] <- list(coef_compare(betas$coef[[k]], betas$coef[[k + 1]]))
}

# Tabelle der Iterationsschritte
fisher_scoring_results <- purrr::map_df(1:15, ~{
  betas$status[[.x]] %>%
    bind_cols(as_tibble(t(betas$coef[[.x]]))) %>%
    mutate(k = .x - 1)
}) %>%
  select(k, diff, converged, everything())

fisher_scoring_results %>%
  mutate(diff = format(diff, digits = 4, scientific = TRUE)) %>%
  kable(booktabs = TRUE, digits = 3) %>%
  kable_styling(
    position = "center", latex_options = c("scale_down", "striped")
  ) %>%
  row_spec(row = 15, bold = TRUE)
k diff converged intercept age electric tv urban married firsthalf radio bicycle knowmethod usemethod
0 NA NA 5.961 -0.123 1.728 2.377 0.518 -0.873 -0.520 1.382 -0.036 1.508 1.451
1 1.365e-01 FALSE 4.980 -0.123 1.725 2.374 0.516 -0.872 -0.519 1.374 -0.035 1.502 1.448
2 1.487e-01 FALSE 4.028 -0.123 1.716 2.365 0.511 -0.870 -0.517 1.352 -0.035 1.488 1.442
3 1.543e-01 FALSE 3.157 -0.121 1.694 2.342 0.496 -0.863 -0.511 1.297 -0.033 1.450 1.426
4 1.404e-01 FALSE 2.476 -0.118 1.635 2.283 0.461 -0.844 -0.495 1.162 -0.029 1.353 1.384
5 1.174e-01 FALSE 2.171 -0.108 1.493 2.139 0.380 -0.796 -0.454 0.886 -0.021 1.136 1.275
6 1.836e-01 FALSE 2.265 -0.087 1.213 1.840 0.241 -0.684 -0.363 0.513 -0.007 0.776 1.031
7 2.405e-01 FALSE 2.208 -0.056 0.827 1.374 0.116 -0.477 -0.224 0.286 -0.001 0.498 0.641
8 2.814e-01 FALSE 1.797 -0.033 0.495 0.871 0.086 -0.260 -0.122 0.259 0.000 0.489 0.359
9 2.332e-01 FALSE 1.559 -0.026 0.307 0.484 0.096 -0.158 -0.095 0.272 0.001 0.524 0.287
10 1.009e-01 FALSE 1.516 -0.025 0.253 0.318 0.100 -0.140 -0.093 0.275 0.000 0.532 0.282
11 1.260e-02 FALSE 1.514 -0.025 0.249 0.297 0.101 -0.139 -0.093 0.276 0.000 0.532 0.282
12 1.581e-04 FALSE 1.514 -0.025 0.249 0.296 0.101 -0.139 -0.093 0.276 0.000 0.532 0.282
13 2.324e-08 FALSE 1.514 -0.025 0.249 0.296 0.101 -0.139 -0.093 0.276 0.000 0.532 0.282
14 1.655e-15 TRUE 1.514 -0.025 0.249 0.296 0.101 -0.139 -0.093 0.276 0.000 0.532 0.282
# Plot progress
fisher_scoring_results %>%
  select(-converged, -diff) %>%
  gather(coef, value, intercept:usemethod) %>%
  mutate(coef = factor(coef, levels = rownames(pois_kq)[order(-pois_kq)])) %>%
  ggplot(aes(x = k, y = value, color = coef, fill = coef)) +
  geom_path(show.legend = FALSE) +
  geom_point(shape = 21, size = 2, color = "black") +
  scale_x_continuous(breaks = 0:14) +
  scale_color_viridis_d(
    aesthetics = c("color", "fill"),
    begin = .3,
    guide = guide_legend(nrow = 2)
  ) +
  labs(
    title = "Fisher-Scoring Iterationsschritte",
    subtitle = "Wert der Koeffizienten zu Schritt",
    caption = "Terme sortiert nach Startwert",
    x = "Schritt (k)", y = latex2exp::TeX("$\\hat{\\beta}_j^{(k)}$"),
    fill = "Term (j)"
  )

c)

Lassen Sie die Poisson-Regression mit einer Statistik-Software durchführen und vergleichen Sie das Ergebnis mit dem aus b)

modpois <- glm(education ~ . -X, family = poisson(), data = education)

tibble(
  Term = names(coef(modpois)),
  Manuell = as.numeric(fisher_scoring_results[14, 4:14]),
  glm = coef(modpois),
  `|diff|` = format(abs(glm - Manuell), digits = 2, scientific = TRUE)
) %>%
  kable(
    booktabs = TRUE, linesep = "",
    digits = 4
  ) %>%
  kable_styling(position = "center")
Term Manuell glm |diff|
(Intercept) 1.5145 1.5145 1.6e-08
age -0.0252 -0.0252 1.2e-11
electric 0.2495 0.2495 2.9e-12
tv 0.2964 0.2964 9.3e-12
urban 0.1005 0.1005 2.7e-11
married -0.1389 -0.1389 2.4e-11
firsthalf -0.0934 -0.0934 6.6e-12
radio 0.2757 0.2757 6.4e-11
bicycle -0.0005 -0.0005 1.6e-11
knowmethod 0.5321 0.5321 1.6e-08
usemethod 0.2822 0.2822 5.8e-11

Wir erhalten bis auf mindestens die 8. Nachkommastelle die gleichen Koeffizienten, allerdings hat glm() nur 5 Iterationsschritte gebraucht, und wir brauchten 14.
Aber immerhin.