Power, Errors, and Breakdown of Null Hypothesis Significance Testing

Published

October 14, 2025

Type I vs. Type II errors, why do we care?

A type-I error is when one incorrectly rejects the null hypothesis, even though it is true. A common analogy for a type-I error is when a court rules that a defendant is guilty even though they are innocent. The type-I error conditions on the null hypothesis being true. The probability of a type-I error is usually denoted as \(\alpha\). The usual threshold for this is 0.05. We see \(\alpha\) in pink below.

A type-II error is when one incorrectly fails to reject the null even though the alternative hypothesis is true. A common analogy for a type-II error is when a court fails to convict a guilty person. The type-II error conditions on the alternative hypothesis being true. The probability of a type-II error is usually denoted as \(\beta\). The usual threshold for this is 0.20. We see \(\beta\) in blue below.

plot(seq(-3, 6, 0.01), dt(seq(-3, 6, 0.01), 100), cex = 0.3)

# plotting alpha 
DISTR <- qt(0.95, 100)
polygon(c(seq(DISTR, 6, 0.01), 
          rev(seq(DISTR, 6, 0.01))),
        c(rep(0, length(seq(DISTR, 6, 0.01))), 
          rev(dt(seq(DISTR, 6, 0.01), 100))),
        col = adjustcolor('pink', alpha = 0.7), border = NA)

# plotting beta 
points(seq(-3, 6, 0.01), dt(seq(-3, 6, 0.01), 100, ncp = 3), cex = 0.3)
polygon(c(seq(-3, qt(0.2, 100, ncp = 3), 0.01), 
          rev(seq(-3, qt(0.2, 100, ncp = 3), 0.01))),
        c(rep(0, length(seq(-3, qt(0.2, 100, ncp = 3), 0.01))), 
          rev(dt(seq(-3, qt(0.2, 100, ncp = 3), 0.01), 100, ncp = 3))),
        col = adjustcolor('blue', alpha = 0.3), border = NA)

What is power?

Looking at the type-II error more in depth, we can rewrite that given the alternative hypothesis is true, \(P(not \hspace{1mm} reject \hspace{1mm} H_0 | H_A) = \beta\) as \(P(reject \hspace{1mm} H_0 | H_A) = 1 - \beta\). This is also known as power! In other words, the probability of rejecting the null hypothesis assuming that the alternative hypothesis is true.

What factors influence power?

  • Significance level: As the \(\alpha\) threshold decreases (becomes more strict), this decreases the critical region for an alternative hypothesis, thereby decreasing power. There is a trade-off between type-I error control and power.

  • Treatment effect: GIven a \(\delta\) that represents the difference between the null hypothesis and the alternative hypothesis, given a critical value T, if \(\delta\) is small, then the critical region will be much smaller than if \(\delta\) was larger. If we categorize \(\delta\) as a treatment effect, we say that small treatment effects are hard to distinguish.

  • Population variance: the variance of our test-statistic is controlled by the actual population variance and the sample size. If the actual populataion variance is large, the null and alternative distribution will potentially overlap more, both pushing the critical value potentially more into the critical region, but it is unclear the effects. Potentially a net negative effect on power.

  • Sample size: However if we increase our sample size, the overall variance of our null and alternative distributions will shrink. This means that the critical value will shift towards 0 and more of the alternative distribution centers around \(\delta\). Thus, higher sample size means higher power.

Violations of linear regression

Linearity Recall that our estimates for \(\beta\) are solved by minimizing the least squares equation \[ min \hspace{1mm} \Sigma_{i=1}^n (Y_i - [\beta_0 + \beta_1 X_i])^2 \] The assumption in the least squares solution here is that the model is linear. Solving this, we get that \[ \vec{\hat{\beta}} = (X^T X)^{-1} X^T Y \] However, if the assumption of linearity is violated, then these estimates will be biased. Because these estimates will be biased, this means that both our predicted values \(Y\), our errors \(\vec{\epsilon} = (I-H)*Y\), and our variance \(\vec{\epsilon}^T \vec{\epsilon} / (n-p)\) will also be biased.

Homoskedasticity This assumptions is that the variance of the errors is the same. This implies that \[ Var(Y) = Var(\vec{\epsilon}) = \sigma^2 *I \] where \(I\) is the identity matrix. Additionally, our variance for our \(\beta\)’s can be written as

\[ Var(\beta) = \sigma^2 * I * (X^TX)^{-1} \] Thus if our errors are not the same, this means that we would have a diagonal matrix with different variances like the one below:

\[ \begin{bmatrix} \sigma_1^2 & 0 & 0 & 0 \\ 0 & \ddots & 0 & 0 \\ 0 & 0 & \sigma_{n-1}^2 & 0 \\ 0 & 0 & 0 & \sigma_n^2 \end{bmatrix} \]

Which would mean that not only our variance of the errors is biased, but also our variance for our estimates of \(\hat{beta}\) and their subsequent standard errors will also be biased.

Independence We see up above that \[ Var(\beta) = \sigma^2 * I * (X^TX)^{-1} \] However, if there is dependence between the errors, we have to account for this by a covariance-matrix \(\Sigma\), meaning that our variance is unsimplified to \[ Var(\beta) = (X^T X)^{-1} X^T * \Sigma * ( (X^T X)^{-1} X^T )^T \] Thus if we violate the assumptions of homoskedasticity, the variance and standard errors of our estimates will be biased.

Normally distributed

What happens to our confidence intervals?

what can happens when the model assumptions are violated? what steps are invalid when you go through the procedure? what does it mean for a confidence interval to be invalid? what does that mean? can you walk through that?

what happens to your type i and type ii errors.

Additional Resources