MSE 125 — Applied Statistics
Wednesday, April 22, 2026
in the early 1990s, AIDS was the leading killer of Americans aged 25–44
the first drug worked — until it stopped working
what next?
CD4 = the white blood cells HIV destroys
rising count \Rightarrow immune system recovering
Hammer et al., NEJM 1996
observed difference of means = 33.3 - (-17.1) = \mathbf{50.4} CD4 cells
treatment gains ≈ 33 cells — control loses ≈ 17 — AZT alone is failing these patients
estimand, estimator, estimate
you’ve been playing this game since week 1:
| lec | estimand | estimator |
|---|---|---|
| 4 | pop’n mean \mu | sample mean \bar X |
| 5 | pop’n coefficients \beta | OLS \hat\beta |
| 7 | pop’n accuracy | test-set accuracy |
today — estimand \mu_T - \mu_C, estimator \bar X_T - \bar X_C, estimate = \mathbf{50.4} CD4 cells
new question: how precise is the estimate?
population distribution \mathcal{X}
the distribution of a single observation drawn from the population
ACTG 175 has two populations, two distributions:
the 1607 treatment patients and 532 controls in our trial are samples from \mathcal{X}_T and \mathcal{X}_C
LLN: as n \to \infty, sample statistics converge to population parameters
\bar X_n \;\longrightarrow\; \mu, \qquad s_n \;\longrightarrow\; \sigma
you proved this in MS&E 120 — now we’ll use it
so plug in the sample to estimate \mathcal{X}’s parameters:
\hat\mu = \bar X_n, \qquad \hat\sigma = s_n
ACTG treatment group: \hat\mu_T = \bar X_T = 33.3 CD4 cells — our estimate of \mu_T
population mean
the difference of the two means is often what we care about
under randomization, that difference reads as the drug’s causal effect
red dashed = group mean
the overlap is bigger than the gap
sub-trial 1: mean = 48.7
sub-trial 2: mean = 35.8
sub-trial 3: mean = 31.8
three sub-trials, three different answers — sampling variation in action
sampling distribution
the distribution of values a statistic would take if we could repeat the study many times, each time with a fresh sample from the population
the three sub-trials above are three draws from this distribution

we never see it directly — we have one sample, not many
the sampling distribution is what we want — but we only have one sample
how do we estimate it?
resample the data you have
our one sample is the best picture we have of the population
treat it as if it were the population
draw new samples from it — with replacement
that’s “with replacement”
each resample is a plausible “alternative trial we might have run”
works for any statistic — that’s the power
np.random.choice(..., replace=True) = one bootstrap resample; list comprehension runs it B times and stacks into an array
B = number of bootstrap replications (here, 10,000) — a separate knob from the dataset size n
boot.mean() = 50.4 ← centered at observed difference
boot.std() = 5.6 ← ≈ standard error of the estimator
10,000 resamples mapping out the shape of plausible values
our third distribution today — after \mathcal{X} and the sampling distribution of \bar X_n
the estimate changes every time you draw a new sample
the estimand stays fixed
the bootstrap just gave us 10,000 estimates — mapping out that variation
95% confidence interval
intuition: a range of plausible values for the estimand
formal: built by a procedure where 95% of intervals contain the estimand across repeated studies
percentile method: the 2.5th to 97.5th percentile of the bootstrap distribution
⚠ not “estimand has a 95% chance of being in [a, b]” — estimand is fixed, CI varies across studies
Q: does the 95% CI include zero?
what would it mean for the drug if the CI did include zero?
95% CI: [39.6, 61.3]
entirely above zero — the drug really works
does: sampling uncertainty — different patients showing up
doesn’t: systematic shifts — seasonality, a new competitor, a marketing campaign mid-trial
before trusting a CI for a decision, ask: is the uncertainty that matters the kind the bootstrap captures?
it looks normal
red curve = Normal(μ, σ) with bootstrap mean and SD — nearly perfect fit
if you average many independent draws from a population distribution \mathcal{X}, the result is approximately normal — for large enough sample size
the sample mean is bell-shaped even if \mathcal{X} isn’t
the bootstrap distribution is a sampling distribution of a sample mean — so: bell-shaped
if X_1, \ldots, X_n \overset{\text{iid}}{\sim} \mathcal{X} with finite mean \mu and finite variance \sigma^2, then for large n:
\bar{X}_n \;\sim\; \text{Normal}\!\left(\mu, \frac{\sigma}{n^{1/2}}\right) \quad \text{(approximately, for large $n$)}
\sim = “distributed as”; the parenthetical keeps us honest that the match is asymptotic
\mathcal{X} does not need to be normal — any population distribution with finite moments works
iid plausible for ACTG? patients are different people (independence); sampled from the same \mathcal{X} (identical distribution)
LLN vs CLT: LLN says \bar X_n converges to \mu; CLT says how fast (n^{-1/2}) and in what shape (normal)
the CLT is the upgrade from this morning’s LLN plug-in
earlier: three sub-trials of 50 patients gave estimates 49, 36, 32
at 500 patients per sub-trial — 10× larger — how much would the three estimates vary?
notation in one place:
draw samples of size m from CD4 data
four panels: population, m=10, m=50, m=500
CLT \Rightarrow \text{SE}(\bar X) = \dfrac{\sigma}{m^{1/2}}
in practice: we don’t know \sigma — substitute the sample SD s
\widehat{\text{SE}}(\bar X) = \dfrac{s}{m^{1/2}}
| m | SE from formula | SE from simulation |
|---|---|---|
| 10 | 39.6 | 39.6 |
| 50 | 17.7 | 17.8 |
| 500 | 5.6 | 5.6 |
formula and simulation agree — CLT isn’t just a theorem, it’s a tool
if the bootstrap distribution is normal, we don’t need 10,000 resamples
\hat{\theta} \pm 1.96 \cdot \widehat{\text{SE}}
for one mean: \widehat{\text{SE}} = s / n^{1/2}
for a difference: variances add (groups are independent, thanks to randomization)
\widehat{\text{SE}} = \left(\frac{s_T^2}{n_T} + \frac{s_C^2}{n_C}\right)^{1/2}
| approach | \widehat{\text{SE}} | 95% CI |
|---|---|---|
| bootstrap, 10,000 resamples | 5.6 | [39.6, 61.3] |
| normal formula | 5.6 | [39.6, 61.2] |
both columns estimate the true SE — write \widehat{\text{SE}}
they agree — so why do we teach both?
advantages:
the real reason to teach the formula: the questions it answers that the bootstrap can’t
before ACTG 175 enrolled a patient, NIH had to answer: how many patients?
z_{\alpha/2} = 1.96 (two-sided 5% critical value), z_\beta = 0.84 (80th percentile of N(0,1))
n \;=\; \frac{2\sigma^2 \,(z_{\alpha/2} + z_\beta)^2}{\Delta^2} \;=\; \frac{2 \cdot 150^2 \cdot (1.96 + 0.84)^2}{50^2} \;\approx\; 141 \text{ per arm}
formula assumes equal group sizes — trial design choice
bootstrap can’t do this — no data yet to resample
for each statistic, predict: does the formula work, marginal, or fail?
then classify:
CLT applies to means, not medians
median’s bootstrap distribution is lumpier, wider, no simple closed-form SE
m=20 with right-skewed prices: bootstrap itself is skewed — normal CI would lie
m=500: CLT has kicked in
when the bootstrap distribution looks wrong, the CI is a warning — not an answer
the same tools scale beyond clinical trials — here’s one you’ll face in industry
A/B test: half your users see the old checkout, half see the new
the results are in:
ship it? defend:
one dataset, one statistic, quantified uncertainty
