Final Exam — What to Expect
MSE 125 — Spring 2026
The final exam is cumulative over Lec 1–16 (the causal inference week and Ch 20 fairness are enrichment material and are excluded). It is closed book, no devices, no AI, 90 minutes, printed in black and white, single version.
The 8 unit quizzes and their practice quizzes are your primary study resource — the final reuses the question types they introduced. Two new archetypes show up only on the final; this handout introduces them so they’re not a surprise on exam day.
Structure
| Section | Points | Time | What it tests |
|---|---|---|---|
| 1. Tool literacy | 25 | $$22 min | 8 MC + 5 fill-in. Quick decisions: which test, which model, which CV protocol, which plot. A formula strip at the top of the section gives Bonferroni, expected FP, recall/precision, \(R^2\), residual. |
| 2. Interpretation & EDA | 35 | $$33 min | 3 problems with figures: regression-table interpretation, EDA plot critique, classification + threshold reasoning. |
| 3. Diagnose & supervise | 40 | $$35 min | 3 longer problems. Starts with the AI code review (new archetype, 15 pts), then diagnose-the-phenomenon (new archetype, 12 pts), then unsupervised interpretation (12 pts). |
No calculator. All arithmetic is doable on paper. Black-and-white printing — every figure distinguishes lines by linestyle, marker, and label, never by color.
Sample item 1 — AI code review
You asked an AI agent to predict customer churn from a labelled dataset. It returned this code:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2)
model = LogisticRegression().fit(X_tr, y_tr)
print("Accuracy:", accuracy_score(y_tr, model.predict(X_tr)))(a) Name the bug in one short phrase.
(b) What single sanity check would have caught it?
(a) The reported accuracy is training accuracy, not test accuracy. Line: accuracy_score(y_tr, model.predict(X_tr)) — both arguments come from the training partition. The model is being evaluated on data it was fit on.
(b) Any of: compute accuracy on X_te, y_te and compare; expect training accuracy to be higher than test accuracy; use cross-validation to estimate generalization.
Sample item 2 — Diagnose the phenomenon
“You ran a two-sample \(t\)-test comparing means between two groups and got \(p = 0.001\), but you only had \(n = 4\) observations per group.” Name two plausible causes of this \(p\)-value, and how you’d check each.
Any two of (each with a check):
- The \(t\)-test’s normality assumption is too weak at \(n=4\). Check: re-run a permutation test on the difference in means and compare the \(p\)-value.
- An outlier is driving the result. Check: plot the raw data; remove the most extreme observation and re-fit.
- The standard error estimate is unreliable at \(n=4\). Check: bootstrap the difference in means and look at the bootstrap CI’s width vs. the observed difference.
- The effect is real but huge. Check: report the effect size (Cohen’s \(d\)) in addition to the \(p\)-value; if it’s enormous, the small \(n\) is consistent with a real, large effect that doesn’t need many observations to be detected.
The exam awards credit for any two defensible causes with sensible checks. The exact list above is one set of right answers.
Recommended preparation
- Re-do the practice quizzes 1–8 under the real quiz time constraint (10 min each). These remain your primary study resource.
- Re-read your wrong answers on quizzes 1–8 specifically. The final reuses types of question, not specific scenarios.
- Take the parallel-form practice final under simulated exam conditions: 90-minute timer, no devices, no notes other than the formula strip in Section 1. The practice exam mirrors the real final’s structure exactly with fresh datasets throughout.
- Practice with the two sample items above. They’re representative of the two new archetypes (AI code review, diagnose-the-phenomenon).
- Skim the “for the quiz” callouts in each chapter — those are direct hints to what’s testable.