Project
Table of contents
- Overview
- Deliverables and deadlines
- Team formation
- Required TA meeting (before May 1)
- Proposal (due Thu May 1)
- Midterm report (due Fri May 15)
- Final report and peer evaluations (due Mon Jun 8, 9:00 AM)
- In-person presentation (Jun 3–17)
- Grading
- Report outline
- FAQs
Overview
The MS&E 125 project provides hands-on experience with the data science pipeline: asking research questions, identifying datasets, cleaning and analyzing data, and communicating findings. Teams of 2–4 students prepare a written report and give an in-person presentation.
Any data-driven investigation is fair game. Past teams have studied athletic performance, gender inequality, restaurant quality, music success, gentrification, and standardized testing.
Deliverables and deadlines
| Deliverable | Due | Format |
|---|---|---|
| Group formation | Fri Apr 17 | Project Group Form |
| TA meeting | Before May 1 | 15-min slot with course staff |
| Proposal | Fri May 1 | Project Proposal Form |
| Midterm report | Fri May 15 | 4-page PDF |
| Final report + peer eval | Mon Jun 8, 9:00 AM | 8-page PDF + individual peer form |
| In-person presentation | May 27 – Jun 12 (scheduled) | ~20 min per team |
Team formation
Project teams consist of 2, 3, or 4 students. Submit your team via Project Group Form by Friday, April 17. If you need help finding teammates, post on Ed.
We expect effort to scale with team size: a 4-person team should produce roughly twice the depth or breadth of a 2-person team. Peer evaluations factor into individual grades.
Required TA meeting (before May 1)
Every team must meet with a member of the course staff for 15 minutes before the proposal deadline. At least two team members must attend. Sign up for a slot on the scheduling sheet.
Proposal (due Thu May 1)
After your TA meeting, submit your proposal via Project Proposal Form. The course staff will provide detailed comments. Your proposal should describe your research question, planned datasets, and intended methods.
Midterm report (due Fri May 15)
A 4-page submission. The bulk should be extensive exploratory data analysis of your dataset(s). Include at least one analysis using methods taught before the due date, plus plans for additional analyses using methods covered later. Address all comments from your proposal.
Final report and peer evaluations (due Mon Jun 8, 9:00 AM)
An 8-page submission (references and plots included in the page count). Aim to employ three or more statistical methods from the course. Address all comments from your midterm report.
Each team member submits an individual peer evaluation. Peer evaluations will be considered in grading.
In-person presentation (Jun 3–17)
Each team gives a ~20-minute presentation to the course staff, scheduled during the final two weeks of the quarter (May 27 – Jun 12). Sign-up slots will be posted in week 8. All team members must attend.
The presentation should cover your research question, key findings, and one methodological choice you found interesting or surprising. Slides are optional — walking through your report is fine.
Grading
The project is 30% of the course grade. Components:
| Component | What we look for |
|---|---|
| Proposal | Clear question, feasible plan, suitable data |
| Midterm report | Thorough EDA, at least one analysis, responsiveness to feedback |
| Final report | Multiple methods applied thoughtfully, clear writing, addressed feedback |
| Presentation | Can explain and defend your work in person |
| Peer evaluation | Honest assessment of team contributions |
The project is intentionally open-ended. As long as your team has spent sufficient time collecting, cleaning, exploring, and analyzing data — and has addressed feedback on your proposal and midterm report — you should receive high marks. If you have concerns about your project’s direction, visit office hours.
Report outline
This outline works for many projects. Yours may differ.
Introduction and motivation — Research questions, why they matter, hypothesis, brief summary of results.
Relevant work — Who else has studied this question? How does your project relate to or extend existing work?
Data and methods — Data sources, cleaning steps, methods used, and how each method addresses your research questions. For most teams, the core will be EDA followed by linear and/or logistic regression.
Results and discussion — Findings and interpretation. This is typically the longest section. Aim for 4–6 well-designed figures (not more than 8). Only include plots that advance your narrative.
Conclusion — To what extent did you answer your research question? What would you do with more time?
FAQs
Can we collect our own data? Yes. If using a survey, share a draft with the course staff before publishing. Effective survey design takes longer than expected — plan ahead.
Can reports exceed 8 pages? You may include an appendix of supplementary results, but the teaching team may not review material beyond 8 pages. Include only results that serve your narrative.
Can reports be double-spaced? Use whatever spacing tells your story effectively. Well-designed figures with informative captions are often more valuable than additional text.
What if my project is not novel? Repeating and extending an existing analysis is a strong approach. Use a fresher dataset, add a new method, or apply the same techniques to a different domain.
How can I make my project more interesting? Frame it as a decision problem: what actions could a business, government agency, or NGO take based on your findings? Which actions have the best return on investment?
What if my data was sampled in a biased way? Consider reweighting. Identify demographic or contextual variables that may differ between your sample and the target population, then resample proportionally.