Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Project

Table of contents

  1. Overview
  2. Deliverables and deadlines
  3. Team formation
  4. Required TA meeting (before May 1)
  5. Proposal (due Thu May 1)
  6. Midterm report (due Fri May 15)
  7. Final report and peer evaluations (due Mon Jun 8, 9:00 AM)
  8. In-person presentation (Jun 3–17)
  9. Grading
  10. Report outline
  11. FAQs

Overview

The MS&E 125 project provides hands-on experience with the data science pipeline: asking research questions, identifying datasets, cleaning and analyzing data, and communicating findings. Teams of 2–4 students prepare a written report and give an in-person presentation.

Any data-driven investigation is fair game. Past teams have studied athletic performance, gender inequality, restaurant quality, music success, gentrification, and standardized testing.

Deliverables and deadlines

DeliverableDueFormat
Group formationFri Apr 17Project Group Form
TA meetingBefore May 115-min slot with course staff
ProposalFri May 1Project Proposal Form
Midterm reportFri May 154-page PDF
Final report + peer evalMon Jun 8, 9:00 AM8-page PDF + individual peer form
In-person presentationMay 27 – Jun 12 (scheduled)~20 min per team

Team formation

Project teams consist of 2, 3, or 4 students. Submit your team via Project Group Form by Friday, April 17. If you need help finding teammates, post on Ed.

We expect effort to scale with team size: a 4-person team should produce roughly twice the depth or breadth of a 2-person team. Peer evaluations factor into individual grades.

Required TA meeting (before May 1)

Every team must meet with a member of the course staff for 15 minutes before the proposal deadline. At least two team members must attend. Sign up for a slot on the scheduling sheet.

Proposal (due Thu May 1)

After your TA meeting, submit your proposal via Project Proposal Form. The course staff will provide detailed comments. Your proposal should describe your research question, planned datasets, and intended methods.

Midterm report (due Fri May 15)

A 4-page submission. The bulk should be extensive exploratory data analysis of your dataset(s). Include at least one analysis using methods taught before the due date, plus plans for additional analyses using methods covered later. Address all comments from your proposal.

Final report and peer evaluations (due Mon Jun 8, 9:00 AM)

An 8-page submission (references and plots included in the page count). Aim to employ three or more statistical methods from the course. Address all comments from your midterm report.

Each team member submits an individual peer evaluation. Peer evaluations will be considered in grading.

In-person presentation (Jun 3–17)

Each team gives a ~20-minute presentation to the course staff, scheduled during the final two weeks of the quarter (May 27 – Jun 12). Sign-up slots will be posted in week 8. All team members must attend.

The presentation should cover your research question, key findings, and one methodological choice you found interesting or surprising. Slides are optional — walking through your report is fine.

Grading

The project is 30% of the course grade. Components:

ComponentWhat we look for
ProposalClear question, feasible plan, suitable data
Midterm reportThorough EDA, at least one analysis, responsiveness to feedback
Final reportMultiple methods applied thoughtfully, clear writing, addressed feedback
PresentationCan explain and defend your work in person
Peer evaluationHonest assessment of team contributions

The project is intentionally open-ended. As long as your team has spent sufficient time collecting, cleaning, exploring, and analyzing data — and has addressed feedback on your proposal and midterm report — you should receive high marks. If you have concerns about your project’s direction, visit office hours.

Report outline

This outline works for many projects. Yours may differ.

Introduction and motivation — Research questions, why they matter, hypothesis, brief summary of results.

Relevant work — Who else has studied this question? How does your project relate to or extend existing work?

Data and methods — Data sources, cleaning steps, methods used, and how each method addresses your research questions. For most teams, the core will be EDA followed by linear and/or logistic regression.

Results and discussion — Findings and interpretation. This is typically the longest section. Aim for 4–6 well-designed figures (not more than 8). Only include plots that advance your narrative.

Conclusion — To what extent did you answer your research question? What would you do with more time?

FAQs

Can we collect our own data? Yes. If using a survey, share a draft with the course staff before publishing. Effective survey design takes longer than expected — plan ahead.

Can reports exceed 8 pages? You may include an appendix of supplementary results, but the teaching team may not review material beyond 8 pages. Include only results that serve your narrative.

Can reports be double-spaced? Use whatever spacing tells your story effectively. Well-designed figures with informative captions are often more valuable than additional text.

What if my project is not novel? Repeating and extending an existing analysis is a strong approach. Use a fresher dataset, add a new method, or apply the same techniques to a different domain.

How can I make my project more interesting? Frame it as a decision problem: what actions could a business, government agency, or NGO take based on your findings? Which actions have the best return on investment?

What if my data was sampled in a biased way? Consider reweighting. Identify demographic or contextual variables that may differ between your sample and the target population, then resample proportionally.