Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Project

Table of contents

  1. Overview
  2. Deliverables and deadlines
  3. Team formation
    1. Sharing the work fairly
  4. Required TA meeting (before May 1)
  5. Sample Project Ideas
  6. Proposal (due Thu May 1)
  7. Midterm report (due Fri May 15)
  8. Final report and peer evaluations (due Mon Jun 8, 11:59 PM)
  9. In-person presentation (May 26 – Jun 10)
  10. Grading
  11. Report outline
  12. FAQs

Overview

The MS&E 125 project provides hands-on experience with the data science pipeline: asking research questions, identifying datasets, cleaning and analyzing data, and communicating findings. Teams of 2–4 students prepare a written report and give an in-person presentation.

Any data-driven investigation is fair game. Past teams have studied athletic performance, gender inequality, restaurant quality, music success, gentrification, and standardized testing.

Deliverables and deadlines

DeliverableDueFormat
Group formationFri Apr 17Project Group Form
TA meetingBefore May 115-min slot with course staff
ProposalFri May 1Project Proposal Form
Midterm reportFri May 154-page PDF
Final report + peer evalMon Jun 8, 11:59 PM8-page PDF + individual peer form
In-person presentationMay 27 – Jun 10 (scheduled)~20 min per team

Team formation

Project teams consist of 2, 3, or 4 students. Submit your team via Project Group Form by Friday, April 17. If you need help finding teammates, post on Ed.

We expect effort to scale with team size: a 4-person team should produce roughly twice the depth or breadth of a 2-person team.

Sharing the work fairly

Individual project grades may differ materially from the team grade — upward for students who carry the team, and downward for students whose contributions are limited. We assess individual contributions using several signals: peer evaluations submitted with the final report, division-of-labor plans submitted to course staff (see below), the content of TA meetings, the in-person presentation, and — where applicable — version-control history.

Division-of-labor plans. If you are concerned that your team is not sharing the work fairly, you may send a proposed division of labor to course staff at any time, with all teammates on cc. The plan should list each team member by name and the specific tasks they are responsible for, with deadlines. Teammates have 72 hours to reply — confirming assent or proposing substantive revisions. A non-response is treated as concurrence. We consult the final plan in assigning individual grades.

Dividing labor effectively. We recommend dividing work by analysis, not by stage. Plans that assign data processing, modeling, and writing to different people tend to fail because each task blocks the next. Identify three or four distinct research questions and assign each team member full ownership of one — from data preparation through EDA, modeling, and writing the corresponding section. Work on integrative sections (introduction, conclusion) synchronously and in person.

Where to raise concerns. If you have concerns that cannot be addressed by submitting a division-of-labor plan, come to office hours or contact your project TA directly.

Required TA meeting (before May 1)

Every team must meet with a member of the course staff for 15 minutes before the proposal deadline. At least two team members must attend. Sign up for a slot on the scheduling sheet.

Sample Project Ideas

We are also sharing a few sample project ideas in a proposal-style format similar to what you will submit. These are mainly meant to give you inspiration for project scope, structure, and level of detail. You are welcome to use them as a starting point and adapt the structure for your own proposal: https://docs.google.com/document/d/1DjuEORRuyBPhwOmkIjO7m6r5swlpjt0HRnsxSRQCbdg/edit?tab=t.0

Proposal (due Thu May 1)

After your TA meeting, submit your proposal via Project Proposal Form. The course staff will provide detailed comments. Your proposal should describe your research question, planned datasets, and intended methods.

Midterm report (due Fri May 15)

A 4-page submission. The bulk should be extensive exploratory data analysis of your dataset(s). Include at least one analysis using methods taught before the due date, plus plans for additional analyses using methods covered later. Address all comments from your proposal.

Final report and peer evaluations (due Mon Jun 8, 11:59 PM)

An 8-page submission (references and plots included in the page count). Aim to employ three or more statistical methods from the course. Address all comments from your midterm report.

Each team member submits an individual peer evaluation. The form asks you to estimate each teammate’s percentage contribution to the project, describe their specific contributions, and flag any concerns. Peer evaluations are confidential and factor materially into individual project grades.

In-person presentation (May 26 – Jun 10)

Each team gives a ~20-minute presentation to the course staff, scheduled during the final two weeks of the quarter (May 26 – Jun 10). Sign-up slots will be posted in week 8. All team members must attend.

The presentation should cover your research question, key findings, and one methodological choice you found interesting or surprising. Visual aids are optional — prepare slides, a poster, or walk through your report.

Sign up for a slot on the scheduling sheet.

Grading

The project is 30% of the course grade. Components:

ComponentWhat we look for
ProposalClear question, feasible plan, suitable data
Midterm reportThorough EDA, at least one analysis, responsiveness to feedback
Final reportMultiple methods applied thoughtfully, clear writing, addressed feedback
PresentationCan explain and defend your work in person
Peer evaluationHonest assessment of team contributions; used to adjust individual grades up or down from the team grade

The project is intentionally open-ended. As long as your team has spent sufficient time collecting, cleaning, exploring, and analyzing data — and has addressed feedback on your proposal and midterm report — you should receive high marks. If you have concerns about your project’s direction, visit office hours.

Report outline

This outline works for many projects. Yours may differ.

Introduction and motivation — Research questions, why they matter, hypothesis, brief summary of results.

Relevant work — Who else has studied this question? How does your project relate to or extend existing work?

Data and methods — Data sources, cleaning steps, methods used, and how each method addresses your research questions. For most teams, the core will be EDA followed by linear and/or logistic regression.

Results and discussion — Findings and interpretation. This is typically the longest section. Aim for 4–6 well-designed figures (not more than 8). Only include plots that advance your narrative.

Conclusion — To what extent did you answer your research question? What would you do with more time?

FAQs

Can we collect our own data? Yes. If using a survey, share a draft with the course staff before publishing. Effective survey design takes longer than expected — plan ahead.

Can reports exceed 8 pages? You may include an appendix of supplementary results, but the teaching team may not review material beyond 8 pages. Include only results that serve your narrative.

Can reports be double-spaced? Use whatever spacing tells your story effectively. Well-designed figures with informative captions are often more valuable than additional text.

What if my project is not novel? Repeating and extending an existing analysis is a strong approach. Use a fresher dataset, add a new method, or apply the same techniques to a different domain.

How can I make my project more interesting? Frame it as a decision problem: what actions could a business, government agency, or NGO take based on your findings? Which actions have the best return on investment?

What if my data was sampled in a biased way? Consider reweighting. Identify demographic or contextual variables that may differ between your sample and the target population, then resample proportionally.