Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Project

Table of contents

  1. Description
  2. Timeline
  3. Required TA meeting (Before April 28)
  4. Project proposal (Due April 28)
  5. Project midterm report (Due May 12)
  6. Final report and peer evaluations (Due June 9)
  7. Grading
  8. Other FAQs
    1. What are the expectations for larger teams?
    2. Can we collect our own data?
    3. Can project reports be longer than 8 pages?
    4. Can reports be double spaced?
    5. Is there a presentation component to the project?
    6. Is there a rough outline of what you’re looking for in the report?

Description

The MS&E 125 project provides hands-on experience with key steps of the data science pipeline:

  • Asking research questions
  • Identifying dataset(s) to help you answer your questions
  • Cleaning, exploring, and analyzing datasets using tools from 125 and beyond
  • Synthesizing and compiling your results in a report

You are free to pursue any topic related to applied statistics. In previous years, teams have considered athletic performance, gender inequality, farming practices, restaurant quality, music success, gentrification, and standardized testing, just to name a few. Any data-driven investigation is fair game.

At the end of the quarter, each team of 2-4 students will prepare an 8-page written report. References and plots should be included in the 8 pages.

Timeline

Friday, April 14: Project group form due

Before April 28: Required 15-minute meeting with project group and course staff

Friday, April 28: Project proposal due

Friday, May 12: Project midterm report (4 pages)

Friday, June 9: Project final report (8 pages) and peer evaluations

Required TA meeting (Before April 28)

To help assess the feasibility and suitability of your project, your are required to sign up for a 15-minute meeting slot with a member of the course staff.

It is ideal if all members of your group can attend the meeting. At least two members must be present. This meeting is a required part of the project.

Project proposal (Due April 28)

After discussing your project idea with the course staff, you should prepare and submit your project proposal. The course staff will provide detailed comments on your proposal.

Project midterm report (Due May 12)

The midterm report is a 4-page submission. The bulk of the report should be extensive exploratory data analysis of one or more datasets that you will use for your project. You should conduct one or more analyses using methods taught before before the midterm report due date. You should also detail plans for additional analyses that will incorporate statistical methods learned later in the course. The comments on your project proposal should be fully addressed in the midterm report. The course staff will provide detailed comments on your midterm report.

Final report and peer evaluations (Due June 9)

The final report is an 8-page submission. In the final report, you should aim to employ three or more statistical methods learned in the course, such as bootstrapping, linear regression, and regularized logistic regression. The comments on your midterm report should be fully addressed in the final report. In additional to the final report, peer evaluations should be submitted individually by each member of the project group. Peer evaluations will be considered in the grading process for the final report.

Grading

The project is intentionally open-ended and is graded holistically. Given the unique challenges faced by each team, there isn’t a one-size-fits-all rubric. Some teams will spend more time collecting complex data and have simpler analyses, while others will pursue more complex analyses of data that’s already clean.

As long as there’s evidence that your team has spent time sufficiently collecting, cleaning, exploring, and analyzing your data, and has taken into consideration the comments on your proposal and midterm report, you should receive high marks.

If you have concerns about the specific directions of your project, please see a member of the teaching staff during office hours. We’re happy to lead you in the right direction!

Other FAQs

What are the expectations for larger teams?

Project teams must consist of two, three, or four students.

While it can be difficult to precisely quantify project output, we expect a 4 person group to produce a project that requires twice the effort/hours of a 2 person project. This can take the form of increased depth and/or breadth.

Upon completing the project, we’ll also ask each student to evaluate the contributions of their team members, and we’ll consider these peer reviews when determining final grades.

Can we collect our own data?

Yes! Many past students have used surveys to answer their research questions.

If you plan to create a survey, be sure to share a copy of your survey with the course staff before publishing it. You’ll also want publish your survey well before the project deadline.

Effective survey design can take much longer than expected, so it’s not a good option for a last-minute project!

Can project reports be longer than 8 pages?

You’re welcome to include an appendix of additional relevant results, but we can’t guarantee that the teaching team will review anything beyond 8 pages. Please make sure not to just dump all of your extraneous findings and plots in an appendix unless there’s a good reason to include them. While research papers often have just 3-5 main plots, researchers will often produce hundreds of plots over the course of a project that the public never sees.

Can reports be double spaced?

You should include as much text as needed to fully tell the story of your project. If you can do that with double spaced text, that’s fine. Keep in mind that plots/figures will take up a lot of space in your report, and relevant/thoughtfully-designed plots/figures/captions are arguably more important than the main text.

Is there a presentation component to the project?

No. If you think a supplementary video will help us better understand the findings in your report, please discuss with the course staff before the project deadline.

Is there a rough outline of what you’re looking for in the report?

As mentioned above, the project is intentionally open-ended and doesn’t have a fixed rubric. That being said, here is a sample outline that works for many projects. Keep in mind that your outline may differ!

Introduction and motivation

  • What are your research questions?

  • Why are these question interesting?

  • What’s your hypothesis?

  • What’s the brief summary of your results?

Relevant work

  • Who else has tried to answer your questions?

  • Were they successful?

  • How does your project relate to or build on existing work?

  • You should be able to recycle a lot of your proposal submission in this section!

Data and methods

  • How will you go about answering your research questions?

  • What data sources will you use?

  • What methods will you use, and how will they answer your research questions?

  • For most groups, the ideal methods will be extensive exploratory data analysis, followed by linear and/or a logistic regression(s). If this is the case for your project, you would use this section to describe what you will plot, describe the structure of your regressions, and explain how your regressions will answer your research questions.

Results and discussion

  • What are your findings?

  • How do you interpret those findings?

  • This will probably be your longest section, so spend the most time here.

  • You shouldn’t have more than 8 total plots+figures in your results. 4-6 is a good target. Only include plots if they are relevant to your story.

  • Spend sufficient time making your plots pretty! With a tool like ChatGPT/CodeSquire, it’s a lot easier to figure out how to clean up plots. Aim to make your plots as clean as a plot you might see in a professional news outlet. You’re welcome to discuss potential improvements to your plots during office hours.

Conclusion

  • To what extent did you answer your research question?

  • Was your hypothesis correct?

  • With infinite time and resources, how would you go about better answering your research question?