Final Project
Overview
For your project you will choose a real-world question that you are interested in and will use statistical methods that we are learning in class. At the end you’ll prepare a report of your findings.
To give you time for this project, approximately 8 hours of time usually devoted to lecture and HW has been removed from the last week and a half of class. Therefore, it is expected that you spend about 8 hours on this project.
Elements of your project
Your project needs to include the following elements:
- Hypothesis. State your question as a hypothesis. Note that your hypothesis must be one that can be answered using the hypothesis tests covered in this course.
- Dataset. Find a dataset online (or gather your own data) to test your hypothesis. There is no minimum or maximum size to your dataset, but it needs to be large enough to test your hypothesis. Also, if your dataset is small, you will need to use tools that are appropriate for small datasets. A useful starting point for finding datasets is here
- Descriptive statistics. Describe your data in the most appropriate ways. This will likely include qualitative descriptions (e.g. shape of distribution, skewness, etc.), quantitative descriptions (e.g. mean, standard deviation, median, etc.), and figure(s) (e.g. histogram, boxplot, probability distribution plot, and/or cumulative distribution plot). Choose the figure(s) most appropriate for your data set.
- Inferential statistics. Compute a confidence interval and test your hypothesis.
Report of your findings
Your memo needs to include all elements of your project (hypothesis, explanation of dataset, descriptive statistics, inferential statistics) and follow the IMRaD style, including an abstract (for guidance and helpful resources on this style, see here. Keep it to 2 pages.
Deliverables
-
Proposal. The proposal for this project is a draft of the introduction to your report. It should clearly answer the following two questions:
- What question will you answer? Phrase your question as a hypothesis.
- What dataset will you use to answer the question? You must have obtained the actual data.
You should discuss your project directly with a TA or myself. After meeting with us, upload a pdf of your proposal to Learning Suite. The TAs will record which students they meet with.
-
Final Report. As mentioned above, the report should follow the IMRaD format. All figures and tables (tables are optional) should follow the proper format. The rubric for the report is shown below.
- (3 pts): The dataset is appropriate for testing the hypothesis
- (7 pts): The distribution of the dataset is described using at least one figure and description. The figures may include a boxplot, histogram, probability mass function, and/or cumulative distribution function. The description includes summary statistics describing central tendency and spread.
- (10 pts): Hypothesis test. The test performed is appropriate for answering the stated hypothesis. The test was performed correctly. The test is interpreted correctly.
- (10 pts): The abstract, introduction, methods, results/discussion, figures, and tables are clear, concise, and convincing.