Final Project
Overview
For your project you will choose a real-world question that you are interested in and will use statistical methods that we are learning in class. At the end you’ll prepare a report of your findings.
To give you time for this project, approximately 8 hours of time usually devoted to lecture and HW has been removed from the last week and a half of class. Therefore, it is expected that you spend about 8 hours on this project.
Elements of your project
Your project needs to include the following elements:
- Hypothesis. State your question as a hypothesis. Note that your hypothesis must be one that can be answered using the hypothesis tests covered in this course.
- Dataset. Find a dataset online (or gather your own data) to test your hypothesis. There is no minimum or maximum size to your dataset. However, if it is really large then you approach population statistics and the whole framework of hypothesis testing is not really relevant. If your dataset is small, be sure it is appropriate for the assumptions of t-distributions. One useful starting point for finding datasets is here
- Descriptive statistics. Describe your data in the most appropriate ways. This will likely include qualitative descriptions (e.g. shape of distribution, skewness, etc.), quantitative descriptions (e.g. mean, standard deviation, median, etc.), and figure(s) (e.g. histogram, boxplot, probability distribution plot, and/or cumulative distribution plot). Choose the figure(s) most appropriate for your data set.
- Inferential statistics. Compute a confidence interval and test your hypothesis.
Report of your findings
Your memo needs to include all elements of your project (hypothesis, explanation of dataset, descriptive statistics, inferential statistics) and follow the IMRaD style, including an abstract (for guidance and helpful resources on this style, see here. Keep it to 2 pages.
Deliverables
-
Proposal. The proposal for this project is a draft of the introduction to your report. It should clearly answer the following two questions:
- What question will you answer? Phrase your question as a hypothesis.
- What dataset will you use to answer the question? You must have obtained the actual data.
You should discuss your project directly with a TA or myself. We will record a score for students we meet with.
-
Final Report. As mentioned above, the report should follow the IMRaD format. All figures and tables (tables are optional) should follow the proper format. The rubric for the report is shown below.
- (3 pts): The dataset is appropriate for testing the hypothesis
- (7 pts): The distribution of the dataset is described using at least one figure and description. The figures may include a boxplot, histogram, probability mass function, and/or cumulative distribution function. The description includes summary statistics describing central tendency and spread.
- (10 pts): Hypothesis test. The test performed is appropriate for answering the stated hypothesis. The test was performed correctly. The test is interpreted correctly.
- (10 pts): The abstract, introduction, methods, results/discussion, figures, and tables are clear, concise, and convincing.