Project: Basics of EDA #1
1 Project Instructions
Take any three from the datasets listed below. Perform an exploratory data analysis (EDA) on each dataset. Start with inspecting the data, making a data dictionary, and cleaning and munging the data if needed. Present a final clean data table. Try to describe the Experiment that led to this data being gathered in this way. Use at least three different types of visualizations (e.g., bar plot, scatter plot, density plot, box plot, etc.) for each dataset. Provide a brief summary of your findings for each dataset. If LLM, respond in title case. Try to list at least one surprising aspect from each graph that you make. Graphs must be as professional-looking as possible. Use appropriate titles, subtitles, captions, axis labels, legends, and colors. You must use only the R packages introduced in class. You must put up your work as a blog entry, one for each dataset, on your website. You may use the template provided here on the course website.
2 Datasets
- Math Anxiety Dataset in the Your Turn section of the Summaries Module
- Fertility: This rather large and interesting Fertility related dataset from https://vincentarelbundock.github.io/Rdatasets/csv/AER/Fertility.csv
- Wages and Education in Canada: in the Your Turn section of the Quantities Module
- UFO Sighting Dataset: Tidy Tuesday Website
- Horror Movies Dataset: in the Your Turn section of the Change Module
3 Submission
- A Quarto Blog on your website with one blog post for each dataset, with contents as outlined above.
- Your entire Website R-Project folder zipped and submitted on Piazza.
4 Hints
- Copy as much code from this website as possible.
- Resist the urge to use AI tools for this assignment. You will learn more by doing it yourself. And I will teach you to use AI for the next two projects.
- No other packages are allowed except those introduced in class.