A brief introduction to the 'Tidy Tuesday' project
A different dataset every Tuesday
A number of my posts source data via the ‘Tidy Tuesday’ project, so I thought it would make sense for me to provide some further information on this project. Every Tuesday a new dataset is provided and people are encouraged to wrangle the data and create a visualisation using the R tidyverse (although other code based methodologies are also welcome). People can post their code and output on twitter (#TidyTuesday). The project was originally co-founded by Thomas Mock in 2018.
The Tidy Tuesday github repository is an excellent starting point to learn more about the project. It contains some background to the project, participant guidelines and guidance, along with all the weekly datasets.
Owing to the popularity of the project, an R package, tidytuesdayR, was also developed. This allows for easy access to the datasets from within R. For example, if I wanted to access the original craft beer dataset I used in this post, I can bring back a list of all the tidy tuesday datasets and then call the relevant dataset using the appropriate tidy tuesday date.
# Load tidy tuesday library
library(tidytuesdayR)
# Obtain all the available tidy tuesday datasets
# Do a check first to make sure daily query limit has not been reached
# Note, I don't execute this code here as it returns many, many rows
if (rate_limit_check(quiet = TRUE) > 10) {
all_available_datasets <- tt_available()
print(all_available_datasets)
}
The craft beer dataset was used for Tidy Tuesday on July 10th 2018. Therefore, the dataset can be imported via the tidytuesdayR package using this date.
# Using the tidy tuesday date sourced from the table above load the dataset
# Do a check first to make sure daily query limit has not been reached
if (rate_limit_check(quiet = TRUE) > 10) {
craft_beer_data <- tt_load("2018-07-10")
head(craft_beer_data$week15_beers)
}
##
## Downloading file 1 of 1: `week15_beers.xlsx`
## # A tibble: 6 x 8
## count abv ibu id name style brewery_id ounces
## <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 1 0.05 NA 1436 Pub Beer American Pale L~ 408 12
## 2 2 0.066 NA 2265 Devil's Cup American Pale A~ 177 12
## 3 3 0.071 NA 2264 Rise of the Phoenix American IPA 177 12
## 4 4 0.09 NA 2263 Sinister American Double~ 177 12
## 5 5 0.075 NA 2262 Sex and Candy American IPA 177 12
## 6 6 0.077 NA 2261 Black Exodus Oatmeal Stout 177 12
Also, I found this R Shiny App useful for browsing submissions people made under the #TidyTuesday twitter hashtag. It doesn’t look like it has been updated in a while, but it is still very interesting.
Another good resource is this list of youtube videos by David Robinson. In each video David takes a look at a Tidy Tuesday dataset livecoding his analysis and visualisations in R.
If you are searching for inspiration for a small personal data project, then I recommend looking through the datasets in the Tidy Tuesday github repository and checking out the submissions based on those datasets under the #TidyTuesday hashtag on twitter. If you are not on Twitter, then you can access the raw tweets via the github repository. See this article for details on how to do that.