Build interactive Plotly treemap using Wine Enthusiast magazine data

One thing that struck me while exploring the Wine Enthusiast magazine data for other posts, is that there is so much information in it. This prompted me to check out ways to produce an interactive chart, which would allow users to dig into the data themselves. The plotly R graphing library provides functionality for making interactive charts, so in this post I wrangle the wine data into a structure that allows me to easily construct an interactive plotly treemap containing countries, provinces, regions and the associated red and white wines.

Firstly, I load the libraries required for this post.

# Load libraries required for this post
library(tidyverse)
library(plotly)
library(knitr)

Next, I get the data from the ‘Tidy Tuesday’ Github repo. For more about the Tidy Tuesday project, see this post.

# Get the base data from the Tidy Tuesday Github repository using readr package
# 14 variables with 129,971 observations
wine_ratings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-28/winemag-data-130k-v2.csv")

This dataset contains nearly 130k wines that have been rated by Wine Enthusiast magazine. The ratings range from 80 points to 100 points. Each row corresponds to a wine that has been rated and contains the rating, along with information about the wine. For this chart I am interested in the country, province and region for each wine, along with its color.

However, the dataset does not include a variable for wine color. Therefore, I need to create one based on the wine variety. Given the manual nature of this task (using ‘case_when()’), I filter for the most popular wine varieties (ones with more than 1,000 reviews), which leaves me with 27 varieties to map to a wine color. I then join this wine_color field back to the original dataset, and I also add a plot_colors variable at the country level, which will be used when I ultimately create the chart.

# Filter the data to keep only grapes with more than 1000 reviews
# Assign a wine color to each of those grape varieties
grape_color <- wine_ratings %>%
  count(variety) %>%
  arrange(desc(n)) %>%
  filter(n > 1000) %>%
  mutate(wine_color = case_when(
    variety == "Pinot Noir" ~ "Red",
    variety == "Chardonnay" ~ "White",
    variety == "Cabernet Sauvignon" ~ "Red",
    variety == "Red Blend" ~ "Red",
    variety == "Bordeaux-style Red Blend" ~ "Red",
    variety == "Riesling" ~ "White",
    variety == "Sauvignon Blanc" ~ "White",
    variety == "Syrah" ~ "Red",
    variety == "Merlot" ~ "Red",
    variety == "Nebbiolo" ~ "Red",
    variety == "Zinfandel" ~ "Red",
    variety == "Sangiovese" ~ "Red",
    variety == "Malbec" ~ "Red",
    variety == "Portuguese Red" ~ "Red",
    variety == "White Blend" ~ "White",
    variety == "Sparkling Blend" ~ "White",
    variety == "Tempranillo" ~ "Red",
    variety == "Rhône-style Red Blend" ~ "Red",
    variety == "Pinot Gris" ~ "White",
    variety == "Champagne Blend" ~ "White",
    variety == "Cabernet Franc" ~ "Red",
    variety == "Grüner Veltliner" ~ "White",
    variety == "Portuguese White" ~ "White",
    variety == "Bordeaux-style White Blend" ~ "White",
    variety == "Pinot Grigio" ~ "White",
    variety == "Gamay" ~ "Red",
    variety == "Gewürztraminer" ~ "White"
  )) %>%
  remove_missing() # remove 1 row with NAs (Rose)

# Enhance the base dataset by adding the wine color to the dataset
wine_ratings_color <- wine_ratings %>%
  inner_join(grape_color) %>%
  mutate(plot_colors = case_when( # color to distinguish countries in the plot
    country == "US" ~ "lavender",
    country == "France" ~ "gainsboro",
    country == "Italy" ~ "lightblue",
    country == "Spain" ~ "lightgreen",
    country == "Argentina" ~ "lightgray",
    country == "Australia" ~ "thistle"
  ))

A treemap is useful for displaying large amounts of hierarchically structured data. Therefore, I need to organize the data into a hierarchy suitable for this type of visualisation. The natural levels in the data are:

  1. country
  2. province
  3. region
  4. number of wineries producing each color of wine

In order to get the data into the necessary format, I first create a pared down dataset containing only the information I need. Then, I create a dataset for each of the four levels mentioned above, before finally chaining them all together in a final dataset.

# Reduce the base dataset keeping only necessary features and values
plotly_treemap_base_data <- wine_ratings_color %>%
  select(country, province, region_1, winery, wine_color, plot_colors) %>%
  # I don't include the US, as it is too big for the purposes of this post
  filter(country %in% c(
    "France", "Italy",
    "Spain", "Argentina", "Australia"
  )) %>%
  drop_na(region_1) %>% # remove about 100 rows with NA
  group_by(country, province, region_1, wine_color, plot_colors) %>%
  count(name = "total_wineries")

# First create each nested level and then append them into one final dataset
# There are four nested levels in total:
# 1. country --> 2. province --> 3. region -->
# 4. number of wineries producing each color of wine

# Create the top level: countries
country_level_1 <- plotly_treemap_base_data %>%
  ungroup() %>%
  group_by(country) %>%
  mutate(
    ids = country,
    labels = country,
    parents = "",
    values = sum(total_wineries),
    fontSize = 12,
    colors = plot_colors
  ) %>%
  ungroup() %>%
  select(ids, labels, parents, values, fontSize, colors) %>%
  distinct()

# Create the second level: province
province_level_2 <- plotly_treemap_base_data %>%
  ungroup() %>%
  group_by(country, province) %>%
  mutate(
    ids = paste0(country, "-", province), # previous label - current label
    labels = province,
    parents = country, # id from country level
    values = sum(total_wineries),
    fontSize = 12,
    colors = plot_colors
  ) %>%
  ungroup() %>%
  select(ids, labels, parents, values, fontSize, colors) %>%
  distinct()

# Create the third level: region
region_level_3 <- plotly_treemap_base_data %>%
  ungroup() %>%
  group_by(country, province, region_1) %>%
  mutate(
    ids = paste0(province, "-", region_1), # previous label - current label
    labels = region_1,
    parents = paste0(country, "-", province), # id from province level
    values = sum(total_wineries),
    fontSize = 12,
    colors = plot_colors
  ) %>%
  ungroup() %>%
  select(ids, labels, parents, values, fontSize, colors) %>%
  distinct()

# Create the fourth level: number of wineries producing each color of wine
color_level_4 <- plotly_treemap_base_data %>%
  mutate(
    ids = paste0(region_1, "-", wine_color), # previous label - current label
    labels = wine_color,
    parents = paste0(province, "-", region_1), # id from region level
    values = total_wineries,
    fontSize = 12,
    colors = case_when(
      wine_color == "Red" ~ "bisque",
      wine_color == "White" ~ "honeydew"
    )
  ) %>%
  ungroup() %>%
  select(ids, labels, parents, values, fontSize, colors)

# Chain the four levels together to form the final dataset
plotly_treemap_data_4_levels <- country_level_1 %>%
  add_row(province_level_2) %>%
  add_row(region_level_3) %>%
  add_row(color_level_4)

See the table below for an example of how the four levels hang together. It contains the first row from each level. The ‘ids’ are constructed by concatenating the ‘labels’ from the previous level and the current level (separated by a dash). The ‘parents’ are simply the ‘ids’ from the level above. The ‘values’ correspond to the total number of wineries at that level. So, in the example below, there are 3,118 wineries in Argentina, 2,752 in Mendoza Province, 41 in Agrelo, of which 39 produce red wine. The ‘fontsize’ and ‘colors’ fields relate to the appearance of the chart.

idslabelsparentsvaluesfontSizecolors
Country LevelArgentinaArgentina3,11812lightgray
Province LevelArgentina-Mendoza ProvinceMendoza ProvinceArgentina2,75212lightgray
Region LevelMendoza Province-AgreloAgreloArgentina-Mendoza Province4112lightgray
Color LevelAgrelo-RedRedMendoza Province-Agrelo3912bisque

Using the final dataset, the interactive treemap can be created using the plot_ly() function from the plotly R library.

# Generate the interactive treemap chart
# country -> province -> region; sized by the numbers of wineries
country_regions_color_plot <- plot_ly(
  type = "treemap",
  ids = plotly_treemap_data_4_levels$ids,
  labels = plotly_treemap_data_4_levels$labels,
  parents = plotly_treemap_data_4_levels$parents,
  values = plotly_treemap_data_4_levels$values,
  marker = list(colors = plotly_treemap_data_4_levels$colors),
  branchvalues = "total",
  textinfo = "label+value+percent entry",
  domain = list(column = 0)
)

It easy to drill down through the countries, provinces and regions to see the breakdown of the production of red and white wine. The chart also makes it easy to navigate back up through the levels, as well as providing tool-tips, as the user hovers over the differnt parts.

FranceItalySpainArgentinaAustraliaBordeauxBurgundyAlsaceChampagneLoire ValleyBeaujolaisSouthwest FranceRhône ValleyFrance OtherLanguedoc-RoussillonProvenceTuscanyPiedmontNortheastern ItalyVenetoSicily & SardiniaCentral ItalyLombardySouthern ItalyItaly Other Northern SpainCataloniaCentral SpainLevanteGaliciaSpain OtherSpanish IslandsAndaluciaMendoza ProvinceOtherSouth AustraliaWestern AustraliaVictoriaAustralia OtherTasmaniaNew South WalesSaint-ÉmilionBordeauxBordeaux BlancBordeaux SupérieurPessac-LéognanHaut-MédocBlaye Côtes de BordeauxMédocMargauxPauillacSaint-EstèphePomerolSaint-JulienCastillon Côtes de BordeauxSauternesGravesCadillac Côtes de BordeauxCôtes de BourgLalande de PomerolListrac-MédocMoulis-en-MédocCôtes de BordeauxBarsacFronsacFrancs Côtes de BordeauxMontagne-Saint-ÉmilionEntre-Deux-MersLussac Saint-ÉmilionPuisseguin Saint-ÉmilionBordeaux ClairetPremieres Côtes de BordeauxPremieres Côtes de BlayeBordeaux RoséCanon-FronsacCadillacCôtes de CastillonCrémant de BordeauxSainte-Foy BordeauxSaint-Georges-Saint-ÉmilionBordeaux Côtes de FrancsCéronsSaint-Croix-du-Mont ChablisPouilly-FuisséBourgogneBeauneSaint-VéranMeursaultSavigny-lès-BeauneMâcon-VillagesNuits-St.-GeorgesChassagne-MontrachetMercureyPuligny-MontrachetGevrey-ChambertinPommardVolnayCrémant de BourgogneChambolle-MusignyVosne-RomanéeSantenayMarsannayViré-ClesséSaint-AubinClos de VougeotCorton-CharlemagnePernand-VergelessesCortonMontagnyRullyAloxe-CortonEchézeauxBourgogne Hautes Côtes de NuitsPetit ChablisChorey-lès-BeauneGivryMorey-Saint-DenisBourgogne Hautes Côtes de BeauneMâcon-LugnyPouilly-VinzellesCharmes-ChambertinCôte de Nuits-VillagesPouilly-LochéCoteaux BourguignonsFixinBâtard-MontrachetCôte ChalonnaiseHautes Côtes de NuitsSaint-BrisChevalier-MontrachetMâcon-ChardonnayAuxey-DuressesChambertinMâcon La Roche VineuseChambertin Clos de BèzeGrands-EchezeauxMâcon-FuisséMâcon-Milly LamartineCorton VergennesCôte de BeauneClos de la RocheMâcon-CruzilleLadoixMâconMazis-ChambertinMontrachetCôte de Beaune-VillagesMâcon-VerzeMâcon-VinzellesSaint-RomainVougeotMâcon-AzéMâcon-LochéChapelle-ChambertinClos Saint-DenisCorton Les RenardesLatricières-ChambertinMâcon ChaintréRomanée-St.-VivantBonnes-MaresBourgogne EpineuilClos de LambraysCôtes de Nuits VillagesLa RomanéeMacon-BussièresMâcon-DavayéMâcon-UchizyMarangesMeursault-BlagnyMonthélieBourgogne Côtes d’AuxerreBourgogne GamayClos de TartCorton-BressandesHautes Cotes de BeauneIrancyMâcon-CharnayMâcon-VergissonMâcon Solutré AlsaceCrémant d'AlsaceChampagneSancerrePouilly-FuméTouraineChinonMenetou-SalonCrémant de LoireVal de LoireCoteaux du GiennoisSaumur-ChampignyBourgueilSaint-Nicolas-de-BourgueilQuincySaumurAnjouReuillyVin de Pays du Val de Loire Beaujolais-VillagesMorgonBeaujolaisMoulin-à-VentFleurieBrouillyJuliénasChiroublesSaint-AmourCôte de BrouillyBeaujolais BlancRégniéChénasBeaujolais RoséBeaujolais-Leynes CahorsCôtes de GascogneGaillacBergerac SecBergeracFrontonCôtes de BergeracCôtes du MarmandaisCôtes du TarnBuzetMadiranMonbazillacSaint-MontVin de Pays des Côtes de GascogneMontravelCôtes du LotPécharmantGersCôtes de DurasJurançon SecPacherenc du Vic BilhVin de Pays du Comté TolosanVin de Pays du LotIrrouléguyJurançon Côtes du RhôneChâteauneuf-du-PapeCostières de NîmesCôtes du Rhône VillagesCrozes-HermitageCornasGigondasSaint-JosephCôte RôtieHermitageVentouxVacqueyrasLirac Vin de FranceVin Mousseux ToscanaChianti ClassicoBrunello di MontalcinoRosso di MontalcinoVino Nobile di MontepulcianoBolgheriMorellino di ScansanoChiantiRosso di MontepulcianoBolgheri SuperioreMaremma ToscanaChianti RufinaMaremmaMontecuccoCortonaChianti Colli SenesiColli della Toscana CentraleSant'AntimoChianti SuperioreVin Santo del Chianti ClassicoChianti Colli FiorentiniCarmignanoVal di Cornia SuveretoOrciaVernaccia di San GimignanoMontescudaioPominoRosso di ToscanaBolgheri SassicaiaChianti MontalbanoMonteregio di Massa MarittimaAlta Valle della GreveBarco Reale di CarmignanoChianti MontespertoliCosta ToscanaTuscanyVin Santo del ChiantiVin Santo del Chianti RufinaVin Santo di MontepulcianoChianti Colli Aretini BaroloBarbarescoLangheRoeroNebbiolo d'AlbaGattinaraPiedmontGhemmeAlta LangaColline NovaresiCoste della SesiaBramaterraMonferratoLessonaBoca Alto AdigeCollioDelle VenezieColli Orientali del FriuliTrentoTrentinoVenezia GiuliaVigneti delle DolomitiFriuli GraveFriuli IsonzoIsonzo del FriuliVenezieAlto Adige Valle IsarcoFriuli Colli OrientaliFriuliFriuli Venezia GiuliaFriuli AquileiaRosazzo Amarone della Valpolicella ClassicoVenetoAmarone della ValpolicellaValpolicella Classico Superiore RipassoPiaveSoave ClassicoValpolicella Superiore RipassoVeroneseRosso del VeroneseSoaveValpolicella RipassoValdadigeBardolinoValpolicella SuperioreSoave Classico SuperioreValpolicella Classico SuperioreValpolicellaBardolino ChiarettoValpolicella ClassicoVeneziaBardolino ClassicoColli di ConeglianoMarca TrevigianaRecioto della Valpolicella ClassicoValdobbiadene Prosecco SuperioreVeneto Orientale SiciliaEtnaTerre SicilianeIsola dei NuraghiCerasuolo di VittoriaCerasuolo di Vittoria ClassicoFaroContea di Sclafani SalinaValli di Porto Pino UmbriaMontefalco RossoRomagnaOrvieto Classico SuperioreSangiovese di Romagna SuperioreRosso PicenoMarcheOrvieto ClassicoLazioOrvietoTerre di ChietiFrascati SuperioreEmilia-RomagnaForlìRosso Piceno SuperioreSangiovese di RomagnaTorgiano Salice SalentinoSalentoPugliaCampaniaCastel del MonteIrpinia RiojaRibera del DueroNavarraVino de la Tierra de Castilla y LeónCariñenaSomontanoToroRuedaCampo de BorjaCigalesEmpordàGetariako TxakolinaCalatayudTierra del Viños de ZamoraPago de ArínzanoBierzoBizkaiko Txakolina CavaPrioratPenedèsMontsantTerra AltaCosters del SegreCatalunyaConca de BarberàAlella Vino de la Tierra de CastillaLa ManchaValdepeñasDominio de Valdepusa MendozaUco ValleyTupungatoLuján de CuyoValle de UcoLa ConsultaAgreloPerdrielMaipúVista Flores Adelaide HillsSouth AustraliaClare ValleyMcLaren ValeCoonawarraEden ValleyBarossaBarossa ValleyLanghorne CreekClarendonPadthawayAdelaideLimestone Coast Margaret River Yarra Valley Red5612%Red5101% White4801% Red4531%Red2291%White1670.452%Red3731%Red2981%White420.114%Red3051% White5592%White3101%White1360.368%Red980.265% White16905%Red1340.363% White15174%Red720.195% Red10083%White750.203%Red10613%Red8252% Red15994%Red7942% Red6982%White750.203%Red4771% White4651% Red16604%White2801%

This interactive chart effectively allows the user to explore a dataset with about 37k rows (before the data was modified to create the chart). To plot this information, a static treemap chart would require a lot more aggregation of the data, due to the sheer number of provinces and regions within each country. Clearly, the interactive treemap is a better option in this case, as it retains so much of the detail.

Conor Buckley
Conor Buckley

My interests include data wrangling, using the R tidyverse, and making insightful charts.