As an avid board game player and collector, I have an ambition to someday design a board game. This led me on a search for existing data on boardgames, and I stumbled upon a dataset from the BoardGameGeek website. BGG is the largest online collection of board game data, and the online board game community voluntary contributes reviews and ratings on the website. The purpose of this EDA is to understand board game players better. I hoped to gain insight into what types of board games are rated highly and purchased more often. This understanding would then allow me to design a board game in the future.
I predicted that playtime, complexity, and domain affect rating average. I predicted that strategy games are highly rated compared to other types of games due to the intellectual challenge they provide. In addition, I predicted that medium complexity and average playtime would be rated highly. I was also curious about the mechanics of the game and whether I would find trends for this variable. For example, do the highest rated games involve mechanics such as dice rolling or storytelling?
The data used in this analysis can be found at: BOARDGAMEGEEK DATASET ON BOARD GAMES
This data set contains all ranked games as of February 2021 from the BGG database. After inspecting the data, I decided to drop the columns that would not be useful for the analysis such as ID, Name, Min Age, and BGG Rank. I also dropped outliers by eliminating the games with less than 146 users (less than 25th percentile), eliminating 'Year Published" of less than 0, and eliminating games with play time above 10,000 minutes (based on the boxplots for Year Published and Play Time). I converted Year Published, and Owned Users from float to integers. Finally, I dropped the nulls. Below is a data dictionary of the variables that I used in my analysis:
| Variable | Description | Data Type |
|---|---|---|
| Year Published | Year published | int64 |
| Min Players | Miniminum number of players recommended | int64 |
| Max Players | Maximum number of players recommended | int64 |
| Play Time | Playing time | int64 |
| Users Rated | Number of users that rated the game | int64 |
| Rating Average | Average rating received by the game | float64 |
| Complexity Average | Average complexity value of the game | float64 |
| Owned Users | Number of BGG registered owners of the game | int64 |
| Mechanics | Mechanics used by the game | object |
| Domains | Board game domains that the game belongs to | object |
!pip install plotly
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
Requirement already satisfied: plotly in /opt/anaconda3/lib/python3.9/site-packages (5.9.0) Requirement already satisfied: tenacity>=6.2.0 in /opt/anaconda3/lib/python3.9/site-packages (from plotly) (8.0.1)
import pandas as pd
boardgames = pd.read_csv("boardgames_clean.csv")
boardgames.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 8623 entries, 0 to 8622 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 8623 non-null object 1 Year Published 8623 non-null int64 2 Min Players 8623 non-null int64 3 Max Players 8623 non-null int64 4 Play Time 8623 non-null int64 5 Users Rated 8623 non-null int64 6 Rating Average 8623 non-null float64 7 Complexity Average 8623 non-null float64 8 Owned Users 8623 non-null int64 9 Mechanics 8623 non-null object 10 Domains 8623 non-null object dtypes: float64(2), int64(6), object(3) memory usage: 741.2+ KB
After cleaning the data, visualization helped to understand the relationship between variables. Correlation observations:
columns = ['Play Time','Users Rated','Rating Average','Complexity Average']
columns
sns.pairplot(boardgames[columns])
['Play Time', 'Users Rated', 'Rating Average', 'Complexity Average']
<seaborn.axisgrid.PairGrid at 0x123c08f40>
columns = ['Play Time','Users Rated','Rating Average','Complexity Average']
df_corr = boardgames[columns]
corrmat = df_corr.corr()
corrmat
f, ax = plt.subplots(figsize = (12, 12))
sns.heatmap(corrmat, vmax = .8, square = True, annot = True, cmap = 'BuGn', linewidths = .5 )
| Play Time | Users Rated | Rating Average | Complexity Average | |
|---|---|---|---|---|
| Play Time | 1.000000 | -0.049129 | 0.215150 | 0.358473 |
| Users Rated | -0.049129 | 1.000000 | 0.197286 | -0.004224 |
| Rating Average | 0.215150 | 0.197286 | 1.000000 | 0.523193 |
| Complexity Average | 0.358473 | -0.004224 | 0.523193 | 1.000000 |
<AxesSubplot:>
The top 8 board games based on the rating average are shown below. The board game with the highest rating is Primer: The Gamer's Source for Battles from the Age of Reason, with a rating average of 9.14079. The first bubble chart shows the positive correlation between complexity and rating average. The second bubble chart shows that most of the highest rated board games are new, published after 2018. The most common domains are thematic games and wargames. The most common mechanics are dice rolling, grid, and hexagon movement.
top8boardgames = boardgames[boardgames['Rating Average'] > 8.9]
top8boardgames
| Name | Year Published | Min Players | Max Players | Play Time | Users Rated | Rating Average | Complexity Average | Owned Users | Mechanics | Domains | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1954 | Core Space | 2019 | 1 | 6 | 180 | 648 | 8.92881 | 3.0476 | 1139 | Cooperative Game, Dice Rolling, Die Icon Resol... | Thematic Games |
| 2508 | Arena: The Contest | 2019 | 1 | 8 | 90 | 600 | 8.99374 | 2.7895 | 1094 | Action Queue, Cooperative Game, Dice Rolling, ... | Thematic Games |
| 2639 | Dungeon Universalis | 2019 | 1 | 6 | 180 | 454 | 8.98585 | 4.1017 | 697 | Action Points, Action Queue, Card Drafting, Co... | Thematic Games |
| 3387 | Roads to Gettysburg II: Lee Strikes North | 2018 | 1 | 2 | 1200 | 155 | 8.94271 | 3.6154 | 664 | Dice Rolling, Hexagon Grid, Simulation | Wargames |
| 4989 | 1985: Under an Iron Sky | 2018 | 2 | 6 | 8640 | 90 | 9.11889 | 4.2727 | 363 | Dice Rolling, Hexagon Grid, Simulation, Zone o... | Wargames |
| 5075 | Primer: The Gamer's Source for Battles from th... | 2013 | 2 | 6 | 120 | 58 | 9.14079 | 4.0000 | 255 | Dice Rolling, Hexagon Grid, Line of Sight, Mov... | Wargames |
| 5340 | Wings of the Motherland | 2019 | 2 | 8 | 240 | 79 | 9.31013 | 4.3636 | 211 | Dice Rolling, Grid Movement, Hexagon Grid, Mov... | Wargames |
| 6129 | 1985: Deadly Northern Lights | 2020 | 2 | 4 | 5000 | 36 | 8.99167 | 4.6250 | 155 | Dice Rolling, Grid Movement, Hexagon Grid, Mov... | Wargames |
(top8boardgames['Owned Users'].sum())/(boardgames['Owned Users'].sum())*100
0.018351727084723927
pd.Series(' '.join(top8boardgames['Mechanics']).split()).value_counts()
Dice 8 Rolling, 8 Game, 7 / 7 Points, 5 Grid, 5 Hexagon 5 Movement 4 Movement, 4 Campaign 3 of 3 Simulation 3 Grid 3 Action 3 Cooperative 3 Modular 3 Mission 3 Scenario 3 Board, 3 Control 2 Simulation, 2 Queue, 2 Zone 2 Role 2 Player 2 Variable 2 Point 2 Die 1 Semi-Cooperative 1 Line 1 Sight, 1 Icon 1 Combat 1 Results 1 Ratio 1 Card 1 Resolution, 1 Playing 1 Drafting, 1 Powers 1 Placement, 1 Tile 1 to 1 Elimination, 1 Playing, 1 Set-up 1 Table, 1 dtype: int64
fig = px.scatter(top8boardgames, x = "Complexity Average", y = "Rating Average", color = 'Name',
size = 'Rating Average', size_max = 60, text = 'Name',
title='Rating Average and Complexity for Top 8 Board Games',
color_discrete_sequence = px.colors.qualitative.Bold)
fig.update_layout(xaxis_title = 'Complexity Average',
yaxis_title = 'Rating Average');
fig.show();
fig = px.scatter(top8boardgames, x = "Year Published", y = "Rating Average", color = 'Name',
size = 'Rating Average', size_max = 60, text = 'Name',
title='Rating Average and Year Published for Top 8 Board Games',
color_discrete_sequence = px.colors.qualitative.Bold)
fig.update_layout(xaxis_title = 'Year Published',
yaxis_title = 'Rating Average');
fig.show();
The top 10 board games based on owned users are shown below. The most owned board game is Pandemic, with 155,312 owners. The first bubble chart shows the positive correlation between complexity and rating average. The second bubble chart shows a positive relationship between year published and rating average. The most common domains are strategy games and family games. The most common mechanics are drafting, hand management, and variable set. The visualizations show that the more complex the game is, the better the rating. In addition, the newer the game is, the better the rating. A possible explanation could be that newer games have more ratings.
mostownedboardgames = boardgames[boardgames['Owned Users'] > 85000]
mostownedboardgames
| Name | Year Published | Min Players | Max Players | Play Time | Users Rated | Rating Average | Complexity Average | Owned Users | Mechanics | Domains | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Terraforming Mars | 2016 | 1 | 5 | 120 | 64864 | 8.43254 | 3.2406 | 87099 | Card Drafting, Drafting, End Game Bonuses, Han... | Strategy Games |
| 16 | 7 Wonders Duel | 2015 | 2 | 2 | 30 | 60302 | 8.10645 | 2.2261 | 94343 | Card Drafting, Drafting, Set Collection, Sudde... | Strategy Games |
| 60 | 7 Wonders | 2010 | 2 | 7 | 30 | 84371 | 7.75152 | 2.3290 | 112410 | Card Drafting, Drafting, Hand Management, Set ... | Family Games, Strategy Games |
| 92 | Codenames | 2015 | 2 | 8 | 15 | 67688 | 7.61810 | 1.2885 | 107682 | Communication Limits, Memory, Push Your Luck, ... | Party Games |
| 97 | Dominion | 2008 | 2 | 4 | 30 | 78089 | 7.61898 | 2.3568 | 101839 | Deck Bag and Pool Building, Delayed Purchase, ... | Strategy Games |
| 98 | Pandemic | 2008 | 2 | 4 | 45 | 102214 | 7.60608 | 2.4115 | 155312 | Action Points, Cooperative Game, Hand Manageme... | Family Games, Strategy Games |
| 173 | Ticket to Ride | 2004 | 2 | 5 | 60 | 71611 | 7.42458 | 1.8487 | 97463 | Card Drafting, End Game Bonuses, Hand Manageme... | Family Games |
| 177 | Carcassonne | 2000 | 2 | 5 | 45 | 101853 | 7.41951 | 1.9126 | 149337 | Area Majority / Influence, Map Addition, Tile ... | Family Games |
| 283 | Love Letter | 2012 | 2 | 4 | 20 | 56013 | 7.23461 | 1.1922 | 92896 | Hand Management, Player Elimination, Score-and... | Family Games |
| 394 | Catan | 1995 | 3 | 4 | 120 | 101510 | 7.15261 | 2.3213 | 154531 | Dice Rolling, Hexagon Grid, Income, Modular Bo... | Family Games, Strategy Games |
pd.Series(' '.join(mostownedboardgames['Mechanics']).split()).value_counts()
Drafting, 7
Hand 6
Management, 6
Variable 5
Set 5
..
Team-Based 1
Deck 1
Bag 1
Pool 1
Production, 1
Length: 74, dtype: int64
fig = px.scatter(mostownedboardgames, x = "Complexity Average", y = "Rating Average", color = 'Name',
size = 'Rating Average', size_max = 60, text = 'Name',
title='Rating Average and Complexity for Most Owned Board Games',
color_discrete_sequence = px.colors.qualitative.Bold)
fig.update_layout(xaxis_title = 'Complexity Average',
yaxis_title = 'Rating Average');
fig.show();
fig = px.scatter(mostownedboardgames, x = "Year Published", y = "Rating Average", color = 'Name',
size = 'Rating Average', size_max = 60, text = 'Name',
title='Rating Average and Year Published for Most Owned Board Games',
color_discrete_sequence = px.colors.qualitative.Bold)
fig.update_layout(xaxis_title = 'Year Published',
yaxis_title = 'Rating Average');
fig.show();
The most complex board games, with a complexity above 4.5, are shown below. The most common domain are war games and strategy games. The most common mechanics are hexagon grid and dice rolling.
complexboardgames = boardgames[boardgames['Complexity Average'] > 4.5]
complexboardgames
| Name | Year Published | Min Players | Max Players | Play Time | Users Rated | Rating Average | Complexity Average | Owned Users | Mechanics | Domains | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 67 | On Mars | 2020 | 1 | 4 | 150 | 5781 | 8.29831 | 4.6337 | 10719 | Area Majority / Influence, Contracts, Delayed ... | Strategy Games |
| 68 | Lisboa | 2017 | 1 | 4 | 120 | 6693 | 8.18437 | 4.5667 | 10212 | Area Majority / Influence, Card Drafting, Hand... | Strategy Games |
| 321 | Advanced Squad Leader | 1985 | 2 | 2 | 480 | 3490 | 7.98739 | 4.7252 | 5739 | Critical Hits and Failures, Dice Rolling, Grid... | Wargames |
| 554 | Arkwright | 2014 | 2 | 4 | 240 | 1999 | 7.85987 | 4.5727 | 3458 | Commodity Speculation, Investment, Simulation,... | Strategy Games |
| 602 | Feudum | 2017 | 2 | 5 | 180 | 2726 | 7.67260 | 4.5758 | 5155 | Action Queue, Area Majority / Influence, Hand ... | Strategy Games |
| 1244 | Magic Realm | 1979 | 1 | 16 | 240 | 1979 | 7.19084 | 4.5267 | 3787 | Action Queue, Dice Rolling, Events, Modular Bo... | Strategy Games, Thematic Games |
| 1271 | World in Flames | 1985 | 2 | 7 | 6000 | 1338 | 7.59906 | 4.6304 | 2646 | Area Movement, Delayed Purchase, Dice Rolling,... | Wargames |
| 1301 | 1862: Railway Mania in the Eastern Counties | 2013 | 1 | 8 | 300 | 665 | 8.35649 | 4.5469 | 1872 | Auction/Bidding, Market, Network and Route Bui... | Strategy Games |
| 1403 | Here I Stand: 500th Anniversary Edition | 2017 | 2 | 6 | 360 | 588 | 8.60607 | 4.5357 | 1716 | Campaign / Battle Card Driven, Dice Rolling, H... | Wargames |
| 1499 | 1817 | 2010 | 3 | 7 | 540 | 482 | 8.75231 | 4.6825 | 791 | Auction/Bidding, Loans, Market, Network and Ro... | Strategy Games |
| 1502 | High Frontier (Third Edition) | 2017 | 1 | 5 | 240 | 646 | 8.28039 | 4.7027 | 1482 | Auction/Bidding, Simulation, Variable Player P... | Strategy Games, Thematic Games |
| 2152 | High Frontier 4 All | 2020 | 1 | 5 | 240 | 360 | 8.83153 | 4.7423 | 1647 | Auction/Bidding, Deck Bag and Pool Building, H... | Strategy Games |
| 2212 | Pacific War: The Struggle Against Japan 1941-1945 | 1985 | 2 | 2 | 6000 | 576 | 7.67296 | 4.5269 | 1728 | Dice Rolling, Hexagon Grid, Simulation | Wargames |
| 2853 | Advanced Third Reich | 1992 | 2 | 6 | 2480 | 730 | 6.88701 | 4.5328 | 1578 | Dice Rolling, Grid Movement, Hexagon Grid, Sce... | Wargames |
| 2948 | 18OE: On the Rails of the Orient Express | 2014 | 2 | 8 | 720 | 267 | 8.12689 | 4.5417 | 584 | Auction/Bidding, Network and Route Building, S... | Strategy Games |
| 3512 | A World at War | 2003 | 1 | 8 | 2880 | 325 | 7.55554 | 4.8447 | 1307 | Hexagon Grid | Wargames |
| 4100 | Europa Universalis | 1993 | 1 | 6 | 3600 | 339 | 6.85814 | 4.9000 | 785 | Area Movement, Dice Rolling, Events, Movement ... | Wargames |
| 4503 | Red Storm: The Air War Over Central Germany, 1987 | 2019 | 1 | 2 | 720 | 106 | 8.43632 | 4.5263 | 599 | Dice Rolling, Grid Movement, Hexagon Grid, Sim... | Wargames |
| 4944 | La Grande Guerre 14-18 | 1999 | 2 | 6 | 3600 | 103 | 7.82427 | 4.9130 | 262 | Dice Rolling, Grid Movement, Hexagon Grid, Sec... | Wargames |
| 4974 | Wacht Am Rhein | 2005 | 1 | 4 | 240 | 127 | 7.40236 | 4.6207 | 568 | Hexagon Grid, Simulation | Wargames |
| 5193 | Triumph of Chaos v.2 (Deluxe Edition) | 2019 | 2 | 2 | 600 | 74 | 8.32027 | 4.9286 | 337 | Campaign / Battle Card Driven, Dice Rolling, M... | Wargames |
| 5238 | Second Front | 1994 | 2 | 2 | 1440 | 118 | 7.31525 | 4.6000 | 376 | Hexagon Grid | Wargames |
| 5255 | Advanced European Theater of Operations | 2001 | 2 | 5 | 360 | 90 | 7.58667 | 4.7500 | 232 | Dice Rolling, Hexagon Grid, Simulation | Wargames |
| 5714 | Edelweiss: The Struggle in the Caucasus | 1989 | 2 | 2 | 360 | 76 | 7.30000 | 4.6000 | 351 | Hexagon Grid, Ratio / Combat Results Table, Zo... | Wargames |
| 5768 | War in the Pacific (Second Edition) | 2006 | 2 | 6 | 480 | 83 | 7.64927 | 4.7778 | 343 | Hexagon Grid, Simulation | Wargames |
| 5887 | Korsun Pocket: Little Stalingrad on the Dnepr | 1979 | 2 | 6 | 6000 | 50 | 7.80300 | 4.5385 | 178 | Dice Rolling, Grid Movement, Hexagon Grid, Mov... | Wargames |
| 6129 | 1985: Deadly Northern Lights | 2020 | 2 | 4 | 5000 | 36 | 8.99167 | 4.6250 | 155 | Dice Rolling, Grid Movement, Hexagon Grid, Mov... | Wargames |
| 6182 | War in the Pacific: The Campaign Against Imper... | 1978 | 1 | 6 | 360 | 84 | 7.05952 | 4.7692 | 337 | Dice Rolling, Hexagon Grid, Simulation | Wargames |
| 6304 | America in Flames | 1998 | 2 | 6 | 360 | 90 | 6.54185 | 4.5333 | 403 | Hexagon Grid | Wargames |
| 6372 | Bloody Omaha: D-Day 1944 | 2009 | 2 | 2 | 180 | 42 | 7.95714 | 4.6364 | 174 | Hexagon Grid, Secret Unit Deployment, Simulation | Wargames |
| 6382 | Patton in Flames | 2000 | 2 | 3 | 120 | 62 | 6.96935 | 4.9091 | 355 | Hexagon Grid | Wargames |
| 6618 | Advanced Pacific Theater of Operations | 2009 | 2 | 5 | 360 | 39 | 7.66410 | 4.6364 | 198 | Dice Rolling, Hexagon Grid, Simulation | Wargames |
| 6643 | Prelude to Disaster: The Soviet Spring Offensive | 1992 | 2 | 2 | 180 | 56 | 6.91786 | 4.5556 | 290 | Dice Rolling, Grid Movement, Hexagon Grid, Mov... | Wargames |
| 6707 | Home Before the Leaves Fall: The Marne Campaig... | 1997 | 2 | 4 | 360 | 56 | 6.67857 | 4.6250 | 223 | Hexagon Grid | Wargames |
| 6958 | Renegade Legion: Prefect | 1992 | 2 | 2 | 1800 | 66 | 6.56818 | 4.7500 | 304 | Dice Rolling, Grid Movement, Hexagon Grid, Mov... | Wargames |
| 7002 | D-Day at Iwo Jima | 2018 | 1 | 2 | 360 | 52 | 6.71808 | 4.8889 | 322 | Campaign / Battle Card Driven, Dice Rolling, H... | Wargames |
| 7886 | Killer Angels | 1984 | 2 | 2 | 300 | 35 | 5.71429 | 4.5556 | 196 | Dice Rolling, Hexagon Grid, Simulation | Wargames |
| 8360 | Air War: Modern Tactical Air Combat | 1977 | 2 | 4 | 120 | 303 | 5.27673 | 4.6557 | 819 | Dice Rolling, Grid Movement, Hexagon Grid, Sce... | Wargames |
| 8433 | The Eagle and the Sun | 1991 | 2 | 2 | 0 | 65 | 3.64000 | 4.8000 | 176 | Chit-Pull System, Dice Rolling, Grid Movement,... | Wargames |
pd.Series(' '.join(complexboardgames['Mechanics']).split()).value_counts()
/ 27
Hexagon 27
Grid, 22
Rolling, 21
Dice 21
..
Role 1
Playing, 1
Selection, 1
Powers, 1
Team-Based 1
Length: 110, dtype: int64
The purpose of this EDA was to determine which of the variables have a positive relationship with rating average. Based on this analysis, there is a positive relationship between complexity average and rating average. Interestingly, the top 8 board games by rating vary in complexity with complexity averages varying between 2.7 to 4.6. Most of the highest rated are war games and thematic games and involve dice rolling, grid, and hexagon. The total number of owners for top 8 board games by rating makes up 0.01% of the total number of owners overall. This suggests that the top 8 board games are niche games.
For the top 10 most owned board games, there is a positive relationship between year published and rating average. The top 10 most owned board games are fairly new, with 7 of the 10 most owned board games published on or after 2008. This was also the case for the top 8 board games by rating, with 7 of the 8 published after 2018. A possible explanation could be better marketing in the recent years or the rise of the review culture due to technology. The most owned board games include strategy games and family games and involve drafting and cards.
The most complex board games are mostly war games and involve hexagons, grids, and dice. The most common mechanics for complex board games are hexagon grid and dice rolling.
If I were to design a board game that would hypothetically be popular among gamers and be rated highly by the community, I would have to take account complexity and play time, since these variables have a moderate positive relationship with rating averages. If my goal is to design a board game that would belong in the catergory most owned, I would design a strategy game or family game that involves drafting and cards. If my goal is to design a niece board game that would appeal to a specific subset of the board game community, I would design a war game that involves hexagons, grids, and dice.