Analysis of Board Game Data to Design a Perfect Board Game

Sheila Acar

09 March 2022

Introduction

As an avid board game player and collector, I have an ambition to someday design a board game. This led me on a search for existing data on boardgames, and I stumbled upon a dataset from the BoardGameGeek website. BGG is the largest online collection of board game data, and the online board game community voluntary contributes reviews and ratings on the website. The purpose of this EDA is to understand board game players better. I hoped to gain insight into what types of board games are rated highly and purchased more often. This understanding would then allow me to design a board game in the future.

I predicted that playtime, complexity, and domain affect rating average. I predicted that strategy games are highly rated compared to other types of games due to the intellectual challenge they provide. In addition, I predicted that medium complexity and average playtime would be rated highly. I was also curious about the mechanics of the game and whether I would find trends for this variable. For example, do the highest rated games involve mechanics such as dice rolling or storytelling?

boardgame-5.jpg

Data Explained

The data used in this analysis can be found at: BOARDGAMEGEEK DATASET ON BOARD GAMES

This data set contains all ranked games as of February 2021 from the BGG database. After inspecting the data, I decided to drop the columns that would not be useful for the analysis such as ID, Name, Min Age, and BGG Rank. I also dropped outliers by eliminating the games with less than 146 users (less than 25th percentile), eliminating 'Year Published" of less than 0, and eliminating games with play time above 10,000 minutes (based on the boxplots for Year Published and Play Time). I converted Year Published, and Owned Users from float to integers. Finally, I dropped the nulls. Below is a data dictionary of the variables that I used in my analysis:

Variable Description Data Type
Year Published Year published int64
Min Players Miniminum number of players recommended int64
Max Players Maximum number of players recommended int64
Play Time Playing time int64
Users Rated Number of users that rated the game int64
Rating Average Average rating received by the game float64
Complexity Average Average complexity value of the game float64
Owned Users Number of BGG registered owners of the game int64
Mechanics Mechanics used by the game object
Domains Board game domains that the game belongs to object

Results

Correlations

After cleaning the data, visualization helped to understand the relationship between variables. Correlation observations:

The Top 8 Board Games based on Rating Average

The top 8 board games based on the rating average are shown below. The board game with the highest rating is Primer: The Gamer's Source for Battles from the Age of Reason, with a rating average of 9.14079. The first bubble chart shows the positive correlation between complexity and rating average. The second bubble chart shows that most of the highest rated board games are new, published after 2018. The most common domains are thematic games and wargames. The most common mechanics are dice rolling, grid, and hexagon movement.

The Top 10 Board Games based on Owned Users

The top 10 board games based on owned users are shown below. The most owned board game is Pandemic, with 155,312 owners. The first bubble chart shows the positive correlation between complexity and rating average. The second bubble chart shows a positive relationship between year published and rating average. The most common domains are strategy games and family games. The most common mechanics are drafting, hand management, and variable set. The visualizations show that the more complex the game is, the better the rating. In addition, the newer the game is, the better the rating. A possible explanation could be that newer games have more ratings.

Complex Board Games

The most complex board games, with a complexity above 4.5, are shown below. The most common domain are war games and strategy games. The most common mechanics are hexagon grid and dice rolling.

Summary

The purpose of this EDA was to determine which of the variables have a positive relationship with rating average. Based on this analysis, there is a positive relationship between complexity average and rating average. Interestingly, the top 8 board games by rating vary in complexity with complexity averages varying between 2.7 to 4.6. Most of the highest rated are war games and thematic games and involve dice rolling, grid, and hexagon. The total number of owners for top 8 board games by rating makes up 0.01% of the total number of owners overall. This suggests that the top 8 board games are niche games.

For the top 10 most owned board games, there is a positive relationship between year published and rating average. The top 10 most owned board games are fairly new, with 7 of the 10 most owned board games published on or after 2008. This was also the case for the top 8 board games by rating, with 7 of the 8 published after 2018. A possible explanation could be better marketing in the recent years or the rise of the review culture due to technology. The most owned board games include strategy games and family games and involve drafting and cards.

The most complex board games are mostly war games and involve hexagons, grids, and dice. The most common mechanics for complex board games are hexagon grid and dice rolling.

If I were to design a board game that would hypothetically be popular among gamers and be rated highly by the community, I would have to take account complexity and play time, since these variables have a moderate positive relationship with rating averages. If my goal is to design a board game that would belong in the catergory most owned, I would design a strategy game or family game that involves drafting and cards. If my goal is to design a niece board game that would appeal to a specific subset of the board game community, I would design a war game that involves hexagons, grids, and dice.