The source of the file is IEEE or Institute of Electrical and Electronics Engineers (https://ieee-dataport.org/open-access/boardgamegeek-dataset-board-games). According to their website, "IEEE is the world’s largest technical professional organization dedicated to advancing technology for the benefit of humanity." The dataset contains data collected on board games from the BoardGameGeek (BGG) website in Februay 2021. boardgames.info()The voluntary online community shares reviews and ratings on the website, and the data set contains all ranked games as of February 2021.
import pandas as pd
boardgames = pd.read_excel('/Users/sheilaacar/Documents/BGG_Data_Set.xlsx')
boardgames.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 20343 entries, 0 to 20342 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 20327 non-null float64 1 Name 20343 non-null object 2 Year Published 20342 non-null float64 3 Min Players 20343 non-null int64 4 Max Players 20343 non-null int64 5 Play Time 20343 non-null int64 6 Min Age 20343 non-null int64 7 Users Rated 20343 non-null int64 8 Rating Average 20343 non-null float64 9 BGG Rank 20343 non-null int64 10 Complexity Average 20343 non-null float64 11 Owned Users 20320 non-null float64 12 Mechanics 18745 non-null object 13 Domains 10184 non-null object dtypes: float64(5), int64(6), object(3) memory usage: 2.2+ MB
boardgames.head()
| ID | Name | Year Published | Min Players | Max Players | Play Time | Min Age | Users Rated | Rating Average | BGG Rank | Complexity Average | Owned Users | Mechanics | Domains | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 174430.0 | Gloomhaven | 2017.0 | 1 | 4 | 120 | 14 | 42055 | 8.79244 | 1 | 3.8604 | 68323.0 | Action Queue, Action Retrieval, Campaign / Bat... | Strategy Games, Thematic Games |
| 1 | 161936.0 | Pandemic Legacy: Season 1 | 2015.0 | 2 | 4 | 60 | 13 | 41643 | 8.61278 | 2 | 2.8405 | 65294.0 | Action Points, Cooperative Game, Hand Manageme... | Strategy Games, Thematic Games |
| 2 | 224517.0 | Brass: Birmingham | 2018.0 | 2 | 4 | 120 | 14 | 19217 | 8.66337 | 3 | 3.9129 | 28785.0 | Hand Management, Income, Loans, Market, Networ... | Strategy Games |
| 3 | 167791.0 | Terraforming Mars | 2016.0 | 1 | 5 | 120 | 12 | 64864 | 8.43254 | 4 | 3.2406 | 87099.0 | Card Drafting, Drafting, End Game Bonuses, Han... | Strategy Games |
| 4 | 233078.0 | Twilight Imperium: Fourth Edition | 2017.0 | 3 | 6 | 480 | 14 | 13468 | 8.69649 | 5 | 4.2219 | 16831.0 | Action Drafting, Area Majority / Influence, Ar... | Strategy Games, Thematic Games |
I decided to drop the columns that would not be useful for the analysis such as ID and Users Rated.
boardgames = boardgames.drop(columns=['ID','Users Rated'])
I dropped outliers with few owned users. The 25th percentile for owned users is 146. I decided that these games are unpopular, compared to the mean of 1408 owned users. As a result, I decided to drop the games with less than 146 users.
boardgames.drop(boardgames[(boardgames['Owned Users'] < 146)].index, inplace=True)
I noticed from the descriptive statistics that the max for play time is 6000 minutes. I wanted to inspect this further by creating a boxplot of the data. Based on the boxplot, I decided to drop the rows with play time above 10000.
boardgames.boxplot(grid=False, rot=45, column = ['Play Time'])
<Axes: >
boardgames.drop(boardgames[(boardgames['Play Time'] > 10000)].index, inplace=True)
I dropped missing values from the dataframe and dropped duplicates.
boardgames = boardgames.dropna(inplace = False)
boardgames.info()
boardgames.shape
<class 'pandas.core.frame.DataFrame'> Index: 8623 entries, 0 to 20342 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 8623 non-null object 1 Year Published 8623 non-null float64 2 Min Players 8623 non-null int64 3 Max Players 8623 non-null int64 4 Play Time 8623 non-null int64 5 Min Age 8623 non-null int64 6 Rating Average 8623 non-null float64 7 BGG Rank 8623 non-null int64 8 Complexity Average 8623 non-null float64 9 Owned Users 8623 non-null float64 10 Mechanics 8623 non-null object 11 Domains 8623 non-null object dtypes: float64(4), int64(5), object(3) memory usage: 875.8+ KB
(8623, 12)
boardgames.drop_duplicates()
| Name | Year Published | Min Players | Max Players | Play Time | Min Age | Rating Average | BGG Rank | Complexity Average | Owned Users | Mechanics | Domains | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Gloomhaven | 2017.0 | 1 | 4 | 120 | 14 | 8.79244 | 1 | 3.8604 | 68323.0 | Action Queue, Action Retrieval, Campaign / Bat... | Strategy Games, Thematic Games |
| 1 | Pandemic Legacy: Season 1 | 2015.0 | 2 | 4 | 60 | 13 | 8.61278 | 2 | 2.8405 | 65294.0 | Action Points, Cooperative Game, Hand Manageme... | Strategy Games, Thematic Games |
| 2 | Brass: Birmingham | 2018.0 | 2 | 4 | 120 | 14 | 8.66337 | 3 | 3.9129 | 28785.0 | Hand Management, Income, Loans, Market, Networ... | Strategy Games |
| 3 | Terraforming Mars | 2016.0 | 1 | 5 | 120 | 12 | 8.43254 | 4 | 3.2406 | 87099.0 | Card Drafting, Drafting, End Game Bonuses, Han... | Strategy Games |
| 4 | Twilight Imperium: Fourth Edition | 2017.0 | 3 | 6 | 480 | 14 | 8.69649 | 5 | 4.2219 | 16831.0 | Action Drafting, Area Majority / Influence, Ar... | Strategy Games, Thematic Games |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 20337 | Trouble | 1965.0 | 2 | 4 | 45 | 4 | 3.79191 | 20339 | 1.0541 | 4962.0 | Roll / Spin and Move | Children's Games |
| 20339 | Bingo | 1530.0 | 2 | 99 | 60 | 5 | 2.85331 | 20341 | 1.0455 | 1533.0 | Betting and Bluffing, Bingo, Pattern Recognition | Party Games |
| 20340 | Candy Land | 1949.0 | 2 | 4 | 30 | 3 | 3.17792 | 20342 | 1.0779 | 5788.0 | Roll / Spin and Move | Children's Games |
| 20341 | Chutes and Ladders | -200.0 | 2 | 6 | 30 | 3 | 2.85567 | 20343 | 1.0201 | 4400.0 | Dice Rolling, Grid Movement, Race, Roll / Spin... | Children's Games |
| 20342 | Tic-Tac-Toe | -1300.0 | 2 | 2 | 1 | 4 | 2.67544 | 20344 | 1.1580 | 1374.0 | Paper-and-Pencil, Pattern Building | Abstract Games, Children's Games |
8623 rows × 12 columns
boardgames.to_csv('/Users/sheilaacar/Documents/boardgames_clean.csv', header=True, index=False)