Skip to content

Jackryd/NeuralBenchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS50 Final Project - Neural Benchmark

Video Demonstration: https://youtu.be/0P9fLhxqWTw

This repository is my final project which I created as a capstone for the CS50x 2022 course. The project contains a small arcade of five Pygame mini-games where you play different CNN based models that recognize your drawings and handwriting (letters, digits, doodles).

Features

  • Five games: Hangman, Number Memory, Wordle-style, SpeedType, SpeedDraw
  • Three TensorFlow CNNs (letters, digits, doodles) trained on public datasets
    • Validation Accuracy
      • Digit model: 99.5%
      • Letter model: 86%
      • Doodle model: 86%
  • Local accounts & highscore saving (SQLite)
  • Smooth drawing canvas with OpenCV processing

Quickstart

# To simply start the game, just do:
git clone https://github.com/Jackryd/NeuralBenchmark.git
cd NeuralBenchmark

# create & activate a virtual env (recommended)
python -m venv .venv
# Windows: .venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate

pip install -r requirements.txt
python app.py

# If you want to try retraining one of the models:
python model_training/doodle_guesser.py
# but note that for this you will need to download the data from Kaggle

Game Controls

  • Left-click: draw
  • Right-click: submit
  • Space: back to menu
  • Esc: quit

Models

This repository includes pre-trained models which you can download and use out of the box, but you can of course add your own models. The game will look for the models:

  • models/doodle_model.h5
  • models/letter_model.h5
  • models/number_model.h5

How everything works

The Model Architectures

Three TensorFlow models have been created:

  • one for predicting doodles (human drawings of e.g. apples, ambulances, and axes),
  • one predicting hand-drawn integers,
  • and one for predicting hand-drawn letters.

They all use the same neural network architecture consisting of three convolutional layers due to their proficiency in image recognition tasks. These layers used a ReLU activation function and were all followed by MaxPooling2d layers. The max pooling layer downsizes the feature map, which in turn reduces the amount of necessary computations. The network is then flattened and fed into a dense layer which is transformed using a softmax function. This squishes the logits into a probability distribution over all the different categories which is the model's prediction.

Here is the code for the letter model which takes a 28x28 monochrome matrix and outputs a discrete probability distribution over all the letters in the alphabet.

# Letter predicting model
model = Sequential()
# Input shape of 28x28x1 - 28x28 with only one colour channel (no RGB)
# 3 CNN layers using ReLU all followed by Maxpooling layers
model.add(Conv2D(32, (3, 3), padding = "same", activation='relu', input_shape=(28,28,1)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))

# Values flattened and fed into a dense network.
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
# Values finally inputted into a 26 output dense layer with a softmax activation function, squishing the values between 0 and 1
model.add(Dense(26, activation='softmax'))
model.summary()

Data Processing

To work with the CNN models, the images for all tasks were converted into monochrome 28x28 bitmap representations. However, where the data was obtained evidently differed, so here is a summary about that:

  • The doodle model uses the Quick, Draw! Doodle Recognition Challenge dataset from Google Research. Instead of using all 345 labels, only 19 were used to make the model more light-weight and quick. The data was formatted using vector representations, but were converted into bitmap representations to integrate with the CNNs. All images were resized into 28x28 pixel images.
  • The number prediction model uses a digit recognition dataset from Kaggle.
  • The letter prediction model uses a Kaggle handwritten letter dataset. Interestingly, all the letter I:s in the dataset were written as: Image of I However, many people write I:s as straight lines. To fix this, a subset of the 1:s from the digit dataset were added since many of them were simply straight lines. This was enough for the model to learn that straight lines were I:s.

Model Training

During training, all the models used categorical crossentropy since all tasks were multiclass classification and an Adam optimizer was used due to its general performance. The models were trained with batch sizes of 32 and below you can see examples of the training curves.

Doodle model Doodles accuracy & loss

Digit model Number accuracy & loss

The games

For all the games, the user had to draw something onto the screen. This was done by tracking the mouse's x and y coordinates, and if the left mouse button is pressed, a small rectangle will be drawn at those coordinates if that is allowed by the specific game's requirements. In most games, the player also has to right click to "submit" a given drawing, which will in turn prompt the correct model. Below is a description of how the different games work

  • Hangman! The player shall draw characters and submit them. The screen then gets cleared and if the letter exists in the word, the user will be told where, but if it's incorrect the user will lose a life.
  • Number Memory! This game is played in rounds and in the first round one number is quickly shown on screen, the player must then draw this numbers and submit it. If the number is correctly identified the player moves on to the next level with two numbers, and the same procedure is undertaken. This continues until the player guesses incorrectly.
  • Wordle! At first a 5x5 Grid of black rectangles are drawn onto the screen and the user should predict what letter exists at each position. If a submitted letter is at the right place, the square turns green, if a letter is in the wrong square, but in the word, the square turns yellow, otherwise the square turns grey. If the player guesses the correct word, they win, otherwise if they can not guess the word in time, the correct word is displayed, and the player loses.
  • SpeedType! The player may draw in a white rectangle which is displayed on the screen. At the top of the screen a sentence which the player must write is displayed. Under the rectangle the current letter which the player must type, as well as the time is displayed. The player must draw a letter and submit; if the model predicts correctly, the player moves onto the next letter, otherwise they must continue writing that same letter until the model has predicted correctly. When the player has written the sentence, the game is finished, and the player can save a score of amounts of words per minute.
  • SpeedDraw! The SpeedDraw game works in the same way as the SpeedType, except that it uses the doodle recognition model instead of the word recognition model.

Gameplay gif

The login and register functions

The player can register and log in accounts. This is made by using an SQL database which stores a table which stores the player's username, a hash of their password, a player ID and a number of strings storing all of their scores. The player can try creating an account and if the username is not taken and the password meets all the requirements, an account is added to the SQL database. The player can then save scores when they have finished a game. The player can also simply log in by clicking the log in button and typing the correct username and password.

Register gif

The GUI

The GUI is created using Pygame and there is one main screen where the user can choose what game to play by pressing the corresponding button. By pressing one of the buttons, the user will be moved to another screen for whatever task they chose.

Conclusion

This project was incredibly educational and fun, and I definitely learned a lot. I started with next to no knowledge about neural networks or machine learning. However, after watching a tutorial by Freecodecamp, the CS50 ai seminar and reading a lot of tensorflow documentation, I felt like I had learnt enough to start creating my own model. This took quite a lot of time and I have read numerous articles comparing different loss functions, activation functions and more, and I have also experimented a lot to improve my models as much as possible. I have also devoted a lot of time to create the Pygame games and GUI. I had a little bit of previous experience in Pygame, but I have still had to read a lot of documentation while creating this project and, just like the rest of CS50, this project has definitely taught me a lot.

This was Neural Benchmark, my CS50 Final Project

Acknowledgements

Audio: https://pixabay.com/

Images: https://www.freepik.com/

Helpful websites and sources:

Datasets:

Thanks for a brilliant course: https://cs50.harvard.edu/x/2022/

About

CS50x Final Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages