Evolving a Neural Network to play Super Mario Bros using Neuroevolution of Augmenting Topologies

Vivek Verma

Vivek Verma

Summary

I have always loved the Mario series - from Super Mario Bros to Super Mario Sunshine to Super Mario Galaxy. When I dived deep into the field of machine learning, I wondered if it is possible to get a neural network to finish a level of Super Mario Bros. I expected to the neural network to be able to finish a certain level.

I first tried using Reinforcement Learning, but quickly discovered that Neuroevolution was the way to go. By using a input matrix instead of pixel values for inputs, I was able to begin the evolution process. This was greatly optimized by the use of parallel computing, which cut my run time by almost a quarter.

This has been done in various forms and on other games. It has also been done on the same game, however, using a different approach. Further research showed that primitive Neuroevolution has far better results than highly complex versions of algorithms like Gradient Descent. 

This is a smaller step towards General Artificial Intelligence, which is a model that can perform multiple tasks. This project also shows the effectiveness of the NEAT algorithm and how it can outperform Reinforcement Learning algorithms.

The next thing would be to get the neural network to beat the whole game, which would be an even larger leap towards GAI.

I learnt a lot from this project, and would love to see where the field of AI goes in the upcoming years.

Source Code: https://github.com/vivek3141/super-mario-neat

Question / Proposal

My question is "Is it possible to get an A.I. to play Super Mario Bros?". 

I expected the program to be able to finish a level some amount of times, not every time.

Machine Learning and Artificial Intelligence are definitely the biggest buzzwords as of today. I have dived deep into this field for a while, starting with Andrew Ng's coursera course. Applying this to a game I like seemed like a good transition to the third category of Machine Learning. 

 

Research

There have been many different cases of people researching very similar things. For example, I was intruiged by this video, which introduced me to the NEAT algorithm. At the time I wathced that video, I barely knew how a neural network works, the maths behind it. Researching more, I found this paper which was my primary source for implementing the algorithm. This type of project is one step towards General Artificial Intelligence, which is an A.I. being able to perform multiple tasks.   

Further research showed that Neuroevolution can give a lot better results than algorithms like Gradient Descent or other Reinforcement Learning Algorithms. This is well documented in a research paper by Uber AI labs.

Method / Testing and Redesign

I tested the model by running a 3 tests and calculating the percentage of how many times the level was completed. Running multiple tests ensures testing all scenarios as the spawning of an enemy is random to a certain degree. This testing is fair, since I used the same level each time. I tested it on my own computer.

The first approach I tried was Reinforcement Learning, specifically, Deep Q-Learning. Q-Leaning uses something called a Q-Table, which contains the best action for every possible state. DQN ups that by parsing its input through a neural network. Although I got mixed results, I believed that Neuroevolution would give me much better results, so I decided to use that.

I first wanted to try implementing Neuroevolution on a simpler task. OpenAI's gym library is extremely useful to set up an environment. I used a prebuilt environment for the popular game Flappy Bird, and ran the NEAT algorithm for about 5 minutes. The results were amazing, acheiving a high score of 123.

The Super Mario Bros game runs on a console called the NES, which has many emulators availible. Most of these languages support plugins for a language called Lua. My plan was to write a Lua script to act as an interface between my Python program and my game, as my preferred language was Python. Doing some more digging, I found a prewritten Lua script, which I could use. Furthermore, this aided the project dramatically, since it included an interface for returning a 16x13 matrix instead of an image.

The biggest gain from using this was run time. The poor performance of my computer meant that everything had to be optimized. Using an image had 61440 inputs, which prevented the Neuroevolution process from beginning, i.e. when I tried randomly initializing the neural network, the large amount of weights and biases caused the program to crash. I was able to start the evolution process by using the matrix.

Although the matrix cut run time by a lot, it was still far too slow, taking upwards of 1000 seconds per generation. By using parallel computing, I was able to run multiple networks at once for testing. This dramatically changed the time for each generation, reducing it to 250 seconds per generation.

Results

I ran the program for about 3 weeks, taking breaks in between, so the effective time spent training was about 1 week. Although the program made great progress, It didnt seem to complete the level. Unfortunately, I had set the barrier for stopping past the finish line, causing the program to never stop. This made the program run forever. While I was watching the neural network play the game, I noticed that one of the genomes on generation 2284 had finished the level, but the program kept running. I saved backups for each generation, so I stopped the program, fixed the code, and ran generation 2284 until I had a successful model.

You can find the final result here.

Conclusion

The result was a trained nerual network with over 210 connections that can reliably complete the 1st level of Super Mario Bros. This partially answeres my original question - Yes, it can beat a level, but it can't do all of them. This result exactly matches my predicted outcome because I had chosen a resonable predicition.

The results are mostly reliable. There might be some lag between each frame and running it through the network, which might hinder performance, but for the most part, it is reliable. This could be improved by running it on Google Cloud Platform, or running it on a far more powerful machine.

I consider my experiment extremely successful. I learnt the nits and grits of how Neuroevolution works, to how the NES handles plugins to parallel computing. 

This project has made me ask a lot more questions - "What are the limitaions of current Machine Learning models?", because, from the amount of research I have done, curring models cannot solve very complex tasks where there is no defined objective. Although there have been some progress from companies like OpenAI, we are far from replacing human tasks with machine learning.

About me

Hi! My name is Vivek Verma, and I've been intrigued by Computer Science ever since I learnt Python in 6th grade. I love solving computer science programs and diving deep into the field of machine learning sparked my interest even more.

Having lived in the two biggest tech areas in the world - Silicon Valley, CA and Bangalore, India, I have been exposed to the field of CS for a while. A great inspiration came from my dad, who works a computer science job.

The biggest inspiration for me would definitely be Sir Issac Newton and his discoveries. I have studied a lot about his theories and his history, and what amazes me is how far ahead of his time he was.

Bibliography, references, and acknowledgements

Bibliography

Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for
Training Deep Neural Networks for Reinforcement Learning, https://arxiv.org/pdf/1712.06567.pdf.

Evolving Neural Networks through Augmenting Topologies, http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf.

Gym-Super-Mario-Bros, https://github.com/ppaquette/gym-super-mario.

NEAT-Python documentation, https://neat-python.readthedocs.io/en/latest/.

 

Everything was done on my own, with no help from outside sources.