Answering the age-old buzzfeed question of whether you’re a rat or a frog depending on your face seemed like the perfect computer vision problem to tackle.
What is computer vision?
Computer vision is exactly what it sounds like, the field of artificial intelligence which allows computers to identify and understand objects in photos and videos.
What did I use?
I used something known as a convolution neural network or CNN for short. CNNs work by learning patterns in smaller regions of images and using that to match with and predict what the image contains. A neural network has multiple layers in between the input and output. Therefore as you begin looking through the layers of a CNN, first you will see it has captured basic patterns, such as edges, gradients, corners and curves. Then, as you keep moving through it, it starts to form shapes and then larger tiled patterns. Soon you could even have layers of CNNs that can identify shapes and even features of things like faces. They are a fascinating area of artificial intelligence.
How did I do it?
First, I got 200 images of frogs and 200 images of rats. Then, I took an existing computer vision model and fine-tuned it on my images of frogs and rats. What this does is that it removes the last few layers from the model and uses the data I’ve given it and creates those last layers. This works because the pre-trained model already has the underlying patterns and shapes that can now be combined and used to generalise how rats or frogs look. This is called transfer learning and it is one of the major breakthroughs in the field. It means now, I don’t have to have millions of images every time I want to train a dataset, I can use the inherent similarities between all images to make the process much faster and cheaper.
Play with the model
Image Credits: Chapter 1, course.fast.ai