Convolutional Neural Nets (CNNs) for Non Data Scientists

We are starting this Deep learning journey together. This blog will give us about Convolutional Neural Network (CNN).

This particular blog will give us a brief understanding of convolutional neural networks (CNN). In the next few blogs, we will dive deeper into the understanding of Convolutional neural networks. We will get our hands dirty with some actual coding and some mathematics, so stay tuned!

If you want to get your hands dirty even before understanding it, check out this blog.

Feel free to skip,


Recognition in CNN

Let’s say you see a bird, you know it’s a bird because you have a certain understanding about the bird, like it has feathers, a beak, certain shape and size. That’s how we teach our children to recognise a bird. Most certainly they won’t get confused between a dog or any bird.

But the thing is, this is little tricky for a computer to understand. We as a coders can’t actually code for computer to understand a bird of each kind. So we have to use a deep learning model. The convolutional neural network comes handy when it’s about recognising something.

Why do we need CNN?

Open your gallery app, you will find a section which segregates the photos based on people. like your aunt, your girlfriend, your friends and so on. Essentially, it’s looking into each photo and noting down who is present in there. Most apps even segregate based on the location, the things present (like food, flowers, skyscrapers etc). This is all happening on your phone. Ever wondered how that happens? Obviously it’s using machine learning for it. But how?

You can see this same in self driving cars, medical scanners, basically any image based computer system that uses models which are based on this architecture.

What is Convolutional Neural network?

Well, this blog is just the first part of this series. We will dive deeper in the next few blogs.

But for now we can understand the basic idea of CNN.

The literal meaning of convolution is twisted together. In a mathematical way, how one thing depends on another to form a result. So, in the convolution neural networks we combine the results of each neuron to make an output. It convolutes the results of neurons in each layer and then passes it to the next layer. (a neuron is just a mathematical function which takes multiple inputs and gives one output)

CNN working
CNN working

In simple terms, CNN takes an image, chops it into small pieces in the first layer and then takes output for each piece. Then in the next layer, it takes these outputs and combines them into little bigger pieces. takes the output for each such piece and then passes it to the next layer. This goes on until we get a certain no of outputs at the end. So if we chop the image into 1000 pieces for the first layer, at the end we may get just 2 values, one for dog and one for bird. That’s how we will know if it’s a dog or a bird.

So, when we pass an image of a person to CNN, the first few layers will recognise basic things in the image like some vertical and horizontal edges, curves and stuff. The next few layers will recognise the nose and eyes, ears. and then in final layer there will be recognition of face.

Ironically, no-one actually knows what CNN is recognising in the image at each layer, it’s a black box, we can just imagine it.

How does CNN work?

1. Reading an image

3 channels of image

The image we give to our model is not the image we actually see. Because computer has no actual eyes. The image gets converted into an array of numbers before going into CNN. This array can be a 3/4 dimensional based on the model and the image (3rd dimension is for no of channels and 4th dimension would be no of images in a batch). So if we have 300×400 pixel RGB image (colour image of icon size) then it will create values 300x400x3 = 360,000 values for that particular image. Most of the time we will deal with high resolution and wide angle images, so as we can see for one small image it can create million values.

So, for a computer it’s not easy to do one-shot recognition as we can do with our own eyes.

2. Convolutional Kernels

To solve the above problem we apply simple trick. we chop-off the image in smaller pieces and then start combining these results in later layers. This chopping is done based on convolutional kernels. These kernels have typical size of 3×3 or 5×5 pixel. To understand it better you can see this video

Essentially, it’s applying filter to the image. These filters are detecting the edges(in this case) in that particular region. After the detection these images are combined and the image is generated again.

You can see the following image for the reference.

internal visualisation source

In layman terms, by applying these filters we get the information about the image.

We will see the mathematics behind it in the next few blogs.

3. Storing the information

Until now, we have seen how this CNN works when it’s in training phase. But when you want to detect your girlfriend from that pic you want it to work instantly. You want it to memorize your gf’s face.

This is very crucial if you think about self driving cars. They have to detect the car and act instantly. So, like our human memory we can store this information in computer as well. To do this we will need some mathematics.

When we see a person, we recognise him or her by observing his/her face, in that, we see nose, eye colour, smile and eyebrows these are more prominent features then the shape of face, hair colour, hair style, facial hair and then some other stuff, like some piercing and stuff.

Eyebrows matter! source

So we do have some features which matter more for our face recognition than the other features.

In mathematical language we call them weights. The features which have more importance will have more weight to them than the features which have less importance.

So, whenever we will do edge detection on someone’s face. The edge of nose/smile will have much more importance than the edge of, let’s say a hair.

These weights can be stored on our machine so that, whenever our model will see a face it will know where to look and what to look at. It’s simpler to theorise than to implement but for now this is enough.

Conclusion

We now have some basic idea about CNN. We know why we want to use it and how to use it, what it does. but still we have many questions.

  1. Why do we need those kernels at all?
  2. Why there are multiple layers in CNN?
  3. How they pass value from one to another?
  4. How CNN gets trained?
  5. How it stores the data?

We will tackle all these questions in other blogs. So stay tuned and subscribe to the newsletter! Hit me anytime on [email protected] or Twitter, LinkedIn!


Discover more from Arshad Kazi

Subscribe to get the latest posts sent to your email.

Leave a Reply/Feedback :)