Activation Function & its Non-Linearity

Activation functions are sometimes daunting for the beginner machine learning practitioner. While training neural networks it is important to know which activation function to use and where to use it. This can significantly affect our results.

In this blog of activation functions, we will learn,

What is an activation function?
What are linearity and non-linearity?
Why do we use a non-linear activation function?

Before getting into this blog. You must have basic understanding of neural networks and forward propagation.

Contents hide

1 What is an activation function?

2 What is a Non-Linearity?

2.1 Linearity

2.2 Non-linearity

3 Why do we use a non-linear activation function?

3.1 Flexibility

3.2 Mathematical reason

3.3 Share this:

3.4 Like this:

3.5 Discover more from Arshad Kazi

What is an activation function?

Activation functions are the part of each neuron; they decide whether to activate the neuron. You can see in the above image for a reference, different neurons get activated for different animals.

In simple words, the activation function will decide if we need to activate that neuron, when you see a dog, a set of neurons will get activated and give the output which will eventually be decided as a dog. If you see a cat, it will activate a different set of neurons. The activation function gives this activation value to each neuron.

Forward and backward propagation help activation function to make predictions.

We can write the forward propagation as,

z1 = W1x + b

a1 = g1(z1)

z2 = W2a1 + b

a2 = g2(z2) = g2(W2a1 + b) = g2(W2g1(z1)a1 + b)

...

As you can see, the input we have given to the first layer will go through multiple layers; each neuron is multiplying this input with an activation function. So, if we think of it, in 5 layered neural networks; the input given to the 1st layer will get multiplied by the activation function in the first layer, then this output will get multiplied with the activation function of the next layer, and so on. It will continue until the last layer; in short, it will significantly affect our output. So, we now know what is activation function and why it is important. Now let’s understand why we need a non-linear activation function.

What is a Non-Linearity?

In neural networks, we use non-linear activation functions. This non-linearity is very important to understand; without non-linear functions (or with linear function) our model won’t be able to give proper results; it will act basically as a single layer neural network.

Before understanding non-linearity, we must know about linear functions. In mathematical terms, linear functions are those which have a line as a graph. That means, the as X changes, Y changes in a uniform way.

Linearity

Consider a car going with constant velocity. As we can imagine, in this case, as time increases, the distance traveled by the car increases. As the car is moving with constant velocity, the distance traveled by the car in the first 1 hr will be the same as the distance covered in any other 1 hr period.

This is linearity. This can be shown by the following graph,

Non-linearity

The non-linear functions the functions which are not linear. That means, they have curve as a graph; they don’t form lines.

We can understand this by simple example.

When we start a car to go somewhere; we start it from speed zero and then accelerate to some certain speed. And hence the distance traveled by that car in the first 5 minutes will be different than the distance traveled by that car when it achieves top speed.

This is non-linearity. The following graph can help us understand this better,

nonlinear motion — Non-linear function: Accelerated Motion

Why do we use a non-linear activation function?

Well, there are two explanations for this.

Non-linearity gives neural networks flexibility
They are the reason neural networks can actually get trained

Flexibility

Non-linear activation functions actually give flexibility to our model. Remember our car example, while explaining linear function we considered that car is moving with constant velocity. It’s like considering that car will achieve that speed as soon as it starts. Which is practically impossible(inertia is a thing!). But when we explained the non-linear function we considered a more practical approach i.e. we gave our example freedom to accelerate our car from zero. This is closer to a real-world scenario. This is just a small example, but in reality, activation functions provide much more flexibility to our model to fit the data.

Most of the real-world problems are non-linear in nature. And we want our neural networks to solve these problems; so it is only practical to use non-linear activation functions. Although, this is not the exact reason behind using a non-linear activation function. To understand the real reason we must understand the maths behind them.

Mathematical reason

To understand this, lets use linear function as our activation function and let’s put it in our forward propagation equation,

So, our activation function would be, g(x) = x

So for the first layer we can write our forward propagation as,

z1 = W1x + b1

a1 = g1(z1) = z1

z2 = W2a1 + b2 .......second layer

a2 = g2(z2) = z2

So, if we put all of that together,

a2 = z2 = W2a1 + b2 = W2(z1) + b2 = W2(W1x + b1) + b2 = (W2W1)x + (W2b1 + b2)   ....can be written as

= W*x + b*

learn more here.

So, as we can see, this is just giving us another linear function, we passed input as z1 = W1x + b1 which is a little different from Wx + b, which means, we are not training our model; we are just changing the values.

You can see the following figure which help us visualise this,

linearity — Linear Function Fitting the data

Hence, by using linear activation function, we will not be able to fit a non-linear data at all.

On the other hand non-linear activation function will be able to do this, because it has that flexibility to mould itself with the data.

Following visualisation can help us with this,

nonlinearity — Non-linear Function fitting the data

Hence, it is much better choice to use a non-linear activation function over activation function in our neural network.

We will know about different activation functions like ReLU, sigmoid in the next blog.

So, stay tuned and subscribe to the newsletter or hit me anytime on [email protected]!

Activation Function & its Non-Linearity

What is an activation function?

What is a Non-Linearity?

Linearity

Non-linearity

Why do we use a non-linear activation function?

Flexibility

Mathematical reason

Share this:

Like this:

Discover more from Arshad Kazi

Leave a Reply/Feedback :)Cancel reply

Discover more from Arshad Kazi