ReLU, Sigmoid & Tanh Activation Functions
Why do we use ReLU activation function over sigmoid or tanh? Why activation functions suffer from vanishing gradient problem?
Journey of Curiosity
Why do we use ReLU activation function over sigmoid or tanh? Why activation functions suffer from vanishing gradient problem?
What is an activation function?
What is linearity or non-linearity?
Why do we use a non-linear activation function?