ReLU, Sigmoid & Tanh Activation Functions
Why do we use ReLU activation function over sigmoid or tanh? Why activation functions suffer from vanishing gradient problem?
Journey of Curiosity
Why do we use ReLU activation function over sigmoid or tanh? Why activation functions suffer from vanishing gradient problem?