Activation Functions

July 16, 2023

Activation functions are an essential component of artificial neural networks (ANNs) as they introduce non-linearities into the network, enabling it to model complex relationships between inputs and outputs. Activation functions are applied to the outputs of individual neurons in a neural network layer. Here are some commonly used activation functions in ANNs:

1. Sigmoid Function (Logistic Function):

The sigmoid function is a smooth, S-shaped curve that maps the input to a range between 0 and 1. It has the mathematical form:

f(x) = 1 / (1 + e^(-x))

Sigmoid functions are useful for binary classification problems where the output needs to be interpreted as a probability. However, they suffer from the vanishing gradient problem, which can hinder training deep neural networks.

2. Hyperbolic Tangent (Tanh) Function:

The hyperbolic tangent function is similar to the sigmoid function but maps the input to a range between -1 and 1. It has the mathematical form:

f(x) = (e^(x) - e^(-x)) / (e^(x) + e^(-x))

Like the sigmoid function, the tanh function suffers from the vanishing gradient problem but can be useful in certain scenarios where negative values are desired.

3. Rectified Linear Unit (ReLU):

The rectified linear unit is a popular activation function that maps any negative input to zero and leaves positive inputs unchanged. It has the mathematical form:

f(x) = max(0, x)

ReLU is computationally efficient and helps mitigate the vanishing gradient problem. It has been widely used in deep learning models and has shown excellent performance in many applications. However, ReLU neurons can be prone to "dying" during training, where they become inactive and never recover. Several variations of ReLU, such as Leaky ReLU, Parametric ReLU (PReLU), and Exponential ReLU (ELU), address this issue.

4. Softmax Function:

The softmax function is commonly used in the output layer of a neural network for multi-class classification problems. It takes a vector of arbitrary real values as input and transforms them into a probability distribution over multiple classes. The softmax function has the mathematical form:

f(x_i) = e^(x_i) / sum(e^(x_j))

Here, x_i refers to the i-th element of the input vector, and the sum is taken over all elements of the vector. The output of the softmax function represents the probability of each class, and the class with the highest probability is chosen as the predicted class.

These are just a few examples of activation functions used in ANNs. There are other activation functions available, such as the Gaussian activation function, the exponential linear unit (ELU), and the scaled exponential linear unit (SELU), each with its own characteristics and advantages. The choice of activation function depends on the specific problem, network architecture, and empirical results.

Search This Blog

Michael's Mechanics

Activation Functions

Popular posts from this blog

Guide

Background

Introduction