Deep Learning is a vast field, and covering every topic in a single blog post is impossible. The main premise of this blog is to introduce you to the key concepts of Deep Learning. This blog will cover most of the essential concepts you need to familiarize yourself with when starting with Deep Learning. One thing to keep in mind: this blog will not go in-depth into these topics. Consider it a checklist to familiarize yourself with key Deep Learning concepts as you begin your journey. I also encourage you to delve deeper into the bold keywords throughout the blog.
This blog assumes you are familiar with general Machine Learning concepts, so topics such as different types of learning (supervised, unsupervised), ground truth, features, targets, classification, and regression will not be discussed. This blog will only focus on Deep Learning concepts.
A neuron in deep learning is the fundamental building block of a neural network. You can consider it a function that takes input and, after performing some operations, outputs a result. By combining multiple neurons, we build neural networks.
A perceptron is a single neuron that takes one or more inputs and, after performing some operations, produces an output.
![]()
The data you provide to your neuron, perceptron, or neural network. Usually denoted by . The first layer of a neural network is called the input layer, which contains only the inputs of the network.
Weights are associated with the input of a neuron and determine how impactful that input is. For example, if the first input is 2 and is 4, and their respective weights are and , then:
This means the first input is more impactful than .
![]()
A layer consists of multiple neurons:
You can think of biases as adjustable knobs that you can tweak according to your needs. Biases are added to the product of the input and weight , allowing a neuron to activate even if its inputs are zero. This helps the network learn better and make accurate predictions.
![]()
An activation function is a mathematical function that determines whether a neuron should be activated or not. It takes information from the previous neuron and applies a mathematical transformation to decide whether to send its signal further. Some of the most commonly used activation functions are:
![]()
The forward pass is the process of sending data from the input layer to the output layer. Here are the steps involved in a forward pass:
Backpropagation is a crucial algorithm for understanding how neural networks learn. It calculates the gradient of the loss function with respect to the network's weights, enabling the adjustment of weights to minimize the error between the predicted output and the actual target output. Here's how backpropagation works:
After the forward pass, the neural network's output is compared to the actual target values (ground truth) from the training data. This comparison is done using a loss function, which quantifies the difference between the predicted output and the actual target values. Here are some commonly used loss functions:
For Regression Problems:
For Classification Problems:
Once the loss has been computed, the next step is to propagate this error backward through the network using the backpropagation algorithm. The goal of backpropagation is to calculate the gradient of the loss function with respect to each weight and bias in the network. This is achieved by applying the chain rule of calculus recursively, starting from the output layer and moving backward through the network.
As backpropagation progresses backward through the network, the gradients of the loss function with respect to the weights and biases of each layer are computed. These gradients represent the direction and magnitude of the change needed to minimize the error.
Once the gradients have been calculated, an optimization algorithm is applied to update the network's weights. Common optimization algorithms include stochastic gradient descent (SGD) and its variants like mini-batch gradient descent, Adam, RMSProp, Adagrad, etc. Each optimizer has its own update rule and hyperparameters that influence the training process. The optimizer uses the computed gradients to adjust the weights in a way that reduces the loss function. The learning rate, which determines the size of the steps taken during optimization, is an important hyperparameter that affects the convergence and performance of the training process.
After the weights have been updated by the optimizer, steps 1-4 are repeated for multiple iterations or epochs. Each iteration consists of a forward pass, loss calculation, backpropagation of errors, and weight update. The number of iterations depends on factors such as the complexity of the problem, the size of the dataset, and the convergence criteria defined by the user.
Periodically during training, the model's performance is evaluated on a separate validation dataset to monitor its progress and prevent overfitting. This evaluation helps determine whether the model is generalizing well to unseen data or if adjustments to the training process are necessary.
Regularization techniques in deep learning are methods used to prevent overfitting. Here are some common regularization techniques:
group sparsity, which is more characteristic of group Lasso or other specialized regularization methods.
Dropout:
Batch Normalization:
Early Stopping:
Data Augmentation:
Hyperparameter tuning is the process of finding the optimal combination of settings that control the learning process of your model, ultimately leading to better performance. Examples of hyperparameters in deep learning include:
Vectorization in deep learning refers to the technique of performing operations on entire arrays of data simultaneously instead of iterating through individual elements. This significantly improves computational efficiency and simplifies code compared to traditional loop-based approaches. It is crucial in deep learning due to the massive datasets and complex calculations involved.
Broadcasting is a mechanism that allows arrays of different shapes to be combined in arithmetic operations. It automatically aligns arrays with compatible shapes and dimensions, eliminating the need for explicit copying or reshaping of arrays.
While this blog provides an overview and a starting point for deep learning, continuous learning, experimentation, and real-world application are key to mastering deep learning and making impactful advancements in the field.