Neural Networks: Exploring the Basics and Building from Scratch

Neural Networks: Exploring the Basics and Building from Scratch


images/deep-dive-into-neural-networks--building-and-training-from-scratch.webp

Neural networks have revolutionized the field of artificial intelligence, offering remarkable capabilities in pattern recognition, decision-making, and predictive analytics. This comprehensive guide aims to demystify neural networks by not only discussing their structure and functionality but also by guiding you through the implementation and training process, including the critical aspect of backpropagation.

Understanding Neural Networks

A neural network is a sophisticated computational model inspired by the structure and functions of the human brain. It is composed of multiple layers of nodes, often referred to as artificial neurons, with each node executing a distinct mathematical operation. This structure enables the neural network to identify patterns and solve intricate problems through its design and learning capabilities.

Layers of a Neural Network

  1. Input Layer: Receives the input data.
  2. Hidden Layers: Perform complex computations and feature extraction.
  3. Output Layer: Delivers the final output or decision.

The Neuron: Fundamental Unit of Neural Networks

A neuron in a neural network is a mathematical entity that processes information using an activation function. The most common activation functions include sigmoid, tanh, and ReLU.

Building a Basic Neuron in Python

Let’s create a basic neuron using Python:

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

class Neuron:
    def __init__(self, weights, bias):
        self.weights = weights
        self.bias = bias

    def feedforward(self, inputs):
        total = np.dot(self.weights, inputs) + self.bias
        return sigmoid(total)

weights = np.array([0, 1])
bias = 4
n = Neuron(weights, bias)

inputs = np.array([2, 3])
print(n.feedforward(inputs))

Constructing a Neural Network with Layers

Building upon the basic neuron, we can create a neural network with layers. Here’s a simple network with one hidden layer:

class NeuralNetwork:
    def __init__(self):
        self.h1 = Neuron(np.random.normal(), np.random.normal())
        self.h2 = Neuron(np.random.normal(), np.random.normal())
        self.o1 = Neuron(np.random.normal(), np.random.normal())

    def feedforward(self, inputs):
        out_h1 = self.h1.feedforward(inputs)
        out_h2 = self.h2.feedforward(inputs)
        out_o1 = self.o1.feedforward(np.array([out_h1, out_h2]))
        return out_o1

network = NeuralNetwork()
print(network.feedforward(np.array([2, 3])))

Training Neural Networks: The Backbone of Learning

Training a neural network involves adjusting its weights and biases to minimize errors in its output. This is achieved through backpropagation and gradient descent.

Backpropagation: Understanding the Core Mechanism

Backpropagation is a method used to calculate the gradient of the loss function in a neural network. It’s crucial for determining how the weights and biases should be updated.

Gradient Descent: Optimizing the Network

Gradient descent is an optimization algorithm that minimizes the loss function. It iteratively adjusts the parameters in the direction of the steepest descent.

Implementing Backpropagation and Gradient Descent

Let’s extend our neural network to include backpropagation and gradient descent. We’ll use mean squared error as our loss function.

def mse_loss(y_true, y_pred):
    return ((y_true - y_pred) ** 2).mean()

class NeuralNetwork:
    # ... [initialization as before]

    def train(self, data, all_y_trues):
        learn_rate = 0.1
        epochs = 1000  # number of times to loop through the entire dataset

        for epoch in range(epochs):
            for x, y_true in zip(data, all_y_trues):
                # --- do a feedforward (we'll need these values later)
                sum_h1 = np.dot(self.h1.weights, x) + self.h1.bias
                out_h1 = sigmoid(sum_h1)

                sum_h2 = np.dot(self.h2.weights, x) + self.h2.bias
                out_h2 = sigmoid(sum_h2)

                sum_o1 = np.dot(self.o1.weights, np.array([out_h1, out_h2])) + self.o1.bias
                out_o1 = sigmoid(sum_o1)
                y_pred = out_o1

                # --- calculate partial derivatives
                d_L_d_ypred = -2 * (y_true - y_pred)

                # Neuron o1
                d_ypred_d_w5 = out_h1 * sigmoid(sum_o1) * (1 - sigmoid(sum_o1))
                d_ypred_d_w6 = out_h2 * sigmoid(sum_o1) * (1 - sigmoid(sum_o1))
                d_ypred_d_b3 = sigmoid(sum_o1) * (1 - sigmoid(sum_o1))

                d_ypred_d_h1 = self.o1.weights[0] * sigmoid(sum_o1) * (1 - sigmoid(sum_o1))
                d_ypred_d_h2 = self.o1.weights[1] * sigmoid(sum_o1) * (1 - sigmoid(sum_o1))

                # Neuron h1
                d_h1_d_w1 = x[0] * sigmoid(sum_h1) * (1 - sigmoid(sum_h1))
                d_h1_d_w2 = x[1] * sigmoid(sum_h1) * (1 - sigmoid(sum_h1))
                d_h1_d_b1 = sigmoid(sum_h1) * (1 - sigmoid(sum_h1))

                # Neuron h2
                d_h2_d_w3 = x[0] * sigmoid(sum_h2) * (1 - sigmoid(sum_h2))
                d_h2_d_w4 = x[1] * sigmoid(sum_h2) * (1 - sigmoid(sum_h2))
                d_h2_d_b2 = sigmoid(sum_h2) * (1 - sigmoid(sum_h2))

                # --- Update weights and biases
                # Neuron h1
                self.h1.weights[0] -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w1
                self.h1.weights[1] -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w2
                self.h1.bias -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_b1

                # Neuron h2
                self.h2.weights[0] -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w3
                self.h2.weights[1] -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w4
                self.h2.bias -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_b2

                # Neuron o1
                self.o1.weights[0] -= learn_rate * d_L_d_ypred * d_ypred_d_w5
                self.o1.weights[1] -= learn_rate * d_L_d_ypred * d_ypred_d_w6
                self.o1.bias -= learn_rate * d_L_d_ypred * d_ypred_d_b3

            # End of epoch, potentially log progress here

network = NeuralNetwork()
data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])  # Example input data
all_y_trues = np.array([0, 1, 1, 0])  # Example target output
network.train(data, all_y_trues)

In this expanded example, the train method of the NeuralNetwork class iterates through the training data for a specified number of epochs. During each iteration, it performs forward propagation to calculate the output, and then it computes the gradients for backpropagation using the chain rule. The weights and biases of each neuron are updated in the direction that reduces the loss, as determined by the mean squared error function.

The Role of Hyperparameters

In the code above, you’ll notice values like learning rate and the number of epochs. These are called hyperparameters, and they play a crucial role in the training process. Selecting the right hyperparameters can significantly affect the performance and accuracy of the neural network.

Training a Neural Network: A Delicate Balance

Training neural networks is a delicate balance between underfitting and overfitting. Underfitting occurs when the network does not learn the underlying pattern of the data, while overfitting happens when the network learns the noise in the training data as if it were a pattern, leading to poor performance on new, unseen data.

Debugging and Optimizing Neural Networks

Debugging a neural network can be challenging due to its “black box” nature. Visualization tools like TensorBoard, and techniques like gradient checking, can be invaluable. Additionally, experimenting with different network architectures, activation functions, and optimization algorithms can lead to significant improvements.

Conclusion and Further Exploration

We have covered the basics of neural networks, their structure, and the process of training them, including backpropagation and gradient descent. This journey into neural networks is just the beginning. For further exploration, consider delving into:

  1. Convolutional Neural Networks (CNNs): Ideal for image recognition and processing.
  2. Recurrent Neural Networks (RNNs): Suited for time-series data and natural language processing.
  3. Transfer Learning: Using a pre-trained network on a new problem.

Additional Resources

  1. “Neural Networks and Deep Learning” by Michael Nielsen. Online Book
  2. TensorFlow and PyTorch: Explore these frameworks for practical implementation of more complex networks.
  3. Online courses such as Andrew Ng’s Machine Learning on Coursera or Deep Learning Specialization for more structured learning.
  4. Andrej Karpathy’s YouTube channel: https://www.youtube.com/@AndrejKarpathy

Remember, the field of neural network research and development is vast and continuously evolving. Staying engaged through constant learning and experimentation is essential to keep pace with this dynamic area of technology. Wishing you an enjoyable learning journey!


About PullRequest

HackerOne PullRequest is a platform for code review, built for teams of all sizes. We have a network of expert engineers enhanced by AI, to help you ship secure code, faster.

Learn more about PullRequest

PullRequest headshot
by PullRequest

February 22, 2024