Building a Neural Network from Scratch: A Practical Guide using NumPy

Intermediate6m readFull-stack developers

Building a Neural Network from Scratch: A Practical Guide using NumPy

Primary Focus

development

AI Tools Covered

aiprogrammingtutorial

What You'll Learn

  • Building a Neural Network from Scratch: A Practical Guide using NumPy

Guide Curriculum

Building a Neural Network from Scratch: A Practical Guide using NumPy

Deep learning often feels like a "black box" because modern libraries like PyTorch and TensorFlow handle the complex mathematics of gradients and backpropagation automatically. However, to truly maste

1 lessons
  • Building a Neural Network from Scratch: A Practical Guide using NumPy4m

Preview: First Lesson

Building a Neural Network from Scratch: A Practical Guide using NumPy

Building a Neural Network from Scratch: A Practical Guide using NumPy

Building a Neural Network from Scratch: A Practical Guide using NumPy

Deep learning often feels like a "black box" because modern libraries like PyTorch and TensorFlow handle the complex mathematics of gradients and backpropagation automatically. However, to truly master neural networks, you must understand the underlying mechanics: how data flows forward and how errors are propagated backward to update weights.

In this guide, you will learn how to implement a basic Multi-Layer Perceptron (MLP) using only Python and NumPy, covering the fundamental concepts of weights, biases, activation functions, and the backpropagation algorithm.

1. The Building Blocks: Weights, Biases, and Activations

A neural network is essentially a series of mathematical transformations. To build one, you need three core components:

  • Weights ($W$): These represent the strength of the connection between neurons. During training, the network adjusts these values to learn patterns.
  • Biases ($b$): These are additive values that allow the activation function to be shifted left or right, helping the model fit the data more flexibly.
  • Activation Functions: These introduce non-linearity into the model. Without them, a neural network—no matter how many layers it has—would behave like a simple linear regression model. Common functions include Sigmoid (which squashes values between 0 and 1) and ReLU (which outputs the input directly if it is positive).

2. The Forward Pas

Free Access

Start learning with this comprehensive guide

This guide includes:

1 module with 1 lessons
6m estimated reading time

About the Author

H
✨ Vibe Coder
@hiram-clark

Hiram Clark is the founder and managing editor of vybecoding.ai and sets editorial direction for the guides and news published here. Articles are drafted with AI assistance and edited before publication. He works hands-on with the AI development tools, workflows, and infrastructure covered on the site.

Full Guide Content

Complete lesson text — start the interactive course above for exercises and progress tracking.

Module 1Building a Neural Network from Scratch: A Practical Guide using NumPy

1.1Building a Neural Network from Scratch: A Practical Guide using NumPy

Building a Neural Network from Scratch: A Practical Guide using NumPy

Deep learning often feels like a "black box" because modern libraries like PyTorch and TensorFlow handle the complex mathematics of gradients and backpropagation automatically. However, to truly master neural networks, you must understand the underlying mechanics: how data flows forward and how errors are propagated backward to update weights.

In this guide, you will learn how to implement a basic Multi-Layer Perceptron (MLP) using only Python and NumPy, covering the fundamental concepts of weights, biases, activation functions, and the backpropagation algorithm.

1. The Building Blocks: Weights, Biases, and Activations

A neural network is essentially a series of mathematical transformations. To build one, you need three core components:

  • Weights ($W$): These represent the strength of the connection between neurons. During training, the network adjusts these values to learn patterns.
  • Biases ($b$): These are additive values that allow the activation function to be shifted left or right, helping the model fit the data more flexibly.
  • Activation Functions: These introduce non-linearity into the model. Without them, a neural network—no matter how many layers it has—would behave like a simple linear regression model. Common functions include Sigmoid (which squashes values between 0 and 1) and ReLU (which outputs the input directly if it is positive).

2. The Forward Pass: Moving Data Through the Network

The forward pass is the process of calculating the output of the network based on the input data. For any given layer, the calculation follows this pattern:

  1. Linear Transformation: Multiply the input vector by the weight matrix and add the bias:

$$z = (Input \cdot W) + b$$

  1. Non-linear Activation: Pass the result ($z$) through an activation function ($\sigma$):

$$a = \sigma(z)$$

This process repeats layer by layer until the final output layer produces the prediction.

3. Backpropagation: The Engine of Learning

The goal of training is to minimize the Loss Function (the difference between the prediction and the actual target). Backpropagation is the application of the Chain Rule from calculus to determine how much each weight and bias contributed to the total error.

The process involves:

  1. Calculating the Error Gradient: Determining the derivative of the loss with respect to the output.
  2. Propagating the Gradient Backward: Moving from the output layer back toward the input layer, calculating the gradient for each weight and bias.
  1. Weight Update: Adjusting the weights in the opposite direction of the gradient using a Learning Rate ($\eta$):

$$W_{new} = W_{old} - (\eta \cdot \text{gradient})$$

4. Implementation in Python

Below is a simplified implementation of a single-layer neural network (a perceptron) to demonstrate the logic.

import numpy as np

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size, learning_rate=0.1):
        self.lr = learning_rate
        
        # Initialize weights and biases with random values
        self.W1 = np.random.randn(input_size, hidden_size)
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size)
        self.b2 = np.zeros((1, output_size))

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        return x * (1 - x)

    def forward(self, X):
        # Layer 1 (Hidden)
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.sigmoid(self.z1)
        
        # Layer 2 (Output)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.sigmoid(self.z2)
        return self.a2

    def backward(self, X, y, output):
        # Calculate error at output layer
        error_out = y - output
        d_output = error_out * self.sigmoid_derivative(output)

        # Calculate error at hidden layer
        error_hidden = d_output.dot(self.W2.T)
        d_hidden = error_hidden * self.sigmoid_derivative(self.a1)

        # Update weights and biases using gradients
        self.W2 += self.a1.T.dot(d_output) * self.lr
        self.b2 += np.sum(d_output, axis=0, keepdims=True) * self.lr
        self.W1 += X.T.dot(d_hidden) * self.lr
        self.b1 += np.sum(d_hidden, axis=0, keepdims=True) * self.lr

# Example Usage: Training on a simple XOR-like problem
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([[0], [1], [1], [0]])

nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)

for epoch in range(10000):
    output = nn.forward(X)
    nn.backward(X, y, output)

print("Final Predictions:")
print(nn.forward(X))

Conclusion and Next Steps

By building this from scratch, you have implemented the fundamental loop of all modern AI: Forward Pass $\rightarrow$ Loss Calculation $\rightarrow$ Backpropagation $\rightarrow$ Optimization.

Next Steps to deepen your knowledge:
  1. Experiment with Activations: Replace sigmoid with ReLU and observe how it affects convergence speed.
  2. Add More Layers: Expand the NeuralNetwork class to support an arbitrary number of hidden layers.
  3. Implement Optimizers: Move beyond basic Stochastic Gradient Descent (SGD) by implementing Momentum or Adam.
  4. Loss Functions: Implement Cross-Entropy Loss, which is more effective for classification tasks than Mean Squared Error (MSE).