Skip to the content.

Implementing Logistic Regression Using Backpropagation

Using Neural Network and Backpropagation to implement Logistic Regression algorithm

Logistic Regression is one of the most used classification technique used in Data Science. Its most probably one of the first few algorithm anyone learns while starting with Data Science or machine learning (think of “Hello World!” while learning a new language).

This post assumes that you are well versed in implementing logistic regression, atleast the basics (I’ll write another post later for basic logistic regression implementation). This is about how to implement Logistic Regression using the backpropagation algorithm and neural network architecture.

The full code can be found at the following Github Repo

We will be going forward as per the following steps:

  1. Define the architecture
  2. Write the Sigmoid Function
  3. Initialize the parameters W and b
  4. Write the cost function, and minimize it while learning the parameters
  5. Use the learnt parameters to predict new data

Except the basic Numpy and pandas library, we won’t be using anything else and will write each function from scratch.

Write the sigmoid function

Sigmoid function is the one which is used in Logistic Regression, though it is just one of the many activation functions used in the activation layers of a Deep neural network (losing its place to fast alternatives like ReLU – Rectified Linear Unit). A sigmoid function takes input a number and outputs another number between 0 and 1 (great for predicting probabilities)

def sigmoid(z):
    sig = 1/(1+np.exp(-1*z))
    return sig

Intitialize the parameters W and b

The cost function for logistic regression is represented as

Loss Function for one example:

Cost Function: Summing over the loss function for m examples:

W and b and weight matrices applied to the input vector X. Forget the summation in the above cost function, if you are working with matrices, typically a matrix multiplication is used which is essentially the same thing

Input: number of features (dimension of W)
def initialize_with_zeros(dim):
    w = np.zeros(dim)
    return w,b
Cost Function, and Forward and backward Propagation


  1. Write the cost function, perform a forward propagation
  2. Find dw and db for later use in backpropagation for Gradient Descent algorithm
  3. Calculate cost, return cost and gradients for GD

Forward Propagation and Backward Propagation

def propagate(w,b,X,Y):
    m=X.shape[1]#number of training examples
    #Forward Propagation
    A = sigmoid(,X)+b)
    cost = (-1/m)*np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
    #BackPropagation: Find individual derivatives of W and B w.r.t. cost for updating weights 
    dw = (1/m)*,(A-Y).T)
    db = (1/m)*np.sum(A-Y)
    cost = np.squeeze(cost)
    #Return dw and db in a disctionary for later usage
    grads = {"dw":dw,"db":db}
    return grads, cost

Optimization function with weights updation for parameters using GradientDescent

This is where the weights are updated and cost is optimized

Parameters: X: Input data set Y: Target W: Weight matrix B: bias matrix

Num_iterations: Number of iterations Learning_rate: Alpha: the learning rate of a Gradient Descent Algorithm


  1. Loop through the number of iterations
  2. Forward pass: pass the parameters X,Y, W,b to the Forward propagation algorithm to get grads(dw,db) and cost
  3. Update the parameters dw and db
  4. Record the cost in the array costs[]
  5. After all the iterations, retrieve params (W,b) and grads (dw,db)
def optimize(w,b, X, Y, num_iterations, learning_rate, print_cost=True):
    costs=[] # array for storing costs
    for i in range(num_iterations):
        #Cost and gradient calculation
        grads,cost = propagate(w,b,X,Y)
        #Retrieve derivatives from backprop algo 
        dw = grads["dw"]
        db = grads["db"]
        #Update the parameters
        w = w-learning_rate*dw
        b = b-learning_rate*db
        #record costs for each 100 iteration
        if i%100==0:
        #Print Cost
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    #Passthe parameters and grads after the GradDesc Algo is complete
    return params, grads, costs

Predict on New Data

Now the algorithm is trained on the data and the parameters W and b are optimized, we can use it to predict for a new dataset

def predict(w,b,X):
    m=X.shape[1] #number of training examples
    Y_pred = np.zeros((1,m))
    w = w.reshape(X.shape[0],1)
    #Compute the vector A after the first pass
    for i in range(A.shape[1]):
        #Convert probabilities to actual prediction
    return Y_pred

All functions merged into a single function

def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate = 0.5, print_cost=True):
    #Step by step call to other functions
    w, b = initialize_with_zeros(X_train.shape[0])
    #Gradient Descent
    parameters, grads, costs = optimize(w, b, X_train, Y_train,num_iterations = num_iterations, learning_rate = learning_rate, print_cost = True)
    w = parameters["w"]
    b = parameters["b"]
    # Predict test/train set examples
    Y_prediction_test = predict(w,b,X_test)
    Y_prediction_train = predict(w,b,X_train)
    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    return d

#Plot Costs
costs = np.squeeze(d['costs'])
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))

Written on November 28, 2017