# Gradient Descent for Logistic Regression

### Implementation of Gradient Descent for optimizing Logistic Regression Cost Function

#### Assumptions:

1. For the sake of simplicity, assume that there are only two features (the algorithm will generalize over $m$ training examples). Vectorized notations will take care of multiple features and training examples.
2. Also, to make the notations simple, the derivative of the cost function $\frac{\partial J(w,b)}{\partial x}$with respect to a variable ‘x’ will be written as $dx$

#### Logistic Regression: Derivative calculation with two examples

Input: $x_1, x_2$ Parameters: $w_1, w_2, b$ $z = w_1x_1 + w_2x_2+b$ –> $a = \sigma(z)$ –> $L(a,y)$

#### Objective: Calculate the derivative of loss function w.r.t. $w_1, w_2$ & $b$

Backpropagating Step By Step:

• Calculate $\frac{\partial L(a,y)}{\partial a}$ or $da$
• Calculate $dz$
• Calculate $dw_1, dw_2, db$

$dw_1 = x_1dz$
$dw_2 = x_2dz$
$db = dz$

#### Looping over m examples: Pseudocode

Initialize: $J=0 ; dw_1=0 ; dw_2=0 ; db=0$

for i = 1 to m
$z^{(i)}$ = $w^Tx^{(i)}$ + $b$
$a^{(i)}$ = $\sigma(z^{(i)})$
$J+=-(y^{(i)}log a^{(i)} + (1-y^{(i)})log (1-a^{(i)})$
$dz^{(i)}$ = $a^{(i)}-y^{(i)}$
$dw_1+ = x_1^{(i)}dz^{(i)}$
$dw_2+ = x_2^{(i)}dz^{(i)}$
$db+= dz^{(i)}$

J/=m
$dw_1$/=m
$dw_2$/=m
db/=m

$w_1: w_1 - \alpha dw_1$
$w_2: w_2 - \alpha dw_2$
b = b- $\alpha$ db

$\alpha$ is the learning rate.

Written on November 29, 2017
]