CNN Filters, Pooling, Padding, and Strides

There are multiple building blocks in a CNN architecture. With each operation (Fiters, pooling, convolution etc), the dimension of output matrix changes. It is extremely important to keep track of matrix dimensions to make sure the calculations are done in a correct way

Imput Image size: [$n_x,n_y,n_c$]

Input: n x n
Filter size: f x f

Output: (n-f+1, n-f+1)

Filter size f is usually odd: 1x1, 3x3, 5x5, 7x7

Valid: Output size< Input Size Same: Output size = Input Size

p: Padding size

Output: (n+2p-f+1,n+2p-f+1)

Since n+2p-f+1 = n; p = $\frac{f-1}{2}$


Stride Length : s


Convolution over Volumes

Number of channels in input image and the filters must be the same

$[n_x,n_y,n_c] * [f,f,n_c] = [\frac{n_x+2p-f}{s}+1,\frac{n_y+2p-f}{s}+1, n_c’]$ where $n_c’$ = number of filters used


There is no parameter to learn in the pooling layer

$[n_h,n_w,n_c] \xrightarrow{Pooling} [\frac{n_h+2p-f}{s}+1,\frac{n_w+2p-f}{s}+1, n_c]$

Pooling Reduces the dimention of the input matrix.

CNN Architecture with all layers : LeNet-5

LeNet-5 was developed by Yann LeCun and shows very good result for recognizing handwritten digits with MNIST. The code for implementing LeNet will follow this blog and I’ll post the link to it as soon as it is available

LeNet-5 Architecture

Written on December 20, 2017