Convolution Neural Network - Overview
Convolution Neural Networks are a class of neural network which take into account not just the vector of inputs, but also takes into account the spatial arrangement of data. An example would be a transactional data which can be analyzed using a tradition neural network where the spatial arrangement or order of inputs does not mattr, as compared to images which are almost exclusively analyze using CNN where the arrangement of pixels around each other is of paramount importance (infact this is exactly what makes up an image).
The CNN are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. The whole network still expresses a single differentiable score function: from the raw image pixels on one end to class scores at the other. And they still have a loss function (e.g. SVM/Softmax) on the last (fully-connected) layer and all the tips/tricks we developed for learning regular Neural Networks still apply.
So what does change? ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.
CNN Architecture: (source : http://cs231n.github.io/convolutional-networks/)
Computers see image as a matrix of numbers across 3 channels (R,G,B). A filter is a smaller matrix which passes over the input with a specified stride legnth, with a given padding - valid or same. The output after passing each filter over the entire input ($n_x,n_y$,num_channels) is a matrix of smaller dimension (details on dimension in the next blog). A number of these filters would give a stack of matrices, on which further operations like pooling can be applied, or they can be passed through further filters. Later, in the final layer, these stack of matrices are flattened into a single dimension vector, and the output could be single or multiple classes, and appropriate cost function can be used.