CNN - Inception Network
Motivation for Inception Network
Instead of deciding whether to use a 1x1 convolution, or a 3x3 or a 5x5 Convolution, or whether to use a Pooling layer - Why not use all of them?
In the example above, all the filters are applied to the input to generate a stacked outpu, whioch contains the output of each filter stacked on top of each other. The Padding is kept at ‘same’ to ensure that the output from all the filters are of the same size.
Disadvantage: Huge memory cost
Solving the problem of memory cost
For example, the computational cost of th 5x5 filer in the above diagram:
Filter: Conv 5x5x192, same, 32
Total number of calculations = (28 * 28 * 32) * (5 * 5 * 192 ) = 120 Million !!
Using 1x1 Convolution to reduce computation cost
A 1x1 convolution is added before the 5x5 cvonvolution -= Also called a bottleneck layer
Total number of calculations = [(28 * 28 * 16) * (1 * 1 * 192)] + [(28 * 28 * 32) * (5 * 5 * 16)] = 12.4 Million !! (earlier the cost was 120 Million)
GoogLeNet - Homage to Yann LeCunn’s LeNet
The name actually comes from the movie Inception