Skip to the content.

Object Localization & Object Detection

Object localization and Object detection with Sliding Window Algorithm, and its Convolutional Implementation

Test Image

Object Localization

Generating boundary boxes for an object in an image.

Bounding Box : Parameters

top left: (0,0)
bottom right: (1,1)
Mid Point: $(b_x,b_y)$
Height: $b_h$
Width: $b_w$

All Parameters : $(b_x,b_y, b_h, b_w)$

$P_c$ = 1 if an object is present $C_1, C_2, C_3..$: Class labels

Test Image

For the eaxmple case, say there are three classes Pedestrian ($C_1$), Car($C_2$) and Bike ($C_3$). If there is no object all 3 classes $C_1, C_2, C_3$ will be equal to 0.

$P_c$ = 1 is any of the three objects are present in the image, =0 if no object and only background is present.

Output unit:

y = $\left[ \eqalign{P_c\cr b_x\cr b_y\cr b_h\cr b_w\cr C_1\cr C_2 \cr C_3} \right]$

For examples, for object Car with $b_x, b_y, b_h, b_w$ parameters, y = $\left[ \eqalign{1 \cr b_x\cr b_y\cr b_h\cr b_w\cr 0\cr 1 \cr 0} \right]$

For no object, y = $\left[ \eqalign{0\cr ?\cr ?\cr ?\cr ?\cr ?\cr ? \cr ?} \right]$ $\rightarrow$ ‘?’ means don’t care about the value

Loss Functions

$L(\hat y, y) = {(\hat y_1 - y_1)^2 + … + (\hat y_8 - y_8)^2 , if y_1 =1 }$

$L(\hat y, y) = {(\hat y_1 - y_1)^2, if y_1=0 }$ (since $P_c$ = 0, all other values are equal to ‘?’)

Object Detection

Sliding window detection algorithm

Pass a window over image with a certain stride. Start with a small window and move it all over the image, then increase the size of window and move it all over the image.

Disadvantage: Huge computation cost, slow algortihm

Convolutional implementation of sliding window - Combines the sliding window computation in one step

Test Image

Test Image

Test Image

Source material from Andrew NG’s awesome course on Coursera. The material in the video has been written in a text form so that anyone who wishes to revise a certain topic can go through this without going through the entire video lectures.

Written on December 22, 2017
]