Skip to the content.

Hyperparameter Tuning

HYperparameter tuning in Deep Learning: Learning rate $\alpha$, $\beta , \beta_1, \beta_2, \epsilon$, number of layers, number of hidden units, learning rate decay, mini-batch size

Some parameters are more important than the others.

  1. Learning Rate $\alpha$
  2. Momentum term $\beta$, # hidden units, mini batch size
  3. #layers, learning rate decay
  4. $\beta_1$ = 0.9, $\beta_2$ = 0.999, $\epsilon = 10^{-8}$

Grid Search: Don’t use a grid, use random values

Coarse to fine sampling scheme - zoom in to a smaller region of the hyperparameters giving best results, and create a random grid within the small square

Importance of picking appropriate scale to pick Hyperparameters

Lets see we are trying to tune number of layers $n^{[l]}$ from somewhere between 50 to 100, or #layers between 2-4 can be used. In such cases sampling uniformly at random makes sense. This might not be true for all hyperparameters

For example, Learning Rate $\alpha$ - say is between 0.0001 to 1
In such case, search for parameters on a log scale rather than uniform scale, in this case all possible features can be learnt through all the scales

python implementation

r = -4*np.random.randn  #r will be between [-4,0]  
alpha = 10^r  

alpha will be between $10^{-4}..10^0$

Generalization for a log scale

If you have to search between $10^a$ and $10^b$, where a and b are the ends of the scale
In the above case, a = $log_{10}0.0001$ = 4 and b = $log_{10}1$ = 0
r will be between [-4,0] or [a,b]

Hyperparameter for exponentially weighted averages $\beta$

$\beta$ = 0.9 …. ..0.999
0.9 : Averaging over last 10 days value
0.999: Averaging over last 1000 days value

Similar to log scale
Exploring the values of 1-$\beta$ = 0.1…. 0.001
r will belong to [-3,-1]
1-$\beta$ = $10^r$
$\beta$ = 1-$10^r$

Pandas vs Caviar

  • Re-test hyperparameters occasionally, intutions do get stale


  1. Babysitting one model and keep working on it (less computational capacity case) [Pandas]
  2. Train multiple models in parallel [Caviar]

Source material from Andrew NG’s awesome course on Coursera. The material in the video has been written in a text form so that anyone who wishes to revise a certain topic can go through this without going through the entire video lectures.

Written on December 16, 2017