# Hyperparameter Tuning

HYperparameter tuning in Deep Learning: Learning rate $\alpha$, $\beta , \beta_1, \beta_2, \epsilon$, number of layers, number of hidden units, learning rate decay, mini-batch size

#### Some parameters are more important than the others.

- Learning Rate $\alpha$
- Momentum term $\beta$, # hidden units, mini batch size
- #layers, learning rate decay
- $\beta_1$ = 0.9, $\beta_2$ = 0.999, $\epsilon = 10^{-8}$

#### Grid Search: Don’t use a grid, use random values

#### Coarse to fine sampling scheme - zoom in to a smaller region of the hyperparameters giving best results, and create a random grid within the small square

#### Importance of picking appropriate scale to pick Hyperparameters

Lets see we are trying to tune number of layers $n^{[l]}$ from somewhere between 50 to 100, or #layers between 2-4 can be used. In such cases sampling uniformly at random makes sense. This might not be true for all hyperparameters

For example, Learning Rate $\alpha$ - say is between 0.0001 to 1

In such case, search for parameters on a log scale rather than uniform scale, in this case all possible features can be learnt through all the scales

python implementation

```
r = -4*np.random.randn #r will be between [-4,0]
alpha = 10^r
```

alpha will be between $10^{-4}..10^0$

##### Generalization for a log scale

If you have to search between $10^a$ and $10^b$, where a and b are the ends of the scale

In the above case, a = $log_{10}0.0001$ = 4 and b = $log_{10}1$ = 0

r will be between [-4,0] or [a,b]

#### Hyperparameter for exponentially weighted averages $\beta$

$\beta$ = 0.9 …. ..0.999

0.9 : Averaging over last 10 days value

0.999: Averaging over last 1000 days value

Similar to log scale

Exploring the values of 1-$\beta$ = 0.1…. 0.001

r will belong to [-3,-1]

1-$\beta$ = $10^r$

$\beta$ = 1-$10^r$

#### Pandas vs Caviar

- Re-test hyperparameters occasionally, intutions do get stale

Approaches:

- Babysitting one model and keep working on it (less computational capacity case) [Pandas]
- Train multiple models in parallel [Caviar]