Tensorflow Glossary Part 2 - Optimizer Functions
A short list and details of the most commonly used optimizer functions used in TensorFlow. Not all functions are listed here.
The detailed list can be found on the official TensorFlow api guide here
Base class for optimizers. This class defines the API to add Ops to train a model. You never use this class directly, but instead instantiate one of its subclasses such as GradientDescentOptimizer, AdagradOptimizer, or MomentumOptimizer.
Construct a new gradient descent optimizer. This has several methods associated to it like minimize()
minimize( loss, global_step=None, var_list=None, gate_gradients=GATE_OP, aggregation_method=None, colocate_gradients_with_ops=False, name=None, grad_loss=None )
Other methods are apply_gradients, compute_gradients
Optimizer that implements the Adagrad algorithm.
Optimizer that implements the Momentum algorithm.
Optimizer that implements the Adam algorithm.
Optimizer that implements the RMSProp algorithm.
Clips tensor values to a specified min and max.
tf.clip_by_value( t, clip_value_min, clip_value_max, name=None )
tf.train.exponential_decay( learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None )
Applies exponential decay to the learning rate.
When training a model, it is often recommended to lower the learning rate as the training progresses. This function applies an exponential decay function to a provided initial learning rate. It requires a global_step value to compute the decayed learning rate. You can just pass a TensorFlow variable that you increment at each training step.
The function returns the decayed learning rate decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
global_step = tf.Variable(0, trainable=False) starter_learning_rate = 0.1 learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 100000, 0.96, staircase=True) # Passing global_step to minimize() will increment it at each step. learning_step = ( tf.train.GradientDescentOptimizer(learning_rate) .minimize(...my loss..., global_step=global_step) )
Maintains moving averages of variables by employing an exponential decay.
When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.
The moving averages are computed using exponential decay. You specify the decay value when creating the ExponentialMovingAverage object. The shadow variables are initialized with the same initial values as the trained variables. When you run the ops to maintain the moving averages, each shadow variable is updated with the formula:
shadow_variable -= (1 - decay) * (shadow_variable - variable)
This is mathematically equivalent to the classic formula below, but the use of an assign_sub op (the “-=” in the formula) allows concurrent lockless updates to the variables:
shadow_variable = decay * shadow_variable + (1 - decay) * variable
Reasonable values for decay are close to 1.0, typically in the multiple-nines range: 0.999, 0.9999, etc.