Jekyll2018-08-06T10:44:47+00:00http://ashukumar27.io/Ashutosh KumarData Science - Machine Learning - Deep Learning.Saving and Loading Models in TensorFlow & Keras2018-05-19T00:00:00+00:002018-05-19T00:00:00+00:00http://ashukumar27.io/save-load-models<p>Saving and Loading model is one of the key components of building Deep Learning Solutions. Not only they are used in model deployments, but also in Transfer Learning. The already trained model from Millions or Billions of records can be saved and used by others who want to just deploy and use the model or do not have access to huge amount or training data.</p>
<p>There are different ways in which models are saved in Keras and Tensorflow, which are outlined below</p>
<h2 id="saving-and-loading-models-in-tensorflow">Saving and loading Models in Tensorflow</h2>
<h2 id="saving-and-loading-model-in-keras">Saving and Loading Model in Keras</h2>
<p>Keras models can be saved is json and yaml format with weights saved separately in .h5 format.</p>
<p>the code is below</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="kn">from</span> <span class="nn">keras.models</span> <span class="kn">import</span> <span class="n">model_from_json</span>
<span class="kn">from</span> <span class="nn">keras.models</span> <span class="kn">import</span> <span class="n">model_from_yaml</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model_json</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">to_json</span><span class="p">()</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">"model.json"</span><span class="p">,</span><span class="s">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">json_file</span><span class="p">:</span>
<span class="n">json_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">model_json</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">save_weights</span><span class="p">(</span><span class="s">"model.h5"</span><span class="p">)</span>
</code></pre></div></div>
<p>Load Models</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">json_file</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">"model.json"</span><span class="p">,</span><span class="s">"r"</span><span class="p">)</span>
<span class="n">loaded_model_json</span> <span class="o">=</span> <span class="n">json_file</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">json_file</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">loaded_model</span> <span class="o">=</span> <span class="n">model_from_json</span><span class="p">(</span><span class="n">loaded_model_json</span><span class="p">)</span>
<span class="n">loaded_model</span><span class="o">.</span><span class="n">load_weights</span><span class="p">(</span><span class="s">"model.h5"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Loaded model from disk"</span><span class="p">)</span>
</code></pre></div></div>
<p>The Code is posted <a href="https://github.com/ashukumar27/Tensorflow/blob/master/100Days/R1D11_Keras_SaveLoad_Model.py">here</a></p>Saving and Loading model is one of the key components of building Deep Learning Solutions. Not only they are used in model deployments, but also in Transfer Learning. The already trained model from Millions or Billions of records can be saved and used by others who want to just deploy and use the model or do not have access to huge amount or training data.Visualizing Tensorflow Graph and Saving/Loading Models2018-05-07T00:00:00+00:002018-05-07T00:00:00+00:00http://ashukumar27.io/tf-tfboard<p>The graph generated in a session in Tensorflow can be vizualized using a Tensorboard which generates the Graph model defined in the code in a UI. The standard way is to save the graph on disk in a file, and then load the file via the tensorboard command which runs Tensorboard on <strong>6006</strong> port (which looks like <strong>g00g</strong>le)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">Session</span><span class="p">()</span> <span class="k">as</span> <span class="n">sess</span><span class="p">:</span>
<span class="n">writer</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">FileWriter</span><span class="p">(</span><span class="s">"/Users/ashutosh/datasets/tensorboard/"</span><span class="p">,</span> <span class="n">sess</span><span class="o">.</span><span class="n">graph</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
<span class="n">writer</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div></div>
<h4 id="terminal-">Terminal :</h4>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>tensorboard --logdir="/Users/ashutosh/datasets/tensorboard/"
</code></pre></div></div>
<h4 id="server-running-on-browser-httplocalhost6006">Server Running on Browser: http://localhost:6006/</h4>
<p>The code for a sample Tensorboard application is posted <a href="https://github.com/ashukumar27/Tensorflow/blob/master/100Days/R1D10_TF_Tensorboard.py">here</a></p>The graph generated in a session in Tensorflow can be vizualized using a Tensorboard which generates the Graph model defined in the code in a UI. The standard way is to save the graph on disk in a file, and then load the file via the tensorboard command which runs Tensorboard on 6006 port (which looks like g00gle)Tensorflow Glossary Part 3 - Loss functions2018-04-26T00:00:00+00:002018-04-26T00:00:00+00:00http://ashukumar27.io/tg-glossary-loss-functions<p>Loss functions are the key to optimizing any machine learning algorithm in Tensorflow. It is important to select the right loss function for any machine learning problem which is then fed into the different <a href="http://www.ashukumar27.io/tf-glossary-optimizers/">optimizer functions</a>.</p>
<h3 id="tflossescosine_distance">tf.losses.cosine_distance</h3>
<p>Adds a cosine-distance loss to the training procedure. (deprecated arguments)</p>
<p>Weighted loss float Tensor. If reduction is NONE, this has the same shape as labels; otherwise, it is scalar.</p>
<h3 id="tflossesget_regularization_loss">tf.losses.get_regularization_loss</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">losses</span><span class="o">.</span><span class="n">get_regularization_loss</span><span class="p">(</span>
<span class="n">scope</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="s">'total_regularization_loss'</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Gets the total regularization loss.</p>
<h3 id="tflosseshinge_loss">tf.losses.hinge_loss</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">losses</span><span class="o">.</span><span class="n">hinge_loss</span><span class="p">(</span>
<span class="n">labels</span><span class="p">,</span>
<span class="n">logits</span><span class="p">,</span>
<span class="n">weights</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span>
<span class="n">scope</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">loss_collection</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">GraphKeys</span><span class="o">.</span><span class="n">LOSSES</span><span class="p">,</span>
<span class="n">reduction</span><span class="o">=</span><span class="n">Reduction</span><span class="o">.</span><span class="n">SUM_BY_NONZERO_WEIGHTS</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Adds a hinge loss to the training procedure.</p>
<h3 id="tflosseshuber_loss">tf.losses.huber_loss</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">losses</span><span class="o">.</span><span class="n">huber_loss</span><span class="p">(</span>
<span class="n">labels</span><span class="p">,</span>
<span class="n">predictions</span><span class="p">,</span>
<span class="n">weights</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span>
<span class="n">delta</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span>
<span class="n">scope</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">loss_collection</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">GraphKeys</span><span class="o">.</span><span class="n">LOSSES</span><span class="p">,</span>
<span class="n">reduction</span><span class="o">=</span><span class="n">Reduction</span><span class="o">.</span><span class="n">SUM_BY_NONZERO_WEIGHTS</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Adds a Huber Loss to the training procedure</p>
<h3 id="tflosseslog_loss">tf.losses.log_loss</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">losses</span><span class="o">.</span><span class="n">log_loss</span><span class="p">(</span>
<span class="n">labels</span><span class="p">,</span>
<span class="n">predictions</span><span class="p">,</span>
<span class="n">weights</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span>
<span class="n">epsilon</span><span class="o">=</span><span class="mf">1e-07</span><span class="p">,</span>
<span class="n">scope</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">loss_collection</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">GraphKeys</span><span class="o">.</span><span class="n">LOSSES</span><span class="p">,</span>
<span class="n">reduction</span><span class="o">=</span><span class="n">Reduction</span><span class="o">.</span><span class="n">SUM_BY_NONZERO_WEIGHTS</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Adds a Log Loss term to the training procedure.</p>
<p>weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of size [batch_size], then the total loss for each sample of the batch is rescaled by the corresponding element in the weights vector. If the shape of weights matches the shape of predictions, then the loss of each measurable element of predictions is scaled by the corresponding value of weights.</p>
<h3 id="tflossesmean_squared_error">tf.losses.mean_squared_error</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">losses</span><span class="o">.</span><span class="n">mean_squared_error</span><span class="p">(</span>
<span class="n">labels</span><span class="p">,</span>
<span class="n">predictions</span><span class="p">,</span>
<span class="n">weights</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span>
<span class="n">scope</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">loss_collection</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">GraphKeys</span><span class="o">.</span><span class="n">LOSSES</span><span class="p">,</span>
<span class="n">reduction</span><span class="o">=</span><span class="n">Reduction</span><span class="o">.</span><span class="n">SUM_BY_NONZERO_WEIGHTS</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Adds a Sum-of-Squares loss to the training procedure.</p>
<p>weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of size [batch_size], then the total loss for each sample of the batch is rescaled by the corresponding element in the weights vector. If the shape of weights matches the shape of predictions, then the loss of each measurable element of predictions is scaled by the corresponding value of weights.</p>
<h3 id="tflossessigmoid_cross_entropy">tf.losses.sigmoid_cross_entropy</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">losses</span><span class="o">.</span><span class="n">sigmoid_cross_entropy</span><span class="p">(</span>
<span class="n">multi_class_labels</span><span class="p">,</span>
<span class="n">logits</span><span class="p">,</span>
<span class="n">weights</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span>
<span class="n">label_smoothing</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
<span class="n">scope</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">loss_collection</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">GraphKeys</span><span class="o">.</span><span class="n">LOSSES</span><span class="p">,</span>
<span class="n">reduction</span><span class="o">=</span><span class="n">Reduction</span><span class="o">.</span><span class="n">SUM_BY_NONZERO_WEIGHTS</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Creates a cross-entropy loss using tf.nn.sigmoid_cross_entropy_with_logits.</p>
<p>weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.</p>
<h3 id="tflossessoftmax_cross_entropy">tf.losses.softmax_cross_entropy</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">losses</span><span class="o">.</span><span class="n">softmax_cross_entropy</span><span class="p">(</span>
<span class="n">onehot_labels</span><span class="p">,</span>
<span class="n">logits</span><span class="p">,</span>
<span class="n">weights</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span>
<span class="n">label_smoothing</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
<span class="n">scope</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">loss_collection</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">GraphKeys</span><span class="o">.</span><span class="n">LOSSES</span><span class="p">,</span>
<span class="n">reduction</span><span class="o">=</span><span class="n">Reduction</span><span class="o">.</span><span class="n">SUM_BY_NONZERO_WEIGHTS</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Creates a cross-entropy loss using tf.nn.softmax_cross_entropy_with_logits.</p>
<p>weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.</p>
<p>If label_smoothing is nonzero, smooth the labels towards 1/num_classes: new_onehot_labels = onehot_labels * (1 - label_smoothing) + label_smoothing / num_classes</p>Loss functions are the key to optimizing any machine learning algorithm in Tensorflow. It is important to select the right loss function for any machine learning problem which is then fed into the different optimizer functions.Tensorflow Glossary Part 2 - Optimizer Functions2018-04-24T00:00:00+00:002018-04-24T00:00:00+00:00http://ashukumar27.io/tf-glossary-optimizers<p>A short list and details of the most commonly used optimizer functions used in TensorFlow. Not all functions are listed here.</p>
<p>The detailed list can be found on the official TensorFlow api guide <a href="https://www.tensorflow.org/api_guides/python/train">here</a></p>
<h3 id="tftrainoptimizer">tf.train.Optimizer</h3>
<p>Base class for optimizers. This class defines the API to add Ops to train a model. You never use this class directly, but instead instantiate one of its subclasses such as GradientDescentOptimizer, AdagradOptimizer, or MomentumOptimizer.</p>
<h3 id="tftraingradientdescentoptimizer">tf.train.GradientDescentOptimizer</h3>
<p>Construct a new gradient descent optimizer. This has several methods associated to it like minimize()</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>minimize(
loss,
global_step=None,
var_list=None,
gate_gradients=GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=False,
name=None,
grad_loss=None
)
</code></pre></div></div>
<p>Other methods are apply_gradients, compute_gradients</p>
<h3 id="tftrainadagradoptimizer">tf.train.AdagradOptimizer</h3>
<p>Optimizer that implements the Adagrad algorithm.</p>
<h3 id="tftrainmomentumoptimizer">tf.train.MomentumOptimizer</h3>
<p>Optimizer that implements the Momentum algorithm.</p>
<h3 id="tftrainadamoptimizer">tf.train.AdamOptimizer</h3>
<p>Optimizer that implements the Adam algorithm.</p>
<h3 id="tftrainrmspropoptimizer">tf.train.RMSPropOptimizer</h3>
<p>Optimizer that implements the RMSProp algorithm.</p>
<h3 id="tfclip_by_value">tf.clip_by_value</h3>
<p>Clips tensor values to a specified min and max.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.clip_by_value(
t,
clip_value_min,
clip_value_max,
name=None
)
</code></pre></div></div>
<h3 id="tftrainexponential_decay">tf.train.exponential_decay</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.train.exponential_decay(
learning_rate,
global_step,
decay_steps,
decay_rate,
staircase=False,
name=None
)
</code></pre></div></div>
<p>Applies exponential decay to the learning rate.</p>
<p>When training a model, it is often recommended to lower the learning rate as the training progresses. This function applies an exponential decay function to a provided initial learning rate. It requires a global_step value to compute the decayed learning rate. You can just pass a TensorFlow variable that you increment at each training step.</p>
<p>The function returns the decayed learning rate
decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
100000, 0.96, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
tf.train.GradientDescentOptimizer(learning_rate)
.minimize(...my loss..., global_step=global_step)
)
</code></pre></div></div>
<h3 id="tftrainexponentialmovingaverage">tf.train.ExponentialMovingAverage</h3>
<p>Maintains moving averages of variables by employing an exponential decay.</p>
<p>When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.</p>
<p>The moving averages are computed using exponential decay. You specify the decay value when creating the ExponentialMovingAverage object. The shadow variables are initialized with the same initial values as the trained variables. When you run the ops to maintain the moving averages, each shadow variable is updated with the formula:</p>
<p>shadow_variable -= (1 - decay) * (shadow_variable - variable)</p>
<p>This is mathematically equivalent to the classic formula below, but the use of an assign_sub op (the “-=” in the formula) allows concurrent lockless updates to the variables:</p>
<p>shadow_variable = decay * shadow_variable + (1 - decay) * variable</p>
<p>Reasonable values for decay are close to 1.0, typically in the multiple-nines range: 0.999, 0.9999, etc.</p>A short list and details of the most commonly used optimizer functions used in TensorFlow. Not all functions are listed here.Tensorflow Glossary Part 1 - Mathematical Functions2018-04-23T00:00:00+00:002018-04-23T00:00:00+00:00http://ashukumar27.io/tensorflow_maths_part1<p>This is a summary and brief write up of common (and not so common mathematical functions in Tensorflow) - just a few lines specifying the syntax and how they should be written in TensorFlow.</p>
<p>Its obviously available at the documentation link given below, but writing this in this post forces me to go through each one of them one by one, which I am too lazy to do in a documentation. Also, this lis excludes a lot of not-so-common functions</p>
<p><a href="https://www.tensorflow.org/api_guides/python/math_ops">Tensorflow Documentation for Math</a></p>
<h3 id="basic-mathematics">Basic Mathematics</h3>
<ul>
<li>tf.add</li>
<li>tf.subtract</li>
<li>tf.multiply</li>
<li>tf.scalar_mul
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">scalar_mul</span><span class="p">(</span>
<span class="n">scalar</span><span class="p">,</span>
<span class="n">x</span>
<span class="p">)</span>
</code></pre></div> </div>
<p>Multiplies a scalar times a Tensor or IndexedSlices object.</p>
</li>
</ul>
<p>Intended for use in gradient code which might deal with IndexedSlices objects, which are easy to multiply by a scalar but more expensive to multiply with arbitrary tensors.</p>
<ul>
<li>tf.div</li>
<li>tf.divide</li>
<li>tf.floor - Returns element-wise remainder of division. When x < 0 xor y < 0 is true, this follows Python semantics in that the result here is consistent with a flooring divide. E.g. floor(x / y) * y + mod(x, y) = x.</li>
<li>tf.mod
tf.cross - Compute the pairwise cross product. a and b must be the same shape; they can either be simple 3-element vectors, or any shape where the innermost dimension is 3. In the latter case, each pair of corresponding 3-element vectors is cross-multiplied independently.</li>
</ul>
<h3 id="basic-mathematical-operations">Basic Mathematical Operations</h3>
<ul>
<li>tf.add_n</li>
<li>tf.abs</li>
<li>tf.negative</li>
<li>tf.sign</li>
<li>tf.reciprocal</li>
<li>tf.square</li>
<li>tf.round</li>
<li>tf.sqrt</li>
<li>tf.rsqrt - Computes reciprocal of square root of x element-wise. (1/sqrt(x))</li>
<li>tf.pow</li>
<li>tf.exp</li>
<li>tf.expm1</li>
<li>tf.log</li>
<li>tf.log1p</li>
<li>tf.ceil</li>
<li>tf.floor</li>
<li>tf.maximum - Returns the max of x and y (i.e. x > y ? x : y) element-wise.</li>
<li>tf.minimum</li>
<li>tf.squared_difference - Returns (x - y)(x - y) element-wise.</li>
</ul>
<h3 id="matrix-operations">Matrix Operations</h3>
<ul>
<li>tf.diag - Returns a diagonal tensor with a given diagonal values.</li>
<li>tf.diag_part - Returns the diagonal part of the tensor.</li>
<li>tf.trace - Compute the trace of a tensor x.
trace(x) returns the sum along the main diagonal of each inner-most matrix in x. If x is of rank k with shape [I, J, K, …, L, M, N], then output is a tensor of rank k-2 with dimensions [I, J, K, …, L] where
output[i, j, k, …, l] = trace(x[i, j, i, …, l, :, :])</li>
<li>tf.transpose</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span>
<span class="n">a</span><span class="p">,</span>
<span class="n">perm</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="s">'transpose'</span><span class="p">,</span>
<span class="n">conjugate</span><span class="o">=</span><span class="bp">False</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Transposes a. Permutes the dimensions according to perm.</p>
<ul>
<li>tf.eye</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">eye</span><span class="p">(</span>
<span class="n">num_rows</span><span class="p">,</span>
<span class="n">num_columns</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">batch_shape</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="bp">None</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Construct an identity matrix, or a batch of matrices.</p>
<ul>
<li>tf.matrix_transpose
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">matrix_transpose</span><span class="p">(</span>
<span class="n">a</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="s">'matrix_transpose'</span><span class="p">,</span>
<span class="n">conjugate</span><span class="o">=</span><span class="bp">False</span>
<span class="p">)</span>
</code></pre></div> </div>
<p>Transposes last two dimensions of tensor a.</p>
</li>
<li>tf.matmul
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span>
<span class="n">a</span><span class="p">,</span>
<span class="n">b</span><span class="p">,</span>
<span class="n">transpose_a</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">transpose_b</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">adjoint_a</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">adjoint_b</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">a_is_sparse</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">b_is_sparse</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="bp">None</span>
<span class="p">)</span>
</code></pre></div> </div>
<p>Multiplies matrix a by matrix b, producing a * b.</p>
</li>
<li>tf.norm
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span>
<span class="n">tensor</span><span class="p">,</span>
<span class="nb">ord</span><span class="o">=</span><span class="s">'euclidean'</span><span class="p">,</span>
<span class="n">axis</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">keepdims</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">keep_dims</span><span class="o">=</span><span class="bp">None</span>
<span class="p">)</span>
</code></pre></div> </div>
<p>Computes the norm of vectors, matrices, and tensors. (deprecated arguments)</p>
</li>
<li>tf.matrix_determinant
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">matrix_determinant</span><span class="p">(</span>
<span class="nb">input</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="bp">None</span>
<span class="p">)</span>
</code></pre></div> </div>
<p>Computes the determinant of one or more square matrices.</p>
</li>
<li>tf.matrix_inverse
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">matrix_inverse</span><span class="p">(</span>
<span class="nb">input</span><span class="p">,</span>
<span class="n">adjoint</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="bp">None</span>
<span class="p">)</span>
</code></pre></div> </div>
</li>
</ul>
<p>Computes the inverse of one or more square invertible matrices or their adjoints (conjugate transposes).</p>
<h3 id="reduction">Reduction</h3>
<ul>
<li>tf.reduce_sum<br />
Computes the sum of elements across dimensions of a tensor. (deprecated arguments)
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">constant</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">]])</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="c"># 6</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="c"># [2, 2, 2]</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c"># [3, 3]</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="c"># [[3], [3]]</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span> <span class="c"># 6</span>
</code></pre></div> </div>
</li>
<li>tf.reduce_prod<br />
Computes the product of elements across dimensions of a tensor. (deprecated arguments)</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">reduce_prod</span><span class="p">(</span>
<span class="n">input_tensor</span><span class="p">,</span>
<span class="n">axis</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">keepdims</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">reduction_indices</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">keep_dims</span><span class="o">=</span><span class="bp">None</span>
<span class="p">)</span>
</code></pre></div></div>
<ul>
<li>tf.reduce_min - Computes the minimum of elements across dimensions of a tensor. (deprecated arguments)</li>
<li>tf.reduce_max - Computes the maximum of elements across dimensions of a tensor. (deprecated arguments)</li>
<li>tf.reduce_mean<br />
Computes the mean of elements across dimensions of a tensor. (deprecated arguments)</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">constant</span><span class="p">([[</span><span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">],</span> <span class="p">[</span><span class="mf">2.</span><span class="p">,</span> <span class="mf">2.</span><span class="p">]])</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="c"># 1.5</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="c"># [1.5, 1.5]</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c"># [1., 2.]</span>
</code></pre></div></div>
<ul>
<li>tf.reduce_all<br />
Computes the “logical and” of elements across dimensions of a tensor. (deprecated arguments)</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">constant</span><span class="p">([[</span><span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">],</span> <span class="p">[</span><span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">]])</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_all</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="c"># False</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_all</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="c"># [False, False]</span>
<span class="n">tf</span><span class="o">.</span><span class="n">reduce_all</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c"># [True, False]</span>
</code></pre></div></div>
<ul>
<li>
<p>tf.reduce_any<br />
Computes the “logical or” of elements across dimensions of a tensor. (deprecated arguments)</p>
</li>
<li>
<p>tf.count_nonzero<br />
Computes number of nonzero elements across dimensions of a tensor. (deprecated arguments)</p>
</li>
</ul>
<p>Returns the element-wise sum of a list of tensors.</p>
<ul>
<li>tf.accumulate_n</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">constant</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]])</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">constant</span><span class="p">([[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">6</span><span class="p">]])</span>
<span class="n">tf</span><span class="o">.</span><span class="n">accumulate_n</span><span class="p">([</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">a</span><span class="p">])</span>
<span class="p">[[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">14</span><span class="p">]]</span>
<span class="n">Explicitly</span> <span class="k">pass</span> <span class="n">shape</span> <span class="ow">and</span> <span class="nb">type</span>
<span class="n">tf</span><span class="o">.</span><span class="n">accumulate_n</span><span class="p">([</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">a</span><span class="p">],</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="n">tensor_dtype</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">int32</span><span class="p">)</span>
<span class="p">[[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span>
<span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">14</span><span class="p">]]</span>
</code></pre></div></div>
<h3 id="scan">Scan</h3>
<ul>
<li>tf.cumsum
Compute the cumulative sum of the tensor x along axis.
```python
tf.cumsum([a, b, c]) # [a, a + b, a + b + c]
tf.cumsum([a, b, c], reverse=True) # [a + b + c, b + c, c]</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>* tf.cumprod
Compute the cumulative product of the tensor x along axis.
```python
tf.cumprod([a, b, c])
[a, a * b, a * b * c]
tf.cumprod([a, b, c], reverse=True)
[a * b * c, b * c, c]
</code></pre></div></div>
<h3 id="sequence-comparison-and-indexing">Sequence Comparison and Indexing</h3>
<ul>
<li>tf.argmin
Returns the index with the smallest value across axes of a tensor. (deprecated arguments)</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">argmin</span><span class="p">(</span>
<span class="nb">input</span><span class="p">,</span>
<span class="n">axis</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">dimension</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">output_type</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">int64</span>
<span class="p">)</span>
</code></pre></div></div>
<ul>
<li>tf.argmax
Returns the index with the largest value across axes of a tensor. (deprecated arguments)
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span>
<span class="nb">input</span><span class="p">,</span>
<span class="n">axis</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">dimension</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">output_type</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">int64</span>
<span class="p">)</span>
</code></pre></div> </div>
</li>
<li>tf.where
Return the elements, either from x or y, depending on the condition.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="o">.</span><span class="n">where</span><span class="p">(</span>
<span class="n">condition</span><span class="p">,</span>
<span class="n">x</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">y</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="bp">None</span>
<span class="p">)</span>
</code></pre></div></div>
<ul>
<li>tf.unique</li>
</ul>
<p>Finds unique elements in a 1-D tensor.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#tensor 'x' is [1, 1, 2, 4, 4, 4, 7, 8, 8]</span>
<span class="n">y</span><span class="p">,</span> <span class="n">idx</span> <span class="o">=</span> <span class="n">unique</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">y</span> <span class="o">==></span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">]</span>
<span class="n">idx</span> <span class="o">==></span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
</code></pre></div></div>This is a summary and brief write up of common (and not so common mathematical functions in Tensorflow) - just a few lines specifying the syntax and how they should be written in TensorFlow.Building a Classifier in Tensorflow2018-04-23T00:00:00+00:002018-04-23T00:00:00+00:00http://ashukumar27.io/tensorflow_classifier<p>Classification algorithm using TensorFlow - Application of a 4 layered Neural Network Architecture to solve the Sonar Mines & Rocks dataset classification. The program is at <a href="https://github.com/ashukumar27/Tensorflow/blob/master/01_SonarMinesRocks_classification_TF_R1D1.py">this location on Github</a></p>
<p>Chunks are code are provided below with the explanation and working for each block, what the block is actually doing, the inputs and the outputs</p>
<p>Steps in building the model using Tensorflow:</p>
<ol>
<li>Import libraries, write a function to read data and for One Hot Encoding of y</li>
<li>Read data, split into train and test</li>
<li>Define the parameters of the Neural Network</li>
<li>Define the input and weights variables, placeholders</li>
<li>Define the Neural Netword Architecture</li>
<li>Define the Weights and Bias variables for each of the hidden layers</li>
<li>Initialize global variables, define a save space for saving the model</li>
<li>Call the Neural Network architecture</li>
<li>Define the cost function and optimizer</li>
<li>Run the session, Calculate the cost and history of each epoch</li>
</ol>
<h3 id="step-1--import-libraries-write-a-function-to-read-data-and-for-one-hot-encoding-of-y">Step 1- Import libraries, write a function to read data and for One Hot Encoding of y</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="kn">import</span> <span class="nn">tensorflow</span> <span class="k">as</span> <span class="n">tf</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">LabelEncoder</span>
<span class="kn">from</span> <span class="nn">sklearn.utils</span> <span class="kn">import</span> <span class="n">shuffle</span>
<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">train_test_split</span>
<span class="k">def</span> <span class="nf">read_data</span><span class="p">():</span>
<span class="n">df</span> <span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">datapath</span><span class="p">)</span>
<span class="c">#define features and labels</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">60</span><span class="p">]]</span><span class="o">.</span><span class="n">values</span>
<span class="n">y</span><span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="p">[</span><span class="mi">60</span><span class="p">]]</span>
<span class="c">#Encode the dependent variable</span>
<span class="n">encoder</span> <span class="o">=</span> <span class="n">LabelEncoder</span><span class="p">()</span>
<span class="n">y_enc</span> <span class="o">=</span> <span class="n">encoder</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="n">Y</span> <span class="o">=</span> <span class="n">one_hot_encode</span><span class="p">(</span><span class="n">y_enc</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">return</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">Y</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">one_hot_encode</span><span class="p">(</span><span class="n">labels</span><span class="p">):</span>
<span class="n">n_labels</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span>
<span class="n">n_unique_labels</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">labels</span><span class="p">))</span>
<span class="n">one_hot_encode</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n_labels</span><span class="p">,</span><span class="n">n_unique_labels</span><span class="p">))</span>
<span class="n">one_hot_encode</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">n_labels</span><span class="p">),</span><span class="n">labels</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span>
<span class="k">return</span> <span class="n">one_hot_encode</span>
</code></pre></div></div>
<h3 id="step-2---read-data-split-into-train-and-test">Step 2 - Read data, split into train and test</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">X</span><span class="p">,</span><span class="n">Y</span> <span class="o">=</span> <span class="n">read_data</span><span class="p">()</span>
<span class="n">X</span><span class="p">,</span><span class="n">Y</span> <span class="o">=</span> <span class="n">shuffle</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">Y</span><span class="p">,</span> <span class="n">random_state</span> <span class="o">=</span> <span class="mi">7</span><span class="p">)</span>
<span class="n">X_train</span><span class="p">,</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">y_test</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">Y</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.2</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Train X: "</span><span class="p">,</span><span class="n">X_train</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Test X: "</span><span class="p">,</span><span class="n">X_test</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Train Y: "</span><span class="p">,</span><span class="n">y_train</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Test Y: "</span><span class="p">,</span><span class="n">y_test</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>
<h3 id="step-3---define-the-parameters-of-the-neural-network">Step 3 - Define the parameters of the Neural Network</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.3</span>
<span class="n">training_epochs</span> <span class="o">=</span> <span class="mi">1000</span>
<span class="n">cost_history</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
<span class="n">n_dim</span><span class="o">=</span><span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">"n_dim"</span><span class="p">,</span><span class="n">n_dim</span><span class="p">)</span>
<span class="n">n_class</span><span class="o">=</span><span class="mi">2</span>
<span class="n">model_path</span> <span class="o">=</span> <span class="s">"D:/DeepLearning/Tensorflow"</span>
<span class="n">n_hidden_1</span> <span class="o">=</span> <span class="mi">60</span>
<span class="n">n_hidden_2</span> <span class="o">=</span> <span class="mi">60</span>
<span class="n">n_hidden_3</span> <span class="o">=</span> <span class="mi">60</span>
<span class="n">n_hidden_4</span> <span class="o">=</span> <span class="mi">60</span>
</code></pre></div></div>
<h3 id="step-4---define-the-input-and-weights-variables-placeholders">Step 4 - Define the input and weights variables, placeholders</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">placeholder</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">,[</span><span class="bp">None</span><span class="p">,</span><span class="n">n_dim</span><span class="p">])</span>
<span class="n">W</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">zeros</span><span class="p">([</span><span class="n">n_dim</span><span class="p">,</span><span class="n">n_class</span><span class="p">]))</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">zeros</span><span class="p">([</span><span class="n">n_class</span><span class="p">]))</span>
<span class="n">y_</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">placeholder</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">,[</span><span class="bp">None</span><span class="p">,</span><span class="n">n_class</span><span class="p">])</span>
</code></pre></div></div>
<h3 id="step-5---define-the-neural-netword-architecture">Step 5 - Define the Neural Netword Architecture</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">multilayer_perceptron</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">weights</span><span class="p">,</span><span class="n">biases</span><span class="p">):</span>
<span class="c">#Hidden layer with Sigmoid activation</span>
<span class="n">layer_1</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">weights</span><span class="p">[</span><span class="s">'h1'</span><span class="p">]),</span><span class="n">biases</span><span class="p">[</span><span class="s">'b1'</span><span class="p">])</span>
<span class="n">layer_1</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">layer_1</span><span class="p">)</span>
<span class="c">#Hidden layer with sigmoid activation</span>
<span class="n">layer_2</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">layer_1</span><span class="p">,</span> <span class="n">weights</span><span class="p">[</span><span class="s">'h2'</span><span class="p">]),</span><span class="n">biases</span><span class="p">[</span><span class="s">'b2'</span><span class="p">])</span>
<span class="n">layer_2</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">layer_2</span><span class="p">)</span>
<span class="c">#Hidden layer with sigmoid activation</span>
<span class="n">layer_3</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">layer_2</span><span class="p">,</span> <span class="n">weights</span><span class="p">[</span><span class="s">'h3'</span><span class="p">]),</span><span class="n">biases</span><span class="p">[</span><span class="s">'b3'</span><span class="p">])</span>
<span class="n">layer_3</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">layer_3</span><span class="p">)</span>
<span class="c">#Hidden layer with sigmoid activation</span>
<span class="n">layer_4</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">layer_3</span><span class="p">,</span> <span class="n">weights</span><span class="p">[</span><span class="s">'h2'</span><span class="p">]),</span><span class="n">biases</span><span class="p">[</span><span class="s">'b4'</span><span class="p">])</span>
<span class="n">layer_4</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">layer_4</span><span class="p">)</span>
<span class="c">#Output layer</span>
<span class="n">out_layer</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">layer_4</span><span class="p">,</span> <span class="n">weights</span><span class="p">[</span><span class="s">'out'</span><span class="p">])</span><span class="o">+</span><span class="n">biases</span><span class="p">[</span><span class="s">'out'</span><span class="p">]</span>
<span class="k">return</span> <span class="n">out_layer</span>
</code></pre></div></div>
<h3 id="step-6---define-the-weights-and-bias-variables-for-each-of-the-hidden-layers">Step 6 - Define the Weights and Bias variables for each of the hidden layers</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">weights</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">'h1'</span><span class="p">:</span><span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_dim</span><span class="p">,</span><span class="n">n_hidden_1</span><span class="p">])),</span>
<span class="s">'h2'</span><span class="p">:</span><span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_hidden_1</span><span class="p">,</span><span class="n">n_hidden_2</span><span class="p">])),</span>
<span class="s">'h3'</span><span class="p">:</span><span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_hidden_2</span><span class="p">,</span><span class="n">n_hidden_3</span><span class="p">])),</span>
<span class="s">'h4'</span><span class="p">:</span><span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_hidden_3</span><span class="p">,</span><span class="n">n_hidden_4</span><span class="p">])),</span>
<span class="s">'out'</span><span class="p">:</span><span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_hidden_4</span><span class="p">,</span><span class="n">n_class</span><span class="p">])),</span>
<span class="p">}</span>
<span class="n">biases</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">'b1'</span><span class="p">:</span> <span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_hidden_1</span><span class="p">])),</span>
<span class="s">'b2'</span><span class="p">:</span> <span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_hidden_2</span><span class="p">])),</span>
<span class="s">'b3'</span><span class="p">:</span> <span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_hidden_3</span><span class="p">])),</span>
<span class="s">'b4'</span><span class="p">:</span> <span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_hidden_4</span><span class="p">])),</span>
<span class="s">'out'</span><span class="p">:</span> <span class="n">tf</span><span class="o">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">truncated_normal</span><span class="p">([</span><span class="n">n_class</span><span class="p">]))</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="step-7---initialize-global-variables-define-a-save-space-for-saving-the-model">Step 7 - Initialize global variables, define a save space for saving the model</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">init</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">global_variables_initializer</span><span class="p">()</span>
<span class="n">saver</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">train</span><span class="o">.</span><span class="n">Saver</span><span class="p">()</span>
</code></pre></div></div>
<h3 id="step-8---call-the-neural-network-architecture">Step 8 - Call the Neural Network architecture</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">y</span> <span class="o">=</span> <span class="n">multilayer_perceptron</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">weights</span><span class="p">,</span> <span class="n">biases</span><span class="p">)</span>
</code></pre></div></div>
<h3 id="step-9---define-the-cost-function-and-optimizer">Step 9 - Define the cost function and optimizer</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cost_function</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">softmax_cross_entropy_with_logits</span><span class="p">(</span><span class="n">logits</span><span class="o">=</span><span class="n">y</span><span class="p">,</span> <span class="n">labels</span><span class="o">=</span><span class="n">y_</span><span class="p">))</span>
<span class="n">training_step</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">train</span><span class="o">.</span><span class="n">GradientDescentOptimizer</span><span class="p">(</span><span class="n">learning_rate</span><span class="p">)</span><span class="o">.</span><span class="n">minimize</span><span class="p">(</span><span class="n">cost_function</span><span class="p">)</span>
</code></pre></div></div>
<h3 id="step-10---run-the-session-calculate-the-cost-and-history-of-each-epoch">Step 10 - Run the session, Calculate the cost and history of each epoch</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">mse_history</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">accuracy_history</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">Session</span><span class="p">()</span> <span class="k">as</span> <span class="n">sess</span><span class="p">:</span>
<span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">init</span><span class="p">)</span>
<span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">training_epochs</span><span class="p">):</span>
<span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">training_step</span><span class="p">,</span> <span class="n">feed_dict</span> <span class="o">=</span> <span class="p">{</span><span class="n">x</span><span class="p">:</span> <span class="n">X_train</span><span class="p">,</span> <span class="n">y_</span><span class="p">:</span><span class="n">y_train</span><span class="p">})</span>
<span class="n">cost</span> <span class="o">=</span> <span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cost_function</span><span class="p">,</span> <span class="n">feed_dict</span><span class="o">=</span><span class="p">{</span><span class="n">x</span><span class="p">:</span> <span class="n">X_train</span><span class="p">,</span> <span class="n">y_</span><span class="p">:</span><span class="n">y_train</span><span class="p">})</span>
<span class="n">cost_history</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">cost_history</span><span class="p">,</span><span class="n">cost</span><span class="p">)</span>
<span class="n">correct_prediction</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">equal</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span> <span class="n">tf</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">y_</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="n">accuracy</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">correct_prediction</span><span class="p">,</span> <span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">))</span>
<span class="n">pred_y</span> <span class="o">=</span> <span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">feed_dict</span><span class="o">=</span><span class="p">{</span><span class="n">x</span><span class="p">:</span> <span class="n">X_test</span><span class="p">})</span>
<span class="n">mse</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">pred_y</span> <span class="o">-</span> <span class="n">y_test</span><span class="p">))</span>
<span class="n">mse_</span> <span class="o">=</span> <span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">mse</span><span class="p">)</span>
<span class="n">mse_history</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">mse_</span><span class="p">)</span>
<span class="n">accuracy</span> <span class="o">=</span> <span class="p">(</span><span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">accuracy</span><span class="p">,</span> <span class="n">feed_dict</span> <span class="o">=</span> <span class="p">{</span><span class="n">x</span><span class="p">:</span> <span class="n">X_train</span><span class="p">,</span> <span class="n">y_</span><span class="p">:</span><span class="n">y_train</span><span class="p">}</span> <span class="p">))</span>
<span class="n">accuracy_history</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">accuracy</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'epoch: '</span><span class="p">,</span><span class="n">epoch</span><span class="p">,</span><span class="s">'-'</span><span class="p">,</span><span class="s">'cost: '</span><span class="p">,</span><span class="n">cost</span><span class="p">,</span><span class="s">" - MSE: "</span><span class="p">,</span> <span class="n">mse_</span><span class="p">,</span><span class="s">" - Train Accuracy: "</span><span class="p">,</span><span class="n">accuracy</span><span class="p">)</span>
<span class="n">save_path</span> <span class="o">=</span> <span class="n">saver</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sess</span><span class="p">,</span><span class="n">model_path</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Model saved in file </span><span class="si">%</span><span class="s">s"</span> <span class="o">%</span><span class="n">save_path</span><span class="p">)</span>
<span class="c">#Plot MSE and Accuracy Graph</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">mse_history</span><span class="p">,</span><span class="s">'r'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">accuracy_history</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="c">#Print the final accuracy</span>
<span class="n">correct_prediction</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">equal</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span> <span class="n">tf</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">y_</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="n">accuracy</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">correct_prediction</span><span class="p">,</span> <span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Test Accuracy:"</span><span class="p">,</span> <span class="p">(</span><span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">accuracy</span><span class="p">,</span> <span class="n">feed_dict</span><span class="o">=</span><span class="p">{</span><span class="n">x</span><span class="p">:</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">y_</span><span class="p">:</span><span class="n">y_test</span><span class="p">})))</span>
<span class="c">## Print the final mean square error</span>
<span class="n">pred_y</span> <span class="o">=</span> <span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">feed_dict</span> <span class="o">=</span> <span class="p">{</span><span class="n">x</span><span class="p">:</span> <span class="n">X_test</span><span class="p">})</span>
<span class="n">mse</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">pred_y</span> <span class="o">-</span> <span class="n">y_test</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"MSE: $.4f"</span> <span class="o">%</span><span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">mse</span><span class="p">))</span>
</code></pre></div></div>Classification algorithm using TensorFlow - Application of a 4 layered Neural Network Architecture to solve the Sonar Mines & Rocks dataset classification. The program is at this location on GithubUnderstanding Logistic Regression Output from SAS2018-04-06T00:00:00+00:002018-04-06T00:00:00+00:00http://ashukumar27.io/logistic-output-sas<p>This post details the terms obtained in SAS output for logistic regression. The definitions are generic and referenced from other great posts on this topic. The aim is to provide a summary of definitions and statistical explaination of the output obtained from Logistic Regression Code in SAS.</p>
<p>This covers the binary classification not the multi-class classification.</p>
<h3 id="logistic-regression-code">Logistic regression code:</h3>
<pre><code class="language-sas">ods graphics on;
proc logistic data = lib_cbl.data_pd_&circle. namelen=100 plots=ALL
descending outest=estimates_pd_&circle. outmodel=model_pd_&circle.;
class = &classvars.
model dv = &variables.
/ selection=stepwise ctable scale=none clparm=wald clodds=pl rsquare lackfit ;
output out= pred_pd_&circle. p=PROB ;
run;
</code></pre>
<h3 id="decile-preparation-code">Decile Preparation Code:</h3>
<pre><code class="language-sas">/* Calculate Deciles for Lift Chart and KS */
data decile_pd;
set lib_cbl.pred_pd_&circle.;
predicted_trn=PROB+(ranuni(12345)/10000000);
run;
proc rank data=decile_pd out=rank_pd groups=10;
var predicted_trn;
ranks predicted_rank_trn;
run;
proc sql;
create table lib_cbl.final_decile_pd_&circle. as select predicted_rank_trn,
count(*)as count,
min(predicted_trn)*1000 as min,
max(predicted_trn)*1000 as max,
mean(dv) as actual,mean(predicted_trn) as pred,
sum(dv) as bad
from rank_pd group by predicted_rank_trn
order by actual desc;
quit;
</code></pre>
<h3 id="output-in-html--pdf-format">Output in HTML & PDF Format</h3>
<p>HTML Format: <a href="https://github.com/ashukumar27/MachineLearning/blob/master/LogisticRegressionSAS/SASLogisticRegression.html">Click Here</a><br />
PDF Format: <a href="https://github.com/ashukumar27/MachineLearning/blob/master/LogisticRegressionSAS/SASLogisticRegression.pdf">Click Here</a></p>
<h2 id="interpretation-of-terms-in-the-output">Interpretation of Terms in the Output</h2>
<p><img src="http://ashukumar27.io/assets/sas/logistic1.png" alt="Test Image" /></p>
<h3 id="1-model-information">1. Model Information</h3>
<p><strong>Data Set</strong> : Data set used by the model</p>
<p><strong>Response Variable</strong> : Name of the Dependent vatiable (the one with 0/1 target values)</p>
<p><strong>Number of Response Levels</strong> : 2 if binary classification, >2 if multinomial logisitc regression</p>
<p><strong>Model</strong> : binary logit (for two class binary classification)</p>
<p><strong>Optimization Technique (Fisher’s scoring)</strong>:</p>
<p>Scoring algorithm, also known as Fisher’s scoring, is a form of Newton’s method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher. This is similar to Gradient Descent algorithm (though Fisher’s scoring finds the peak as compared to the bottom in Gradient Descent, but the concept is similar). This is an optimization function which minimizes the cost function of logistic regression.</p>
<p>Fisher scoring is a hill-climbing algorithm for getting results - it maximizes the likelihood by getting successively closer and closer to the maximum by taking another step ( an iteration). It knows when it has reached the top of the hill in that taking another step does not increase the likelihood. It is known to be an efficient procedure - not many steps are usually needed - and generally converges to an answer. When this is the case you do not need to be concerned about it - accept what you have got. Changing metaphors, you can drive the care without knowing how the internal combustion engine works.</p>
<p>Occasionally however the likelihood surface you are trying to climb is not a sharp peak but there are multiple peaks and the there is no much depth in the landscape. There are a series of bumps each giving potentially different answers in terms of the estimates but with pretty similar likelihoods. When this is the case good software will warn you and you will see a large number of iterations and a failure to converge. When that is the case it is usually due to lack of information ( small sample size) or failure to meet assumptions of the model ( eg multicollinearity between predictors). When this happens the solution is not usually a different algorithm but improved data and/or model. Some software allows you to profile the likelihood to see a map of the surface in which you are trying to find the peak.</p>
<p>In lots of software for the logistic model the Fisher scoring method (which is equivalent to iteratively reweighted least squares) is the default ; an alternative is the Newton-Raphson algorithm . Both should give same parameter estimates; but the standard errors may differ slightly. (The Fisher scoring is based on the expected information matrix; Newton-Raphson method on the observed information matrix.) In fact in a binary logit model, the observed and expected information matrices are identical, resulting in identical estimated standard errors.</p>
<p>If you have rare events ( eg many people but few of them have died - lots of zeroes , not many ones) then you may wish to use other procedures to get less biased results; such as the Firth method which is a penalized likelihood approach to reducing small-sample bias in maximum likelihood estimation; see
https://www3.nd.edu/~rwilliam/stats3/RareEvents.pdf</p>
<p><strong>Number of Observations Read & Number of Observations Used</strong> : This will be the same if no missing values are present. Rows with missing values are removed</p>
<p><strong>Response Profile</strong> : Frequency split of the Dependent Variable (target Variable) - how many rows (observations) have value 1 and how many have a value 0</p>
<h2 id="probability-modeled-is-dv1-this-statement-confirms-the-target-variable-name-here-the-target-variable-is-named-dv-and-the-logistic-model-is-built-to-predict-the-probability-of-this-variable"><strong>Probability modeled is dv=1</strong> This statement confirms the target variable name (here the target variable is named ‘dv’) and the logistic model is built to predict the probability of this variable.</h2>
<p><strong>Stepwise Selection Procedure</strong></p>
<p>Stepwise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure.In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion. Usually, this takes the form of a sequence of F-tests or t-tests, but other techniques are possible, such as adjusted <script type="math/tex">R^2</script>, Akaike information criterion (AIC), Bayesian information criterion, Mallows’s Cp, PRESS, or false discovery rate.</p>
<p>SAS Code and Explanation:</p>
<p>The following invocation of PROC LOGISTIC illustrates the use of stepwise selection to identify the prognostic factors for cancer remission. A significance level of 0.3 is required to allow a variable into the model (SLENTRY= 0.3), and a significance level of 0.35 is required for a variable to stay in the model (SLSTAY= 0.35). A detailed account of the variable selection process is requested by specifying the DETAILS option. The Hosmer and Lemeshow goodness-of-fit test for the final selected model is requested by specifying the LACKFIT option. The OUTEST= and COVOUT options in the PROC LOGISTIC statement create a data set that contains parameter estimates and their covariances for the final selected model. The response variable option EVENT= chooses remiss=1 (remission) as the event so that the probability of remission is modeled. The OUTPUT statement creates a data set that contains the cumulative predicted probabilities and the corresponding confidence limits, and the individual and cross validated predicted probabilities for each observation. The ODS OUTPUT statement writes the “Association” table from each selection step to a SAS data set.</p>
<pre><code class="language-SAS">title 'Stepwise Regression on Cancer Remission Data';
proc logistic data=Remission outest=betas covout;
model remiss(event='1')=cell smear infil li blast temp
/ selection=stepwise
slentry=0.3
slstay=0.35
details
lackfit;
output out=pred p=phat lower=lcl upper=ucl
predprob=(individual crossvalidate);
ods output Association=Association;
run;
proc print data=betas;
title2 'Parameter Estimates and Covariance Matrix';
run;
proc print data=pred;
title2 'Predicted Probabilities and 95% Confidence Limits';
run;
</code></pre>
<p>In stepwise selection, an attempt is made to remove any insignificant variables from the model before adding a significant variable to the model. Each addition or deletion of a variable to or from a model is listed as a separate step in the displayed output, and at each step a new model is fitted.</p>
<p>Finally, when none of the remaining variables outside the model meet the entry criterion, and the stepwise selection is terminated.</p>
<p><a href="https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect052.htm">See stepwise details here</a></p>
<p>Backward Elimination:</p>
<p>Adifferent variable selection method is used to select prognostic factors for cancer remission, and an efficient algorithm is employed to eliminate insignificant variables from a model. The following statements invoke PROC LOGISTIC to perform the backward elimination analysis:</p>
<pre><code class="language-SAS"> title 'Backward Elimination on Cancer Remission Data';
proc logistic data=Remission;
model remiss(event='1')=temp cell li smear blast
/ selection=backward fast slstay=0.2 ctable;
run;
</code></pre>
<p>The backward elimination analysis (SELECTION=BACKWARD) starts with a model that contains all explanatory variables given in the MODEL statement. By specifying the FAST option, PROC LOGISTIC eliminates insignificant variables without refitting the model repeatedly. This analysis uses a significance level of 0.2 to retain variables in the model (SLSTAY=0.2), which is different from the previous stepwise analysis where SLSTAY=.35. The CTABLE option is specified to produce classifications of input observations based on the final selected model.</p>
<p>Results of the fast elimination analysis are shown in Output 51.1.9 and Output 51.1.10. Initially, a full model containing all six risk factors is fit to the data (Output 51.1.9). In the next step (Output 51.1.10), PROC LOGISTIC removes blast, smear, cell, and temp from the model all at once. This leaves li and the intercept as the only variables in the final model. Note that in this analysis, only parameter estimates for the final model are displayed because the DETAILS option has not been specified.</p>
<p>See the outputs on the link provided above</p>
<hr />
<p><strong>The Akaike information criterion (AIC)</strong> is an estimator of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.</p>
<p>AIC is founded on information theory: it offers an estimate of the relative information lost when a given model is used to represent the process that generated the data. (In doing so, it deals with the trade-off between the goodness of fit of the model and the simplicity of the model.)</p>
<p>AIC does not provide a test of a model in the sense of testing a null hypothesis. It tells nothing about the absolute quality of a model, only the quality relative to other models. Thus, if all the candidate models fit poorly, AIC will not give any warning of that.</p>
<p>Suppose that we have a statistical model of some data. Let k be the number of estimated parameters in the model. Let <script type="math/tex">{\displaystyle {\hat {L}}} \hat L</script> be the maximum value of the likelihood function for the model. Then the AIC value of the model is the following.</p>
<p><script type="math/tex">{\displaystyle \mathrm {AIC}</script> =<script type="math/tex">2k-2\ln({\hat {L}})} {\displaystyle \mathrm {AIC} =2k-2\ln({\hat {L}})}</script></p>
<p>Given a set of candidate models for the data, the preferred model is the one with the minimum AIC value. Thus, AIC rewards goodness of fit (as assessed by the likelihood function), but it also includes a penalty that is an increasing function of the number of estimated parameters. The penalty discourages overfitting, because increasing the number of parameters in the model almost always improves the goodness of the fit.</p>
<p><strong>BIC( Bayesian Information Criterion)</strong>. Your logistic regression model will give you -2 Log Likelihood. So it is very easy to calculate both AIC and BIC.</p>
<p>BIC = LN(number of observations) x number of variables in your model- 2 Log Likelihood
AIC = 2 x number of variables in your model - 2 Log Likelihood</p>
<p>AIC is a bit more liberal often favours a more complex, wrong model over a simpler, true model. On the contrary, BIC tries to find the true model among the set of candidates. BIC penalizes for every additional parameter/variable that is in the model. Most of the time, they will agree on the preferred model but when they don’t, I guess you would just have to exercise your judgement.</p>
<hr />
<h3 id="the-hosmerlemeshow-test"><strong>The Hosmer–Lemeshow test</strong></h3>
<p>The Hosmer–Lemeshow test is a statistical test for goodness of fit for logistic regression models. It is used frequently in risk prediction models. The test assesses whether or not the observed event rates match expected event rates in subgroups of the model population. The Hosmer–Lemeshow test specifically identifies subgroups as the deciles of fitted risk values. Models for which expected and observed event rates in subgroups are similar are called well calibrated.</p>
<p>The Hosmer-Lemeshow goodness of fit test is based on dividing the sample up according to their predicted probabilities, or risks. Specifically, based on the estimated parameter values β̂ 0,β̂ 1,..,β̂ p, for each observation in the sample the probability that Y=1 is calculated, based on each observation’s covariate values:</p>
<p><script type="math/tex">\hat \pi</script> = <script type="math/tex">\frac{exp(\hat \beta_0 + \hat \beta_1 X_1 +...+\hat \beta_p X_p)}{1+exp(\hat \beta_0 + \hat \beta_1 X_1 +...+\hat \beta_p X_p)}</script></p>
<p>The observations in the sample are then split into g groups (we come back to choice of g later) according to their predicted probabilities. Suppose (as is commonly done) that g=10. Then the first group consists of the observations with the lowest 10% predicted probabilities. The second group consists of the 10% of the sample whose predicted probabilities are next smallest, etc etc.</p>
<p>Suppose for the moment, artifically, that all of the observations in the first group had a predicted probability of 0.1. Then, if our model is correctly specified, we would expect the proportion of these observations who have Y=1 to be 10%. Of course, even if the model is correctly specified, the observed proportion will deviate to some extent from 10%, but not by too much. If the proportion of observations with Y=1 in the group were instead 90%, this is suggestive that our model is not accurately predicting probability (risk), i.e. an indication that our model is not fitting the data well.</p>
<p>In practice, as soon as some of our model covariates are continuous, each observation will have a different predicted probability, and so the predicted probabilities will vary in each of the groups we have formed. To calculate how many Y=1 observations we would expect, the Hosmer-Lemeshow test takes the average of the predicted probabilities in the group, and multiplies this by the number of observations in the group. The test also performs the same calculation for Y=0, and then calculates a Pearson goodness of fit statistic</p>
<script type="math/tex; mode=display">\sum_{k=0}^l \sum_{l=1}^g \frac{(o_{kl}-e_{kl})^2}{e_{kl}}</script>
<p>where o0l denotes the number of observed Y=0 observations in the lth group, o1l denotes the number of observed Y=1 observations in the lth group, and e0l and e1l similarly denote the expected number of zeros.</p>
<p>In a 1980 paper Hosmer-Lemeshow showed by simulation that (provided p+1<g) their test statistic approximately followed a chi-squared distribution on g−2 degrees of freedom, when the model is correctly specified. This means that given our fitted model, the p-value can be calculated as the right hand tail probability of the corresponding chi-squared distribution using the calculated test statistic. If the p-value is small, this is indicative of poor fit.</p>
<p>It should be emphasized that a large p-value does not mean the model fits well, since lack of evidence against a null hypothesis is not equivalent to evidence in favour of the alternative hypothesis. In particular, if our sample size is small, a high p-value from the test may simply be a consequence of the test having lower power to detect mis-specification, rather than being indicative of good fit.</p>
<p>Choosing the number of groups
As far as I have seen, there is little guidance as to how to choose the number of groups g. Hosmer and Lemeshow’s conclusions from simulations were based on using g>p+1, suggesting that if we have 10 covariates in the model, we should choose g>11, although this doesn’t appear to be mentioned in text books or software packages.</p>
<p>Intuitively, using a small value of g ought to give less opportunity to detect mis-specification. However, if we choose g to large, the numbers in each group may be so small that it will be difficult to determine whether differences between observed and expected are due to chance or indicative or model mis-specification.</p>
<p>A further problem, highlighted by many others (e.g. Paul Allison) is that, for a given dataset, if one changes g, sometimes one obtains a quite different p-value, such that with one choice of g we might conclude our model does not fit well, yet with another we conclude there is no evidence of poor fit. This is indeed a troubling aspect of the test.</p>
<p>References and Also-read -</p>
<p><a href="https://en.wikipedia.org/wiki/Hosmer%E2%80%93Lemeshow_test">Wikipedia</a> <br />
<a href="http://thestatsgeek.com/2014/02/16/the-hosmer-lemeshow-goodness-of-fit-test-for-logistic-regression/">StatsGeek</a></p>
<hr />
<h2 id="concordance--discordance">Concordance & Discordance</h2>
<p>Concordant pairs and discordant pairs refer to comparing two pairs of data points to see if they “match.” The meaning is slightly different depending on if you are finding these pairs from various coefficients (like Kendall’s Tau) or if you are performing experimental studies and clinical trials.</p>
<p>Let’s say you had two interviewers rate a group of twelve job applicants:</p>
<p><img src="http://ashukumar27.io/assets/sas/concor.png" alt="ConcordantDiscordant" /></p>
<p>Note that in the first column, interviewer 1’s choices have been ordered from smallest to greatest. That way, a comparison can be made between the choices for interviewer 1 and 2. With concordant or discordant pairs, you’re basically answering the question: did the judges/raters rank the pairs in the same order? You aren’t necessarily looking for the exact same rank, but rather if one job seeker was consistently ranked higher by both interviewers.</p>
<p>Three possible scenarios are possible for these ordered pairs:</p>
<ul>
<li>Tied pairs: both interviewers agree. For example, candidate A was marked as a 1st choice for both interviewers, so they are tied.</li>
<li>Concordant pairs: both interviewers rank both applicants in the same order — that is, they both move in the same direction. While they aren’t the same rank (i.e. both 1st or both 2nd), each pair is ordered equally higher or equally lower. Interviewer 1 ranked F as 6th and G as 7th, while interviewer 2 ranked F as 5th and G as 8th. F and G are concordant because F was consistently ranked higher than G.</li>
<li>Discordant pairs: Candidates E and F are discordant because the interviewers ranked in opposite directions (one said E had a higher rank than F, while the other said F ranked higher than 6).</li>
</ul>
<h4 id="steps-to-calculate-concordance--discordance-and-auc">Steps to calculate concordance / discordance and AUC</h4>
<p><strong>Calculate the predicted probability in logistic regression model.</strong></p>
<ul>
<li>
<p>Divide the data into two datasets. One dataset contains observations having actual value of dependent variable with value 1 (i.e. event) and corresponding predicted probability values. And the other dataset contains observations having actual value of dependent variable 0 (non-event) against their predicted probability scores.</p>
</li>
<li>
<p>Compare each predicted value in first dataset with each predicted value in second dataset.<br />
Total Number of pairs to compare = x * y<br />
x: Number of observations in first dataset (actual values of 1 in dependent variable)<br />
y: Number of observations in second dataset (actual values of 0 in dependent variable).</p>
</li>
<li>
<p>In this step, we are performing cartesian product (cross join) of events and non-events. For example, you have 100 events and 1000 non-events. It would create 100k (100*1000) pairs for comparison.</p>
</li>
<li>A pair is concordant if 1 (observation with the desired outcome i.e. event) has a higher predicted probability than 0 (observation without the outcome i.e. non-event).</li>
<li>A pair is discordant if 0 (observation without the desired outcome i.e. non-event) has a higher predicted probability than 1 (observation with the outcome i.e. event).</li>
<li>
<p>A pair is tied if 1 (observation with the desired outcome i.e. event) has same predicted probability than 0 (observation without the outcome i.e. non-event).</p>
</li>
<li>The final percent values are calculated using the formula below -</li>
</ul>
<p>Percent Concordant = (Number of concordant pairs)/Total number of pairs</p>
<p>Percent Discordance = (Number of discordant pairs)/Total number of pairs</p>
<p>Percent Tied = (Number of tied pairs)/Total number of pairs</p>
<p>Area under curve (c statistics) = Percent Concordant + 0.5 * Percent Tied</p>
<p><strong>Somers’ D</strong> (Somers’ Delta), sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two possibly dependent random variables X and Y. Somers’ D takes values between -1 when all pairs of the variables disagree and 1 when all pairs of the variables agree.</p>
<p>Delta is an ordinal alternative to Pearson’s Correlation Coefficient. Like Pearson’s R, the range for Somers’ D is -1 to 1:</p>
<p>-1 = all pairs disagree, <br />
1 = all pairs agree.</p>
<p>Large values for Somers’ D (tending towards -1 or 1) suggest the model has good predictive ability. Smaller values (tending towards zero in either direction) indicate the model is a poor predictor. Let’s say you had a Delta of .549 in the friendly sales staff/customer satisfaction scenario. Customer satisfaction is the dependent variable, so you can say that friendly sales staff improves customer satisfaction by 54.9%.</p>
<hr />
<h3 id="decile-table-lift-chart-and-ks">Decile Table, Lift Chart, and KS</h3>
<p><strong>Decile Table</strong>: The predicted probabilities are sorted in descending order, and then divided into groups of 10. This gives a uniform distribution of customers (equal numbers in each decile) with the first decile containing exactly 10% of population.</p>
<p>The idea is to see the distribution of probabilities of classification. If the model is random (no model), and say, if the good/positive is represented by 1 and bad/negative is represented by 0, the distribution of 1 and 0 will be same as it is in the population, i.e. first decile containing 10% of entire population would contain 10% of good. In the case of a very good model, the first decile would (10% of population) would contain, say, 40% of good customers; 2nd decile would contain 15% of good, 3rd would contain 8% of good, and so on. This way the classifier segregates the population and top 3 deciles (example) would contain 63% of good population. If you target this 30% population, you will be capturing over 63% of good population.</p>
<p>A decile table looks like:
<img src="http://ashukumar27.io/assets/sas/decile.png" alt="Test Image" /></p>
<p><strong>Lift Chart</strong>: This is a representation of cumulative good and cumulative bad decile wise, and the difference between the two lines show the robustness of the model.</p>
<p><strong>Kolmogorov–Smirnov test (KS)</strong> KS is the maximum difference between the cumulative good and cumulative bad population. This is a metric by which the performance of the model can be measured. For a good model, the KS should be atleast 35-40 in the 3rd or 4th decile.</p>
<p><img src="http://ashukumar27.io/assets/sas/ks.png" alt="Test Image" /></p>This post details the terms obtained in SAS output for logistic regression. The definitions are generic and referenced from other great posts on this topic. The aim is to provide a summary of definitions and statistical explaination of the output obtained from Logistic Regression Code in SAS.Anamoly Detection Algorithms2018-03-24T00:00:00+00:002018-03-24T00:00:00+00:00http://ashukumar27.io/anamoly-detection<p>Anamoly Detection is a class of semi-supervised (close to unsupervised) learning algorithm widely used in Manufacturing, data centres, fraud detection and as the name implies, anamoly detection. Normally this is used when we have a imbalanced classification problem, with, say, y=1(anamoly) is approx 20 and y=0 is 10,000. An example would be identifying faulty aircraft engines based on a wide number of parameters, where the anamolous data might not be available or if it is available, will be less than 0.1%.</p>
<h3 id="algorithm">Algorithm:</h3>
<p>Suppose there are m training examples <script type="math/tex">x^{(1)},x^{(2)},x^{(3)},....,x^{(m)},</script></p>
<p>Problem Statement : Is <script type="math/tex">x_{test}</script> anamolous?</p>
<p>Approach: <br />
Suppose <script type="math/tex">x_i</script> are the features of the training examples</p>
<p>Model p(x) from the data; p(x) = <script type="math/tex">p(x_1;\mu_1 , \sigma_1^2)*p(x_2;\mu_2, \sigma_2^2)*p(x_3;\mu_3 , \sigma_3^2)*...</script>,<br />
or <br />
p(x) = <script type="math/tex">\prod_{j=1}^n p(x_j;\mu_j, \sigma_j^2)</script></p>
<p>Identify unusual/anamolous examples by checking if p(x)<<script type="math/tex">\epsilon</script></p>
<h3 id="guassian-distribution">Guassian Distribution</h3>
<p>An assumption of the above model is that the features <script type="math/tex">x_j</script> are distributed as per the Guassian Distribution with mean $\mu_j$ and variance $\sigma_j^2$. If the features are distributed in a different way apply transformations to convert them to the normal distribution.</p>
<p><strong>Normal Distribution</strong></p>
<p><script type="math/tex">X</script> ~ <script type="math/tex">N(\mu, \sigma^2)</script></p>
<p><script type="math/tex">p(x;\mu, \sigma^2)</script> = <script type="math/tex">\frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}</script></p>
<p>Probability Density Function</p>
<p><img src="http://ashukumar27.io/assets/normaldist.png" alt="Probability Density function" /></p>
<p>Cumulative Distribution Function</p>
<p><img src="http://ashukumar27.io/assets/cumdist.png" alt="Cumulative Distribution function" /></p>
<p>Parameter Estimation:</p>
<script type="math/tex; mode=display">\mu=\frac1m\sum_{i=1}^m x^{(i)}</script>
<p><script type="math/tex">\sigma^2</script> = <script type="math/tex">\frac1m\sum_{i=1}^m (x_i-\mu)^2</script></p>
<h3 id="anamoly-detection-algorithm">Anamoly Detection Algorithm</h3>
<ol>
<li>Choose features <script type="math/tex">x_i</script> that you think might be indicative of anamolous examples</li>
<li>Fit parameters <script type="math/tex">\mu_1, \mu_2, ..\mu_n, \sigma_1^2, \sigma_2^2,... \sigma_n^2</script> using the formulae</li>
<li>Given new example <script type="math/tex">x</script>, compute <script type="math/tex">p(x)</script> as:</li>
</ol>
<p><script type="math/tex">p(x)</script>= <script type="math/tex">\prod_{j=1}^n p(x_j;\mu_j, \sigma_j^2)</script> = <script type="math/tex">\frac1{\sqrt{2\pi \sigma_j^2}}e^{-\frac{(x_j-\mu_j)^2}{2\sigma_j^2}}</script></p>
<p>Anamoly if p(x)<$\epsilon$</p>
<p><strong>Example - Dividing data into Train, CV and Test Set</strong></p>
<p><img src="http://ashukumar27.io/assets/anamoly1.png" alt="AD Example" /></p>
<p><strong>Anamoly Detection vs Supervised Learning</strong></p>
<p><img src="http://ashukumar27.io/assets/anamoly2.png" alt="Anamoly vs Supervised Learning" /></p>Anamoly Detection is a class of semi-supervised (close to unsupervised) learning algorithm widely used in Manufacturing, data centres, fraud detection and as the name implies, anamoly detection. Normally this is used when we have a imbalanced classification problem, with, say, y=1(anamoly) is approx 20 and y=0 is 10,000. An example would be identifying faulty aircraft engines based on a wide number of parameters, where the anamolous data might not be available or if it is available, will be less than 0.1%.ROC-AUC explained2018-03-22T00:00:00+00:002018-03-22T00:00:00+00:00http://ashukumar27.io/roc-auc<p>ROC or receiver operating characteristic curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. Essentially it illustrates the ability of the classifier to segregate the classses. A higher AUC (Area under the curve)-ROC denotes a better classifier</p>
<p>In a ROC curve the true positive rate (Sensitivity) is plotted in function of the false positive rate (100-Specificity) for different cut-off points of a parameter. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The area under the ROC curve (AUC) is a measure of how well a parameter can distinguish between two diagnostic groups (diseased/normal).</p>
<p><strong>Example</strong>: Consider a test which outputs the probability of having a disease (disease vs no-disease classifier)</p>
<p>The diagnostic performance of a test, or the accuray of a test to discriminate diseased cases from normal cases is evaluated using Receiver Operating Characteristic (ROC) curve analysis (Metz, 1978; Zweig & Campbell, 1993). ROC curves can also be used to compare the diagnostic performance of two or more laboratory or diagnostic tests (Griner et al., 1981).</p>
<p>When you consider the results of a particular test in two populations, one population with a disease, the other population without the disease, you will rarely observe a perfect separation between the two groups. Indeed, the distribution of the test results will overlap, as shown in the following figure.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc1.png" alt="Test Image" /></p>
<p>For every possible cut-off point or criterion value you select to discriminate between the two populations, there will be some cases with the disease correctly classified as positive (TP = True Positive fraction), but some cases with the disease will be classified negative (FN = False Negative fraction). On the other hand, some cases without the disease will be correctly classified as negative (TN = True Negative fraction), but some cases without the disease will be classified as positive (FP = False Positive fraction).</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc2.png" alt="Test Image" /></p>
<p><strong>Schematic outcomes of a test</strong></p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc3.png" alt="Test Image" /></p>
<h3 id="the-following-statistics-can-be-defined">The following statistics can be defined:</h3>
<ul>
<li><strong>Sensitivity</strong>: probability that a test result will be positive when the disease is present (true positive rate, expressed as a percentage) = a / (a+b)</li>
<li><strong>Specificity</strong>: probability that a test result will be negative when the disease is not present (true negative rate, expressed as a percentage) = d / (c+d)</li>
<li><strong>Positive likelihood ratio</strong>: ratio between the probability of a positive test result given the presence of the disease and the probability of a positive test result given the absence of the disease, i.e. = True positive rate / False positive rate = Sensitivity / (1-Specificity)</li>
<li><strong>Negative likelihood ratio</strong>: ratio between the probability of a negative test result given the presence of the disease and the probability of a negative test result given the absence of the disease, i.e. = False negative rate / True negative rate = (1-Sensitivity) / Specificity</li>
<li><strong>Positive predictive value</strong>: probability that the disease is present when the test is positive (expressed as a percentage) = a / (a+c)</li>
<li><strong>Negative predictive value</strong>: probability that the disease is not present when the test is negative (expressed as a percentage) = d / (b+d)</li>
</ul>
<p><strong>Precision & Recall</strong></p>
<p>Precision = <script type="math/tex">\frac {TP}{TP+FP}</script></p>
<p>Recall = <script type="math/tex">\frac {TP}{TP+FN}</script></p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc6.png" alt="Test Image" /></p>
<p><strong>Sensitivity and specificity versus criterion value</strong></p>
<p>When you select a higher criterion value, the false positive fraction will decrease with increased specificity but on the other hand the true positive fraction and sensitivity will decrease:</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc4.png" alt="Test Image" /></p>
<p>When you select a lower threshold value, then the true positive fraction and sensitivity will increase. On the other hand the false positive fraction will also increase, and therefore the true negative fraction and specificity will decrease.</p>
<p><strong>The ROC curve</strong><br />
In a Receiver Operating Characteristic (ROC) curve the true positive rate (Sensitivity) is plotted in function of the false positive rate (100-Specificity) for different cut-off points. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. A test with perfect discrimination (no overlap in the two distributions) has a ROC curve that passes through the upper left corner (100% sensitivity, 100% specificity). Therefore the closer the ROC curve is to the upper left corner, the higher the overall accuracy of the test</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc5.png" alt="Test Image" /></p>
<h2 id="interpreting-roc-curves">Interpreting ROC Curves</h2>
<p>Skip the blabber below, directly watch <a href="https://youtu.be/OAl6eAyP-yo">this awesome video</a></p>
<p>For example, let’s pretend you built a classifier to predict whether a research paper will be admitted to a journal, based on a variety of factors. The features might be the length of the paper, the number of authors, the number of papers those authors have previously submitted to the journal, et cetera. The response (or “output variable”) would be whether or not the paper was admitted.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc7.PNG" alt="Test Image" /></p>
<p>Let’s first take a look at the bottom portion of this diagram, and ignore the everything except the blue and red distributions. We’ll pretend that every blue and red pixel represents a paper for which you want to predict the admission status. This is your validation (or “hold-out”) set, so you know the true admission status of each paper. The 250 red pixels are the papers that were actually admitted, and the 250 blue pixels are the papers that were not admitted.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc8.PNG" alt="Test Image" /></p>
<p>Since this is your validation set, you want to judge how well your model is doing by comparing your model’s predictions to the true admission statuses of those 500 papers. We’ll assume that you used a classification method such as logistic regression that can not only make a prediction for each paper, but can also output a predicted probability of admission for each paper. These blue and red distributions are one way to visualize how those predicted probabilities compare to the true statuses.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc9.PNG" alt="Test Image" /></p>
<p>Let’s examine this plot in detail. The x-axis represents your predicted probabilities, and the y-axis represents a count of observations, kind of like a histogram. Let’s estimate that the height at 0.1 is 10 pixels. This plot tells you that there were 10 papers for which you predicted an admission probability of 0.1, and the true status for all 10 papers was negative (meaning not admitted). There were about 50 papers for which you predicted an admittance probability of 0.3, and none of those 50 were admitted. There were about 20 papers for which you predicted a probability of 0.5, and half of those were admitted and the other half were not. There were 50 papers for which you predicted a probability of 0.7, and all of those were admitted. And so on.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc10.PNG" alt="Test Image" /></p>
<p>Based on this plot, you might say that your classifier is doing quite well, since it did a good job of separating the classes. To actually make your class predictions, you might set your “threshold” at 0.5, and classify everything above 0.5 as admitted and everything below 0.5 as not admitted, which is what most classification methods will do by default. With that threshold, your accuracy rate would be above 90%, which is probably very good.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc11.PNG" alt="Test Image" /></p>
<p>Now let’s pretend that your classifier didn’t do nearly as well and move the blue distribution. You can see that there is a lot more overlap here, and regardless of where you set your threshold, your classification accuracy will be much lower than before.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc12.PNG" alt="Test Image" /></p>
<p>Now let’s talk about the ROC curve that you see here in the upper left. So, what is an ROC curve? It is a plot of the True Positive Rate (on the y-axis) versus the False Positive Rate (on the x-axis) for every possible classification threshold. As a reminder, the True Positive Rate answers the question, “When the actual classification is positive (meaning admitted), how often does the classfier predict positive?” The False Positive Rate answers the question, “When the actual classification is negative (meaning not admitted), how often does the classifier incorrectly predict positive?” Both the True Positive Rate and the False Positive Rate range from 0 to 1.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc13.PNG" alt="Test Image" /></p>
<p>To see how the ROC curve is actually generated, let’s set some example thresholds for classifying a paper as admitted.</p>
<p>A threshold of 0.8 would classify 50 papers as admitted, and 450 papers as not admitted. The True Positive Rate would be the red pixels to the right of the line divided by all red pixels, or 50 divided by 250, which is 0.2. The False Positive Rate would be the blue pixels to the right of the line divided by all blue pixels, or 0 divided by 250, which is 0. Thus, we would plot a point at 0 on the x-axis, and 0.2 on the y-axis, which is right here.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc14.PNG" alt="Test Image" /></p>
<p>Let’s set a different threshold of 0.5. That would classify 360 papers as admitted, and 140 papers as not admitted. The True Positive Rate would be 235 divided by 250, or 0.94. The False Positive Rate would be 125 divided by 250, or 0.5. Thus, we would plot a point at 0.5 on the x-axis, and 0.94 on the y-axis, which is right here.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc15.PNG" alt="Test Image" /></p>
<p>We’ve plotted two points, but to generate the entire ROC curve, all we have to do is to plot the True Positive Rate versus the False Positive Rate for all possible classification thresholds which range from 0 to 1. That is a huge benefit of using an ROC curve to evaluate a classifier instead of a simpler metric such as misclassification rate, in that an ROC curve visualizes all possible classification thresholds, whereas misclassification rate only represents your error rate for a single threshold. Note that you can’t actually see the thresholds used to generate the ROC curve anywhere on the curve itself.</p>
<p>Now, let’s move the blue distribution back to where it was before. Because the classifier is doing a very good job of separating the blues and the reds, I can set a threshold of 0.6, have a True Positive Rate of 0.8, and still have a False Positive Rate of 0.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc16.PNG" alt="Test Image" /></p>
<p>Therefore, a classifier that does a very good job separating the classes will have an ROC curve that hugs the upper left corner of the plot. Conversely, a classifier that does a very poor job separating the classes will have an ROC curve that is close to this black diagonal line. That line essentially represents a classifier that does no better than random guessing.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc17.PNG" alt="Test Image" /></p>
<p>Naturally, you might want to use the ROC curve to quantify the performance of a classifier, and give a higher score for this classifier than this classifier. That is the purpose of AUC, which stands for Area Under the Curve. AUC is literally just the percentage of this box that is under this curve. This classifier has an AUC of around 0.8, a very poor classifier has an AUC of around 0.5, and this classifier has an AUC of close to 1.</p>
<p><img src="http://ashukumar27.io/assets/machinelearning/roc18.PNG" alt="Test Image" /></p>
<p>To Things about this diagram: First, this diagram shows a case where your classes are perfectly balanced, which is why the size of the blue and the red distributions are identical. In most real-world problems, this is not the case. For example, if only 10% of papers were admitted, the blue distribution would be nine times larger than the red distribution. However, that doesn’t change how the ROC curve is generated.</p>
<p>A second note about this diagram is that it shows a case where your predicted probabilities have a very smooth shape, similar to a normal distribution. That was just for demonstration purposes. The probabilities output by your classifier will not necessarily follow any particular shape.</p>
<p><strong>Three other important notes:</strong></p>
<p>The first note is that the ROC curve and AUC are insensitive to whether your predicted probabilities are properly calibrated to actually represent probabilities of class membership. In other words, the ROC curve and the AUC would be identical even if your predicted probabilities ranged from 0.9 to 1 instead of 0 to 1, as long as the ordering of observations by predicted probability remained the same. All the AUC metric cares about is how well your classifier separated the two classes, and thus it is said to only be sensitive to rank ordering. You can think of AUC as representing the probability that a classifier will rank a randomly chosen positive observation higher than a randomly chosen negative observation, and thus it is a useful metric even for datasets with highly unbalanced classes.</p>
<hr />
<p>References:<br />
[1] https://en.wikipedia.org/wiki/Receiver_operating_characteristic<br />
[2] https://www.medcalc.org/manual/roc-curves.php<br />
[3] http://www.dataschool.io/roc-curves-and-auc-explained/</p>ROC or receiver operating characteristic curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. Essentially it illustrates the ability of the classifier to segregate the classses. A higher AUC (Area under the curve)-ROC denotes a better classifierSimilarity functions in Python2018-03-21T00:00:00+00:002018-03-21T00:00:00+00:00http://ashukumar27.io/similarity_functions<p>Similarity functions are used to measure the ‘distance’ between two vectors or numbers or pairs. Its a measure of how similar the two objects being measured are. The two objects are deemed to be similar if the distance between them is small, and vice-versa.</p>
<h3 id="measures-of-similarity">Measures of Similarity</h3>
<p><strong>Eucledian Distance</strong></p>
<p>Simplest measure, just measures the distance in the simple trigonometric way</p>
<p><script type="math/tex">d(x,y)</script>= <script type="math/tex">\sqrt {\sum_{i=1}^k (x_i-y_i)^2}</script></p>
<p>When data is dense or continuous, this is the best proximity measure. The Euclidean distance between two points is the length of the path connecting them.This distance between two points is given by the Pythagorean theorem.</p>
<p>Implementation in python</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">euclidean_distance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">):</span>
<span class="k">return</span> <span class="n">sqrt</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="nb">pow</span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="n">b</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)))</span>
</code></pre></div></div>
<p><strong>Manhattan Distance</strong></p>
<p>Manhattan distance is an metric in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates. In simple way of saying it is the absolute sum of difference between the x-coordinates and y-coordinates. Suppose we have a Point A and a Point B: if we want to find the Manhattan distance between them, we just have to sum up the absolute x-axis and y–axis variation. We find the Manhattan distance between two points by measuring along axes at right angles.</p>
<p>In a plane with p1 at (x1, y1) and p2 at (x2, y2).</p>
<p>Manhattan distance = <script type="math/tex">\lvert x1 – x2 \rvert+ \lvert y1 – y2 \rvert</script></p>
<p>This Manhattan distance metric is also known as Manhattan length, rectilinear distance, L1 distance, L1 norm, city block distance, Minkowski’s L1 distance,taxi cab metric, or city block distance.</p>
<p>Implementation in Python</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">manhattan_distance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="n">b</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">))</span>
</code></pre></div></div>
<p><strong>Minkowski Distance</strong></p>
<p>The Minkowski distance is a generalized metric form of Euclidean distance and Manhattan distance. It looks like this:</p>
<p>In the equation d^MKD is the Minkowski distance between the data record i and j, k the index of a variable, n the total number of variables y and λ the order of the Minkowski metric. Although it is defined for any λ > 0, it is rarely used for values other than 1, 2 and ∞.</p>
<p>Different names for the Minkowski difference arise from the synonyms of other measures:</p>
<ul>
<li>
<p>λ = 1 is the Manhattan distance. Synonyms are L1-Norm, Taxicab or City-Block distance. For two vectors of ranked ordinal variables the Manhattan distance is sometimes called Foot-ruler distance.</p>
</li>
<li>
<p>λ = 2 is the Euclidean distance. Synonyms are L2-Norm or Ruler distance. For two vectors of ranked ordinal variables the Euclidean distance is sometimes called Spear-man distance.</p>
</li>
<li>
<p>λ = ∞ is the Chebyshev distance. Synonym are Lmax-Norm or Chessboard distance.</p>
</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">math</span> <span class="kn">import</span><span class="o">*</span>
<span class="kn">from</span> <span class="nn">decimal</span> <span class="kn">import</span> <span class="n">Decimal</span>
<span class="k">def</span> <span class="nf">nth_root</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">n_root</span><span class="p">):</span>
<span class="n">root_value</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="nb">float</span><span class="p">(</span><span class="n">n_root</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">round</span> <span class="p">(</span><span class="n">Decimal</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="o">**</span> <span class="n">Decimal</span><span class="p">(</span><span class="n">root_value</span><span class="p">),</span><span class="mi">3</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">minkowski_distance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">p_value</span><span class="p">):</span>
<span class="k">return</span> <span class="n">nth_root</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="nb">pow</span><span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="n">b</span><span class="p">),</span><span class="n">p_value</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)),</span><span class="n">p_value</span><span class="p">)</span>
<span class="k">print</span> <span class="n">minkowski_distance</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">],[</span><span class="mi">7</span><span class="p">,</span><span class="mi">6</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span><span class="mi">3</span><span class="p">)</span>
</code></pre></div></div>
<p><strong>Cosine Similarity</strong><br />
Cosine similarity metric finds the normalized dot product of the two attributes. By determining the cosine similarity, we will effectively trying to find cosine of the angle between the two objects. The cosine of 0° is 1, and it is less than 1 for any other angle. <strong>It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.</strong></p>
<p>Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1]. One of the reasons for the popularity of cosine similarity is that it is very efficient to evaluate, especially for sparse vectors.</p>
<p>Cosine Similarity (A,B) = <script type="math/tex">cos(\theta)</script> = <script type="math/tex">\frac {A{.}B}{\|A\| \|B\|}</script> = <script type="math/tex">\frac{\sum_{i=1}^n A_i B_i}{\sqrt {\sum_{i=1}^n A_i^2}\sqrt {\sum_{i=1}^n B_i^2}}</script></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">square_rooted</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">round</span><span class="p">(</span><span class="n">sqrt</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="n">a</span><span class="o">*</span><span class="n">a</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">x</span><span class="p">])),</span><span class="mi">3</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">):</span>
<span class="n">numerator</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">a</span><span class="o">*</span><span class="n">b</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">))</span>
<span class="n">denominator</span> <span class="o">=</span> <span class="n">square_rooted</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="n">square_rooted</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">round</span><span class="p">(</span><span class="n">numerator</span><span class="o">/</span><span class="nb">float</span><span class="p">(</span><span class="n">denominator</span><span class="p">),</span><span class="mi">3</span><span class="p">)</span>
<span class="k">print</span> <span class="n">cosine_similarity</span><span class="p">([</span><span class="mi">3</span><span class="p">,</span> <span class="mi">45</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">54</span><span class="p">,</span> <span class="mi">13</span><span class="p">,</span> <span class="mi">15</span><span class="p">])</span>
</code></pre></div></div>
<p><strong>Jaccard Similarity</strong></p>
<p>Jaccard Similarity is used to find similarities between sets. The Jaccard similarity measures similarity between finite sample sets, and is defined as the cardinality of the intersection of sets divided by the cardinality of the union of the sample sets.</p>
<p>Suppose you want to find jaccard similarity between two sets A and B, it is the ratio of cardinality of A ∩ B and A ∪ B.</p>
<p>Cardinality: Number of elements in a set</p>
<table>
<tbody>
<tr>
<td>say A & B are sets, with cardinality denoted by</td>
<td>A</td>
<td>and</td>
<td>B</td>
</tr>
</tbody>
</table>
<p>Jaccard Similarity J(A,B) = <script type="math/tex">\frac {\lvert A \cap B \rvert}{\lvert A \cup B \rvert}</script></p>
<p>Implementation in Python</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">math</span> <span class="kn">import</span><span class="o">*</span>
<span class="k">def</span> <span class="nf">jaccard_similarity</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">):</span>
<span class="n">intersection_cardinality</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="nb">set</span><span class="o">.</span><span class="n">intersection</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="nb">set</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="nb">set</span><span class="p">(</span><span class="n">y</span><span class="p">)]))</span>
<span class="n">union_cardinality</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="nb">set</span><span class="o">.</span><span class="n">union</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="nb">set</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="nb">set</span><span class="p">(</span><span class="n">y</span><span class="p">)]))</span>
<span class="k">return</span> <span class="n">intersection_cardinality</span><span class="o">/</span><span class="nb">float</span><span class="p">(</span><span class="n">union_cardinality</span><span class="p">)</span>
<span class="k">print</span> <span class="n">jaccard_similarity</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">],[</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">7</span><span class="p">,</span><span class="mi">9</span><span class="p">])</span>
</code></pre></div></div>
<hr />
<p>References:<br />
[1] http://dataconomy.com/2015/04/implementing-the-five-most-popular-similarity-measures-in-python/<br />
[2] https://en.wikipedia.org/wiki/Similarity_measure<br />
[3] http://bigdata-madesimple.com/implementing-the-five-most-popular-similarity-measures-in-python/<br />
[4] http://techinpink.com/2017/08/04/implementing-similarity-measures-cosine-similarity-versus-jaccard-similarity/</p>Similarity functions are used to measure the ‘distance’ between two vectors or numbers or pairs. Its a measure of how similar the two objects being measured are. The two objects are deemed to be similar if the distance between them is small, and vice-versa.