https://datascience.stackexchange.com/questions/23159/in-softmax-classifier-why-use-exp-function-to-do-normalizationThe second answer by MachineLearner is great.This is similar to the Boltzmann distribution.