 Normalizing inputs in Machine Learning means to scale the feature input data to a normalized form using statistical normalization or standardization to speed up the learning process.

#### Normalization and Standardization

Statistical normalization means to scale the inputs data ranges from 0 to 1.
In machine learning, input columns are called the features. In this technique, every feature X is first subtracted by the minimum value of the feature and then divided it by the difference between maximum feature value and minimum feature value. The equation for Normalization is,

X^{\prime}=\frac{X-X_{\min }}{X_{\max }-X_{\min }}

It is also called the Min-Max Scaling.

There is another input normalization technique which is called Standardization. Standardization scales the feature values around the mean of input features by using the standard deviation. The equation for Standardization is,

X^{\prime}=\frac{X-\mu}{\sigma}

Where,

μ = The Mean
σ = Standard Deviation

If you want to deep dive into the statistical mathematics on this topic, you can check it here Normalization (statistics)

#### Why is the normalization technique used?

If you are known to gradient descent which is an optimization technique, then you are familiar with the cost function,

J(w, b)=\frac{1}{m} \sum_{i=1}^{m} \mathcal{L}\left(\hat{y}^{(i)}, y^{(i)}\right)

And,

\hat{y}^{(i)} = \sigma(w^T.x^{(i)}+b)

Where,

w = The weight
b = The bias
m = The number of training examples
​L = Loss function
(i) = Prediced value of y for i-th training example
y(i) = Actual value of y for i-th training example
x(i)= i-th training example
σ = Sigmoid activation function

This shows that the cost function depends on the size of the input feature x(I). So if we have input features with different values, gradient descent faces problems with optimizing the cost and it takes too much time.

For example, if one feature has values ranges from 1 to 4 and another feature has the values ranges from 1000-30000, the weight parameter w for each feature varies from very low to high. Therefore the loss also varies from very high to very low. As a result, the cost function looks like this. Unnormalized Inputs produce and elongated Cost Function that takes much time to find global minima to optimize the cost

On the other hand, Normalization scales the value between 0 and 1 or only close to 0 which results in a cost function like below – Normalized Inputs results this type of regular cost function that is fast and easy for finding the global minima

In conclusion, Normalization results in faster training for your AI model by scaling the features to the standard values.