alter 4

a) Discuss about quadratic cost / mean squared error. (2 Marks) Quadratic cost, or Mean Squared Error (MSE), is one of the simplest cost functions primarily used for regression tasks. It measures the performance of a model by calculating the squared difference between the true label ( $y_{i}$ ) and the predicted output ( ${\hat{y}}_{i}$ ). Formula: $C = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}$ Squaring the difference serves two purposes:

It ensures the difference always remains positive.
It severely penalizes large errors compared to smaller ones.

b) Point out the benefit or importance of using cross entropy cost function. (3 Marks) Benefits of the Cross Entropy cost function include:

It helps to minimize the impact of saturated neurons on learning speed, allowing the network to learn faster.
It is relatively easy to compute and optimize.
It is robust to outliers in the data.
It is well-suited and highly effective for multi-class classification tasks.

c) Using this one-dimensional quadratic function $f (x) = 2 x^{2} + 4 x + 2$ find the second step of gradient. [ $η = 0.1, x_{0} = 5$ ], [ $P_{n + 1} = P_{n} - η \nabla f (P_{n})$ ] (5 Marks)

Solution: Given, Function, $f (x) = 2 x^{2} + 4 x + 2$ Learning rate, $η = 0.1$ Starting point, $x_{0} = 5$

First, we need to find the derivative (gradient) of the function: $f^{'} (x) = \frac{d}{d x} (2 x^{2} + 4 x + 2) = 4 x + 4$

The gradient descent update rule is: $x_{n + 1} = x_{n} - η f^{'} (x_{n})$

Step 1 - Finding $x_{1}$ : Gradient at $x_{0} = 5$ is $f^{'} (x_{0}) = 4 (5) + 4 = 20 + 4 = 24$ So, $x_{1} = x_{0} - η f^{'} (x_{0})$ $\Rightarrow x_{1} = 5 - (0.1 \times 24)$ $\Rightarrow x_{1} = 5 - 2.4$ $\Rightarrow x_{1} = 2.6$

Step 2 - Finding $x_{2}$ : Gradient at $x_{1} = 2.6$ is $f^{'} (x_{1}) = 4 (2.6) + 4 = 10.4 + 4 = 14.4$ So, $x_{2} = x_{1} - η f^{'} (x_{1})$ $\Rightarrow x_{2} = 2.6 - (0.1 \times 14.4)$ $\Rightarrow x_{2} = 2.6 - 1.44$ $\Rightarrow x_{2} = 1.16$

Answer: The second step of gradient is $1.16$ .