Logistic Regression: When a Line Learns to Decide

I found Logistic Regression really interesting and it was quick for me to understand--I think that is largely because Linear Regression has created a mental map for me. It's kinda like if you understand a monkey, you'd not have a so much trouble understanding a Chimp (I'm really not sure on this one though, don't take my word for it, but you yeah, you get my point) Logistic Regression is basically just applying the sigmoid function to the linear regression model. This helps in classification, for instance, selecting between cat and dog.

$$fw,b(x)=σ(w⊤x+b)$$

where:

x: The input feature vector.
w: The weight vector (parameters of the model).
b: The bias term.
σ(⋅): The sigmoid function, defined as:

$$\sigma(z) = \frac{1}{1 + e^{-z}}$$

I'm a bit skeptic about writing all the formulas and all for my blog. I want to keep things as simplistic for now. Well, for the cost function, we can't apply the same cost function like we did for linear regression. The reason for this is because: For the cost function of Linear Regression we have a global minimum -- yup, just one -- but for the Logistic Regression, we will have so many local minima, hence, making it difficult to reach a convergence. Anyway, isn't there a solution for all things? yes. obviously, there is. We use the log. The log of F(X).. That will help us to have a converging point.

The Loss function and Cost function:

$$\begin{align*} \text{Loss}(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) &= - \left[ y^{(i)} \log(f_{\mathbf{w},b}(\mathbf{x}^{(i)})) + (1 - y^{(i)}) \log(1 - f_{\mathbf{w},b}(\mathbf{x}^{(i)})) \right] \\ J(\mathbf{w}, b) &= -\frac{1}{m} \sum_{i=1}^m \left[ y^{(i)} \log(f_{\mathbf{w},b}(\mathbf{x}^{(i)})) + (1 - y^{(i)}) \log(1 - f_{\mathbf{w},b}(\mathbf{x}^{(i)})) \right] \end{align*}$$

Lastly, the gradient descent is used for finding the parameter w and b.