Photo by Colton Sturgeon on Unsplash
The Balancing Act: Choosing the Right Lambda in Regularization (Linear Regression)
Since we can't be sure which of the parameter to penalize, we penalize all by adding the Regularization term to the cost function. A regularized cost function is defined by:
$$J(\mathbf{w}, b) = \frac{1}{2m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right)^2 + \frac{\lambda}{2m} \sum_{j=1}^n w_j^2 $$
$$\frac{\lambda}{2m} \sum_{j=1}^n w_j^2$$
The regularization term, where λ is the regularization parameter and wj are the weights.
Note: if λ = 0, then the regularization term is not used, resulting in overfitting. if λ = 10^10, this will make the model penalize the parameter to choose very tiny number almost tantamount to 0, resulting in f(x) = b
also, the gradient descent is not left out. Gradients for w:
$$\frac{\partial J(\mathbf{w}, b)}{\partial w_j} = \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right) x_j^{(i)} + \frac{\lambda}{m} w_j$$
Gradients for b:
$$\frac{\partial J(\mathbf{w}, b)}{\partial b} = \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right)$$
Repeat until converge:
$$w_j = w_j - \alpha \left( \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right) x_j^{(i)} + \frac{\lambda}{m} w_j \right)$$
$$ b = b - \alpha \left( \frac{1}{m} \sum{i=1}^m \left( f{\mathbf{w}, b}(\mathbf{x}^{(i)}) - y^{(i)} \right) \right)$$
Summary: Picking the right lambda matters!