What Andrew Ng's Machine Learning Course Taught Me, and the Classifier I Built

I finished Supervised Machine Learning: Regression and Classification, the first course in Andrew Ng’s machine learning specialization. I wanted the real fundamentals before going deeper into applied AI, and this is exactly that: the math, the intuition, and the algorithms that everything else builds on.

These are my notes on what stuck, plus the small classifier I built afterward to prove I actually understood it.

Linear regression and the cost function

The course starts where machine learning starts: fitting a line to data. A linear regression model with one feature is just

f_{w,b}(x) = wx + b

where $w$ is the weight and $b$ is the bias. Training means finding the values of $w$ and $b$ that make the predictions closest to the real labels. “Closest” needs a definition, and that is the cost function. For regression we use mean squared error:

J(w,b) = \frac{1}{2m} \sum_{i=1}^{m} \left( f_{w,b}\!\left(x^{(i)}\right) - y^{(i)} \right)^2

Here $m$ is the number of training examples, and $x^{(i)}$ and $y^{(i)}$ are the $i$ -th feature and label. The factor of $\tfrac{1}{2}$ is just there to make the derivative clean later. The whole game is to make $J(w,b)$ as small as possible.

Gradient descent

Gradient descent is the algorithm that actually does the minimizing, and it was my favorite part of the course because it finally made optimization feel concrete. You start with some values for $w$ and $b$ , then repeatedly step downhill along the slope of the cost:

\begin{aligned} w &:= w - \alpha \, \frac{\partial}{\partial w} J(w,b) \\[4pt] b &:= b - \alpha \, \frac{\partial}{\partial b} J(w,b) \end{aligned}

The learning rate $\alpha$ controls how big each step is. Too small and training crawls; too large and it can overshoot the minimum and diverge. Watching the cost go down on every iteration, and seeing what happens when $\alpha$ is wrong, taught me more than any diagram could.

From regression to classification

The second half of the course moves from predicting numbers to predicting classes. Linear regression is the wrong tool for that, so logistic regression wraps the linear model in the sigmoid function:

g(z) = \frac{1}{1 + e^{-z}}, \qquad f_{w,b}(x) = g\!\left( \mathbf{w} \cdot \mathbf{x} + b \right)

Building a technical debt classifier

Theory only sticks once you ship something with it, so I built a small project: a technical debt classifier. The idea is to train a model on real code from GitHub and have it flag the kind of debt that quietly piles up in a codebase.

It takes a snippet of code as input. For example:

// TODO fix later
global $wpdb;
$query = "SELECT * FROM $table";

And classifies it:

Technical Debt (Security Risk)

That tiny snippet trips several alarms at once: a lingering TODO, direct use of the database handle, and a raw SQL string built by concatenation with no $wpdb->prepare() in sight.

The model sorts each snippet into one of five classes:

Clean
Refactor Needed
Legacy
Security Risk
Dead Code

Five classes meant going beyond the binary case from the course. The trick is the one vs rest idea Andrew Ng covers near the end: train one logistic regression classifier per class, each answering “is it this class or not,” then pick the class with the highest probability. Same sigmoid, same logistic loss, run five times.

I kept the features deliberately simple and interpretable rather than reaching for a large model: counts of TODO and FIXME markers, raw SQL concatenation, commented out blocks, calls to deprecated functions, and a few size and complexity signals. Everything runs locally, no API calls, which keeps it fast and private.

Putting it to work for WordPress 7.0

The real test was running it against my own Product Roles Manager for WooCommerce plugin while preparing it for the WordPress 7.0 release. I pointed the classifier at the whole codebase and used the output as a prioritized checklist: clean up the Security Risk hits first, schedule the Refactor Needed sections, and delete what it flagged as Dead Code.

It was not magic, and I reviewed every call by hand, but it turned a vague “audit the plugin before the new WordPress version” into a concrete, ranked list of places to look. That is exactly the kind of leverage I was hoping to get out of learning this properly.

What I took away

Two things. First, the fundamentals are worth the time: gradient descent, cost functions, and logistic regression are the bedrock under almost everything in applied AI, and understanding them changes how I read every model that follows. Second, building something real, even a small classifier on my own code, is what turned the equations into intuition. Next up is the rest of the specialization, and more projects to keep the theory honest.