I’m taking the course on Machine Learning with Python / Tensorflow and I reached the end of the part where we train a model for linear regression. Although the video does a very good job in explaining how we can prepare the data and set up thing to define a LinearClassifier estimator, I am left with a couple of questions – which are, in a sense, related:

Why do we need (or why/when should we prefer) supervised learning when it comes to finding a linear model? Aren’t there other algorithms (eg solving systems of linear equations to find weights minimizing the error etcetera) that don’t involve training a model? I mean, Excel does not train a model on the spot to find a line of best fit. Is a ML approach faster or more efficient? What makes the difference? Is it a matter of volume of data, or of multiple dimensions?

While I understand that knowing all the technical implementation details is often not necessary, the lesson does presents the estimator a bit like a black box. Understanding an algorithm is in general very useful because when we read the output we might have a way to detect if we messed things up. So… what happens when we tell the machine to train the model? We were told that TF works by building “graphs” of defined computations, to be later executed in a “session”: what does the graph of computations look like for the linear regression estimator?

There’s the regular regression analysis that can fit data to lines or other polynomial curves, which is what mathematical packages like Excel or various python libraries use. But if you squint at it, it’s just another mathematical model like a linear classifier that makes predictions from data. Regression uses linear algebra and a linear classifier uses a trainable neural network. Regression in general should be significantly faster. I suppose a linear classifier could be used when regression can’t be done due to constraints on the linear algebra in the regression but it would require more digging to find a concrete example.

A linear classifier is a black box, like all neural networks. If we knew the algorithm of the process simulated by the neural network, we would just program it (and for most regressions, we do). The network does its processing and the training step just compares the network’s output with known values to determine if the network is progressing toward a good solution or not. The network essentially learns what is good and what is not and tends toward the good. Once the network is successful enough at its task, it can then be used.

Waitwaitwait- here we’re talking about different kinds of black box, I think

a NN is a black box because its complexity makes it so that we can look at all its parts (its nodes, layers, activation functions, loss, etc.) and see it in action, and despite knowing what it looks like we cannot extract information such as “this machine recognizes this handwritten number as a 3 because it has two bumps”, for the reason that (afaiu) this is just not how NNs work and hence it is not a meaningful exercise to try to formulate statements of that kind. However nothing magical is happening and if one wanted to one could still “open the box” and follow every step of the computations that take place in the training process

the estimator in the course is treated a black box (also) because we just type mylinearclassifier = tf.estimator.LinearClassifier(...)without asking what it might look like: we press a button, it does the thing. By this standard, even a car can be a black box, in that I could be able to drive without having the slightest idea how a car transmission system works or even what engines look like at all. So, our linear classifier is a NN: what can I expect of what it should look like? How many layers, which activation functions do we apply, …? I mean, I guess that it’s not important to know the exact answer (also because of the point above!), but perhaps one at least in the ballpark- just to have a feeling of what is a sensible implementation of such a NN

A “blackbox” generally refers to the concept of putting certain inputs in, getting some outputs out - and having no idea what’s happening inbetween.

Even a basic linear regression can be just that, if you don’t know HOW it’s actually done. Ofcourse if you understand the concept and just don’t know the exact implementation in a library, then “blackbox” doesn’t really apply.

A NN or even DNN… unlike a linear regression, there isn’t that much to actually learn. How a linear regression works is understandable - but how a NN is actually able to turn inputs into meaningful outputs? That’s another thing… Like, with image recognition and some basic models you can still get a decent understanding on how weights and biases in general work and influence the result. But the deeper the model, the less likely it is to get any understanding out of it.

No, I meant a linear classifier is a black box since it’s a neural network. All the stuff you said about understanding a neural net also applies to the linear classifier. There are implementation details for any neural network regarding the number and types of layers, dropouts, etc. Just because a neural net sorts dogs and cats or computes a linear regression is irrelevant because it’s just a mathematical model taking in numbers and outputting numbers to which we assign meaning.

Since it sounds like you’re more interested in the implementation details, the bad news is that they are determined experimentally (and some experience and intuition, which is obtained experimentally…). A good cat/dog classifier is developed by trial and error and not from a set recipe. You can research specific problem domains and find out what particular neural nets work well for that problem as many have published solutions that work very well.

Back to the regression, I can teach a computer to compute a regression algebraically since I know how to do it. A neural net regression is the computer analog of plotting the points for the regression and estimating the line of best fit with a straightedge. We always have to remember that determining how neural nets learn is close to equivalent to determining how we learn, and we’re not far beyond trial and error on either front.

Not really? Like, “trial and error” is a broad concept and the way an optimization for a NN works is in no way compareable to how the brain works. Not just because the brain has more “neurons” but because those neurons build sub-structures with specific tasks while at the same time the brain has some plasticity which allows it to adjust to differnt aspects and sometimes one area can take over tasks from another.

Sowhere in the billion-year history of life it might come down to “trial and error” but with the modern brain, we have a structure with a level of functionality and “learning” which far outperforms the mere “trial and error” concept of ML.