I am currently working on the Cat-Dog-Immage classifier project for Tensorflow and while plucking in some numbers and layers and feel really stupid ^^°
I have no idea how do determine an actually decent design. Got some good working models from the forum which are just very simple and work great - which is good to see. But I’d love to have at least some understanding on why these work and mine don’t.
Also I spent hours creating different models which just perfectly managed to train for 15 minutes just to flatline on 50% accuracy - just because I chose the first activation on the first conv2d-layer to be “sigmoid”. Chosing “relu” instead suddenly produced some actual results.
Can someone explain what to look out for or a good resource?
I know actually creating good models takes a lot of people with high degrees and powerful machines - so I am not expecting to become on expert. It would just be nice having some general understanding of why some choices are better than others.
In reverse order, I think this project is loosely based on this article. This is a good start. After that, it’s the tensor flow documentation and tutorials.
As far as design, it really is all hit and miss. You just get better with experience. The best way to improve is to take a model that at least partially works and then make incremental improvements by experimenting until it works well enough. That means adding or changing layers, activations, dropouts, gradient functions, etc., then training, evaluating, and adjusting again. There are guidelines on some of this, but most of them are just that.
The only hard design help to offer here is to make sure your model outputs in binary (0 or 1 for cat or dog) or something coerced to binary and that you are using a classifier and not a regressor or some other model.
This really is the billion dollar question. If we knew how we discern cat from dog in our hardware and software, then we could program it and the self-driving cars would be here. Right now, we’re just representing a picture of a cat or dog with a tensor processed through some tensor function that is supposed to find and enhance the essential catness or dogness of the tensor to the degree that dumb blocks of silicon can tell the difference after obscene amounts of computation. We still haven’t had that breakthrough in cognition or AI that gets us to the point of algorithmic generation of models for AI tasks.
That’s somehow both reassuring AND disheartening xD
Surely the project gave me some insights thanks to running into a bunch of errors along the way.
I’ll try plaing around with it some more. Thanks for the article and guess I will give the documentation a read.
About 20 years ago, I gave a seminar on using neural nets in chemistry, specifically for computing or predicting properties, states, configurations, etc. At the time, there was great interest in being able to calculate protein shapes, wavefunctions, quantum dynamics and the like, but it was computationally expensive. So AI was seen as a shortcut, but it was still too primitive to do anything. The best question I had came from one of our professors and he was wondering how the correlations that were found were developed from the training to the actual predicting (he was thinking in terms of computation, not AI). Unfortunately, the answer really didn’t change in those 20 years.
A Google AI just won the recent protein folding challenge, beating the best computational models. I doubt seriously there were great gains in understanding about how to design the models, or what they mean, in terms of protein folding. But there were 20 years of Moore’s law growth in computing power and the complexity of AI protein folding seems to be less than the complexity of computed protein folding.
So basically, the AI performed better but there is propably no actual use for it?
I did the challenge again by using a pretrained model and got 78% instead of my own model with 64%. Looking at it’s summary in tensorflow, I really wonder what level of thought went into it’s construction…
There must be some level of insight, even if the fact it’s an ML-model means the problem isn’t actually solved.
It’s fascinating though. This means the invention of ML basically allows us to use massive computational power to solve problems, without getting any insight into the solution itself.
On a side note, I sure hope “learning” this will end up helping me land a job somewhere in the field ^^°
As in “learning” how to built something that’s literally a blackbox to the people building it (including myself)…
One of the better texts is available free online. It’s math intensive and gives you an indication of the computational power necessary. There is definitely an art to designing models, but even a suboptimal model can get better with training. So, you get a lot of small scale experimentation (programmatically create many models with tensor flow and train and evaluate them on reasonably small datasets, keep the best ones, then test with larger data sets, etc.) to get a good model, then train as much as possible. In the end you have a pretrained model in a JS or python library you can put into your web app.
I think there are two possibilities going forward in AI:
We have yet to make a breakthrough understanding in cognition that will point the way forward in AI, or vice-versa.
Cognition works like current AI, except with a few billion extra years of training and the extra incentive of death for failing to recognize danger.
Regardless, a lot of work gets done and a lot of money gets made with a technology we don’t fully understand yet.
Yes, but the good models use dropout layers and huge datasets to prevent overfitting, in addition to things like rotation, translation, and other effects that were added to the training data in the cat/dog classifier. Overfitting is definitely a problem, but it’s far worse on small or insufficiently representative data sets. The layers, model type, dropout, gradient algorithms, and training are all just parameters to adjust in order to optimize the model for the purpose at hand, while watching for overfitting or bias or other problems.