I’m having some trouble getting my model’s mae below 3500. The closest I’ve come is 5606 using a Random Forest Regressor. I’ve tuned my hyperperamaters. Maybe I should drop less correlated features from my dataset? I’m lost tbh.
Challenge: Linear Regression Health Costs Calculator
Take a look at the TF fuel efficiency example, if you haven’t already, and try to adapt it to this problem. There are some other details of how I did this here.
thank you so much! this helps a ton. tbh, i just finished reading a book on machine learning and ended up using sklearn for model selection and data processing without realizing it was a tf and keras project lol.
There’s no right way to do it, as long as it passes the tests, but given the similarities I think that was the intention. There probably is a way to use sklearn to solve this, but my hammer is tensorflow/keras, for better or worse. Probably the biggest source of confusion for me is the term “regression,” which I always interpret first as the curve fitting math from statistics and not this very similar approach in AI.
So I was able to get much better results with a keras model but I’m still stuck in the 4500s. Any advice on tweaking my model? I’ve never seriously used keras I only loosely understand it. Thanks again!
I encoded age, bmi, and children as numerics and sex, smoker, and region as categorical features, similar to the fuel efficiency example. I then started building up dense layers as the tutorial did and alternated them with dropout layers. I started with a dense layer of 64, and doubled that as I added layers, and tinkered with the dropout values to try to tweak things, and of course you end with an output dense layer of 1. I used the adam optimizer with MeanAbsoluteError loss function since we’re trying to get a number and not a classification (the binary cross-entropy in the other ML projects).
The biggest surprise to me was how many layers were necessary.
Thank you. I pretty much set up the data the same way. This is great advice, I was thinking I probably shouldn’t use more than 3 or 4 layers but I guess I’ll keep going!
Just wanted to say thanks again. Just submitted my project with a mae of 1800. My biggest problem was that I forgot to set index= None when I split my train and test sets, which caused my model to not read the age column. Once I fixed that it brought me from 4200 to 1800 immediately haha. Appreciate all the help!