Health Cost Calculator: trouble getting below 3500 mae

I’m having some trouble getting my model’s mae below 3500. The closest I’ve come is 5606 using a Random Forest Regressor. I’ve tuned my hyperperamaters. Maybe I should drop less correlated features from my dataset? I’m lost tbh.

Challenge: Linear Regression Health Costs Calculator

Link to the challenge:

Welcome to the forums, @benlaurent.

Take a look at the TF fuel efficiency example, if you haven’t already, and try to adapt it to this problem. There are some other details of how I did this here.

thank you so much! this helps a ton. tbh, i just finished reading a book on machine learning and ended up using sklearn for model selection and data processing without realizing it was a tf and keras project lol.

There’s no right way to do it, as long as it passes the tests, but given the similarities I think that was the intention. There probably is a way to use sklearn to solve this, but my hammer is tensorflow/keras, for better or worse. Probably the biggest source of confusion for me is the term “regression,” which I always interpret first as the curve fitting math from statistics and not this very similar approach in AI.

So I was able to get much better results with a keras model but I’m still stuck in the 4500s. Any advice on tweaking my model? I’ve never seriously used keras I only loosely understand it. Thanks again!

I encoded age, bmi, and children as numerics and sex, smoker, and region as categorical features, similar to the fuel efficiency example. I then started building up dense layers as the tutorial did and alternated them with dropout layers. I started with a dense layer of 64, and doubled that as I added layers, and tinkered with the dropout values to try to tweak things, and of course you end with an output dense layer of 1. I used the adam optimizer with MeanAbsoluteError loss function since we’re trying to get a number and not a classification (the binary cross-entropy in the other ML projects).

The biggest surprise to me was how many layers were necessary.

Thank you. I pretty much set up the data the same way. This is great advice, I was thinking I probably shouldn’t use more than 3 or 4 layers but I guess I’ll keep going!

Just wanted to say thanks again. Just submitted my project with a mae of 1800. My biggest problem was that I forgot to set index= None when I split my train and test sets, which caused my model to not read the age column. Once I fixed that it brought me from 4200 to 1800 immediately haha. Appreciate all the help!

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.