Health Cost Calculator: trouble getting below 3500 mae

benlaurent · April 14, 2021, 5:54am

I’m having some trouble getting my model’s mae below 3500. The closest I’ve come is 5606 using a Random Forest Regressor. I’ve tuned my hyperperamaters. Maybe I should drop less correlated features from my dataset? I’m lost tbh.

Challenge: Linear Regression Health Costs Calculator

Link to the challenge:

jeremy.a.gray · April 14, 2021, 10:59am

Welcome to the forums, @benlaurent.

Take a look at the TF fuel efficiency example, if you haven’t already, and try to adapt it to this problem. There are some other details of how I did this here.

benlaurent · April 15, 2021, 3:58am

thank you so much! this helps a ton. tbh, i just finished reading a book on machine learning and ended up using sklearn for model selection and data processing without realizing it was a tf and keras project lol.

jeremy.a.gray · April 15, 2021, 11:18pm

There’s no right way to do it, as long as it passes the tests, but given the similarities I think that was the intention. There probably is a way to use sklearn to solve this, but my hammer is tensorflow/keras, for better or worse. Probably the biggest source of confusion for me is the term “regression,” which I always interpret first as the curve fitting math from statistics and not this very similar approach in AI.

benlaurent · April 16, 2021, 1:28am

So I was able to get much better results with a keras model but I’m still stuck in the 4500s. Any advice on tweaking my model? I’ve never seriously used keras I only loosely understand it. Thanks again!

jeremy.a.gray · April 16, 2021, 1:28pm

I encoded age, bmi, and children as numerics and sex, smoker, and region as categorical features, similar to the fuel efficiency example. I then started building up dense layers as the tutorial did and alternated them with dropout layers. I started with a dense layer of 64, and doubled that as I added layers, and tinkered with the dropout values to try to tweak things, and of course you end with an output dense layer of 1. I used the adam optimizer with MeanAbsoluteError loss function since we’re trying to get a number and not a classification (the binary cross-entropy in the other ML projects).

The biggest surprise to me was how many layers were necessary.

benlaurent · April 16, 2021, 1:47pm

Thank you. I pretty much set up the data the same way. This is great advice, I was thinking I probably shouldn’t use more than 3 or 4 layers but I guess I’ll keep going!

benlaurent · April 18, 2021, 3:42pm

Just wanted to say thanks again. Just submitted my project with a mae of 1800. My biggest problem was that I forgot to set index= None when I split my train and test sets, which caused my model to not read the age column. Once I fixed that it brought me from 4200 to 1800 immediately haha. Appreciate all the help!

system · October 18, 2021, 3:43am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Machine Learning with Python Projects - Linear Regression Health Costs Calculator Python	2	25	June 19, 2025
Machine Learning: Linear Regression Health Costs Calculator Completion Code Feedback	5	2050	February 13, 2022
https://www.freecodecamp.org/learn/machine-learning-with-python/machine-learning-with-python-projects/linear-regression-health-costs-calculator Python	2	446	October 26, 2021
Linear Regression Health Costs Calculator Python	6	2181	July 14, 2021
Health Costs with Linear Regression - Data Shuffle? or Not? Python	5	2970	January 18, 2023

Health Cost Calculator: trouble getting below 3500 mae

Related topics