Program working but 1 test failing in medical data visualizer

Tell us what’s happening:

I feel like my program works perfectly, but there is a rounding error somewhere and i can’t figure out where, other posts on the forum said that it wasnt a rounding error but a bug that occurs with the newer matplotlib version but those bugfixes didn’t help me either.

This is what my code outputs:

python main.py
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ht9_r_6g because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
…[‘0.0’, ‘0.0’, ‘-0.0’, ‘0.0’, ‘-0.1’, ‘0.5’, ‘0.0’, ‘0.1’, ‘0.1’, ‘0.2’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.2’, ‘0.1’, ‘0.0’, ‘0.2’, ‘0.1’, ‘0.0’, ‘0.1’, ‘-0.0’, ‘-0.1’, ‘0.1’, ‘0.0’, ‘0.2’, ‘0.0’, ‘0.1’, ‘-0.0’, ‘-0.0’, ‘0.1’, ‘0.0’, ‘0.1’, ‘0.4’, ‘-0.0’, ‘-0.0’, ‘0.3’, ‘0.2’, ‘0.1’, ‘-0.0’, ‘0.0’, ‘0.0’, ‘-0.0’, ‘-0.0’, ‘-0.0’, ‘0.2’, ‘0.1’, ‘0.1’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.3’, ‘0.0’, ‘-0.0’, ‘0.0’, ‘-0.0’, ‘-0.0’, ‘-0.0’, ‘0.0’, ‘0.0’, ‘-0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.2’, ‘0.0’, ‘-0.0’, ‘0.2’, ‘0.1’, ‘0.3’, ‘0.2’, ‘0.1’, ‘-0.0’, ‘-0.0’, ‘-0.0’, ‘-0.0’, ‘0.1’, ‘-0.1’, ‘-0.1’, ‘0.6’, ‘0.0’, ‘0.2’, ‘0.1’, ‘0.1’, ‘-0.0’, ‘0.0’, ‘-0.0’, ‘0.1’]
F

FAIL: test_heat_map_values (test_module.HeatMapTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-medical-data-visualizer/test_module.py”, line 48, in test_heat_map_values
self.assertEqual(actual, expected, “Expected different values in heat map.”)
AssertionError: Lists differ: ['0.0[59 chars], ‘0.2’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0[548 chars]0.1’] != ['0.0[59 chars], ‘0.3’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0[548 chars]0.1’]

First differing element 9:
‘0.2’
‘0.3’

[‘0.0’,
‘0.0’,
‘-0.0’,
‘0.0’,
‘-0.1’,
‘0.5’,
‘0.0’,
‘0.1’,
‘0.1’,

  • ‘0.2’,
    ? ^
  • ‘0.3’,
    ? ^

    ‘0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘0.2’,
    ‘0.1’,
    ‘0.0’,
    ‘0.2’,
    ‘0.1’,
    ‘0.0’,
    ‘0.1’,
    ‘-0.0’,
    ‘-0.1’,
    ‘0.1’,
    ‘0.0’,
    ‘0.2’,
    ‘0.0’,
    ‘0.1’,
    ‘-0.0’,
    ‘-0.0’,
    ‘0.1’,
    ‘0.0’,
    ‘0.1’,
    ‘0.4’,
    ‘-0.0’,
    ‘-0.0’,
    ‘0.3’,
    ‘0.2’,
    ‘0.1’,
    ‘-0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘-0.0’,
    ‘-0.0’,
    ‘-0.0’,
    ‘0.2’,
    ‘0.1’,
    ‘0.1’,
    ‘0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘0.3’,
    ‘0.0’,
    ‘-0.0’,
    ‘0.0’,
    ‘-0.0’,
    ‘-0.0’,
    ‘-0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘-0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘0.0’,
    ‘0.2’,
    ‘0.0’,
    ‘-0.0’,
    ‘0.2’,
    ‘0.1’,
    ‘0.3’,
    ‘0.2’,
    ‘0.1’,
    ‘-0.0’,
    ‘-0.0’,
    ‘-0.0’,
    ‘-0.0’,
    ‘0.1’,
    ‘-0.1’,
    ‘-0.1’,

  • ‘0.6’,
    ? ^
  • ‘0.7’,
    ? ^

    ‘0.0’,
    ‘0.2’,
    ‘0.1’,
    ‘0.1’,
    ‘-0.0’,
    ‘0.0’,
    ‘-0.0’,
    ‘0.1’] : Expected different values in heat map.


Ran 4 tests in 6.438s

FAILED (failures=1)

Your code so far

Draws Heat Map

def draw_heat_map():
# Cleans the data for systolic pressure and height and weight
df_heat = df[(df[‘ap_lo’] <= df[‘ap_hi’]) &
(df[‘height’] >= df[‘height’].quantile(0.025)) &
(df[‘height’] <= df[‘height’].quantile(0.975)) &
(df[‘weight’] >= df[‘weight’].quantile(0.025)) &
(df[‘height’] <= df[‘height’].quantile(0.975))]

# Calculates the correlation matrix
corr = df_heat.corr()

# Generates a mask for the upper triangle
mask = np.triu(corr)



# Sets up the matplotlib figure
fig, ax = plt.subplots(figsize=(12, 12))

# Draws the heatmap with 'sns.heatmap()'
graph = sns.heatmap(corr, mask = mask, annot=True, center=0, linewidths = 1, square=True, fmt = '.1f', cbar_kws = {'shrink' : 0.5})

fig = graph.figure

# Saves the figure
fig.savefig('heatmap.png')
return fig

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36

Challenge: Medical Data Visualizer

Link to the challenge:

Hi,

I have/ had the same problem.
So it seems it is a slight rounding error, looking at the data, if the format were shorter for my faulty values, it would not be an issue.

It is not a pretty solution, but I simply located the wrong values and replaced them with the expected ones using:
df.at[‘row’, ‘column’] = expected value
(actually I used .at because .replace() was misbehaving but that is another matter)
As I said, it isn’t pretty but I feel like it’s a problem that many have and seem to be a version problem or something, and I don’t feel like this is what freecodecamp is trying to teach us. So if you want to pass the exercise it is an option.

I’ll follow this thread to see if a more general solution is found.

Good luck

1 Like

Thanks for the help!

This helped me pass the tests, which is nice, but it still feels a little wrong to adjust it this way. I still hope there is a better solution available…

There’s definitely a better way since there are a couple of common errors that cause this behavior and I don’t think you posted the code containing the error since some errors in processing can mess up the heat map but happen in processing or in the categorical plot.

You can either post your entire python file in a code block (the </> button in the post editor) or post a link to your project on repl.it (preferable). Output is better in a code block too.

Hi Jeremy,

Thank you for getting engaged. I’d be happy to find out exactly what is causing the error.

Here is a link to my code:
Medical Data Visualizer - Fail

And an image of the failure.
If I set “self.maxDiff = None”, (which I’ve done in the code now) it simply shows the 3 faulty elements. You can also see them in my code as the are written as comments in the heat map function.

Your code is not correctly filtering by quantile while cleaning the data for the heat map here:

    df_heat = df
    print(df_heat.shape)
    df_heat = df_heat[df['ap_lo'] <= df_heat['ap_hi']]
    # typo here        ^
    print(df_heat.shape)
    df_heat = df_heat[(df_heat['height'] >= df_heat['height'].quantile(0.025)) & (df_heat['height'] <= df_heat['height'].quantile(0.975))]
    print(df_heat.shape)
    df_heat = df_heat[(df_heat['weight'] >= df_heat['weight'].quantile(0.025)) & (df_heat['weight'] <= df_heat['weight'].quantile(0.975))]
    print(df_heat.shape)
    # if you clean correctly and output the shape
    print(df_heat1.shape)

which produces output like:

(70000, 14)
(68766, 14)
(67024, 14)
(66897, 14)
# this is the correct way
(63259, 14)

so you can see that some ‘bad’ data are being left in the dataset, which affects the heat map correlation values.

The real issue is what should be done because it seems like just filtering out the bad values sequentially should work. The reason is in the quantile() method. Since a quantile is calculated using the number of samples, if you change the number of samples (for instance, by repeatedly filtering and reassigning to df_heat), then you change the quantiles, which changes the number of samples that are excluded in each step. As the number of samples decrease, the number in the top and bottom quantiles decrease, and hence fewer ‘bad’ data are eliminated than should be.

So there are (at least) two approaches that will work: one, filter the data in one step by ANDing all the conditions together, or two, filter out the ‘bad’ data of each type (bp, height, weight) from an original dataframe (not one repeatedly updated) and then either join and filter the bad ones from the original or join the unique good ones as a new dataframe.

That makes a lot of sense!

I got another error trying your solution but your provided help with “.shape” help me to bug check on my own. So thank you for providing a comprehensive and educating response.

I pass the test now :slight_smile:
Not sure about the post originator though, it does not seem he did the same mistake as me looking at the code he posted.

Anyways, thanks alot, much appreciated!