Medical Data Visualizer Heatmap Expected Different Values

Tell us what’s happening:
I seem to be failing the heatmap values test with this error:

Traceback (most recent call last):
  File "/home/runner/boilerplate-medical-data-visualizer-1/", line 47, in test_heat_map_values
    self.assertEqual(actual, expected, "Expected different values in heat map.")
AssertionError: Lists differ: ['0.0[59 chars], '0.2', '0.0', '0.2', '0.1', '0.0', '0.3', '0[547 chars]0.1'] != ['0.0[59 chars], '0.3', '0.0', '0.0', '0.0', '0.0', '0.0', '0[548 chars]0.1']

First differing element 9:

Diff is 1200 characters long. Set self.maxDiff to None to see it. : Expected different values in heat map.

I’ve looked at other posts with a similar problem. I went to poetry.lock and changed the matplotlib version to 3.8.0 (latest version) and the error message went from saying that Diff is some 2200 characters long to 1200 characters long so that definitely affected something. On the off-chance I messed up somewhere and it’s not actually a dependency issue, where do I add self.maxDiff to None?
Your code so far

def draw_heat_map():
# Clean the data
  # 0. Copy original df to new dataframe
  df_heat = df.copy()

  # Create boolean mask to select and filter out rows based on following conditions:
  # 1. Segments where diastolic pressure (ap_lo) is higher than systolic (ap_hi)
  # 2. Segments where height is less than the 2.5th percentile or greater than the 97.5th percentile
  # 3. Segments where weight is less than the 2.5th percentile or greater than the 97.5th percentile
  cont = df_heat.loc[
      (df['ap_lo'] >= df['ap_hi']) |
      (df['height'] <= df['height'].quantile(0.025)) | (df['height'] >= df['height'].quantile(0.975)) |
      (df['weight'] <= df['weight'].quantile(0.025)) | (df['weight'] >= df['weight'].quantile(0.975))

  # Remove the rows selected by cont
  df_heat = df_heat.drop(cont)

  # Calculate the correlation matrix (spearman method)
  corr = df_heat.corr(method='spearman')

  # Generate a mask for the upper triangle
  mask = np.triu(np.ones_like(corr))

  # Set up the matplotlib figure
  fig, ax = plt.subplots()

  # Draw the heatmap with 'sns.heatmap()'
  sns.heatmap(corr, mask=mask, annot=True, fmt=".1f", ax=ax)

  #Export the figure as a png
  return fig

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/117.0

Challenge: Data Analysis with Python Projects - Medical Data Visualizer

Link to the challenge:

The main problem is that your matrix is different from what it’s supposed to be. Don’t worry too much about “max diff” The error is showing you at least one example where your array is incorrect. Seeing all of the numbers that are incorrect isn’t that important since you are applying functions to the entire matrix in any case.

I’m a little suspicious of the section where you drop data, I would try to simplify it and do it in separate lines instead of combining it into a big line like this.

Is there a reason you chose to specify Spearman method for your correlation? Does it result in a different matrix than the default?

I was looking up what a correlation matrix is cause I’m generally not at all familiar with statistics, and spearman was the last method I saw before I went back to write the function so it just kinda stuck with me. I knew pandas has 3 methods for these matrices when I looked at the documentation, but I only now realized pearson was the default.

I went back to take a look, it seems that the matrix IS actually different now, as the traceback says that the Diff is even shorter at 1000 characters long. Though, I think it is just a dependency issue after all, cause comparing the provided figure to my own heatmap side by side shows that they’re basically the same, bar the actual colors that are used.

Check these lines carefully:

And the instruction:

height is less than the 2.5th percentile (Keep the correct data with (df['height'] >= df['height'].quantile(0.025)))

You are dropping height less than or equal to 2.5th percentile.
Instruction is to keep height less then or equal to 2.5th percentile.

The instruction is the exact opposite. In fact, the equation they provide to “keep the correct data” very explicitly suggests that anything below the 2.5th percentile is incorrect data. Admittedly, I may have used the wrong operator (<= instead of just <) so I’ll go back later to see if this gives me a different result.

Yes, sorry it was the = that I was pointing out :+1:

Yeah the test passes fine now, thanks :+1: