Medical Data visualizer. test fails at heat_map values

I am almost done with this assignment but here come the problem when I am running the test it fails at values of expected and actual for heat_map.
I have no clue what is happing here. As the problem was discussed a times few earlier in the forum; all those recommendations didn’t work in my case. I think I am making some silly mistake somewhere, please help me out as am working on this assignment for almost a week now :slightly_smiling_face:

here is my repl
My repl

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Import data
df = pd.read_csv('medical_examination.csv')
df = df.dropna()
#TESR CODE#START#
# Where to save the figures

#TESR CODE#END#
# Add 'overweight' column
bmi = (df['weight'])/((df['height']/100)**2)
df['bmi'] = bmi;
df.loc[df['bmi']>25, 'overweight'] = '1'
df.loc[df['bmi']<25, 'overweight'] = '0'
df=df.drop('bmi', axis=1)
df['overweight'] = df['overweight'];
df = df.dropna() # dropping all rows which has any nan value.
df['overweight']= pd.to_numeric(df['overweight']) # Coverting the strings 1 and 0 into numeric in coulumn overweight
#df['overweight'] = None

# Normalize data by making 0 always good and 1 always bad. If the value of 'cholesterol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.


# Draw Categorical Plot
def draw_cat_plot():
    df.loc[df['cholesterol']> 1, 'cholesterol'] = '1'
    #Normalising the data in coulumn of cholesterol and assigninning 1 and 0 for bad and good values;
    df.loc[df['cholesterol']==1, 'cholesterol'] = '0'

    df['cholesterol']= pd.to_numeric(df['cholesterol']) 
    # Here I am converting the type of this column from string to integer.
    df.loc[df['gluc']> 1, 'gluc'] = '1'
    df.loc[df['gluc']== 1, 'gluc'] = '0'
    df['gluc']= pd.to_numeric(df['gluc']) # Here I am converting the type of this column from string to integer.
    # Create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
    df_cat = df.drop(['id','age','gender','height','weight','ap_hi','ap_lo'] , axis=1)
    # dropping all coulmns which we are not going to plot in the chart

    # Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the columns for the catplot to work correctly.
    df_cat = pd.melt(df_cat,id_vars=['cardio'],var_name='variable', value_name='value')
    # converting data into long format for ploting.

    # Draw the catplot with 'sns.catplot()
    fig = sns.catplot(x="variable", hue="value", data=df_cat, col="cardio", kind="count", sharex=False)


    # Do not modify the next two lines
    fig.savefig('catplot.png')
    return fig


# Draw Heat Map
def draw_heat_map():
    # Clean the data   

    df_heat = df.loc[(df['ap_lo'] <= df['ap_hi']) & (df['height'] >= df['height'].quantile(0.025)) & (df['height'] <= df['height'].quantile(0.975)) & (df['weight'] >= df['weight'].quantile(0.025)) & (df['weight'] <= df['weight'].quantile(0.975))]
    

    # Calculate the correlation matrix
    corr = df_heat.corr()# Compute the correlation matrix

    # Generate a mask for the upper triangle
    mask = np.triu(np.ones_like(corr, dtype=bool))
    


    # Set up the matplotlib figure
    fig, ax = plt.subplots(figsize=(11, 9))

    # Draw the heatmap with 'sns.heatmap()'
    grap = sns.heatmap(corr,ax=ax, mask=mask, vmax=.3, center=0,square=True, linewidths=0.3,annot= True,fmt = '.1f',cbar_kws={"shrink": .5}) # ploting heatmap

    fig = grap.figure
    # Do not modify the next two lines
    fig.savefig('heatmap.png')
    return fig

What about the bmi being exactly 25?

Oh and your data cleaning is all over the place… like, it’s literally distributed around in the code. Put that all to the front. The task was to “clean the data” not to “make this cleaning in this function, that cleaning somewhere else and then maybe add some random cleaning at the front”.

On top of that, don’t use strings as placeholders but instead find a method that is able to change several values. Because, did you even check if you can write a “1” into the cholesterol-column? The problem is Numpy-arrays are different from Python arrays. Numpy does apply one data-type to an entire column. Sooo what happens when you try to write a string “1” into an integer-column? Will Numpy change the column or will it change the string? Either way, it’s ruining your code.

1 Like

I had to cast the 0’s and 1’s when normalizing explictly to int type for some reason.
see below :

bmi = pd.Series(((df[‘weight’])/ ((df[‘height’]/100)**2)>25))
bmi = bmi.mask(bmi == True,1).astype(‘int32’)
bmi = bmi.where(bmi == True,0).astype(‘int32’)
df = df.assign(overweight = bmi )

Im a beginner , so if anyone has a pointer to make it cleaner let me know. However this worked for me.

Hi Jagaya!
Thanks for your response and debugging of my code. I fixed the problem you mentioned and I think my code is much cleaner now :grinning:
But the problem the Test_heat_map_values is still here and not going anywhere.
Here is the updated code
https://replit.com/@WaqasRashid/boilerplate-medical-data-visualizer#medical_data_visualizer.py

Can you please have a look at it, what am I messing up this time :slightly_smiling_face:

Right now I am just getting an indentation error. Keep in mind Python uses identation to determine what belongs together.

1 Like

I modified your heat map code as follows:

# Draw Heat Map
def draw_heat_map():
    # Clean the data   

    df_heat = df.copy()
    print(df_heat.shape)
    df_heat = df_heat[df_heat['ap_lo'] <= df_heat['ap_hi'] & (df_heat['height'] >= df_heat['height'].quantile(0.025)) &(df_heat['height'] <= df_heat['height'].quantile(0.975)) &(df_heat['weight'] >= df_heat['weight'].quantile(0.025)) &(df_heat['weight'] <= df_heat['weight'].quantile(0.975))]
    print(df_heat.shape)

This prints

(70000, 14)
(22, 14)

This first tuple makes sense since there are 70,000 records with 14 columns. Your selection code only left 22, which seems low (should be approximately 63,258). Since everything else in the heat map code is straightforward, this must be the problem.

Long lines of code are hard to read; I would break that one over several lines. One thing in that line is not like the others. Pandas is very particular about its operators and precedence.

1 Like

Hi,

I am looking for the videos that explain the correlation matrix and triangle, but I haven’t found them on the course.

Can I ask you where did you learn about that?

Part of the tasks is to do your own research. So while the task talks about correlation matrix and masks, it might imply that you should figure it out yourself - either by googling or looking into the documentation of Seaborn. It’s some time ago that I watched the video, so I cannot say if it was actually covered there…

2 Likes

Hi there,
Thanks to you all for your response and guidance.
Now I am getting three empty elements like ’ ', ’ ‘, ’ ‘, this in heat maps correlation values.


and when I make change the ‘’ def test_heat_map_values(self):’’ in test_module code and add similar three empty elements to the end of the matrix then it passes all the test :grinning:
Now I am struggling to find out what is actually causing this don’t know this issue and I am making some coding mistakes again? OR it’s some version problem.
Can I submit this assignment with the edited test_module.
Here is my updated code link https://replit.com/@WaqasRashid/boilerplate-medical-data-visualizer#test_module.py

Many thanks for reading and your time :+1: :+1:

Search the forums for this; there’s a difference in matplotlib in some versions that causes this problem and those threads have solutions.

1 Like

In my old version of the project, the README mentions it:

  • Create a correlation matrix using the dataset. Plot the correlation matrix using seaborn’s heatmap(). Mask the upper triangle. The chart should look like “examples/Figure_2.png”.

I don’t know that you’ll find too many videos on those concepts, other than examples to make a heatmap. I’ve seen them plenty in university level statistics and matrix theory/linear algebra classes.

1 Like