Medical Data Visualizer: 3 errors

Harry · May 6, 2021, 8:52pm

Hey,

I’ve been working on the medical data analyzer project but am having some problems with finishing it off.

First is the catplot stage. when I just run the sns.catplot(…) command in a cell, it outputs fine, but the function itself seems to have problems.

My code:

# Import data
df = pd.read_csv(“medical_examination.csv”)

# Add ‘overweight’ column
df[‘overweight’] = 0
**df.loc[(df[“weight”] / ((df[“height”]/100)2)) > 25, “overweight”] = 1

# Normalize data by making 0 always good and 1 always bad. If the value of ‘cholesterol’ or ‘gluc’ is 1, make the value 0. If the value is more than 1, make the value 1.
df.loc[df[“cholesterol”] == 1, “cholesterol”] = 0
df.loc[df[“cholesterol”] == 2, “cholesterol”] = 1
df.loc[df[“cholesterol”] == 3, “cholesterol”] = 1

df.loc[df[“gluc”] == 1, “gluc”] = 0
df.loc[df[“gluc”] == 2, “gluc”] = 1
df.loc[df[“gluc”] == 3, “gluc”] = 1

# Draw Categorical Plot
def draw_cat_plot():
** # Create DataFrame for cat plot using pd.melt using just the values from ‘cholesterol’, ‘gluc’, ‘smoke’, ‘alco’, ‘active’, and ‘overweight’.**
** # Group and reformat the data to split it by ‘cardio’. Show the counts of each feature. You will have to rename one of the columns for the catplot to work correctly.**

** df_cat = df.drop([“id”, “age”, “gender”, “ap_lo”, “weight”, “height”, “ap_hi”], axis =1)**
** df_cat = df_cat.melt(id_vars = “cardio”)**

** # Draw the catplot with ‘sns.catplot()’**
** fig, ax = plt.subplots(figsize = (11, 9))**

** sns.catplot(data = df_cat, kind = “count”, **
** x = “variable”, hue = “value”, **
** col = “cardio”)**

** # Do not modify the next two lines**
** fig.savefig(‘catplot.png’)**
** return fig**

It throws errors saying:

======================================================================
FAIL: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):

File “/home/runner/boilerplate-medical-data-visualizer-1/test_module.py”, line 28, in test_bar_plot_number_of_bars*
self.assertEqual(actual, expected, “Expected a different number of bars chart.”)*
AssertionError: 1 != 13 : Expected a different number of bars chart.

======================================================================
FAIL: test_line_plot_labels (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):

File “/home/runner/boilerplate-medical-data-visualizer-1/test_module.py”, line 15, in test_line_plot_labels*
self.assertEqual(actual, expected, “Expected line plot xlabel to be ‘variable’”)*

But when I actually just run the sns.catplot(…) command, it looks fine. Including the “variable” labels.

The other issues I’m having are with my heatplot. As far as I can tell my data cleaning was fine, but the correlation values for a few are wrong before/after rounding:

cholesterol-aplo is 0.15 unrounded, but rounds to 0.1 and should be 0.2
overweight-height is -0.16 unrounded, rounds to -0.2, should be -0.1
weight-height is 0.25 unrounded, rounds to 0.2, should be 0.3

Two of these seem to just round the wrong way. My code is:

# Normalize data by making 0 always good and 1 always bad. If the value of ‘cholesterol’ or ‘gluc’ is 1, make the value 0. If the value is more than 1, make the value 1.

df.loc[df[“cholesterol”] == 1, “cholesterol”] = 0

df.loc[df[“cholesterol”] == 2, “cholesterol”] = 1

df.loc[df[“cholesterol”] == 3, “cholesterol”] = 1

df.loc[df[“gluc”] == 1, “gluc”] = 0

df.loc[df[“gluc”] == 2, “gluc”] = 1

df.loc[df[“gluc”] == 3, “gluc”] = 1

My draw_heat_map code is the following:

# Draw Heat Map
def draw_heat_map():
** # Clean the data**
** df_heat = df**

** df_heat = df_heat[df_heat[“ap_lo”] <= df_heat[“ap_hi”]]**
** df_heat = df_heat[df_heat[“height”] >= df_heat[“height”].quantile(0.025)]**
** df_heat = df_heat[df_heat[“height”] <= df_heat[“height”].quantile(0.975)]**
** df_heat = df_heat[df_heat[“weight”] >= df_heat[“weight”].quantile(0.025)]**
** df_heat = df_heat[df_heat[“weight”] <= df_heat[“weight”].quantile(0.975)]**

** # Calculate the correlation matrix**
** corr = df_heat.corr()**

** # Generate a mask for the upper triangle**
** mask = np.triu(np.ones_like(corr, dtype=bool))**

** # Set up the matplotlib figure**
** fig, ax = plt.subplots(figsize = (11, 9))**

** # Draw the heatmap with ‘sns.heatmap()’**
** sns.heatmap(corr, mask = mask, vmax = 0.3, center = 0, annot = True,**
** square = True, linewidths=0.5, cbar_kws={“shrink”: 0.5})**

** # Do not modify the next two lines**
** fig.savefig(‘heatmap.png’)**
** return fig**

Can anyone tell where I’m going wrong? It seems like I’m 99% there but am having some errors that ruin the whole thing.

LockStockAnd2SB · May 26, 2021, 5:34pm

I’ve been having exactly the same issues and haven’t been able to find a solution.

jeremy.a.gray · May 26, 2021, 10:34pm

Welcome to the forums @LockStockAnd2SB.

I don’t remember seeing this original post, but since there weren’t any code blocks or a link to the project on repl.it (or similar) I probably passed on it.

If you’ll post your code and specific errors (in code blocks) or a link to your project (preferably on repl.it) you’re far more likely to get some assistance.

LockStockAnd2SB · May 27, 2021, 5:41pm

Ok. Thanks for your advice. So this is my code:

So basically I’m having issues with the rounding of the correlation values. And because of some reason, seaborn transforms automatically the “0.0” strings from the correlation matrix to “0”, which results in another error.

jeremy.a.gray · May 28, 2021, 12:58am

The first problem is right here:

def draw_heat_map():
    # Clean the data
    df_heat = df
    pressure = df_heat['ap_lo'] <= df_heat['ap_hi']
    df_heat = df_heat[pressure]

    height_1 = df_heat['height'] >= df_heat['height'].quantile(0.025)
    df_heat = df_heat[height_1]

    height_2 = df_heat['height'] <= df_heat['height'].quantile(0.975)
    df_heat = df_heat[height_2]

    weight_1 = df_heat['weight'] >= df_heat['weight'].quantile(0.025)
    df_heat = df_heat[weight_1]

    weight_2 = df_heat['weight'] <= df_heat['weight'].quantile(0.975)
    df_heat = df_heat[weight_2]

You can’t do these steps sequentially; they need to be logically ANDed together so that any record that doesn’t meet all the criteria are cleaned. You can print the shape of the dataframe from the sequential cleaning and the AND cleaning and the former is smaller. There’s other discussion about this around the forums. This will fix your test failure. You’ll also need to pass sns.heatmap() a fmt parameter to force using one decimal place.

The test errors are both here:

    fig = sns.catplot(x='variable', y=None, hue='value',  kind='count', col='cardio', data=df_cat)
    fig.set_axis_labels('variable', 'total')

sns.catplot() returns a FacetGrid object, not a matplotlib fig object. That’s actually at sns.catplot().fig. Once you fix that, you’ll get errors about setting the axis labels, which you don’t need to do since seaborn does it for you. Then, you’ll get errors about your plot not having a y-axis label of total. Seaborn pulls that from the index column on your dataframe, so you’ll need to remassage that so that it’s named total and not count.

LockStockAnd2SB · May 28, 2021, 2:08am

Thank you very much, it worked

system · November 26, 2021, 2:09pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data Analysis with Python Projects - Medical Data Visualizer Python	9	430	August 19, 2023
Data Analysis with Python Projects - Medical Data Visualizer Python	2	573	April 21, 2023
Medical Data Visualizer heatmap error Python	3	1465	June 1, 2021
Medical Data Visualizer Project - Errors Python	6	1236	May 31, 2022
Problems with "Data Analysis with Python Projects - Medical Data Visualizer" Code Feedback	4	2583	June 1, 2021

Medical Data Visualizer: 3 errors

Related topics