Medical Data Visualizer Confusion

ValueError: Could not interpret input ‘total’

‘total’ isn’t in your data. If you input df_cat.columns, there is no ‘total’. Remove ‘total’ from your catplot().

kind='count' and hue='value' too. You haven’t been matching up with what ArbyC mentioned.

Hi @nmiquan - I have made yet another attempt following ArbyC and your directions…and started altogether a new assignment … yet the error message is unchanged - below is the code and

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Import data
df = pd.read_csv('medical_examination.csv')

# Add 'overweight' column
df['overweight'] = (df['weight']/ (df['height']/ 100 **2))
df['overweight'] = df['overweight'].apply(lambda x: 1 if x > 25 else 0)

# Normalize data by making 0 always good and 1 always bad. If the value of 'cholestorol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.

df['gluc'] = df['gluc'].apply(lambda x : 0 if x == 1 else 1)

df.loc[df['cholesterol'] == 1, 'cholesterol'] = 0
df.loc[df['cholesterol'] > 1, 'cholesterol'] = 1

# Draw Categorical Plot
def draw_cat_plot():
    # Create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
    df_cat = pd.melt(df, value_vars=['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke'], id_vars ='cardio')


    # Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.
    #df_cat = None

    # Draw the catplot with 'sns.catplot()'

    fig = sns.catplot(data=df_cat, kind="count",  x="variable", hue="value", col="cardio")

    # Do not modify the next two lines
    fig.savefig('catplot.png')
    return fig


# Draw Heat Map
def draw_heat_map():
    # Clean the data
    df_heat = df[(df['ap_lo'] <= df['ap_hi']) & 
    df['height'] >= (df['height'].quantile(0.025)) &
    df['height'] <= (df['height'].quantile(0.975)) &
    df['weight'] >= (df['weight'].quantile(0.025)) &
    df['weight'] <= (df['weight'].quantile(0.975))
    ]

    # Calculate the correlation matrix
    corr = df_heat.corr()

    # Generate a mask for the upper triangle
    mask = np.triu(corr)



    # Set up the matplotlib figure
    fig, ax = plt.subplots(figsize=(9,9))

    # Draw the heatmap with 'sns.heatmap()'
    sns.heatmap(corr, linewidths=1, mask=mask, vmax=.3, center=0.09,square=True, cbar_kws = {'orientation' : 'horizontal'})

    # Do not modify the next two lines
    fig.savefig('heatmap.png')
    return fig

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

Import data

df = pd.read_csv(‘medical_examination.csv’)

Add ‘overweight’ column

df[‘overweight’] = (df[‘weight’]/ (df[‘height’]/ 100 **2))
df[‘overweight’] = df[‘overweight’].apply(lambda x: 1 if x > 25 else 0)

Normalize data by making 0 always good and 1 always bad. If the value of ‘cholestorol’ or ‘gluc’ is 1, make the value 0. If the value is more than 1, make the value 1.

df[‘gluc’] = df[‘gluc’].apply(lambda x : 0 if x == 1 else 1)

df.loc[df[‘cholesterol’] == 1, ‘cholesterol’] = 0
df.loc[df[‘cholesterol’] > 1, ‘cholesterol’] = 1

Draw Categorical Plot

def draw_cat_plot():
# Create DataFrame for cat plot using pd.melt using just the values from ‘cholesterol’, ‘gluc’, ‘smoke’, ‘alco’, ‘active’, and ‘overweight’.
df_cat = pd.melt(df, value_vars=[‘active’, ‘alco’, ‘cholesterol’, ‘gluc’, ‘overweight’, ‘smoke’], id_vars =‘cardio’)

# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.
#df_cat = None

# Draw the catplot with 'sns.catplot()'

fig = sns.catplot(data=df_cat, kind="count",  x="variable", hue="value", col="cardio")

# Do not modify the next two lines
fig.savefig('catplot.png')
return fig

Draw Heat Map

def draw_heat_map():
# Clean the data
df_heat = df[(df[‘ap_lo’] <= df[‘ap_hi’]) &
df[‘height’] >= (df[‘height’].quantile(0.025)) &
df[‘height’] <= (df[‘height’].quantile(0.975)) &
df[‘weight’] >= (df[‘weight’].quantile(0.025)) &
df[‘weight’] <= (df[‘weight’].quantile(0.975))
]

# Calculate the correlation matrix
corr = df_heat.corr()

# Generate a mask for the upper triangle
mask = np.triu(corr)



# Set up the matplotlib figure
fig, ax = plt.subplots(figsize=(9,9))

# Draw the heatmap with 'sns.heatmap()'
sns.heatmap(corr, linewidths=1, mask=mask, vmax=.3, center=0.09,square=True, cbar_kws = {'orientation' : 'horizontal'})

# Do not modify the next two lines
fig.savefig('heatmap.png')
return fig

Your draw_catplot() works fine.

Check parentheses here

    df_heat = df[(df['ap_lo'] <= df['ap_hi']) & 
    df['height'] >= (df['height'].quantile(0.025)) &
    df['height'] <= (df['height'].quantile(0.975)) &
    df['weight'] >= (df['weight'].quantile(0.025)) &
    df['weight'] <= (df['weight'].quantile(0.975))
    ]

in your draw_heat_map().

Thanks @nmiquan ; addressed parantheses and looks better except for failure in heatmap values? Diff of 962 characters?

def draw_heat_map():
    # Clean the data
    df_heat = df[
    (df['ap_lo'] <= df['ap_hi']) & 
    (df['height'] >= (df['height'].quantile(0.025))) &
    (df['height'] <= (df['height'].quantile(0.975))) &
    (df['weight'] >= (df['weight'].quantile(0.025))) &
    (df['weight'] <= (df['weight'].quantile(0.975)))
    ]

    # Calculate the correlation matrix
    corr = df_heat.corr()

    # Generate a mask for the upper triangle
    mask = np.triu(corr)

    # Set up the matplotlib figure
    fig, ax = plt.subplots(figsize=(9,9))

    # Draw the heatmap with 'sns.heatmap()'
    sns.heatmap(corr, linewidths=1, mask=mask, vmax=.3, center=0.09,square=True, cbar_kws = {'orientation' : 'horizontal'})

    # Do not modify the next two lines
    fig.savefig('heatmap.png')
    return fig

=
FAIL: test_heat_map_values (test_module.HeatMapTestCase)

Traceback (most recent call last):
File “/home/runner/ColossalRegalProject/test_module.py”, line 47, in test_heat_map_values
self.assertEqual(actual, expected, “Expected differnt values in heat map.”)
AssertionError: Lists differ: != [‘0.0’, ‘0.0’, ‘-0.0’, ‘0.0’, ‘-0.1’, '0.5[628 chars], ‘’]

Second list contains 94 additional elements.
First extra element 0:
‘0.0’

Diff is 962 characters long. Set self.maxDiff to None to see it. : Expected differnt values in heat map.


Ran 4 tests in 8.669s

FAILED (failures=1, errors=2)

Manually check if the values in your heatmap are correct. Otherwise, I am suspecting there is an error in the unittest.

If you look at the saved plot, are there values inside of squares? annot argument passed to sns.heatmap should handle that, along fmt argument to format annotations.

If that doesn’t help, can you paste link to repl.it with your code, or somewhere else? That’ll make helping easier.

Hi @sanity - below is my code pasted in along with the repl.it URL and the heatmap plot image.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Import data
df = pd.read_csv('medical_examination.csv')

# Add 'overweight' column
df['overweight'] = (df['weight']/ (df['height']/ 100 **2))
df['overweight'] = df['overweight'].apply(lambda x: 1 if x > 25 else 0)

# Normalize data by making 0 always good and 1 always bad. If the value of 'cholestorol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.

df['gluc'] = df['gluc'].apply(lambda x : 0 if x == 1 else 1)

df.loc[df['cholesterol'] == 1, 'cholesterol'] = 0
df.loc[df['cholesterol'] > 1, 'cholesterol'] = 1

# Draw Categorical Plot
def draw_cat_plot():
    # Create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
    df_cat = pd.melt(df, value_vars=['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke'], id_vars ='cardio')


    # Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.
    #df_cat = None

    # Draw the catplot with 'sns.catplot()'

    fig = sns.catplot(data=df_cat, kind="count",  x="variable", hue="value", col="cardio")

    # Do not modify the next two lines
    fig.savefig('catplot.png')
    return fig


# Draw Heat Map
def draw_heat_map():
    # Clean the data
    df_heat = df[
    (df['ap_lo'] <= df['ap_hi']) & 
    (df['height'] >= (df['height'].quantile(0.025))) &
    (df['height'] <= (df['height'].quantile(0.975))) &
    (df['weight'] >= (df['weight'].quantile(0.025))) &
    (df['weight'] <= (df['weight'].quantile(0.975)))
    ]

    # Calculate the correlation matrix
    corr = df_heat.corr()

    # Generate a mask for the upper triangle
    mask = np.triu(corr)

    # Set up the matplotlib figure
    fig, ax = plt.subplots(figsize=(9,9))

    # Draw the heatmap with 'sns.heatmap()'
    sns.heatmap(corr,annot=True, fmt='.1f', linewidths=1, mask=mask, vmax=.8, center=0.09,square=True, cbar_kws = {'shrink':0.5})

    # Do not modify the next two lines
    fig.savefig('heatmap.png')
    return fig

Alright, regarding errors with draw_cat_plot, sns.catplot doesn’t return figure, it returns FacetGrid object. fig attribute can be used to access figure from it.

Regarding draw_heat_map. As it can be seen on the plot there are missing some values. Take a closer look at the way there’s added df['overweight'] column and make sure all is correct there.
After getting this in order you will probably also encounter another issue, with some differences in the ax data returned by the function, due to different version of matplotlib. To bypass that issue without changing test you can force repl.it to use older version. To do that, in poetry.lock file for matplotlib change the version to 3.2.2 and then re-run to make repl.it update dependencies.

@sanity Thank you very much for your patience and assistance; based on your last post I have rectified df[‘overweight’] and using .fig method plotted sns.catplot. Lastly; changed matplotlib version to 3.2.2 in poetry.lock file and all seems to be okay except for one last thing - it fails unable to plot 'Ylabel" - as TOTAL ?? I have pasted all my code along with error message and the TEST FILE code. Thank you in advance.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Import data
df = pd.read_csv('medical_examination.csv')

# Add 'overweight' column
df['overweight'] = (df['weight']/ (df['height']/ 100) **2).apply(lambda x : 1 if x > 25 else 0)


# Normalize data by making 0 always good and 1 always bad. If the value of 'cholestorol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.

df['gluc'] = df['gluc'].apply(lambda x : 0 if x == 1 else 1)

df.loc[df['cholesterol'] == 1, 'cholesterol'] = 0
df.loc[df['cholesterol'] > 1, 'cholesterol'] = 1

# Draw Categorical Plot
def draw_cat_plot():
    # Create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
    df_cat = pd.melt(df, value_vars=['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke'], id_vars ='cardio')


    # Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.
    #df_cat = None

    # Draw the catplot with 'sns.catplot()'

    fig = sns.catplot(data=df_cat, kind='count',  x='variable', hue='value', col='cardio').fig

    # Do not modify the next two lines
    fig.savefig('catplot.png')
    return fig


# Draw Heat Map
def draw_heat_map():
    # Clean the data
    df_heat = df[
    (df['ap_lo'] <= df['ap_hi']) & 
    (df['height'] >= (df['height'].quantile(0.025))) &
    (df['height'] <= (df['height'].quantile(0.975))) &
    (df['weight'] >= (df['weight'].quantile(0.025))) &
    (df['weight'] <= (df['weight'].quantile(0.975)))
    ]

    # Calculate the correlation matrix
    corr = df_heat.corr()

    # Generate a mask for the upper triangle
    mask = np.triu(corr)

    # Set up the matplotlib figure
    fig, ax = plt.subplots(figsize=(9,9))

    # Draw the heatmap with 'sns.heatmap()'
    sns.heatmap(corr,annot=True, fmt='.1f', linewidths=1, mask=mask, vmax=.8, center=0.09,square=True, cbar_kws = {'shrink':0.5})

    # Do not modify the next two lines
    fig.savefig('heatmap.png')
    return fig


FAILURE message:

FAIL: test_line_plot_labels (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
.F.['0.0', '0.0', '-0.0', '0.0', '-0.1', '0.5', '0.0', '0.1', '0.1', '0.3', '0.0',
 '0.0', '0.0', '0.0', '0.0', '0.0', '0.2', '0.1', '0.0', '0.2', '0.1', '0.0', '0.1
', '-0.0', '-0.1', '0.1', '0.0', '0.2', '0.0', '0.1', '-0.0', '-0.0', '0.1', '0.0'
, '0.1', '0.4', '-0.0', '-0.0', '0.3', '0.2', '0.1', '-0.0', '0.0', '0.0', '-0.0',
 '-0.0', '-0.0', '0.2', '0.1', '0.1', '0.0', '0.0', '0.0', '0.0', '0.3', '0.0', '-
0.0', '0.0', '-0.0', '-0.0', '-0.0', '0.0', '0.0', '-0.0', '0.0', '0.0', '0.0', '0
.2', '0.0', '-0.0', '0.2', '0.1', '0.3', '0.2', '0.1', '-0.0', '-0.0', '-0.0', '-0
.0', '0.1', '-0.1', '-0.1', '0.7', '0.0', '0.2', '0.1', '0.1', '-0.0', '0.0', '-0.
0', '0.1', '', '', '']
.
======================================================================
FAIL: test_line_plot_labels (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/ColossalRegalProject/test_module.py", line 18, in test_line_p
lot_labels
    self.assertEqual(actual, expected, "Expected line plot ylabel to be 'total'")
AssertionError: 'count' != 'total'
- count
+ total
 : Expected line plot ylabel to be 'total'

----------------------------------------------------------------------
Ran 4 tests in 9.872s

FAILED (failures=1)
 

def test_line_plot_labels(self):
        actual = self.ax.get_xlabel()
        expected = "variable"
        self.assertEqual(actual, expected, "Expected line plot xlabel to be 'variable'")
        actual = self.ax.get_ylabel()
        expected = "total"
        self.assertEqual(actual, expected, "Expected line plot ylabel to be 'total'")
        actual = []
        for label in self.ax.get_xaxis().get_majorticklabels():
            actual.append(label.get_text())
        expected = ['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke']

That just means label on the plot for the y axis needs to be changed from count to total. There’s few ways to do that, for example FacetGrid object has methods to do that.

Apologies for repeatedly asking for help; @sanity
Okay; so the catplot (bar graph) has y-axis read as “count” that label needs to be changed to Total. So; I look up seaborn documentation and it says “methods to tweak presentation”
But; when I use the following code to set_axis labels it doesnt work?

fig.set_axis_labels("", "total")

Traceback (most recent call last):
File “main.py”, line 6, in
medical_data_visualizer.draw_cat_plot()
File “/home/runner/ColossalRegalProject/medical_data_visua
lizer.py”, line 32, in draw_cat_plot
fig.set_axis_labels("", “total”)
AttributeError: ‘Figure’ object has no attribute ‘set_axis_l
abels’

Keep in mind that once you write fig = sns.catplot(...).fig that’s no longer FacetGrid, but just a figure. So the label setting needs to happen before assigning fig attribute from the FacetGrid.
For example:

g = sns.catplot(...)
g.(...)  # setting y label in here
fig = g.fig
5 Likes

Thanks heaps for all you help. I feel as if I asked for more help than ever on this assignment; it wasn’t an easy one. I shall try bit harder the next assignment and I am sure it won’t be easy too. Thank you once again.

Hi, is anything wrong with this display ? been stuck and clueless
Capture

1 Like

I remember being advised without using groupby; should do.
So; the first step was to use melt() function (df; value_vars, – >against id_vars) and later the result would be sns.catplot( x, kind, hue, col=‘cardio’) and then draw the fig. Sorry; like I said this is what I recall. Let me know if that helps.

try this
df_cat = df_cat.groupby([‘cardio’,‘variable’, ‘value’], as_index = False).size().rename(columns={‘size’:‘total’})

1 Like

Yes, it works perfectly fine. I wonder what’s the purpose of creating a data-frame which is grouped by cardio…

Doesn’t work for me, I get wired plot in my legend there are two 1 and two 0

I got this solution for the catplot, and it worked!

df_cat = pd.melt(df, id_vars = "cardio", value_vars = ["active", "alco", "cholesterol", "gluc", "overweight", "smoke"])

fig = sns.catplot(x = "variable", hue = "value", col = "cardio", data = df_cat, kind = "count").set_axis_labels("variable", "total")