Three numerical errors in heatmap

Tell us what’s happening:
Describe your issue in detail here.
My solution for the medical data visualizer project has three numbers wrong in the heatmap and fails the test. The numbers are associated with weight/height, cholesterol/gluc and overweight/height and are each off by .1. I suspect I have made a mistake cleaning or normalizing the data, but I cannot find any mistakes. I have tried cleaning several different ways and continue to get the same heatmap errors. Any ideas or suggestions about why my solution fails the heatmap test would be appreciated.

I am also getting two errors that I do not understand:

ERROR: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)

Traceback (most recent call last):
File “/home/gary/Documents/a_Learning/DataScience/DataAnalysisProjects/3Medical_Data_Visualizer/test_module.py”, line 26, in test_bar_plot_number_of_bars
actual = len([rect for rect in self.ax.get_children() if isinstance(rect, mpl.patches.Rectangle)])
AttributeError: ‘numpy.ndarray’ object has no attribute ‘get_children’

======================================================================
ERROR: test_line_plot_labels (test_module.CatPlotTestCase)

Traceback (most recent call last):
File “/home/gary/Documents/a_Learning/DataScience/DataAnalysisProjects/3Medical_Data_Visualizer/test_module.py”, line 13, in test_line_plot_labels
actual = self.ax.get_xlabel()
AttributeError: ‘numpy.ndarray’ object has no attribute ‘get_xlabel’

The catplot that my solution creates is identical to the example catplot provided by fCC, so I don’t understand why the errors occur.

I have very little experience with python testing, so any helpful suggestions would be appreciated.

Your code so far
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

Import data

df = pd.read_csv(“medical_examination.csv”)

Add ‘overweight’ column

df[‘overweight’] = df[‘weight’]/(df[‘height’]/100)**2
df.loc[df[‘overweight’] > 25, ‘overweight’] = 1
df.loc[df[‘overweight’] > 1, ‘overweight’] = 0

Normalize data by making 0 always good and 1 always bad. If the value of ‘cholesterol’ or ‘gluc’ is 1, make the value 0. If the value is more than 1, make the value 1.

df.loc[df[‘cholesterol’] == 1, ‘cholesterol’] = 0
df.loc[df[‘cholesterol’] > 1, ‘cholesterol’] = 1

df.loc[df[‘gluc’] == 1, ‘gluc’] = 0
df.loc[df[‘gluc’] > 1, ‘gluc’] = 1

df_clean = df.loc[(df[‘height’] >= df[‘height’].quantile(0.025)) & (df[‘height’] <= df[‘height’].quantile(0.975))]

df_clean1 = df_clean.loc[(df_clean[‘weight’] >= df_clean[‘weight’].quantile(0.025)) & (df_clean[‘weight’] <= df_clean[‘weight’].quantile(0.975))]

Draw Categorical Plot

def draw_cat_plot():
# Create DataFrame for cat plot using pd.melt using just the values from ‘cholesterol’, ‘gluc’, ‘smoke’, ‘alco’, ‘active’, and ‘overweight’.
df_cat = pd.melt(df, id_vars=[‘cardio’], value_vars=[‘cholesterol’, ‘gluc’, ‘alco’, ‘active’, ‘smoke’, ‘overweight’])

# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the columns for the catplot to work correctly.
df_cat = df_cat.groupby(['cardio','variable','value']).size().reset_index(name='total')

# Draw the catplot with 'sns.catplot()'
fig = sns.catplot(x='variable',
             y='total',
             col='cardio',
             hue='value',
             legend=True,
             data=df_cat,
             kind='bar',
             ci=None
             )

# Do not modify the next two lines
fig.savefig('catplot.png')
return fig

Draw Heat Map

def draw_heat_map():
# Clean the data
df_heat = df_clean1.copy()

# Calculate the correlation matrix
corr = df_heat.corr()

# Generate a mask for the upper triangle
mask = np.triu(corr)

# Set up the matplotlib figure
sns.set_style('white')
#sns.color_palette('rocket', as_cmap=True)
sns.set(rc={'figure.figsize':(12, 12)})
sns.set(rc={'figure.facecolor':'white'})

# Draw the heatmap with 'sns.heatmap()'
fig, ax = plt.subplots(figsize=(10,10))
ax = sns.heatmap(corr,
              annot=True,
              mask=mask,
              fmt='.1f',
              linewidths=.5,
              )

# Do not modify the next two lines
fig.savefig('heatmap.png')
return fig

Your browser information:

User Agent is: Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0

Challenge: Medical Data Visualizer

Link to the challenge:

I have found the solution in the forum for the two errors listed in my post. I thought it might be a problem with the catplot object, but I was confused because the instructions said to use seaborn (which I did), so I thought I had the correct object.

The error messages said AttributeError: ‘numpy.ndarray’ object has no attribute ‘get_children’ and AttributeError: ‘numpy.ndarray’ object has no attribute ‘get_xlabel’. I did not understand why the messages referred to a “‘numpy.ndarray’ object”.

Could someone explain how I could have figured out that the object needed was a matplotlib object not a seaborn object?

Also, I still need help with the heatmap problem.

Thanks. Gary

That’s the actual object that got returned → but the test calls the “get_children” attribute of it, which this object doesn’t have. Hence the error saying “object X doesn’t have attribute y”.
So at that point it’s… “decently” clear the object is wrong. Alas given understanding of how these errors work, which in itself takes some time.

As I only solved this error with the forum myself because I didn’t know what to do with the error back then and I can only construct a way to solve it without… First you’d have to figure out what kind of object Seaborn returns, AND what kind of object actually has a “get_children” attribute - and then how to get from one to the other.

Thanks for your suggestions, Jagaya. I will add researching object types and attributes to my todo list and work on this after I figure out why my heatmap plot yields wrong numbers. I appreciate your feedback. Gary

That’s actually quite simple: you have to do all the cleaning in one go, no in two steps as you are doing now.

The completely non-facetious answer is that you have to read the documentation for seaborn (and possibly) matplotlib. The docs tell you what each function returns. Granted it’s a bit dense and it takes some digging to determine how the seaborn objects are composed of matplotlib objects, but it’s all there. It’s also not completely consistent across the seaborn methods…

The other technique you can use is to run your code in an interactive interpreter. Generate your seaborn object and investigate it with dir(object), dict(object), the type functions, help(), etc. This is often faster than checking the docs and can yield immediately useful results. In this case, since the tests can’t find get_children and get_xlabel, running dir(object) on it will yield some likely candidates on which you can run dir(object.candidate) to find the part of the seaborn object with the proper attributes.

The reason the tests are this way is to avoid coupling the tests and implementation. As long as you have the correct information in the appropriate matplotlib objects, the tests will pass. It doesn’t matter if you build them with seaborn, pandas, matplotlib, or any other utility.

Thanks, but I am afraid I do not understand. I tried ‘cleaning in one go’ as follows:

Blockquote
df[‘ap_lo’] <= df[‘ap_hi’]]
df[‘height’] >= df[‘height’].quantile(0.025)
df[‘height’] <= df[‘height’].quantile(0.975)
df[‘weight’] >= df[‘weight’].quantile(0.025)
df[‘weight’] <= df[‘weight’].quantile(0.975)
and
df = df.loc[df[‘ap_lo’] <= df[‘ap_hi’]]
df = df.loc[(df[‘weight’] >= df[‘weight’].quantile(0.025)) &
(df[‘weight’] <= df[‘weight’].quantile(0.975))]
df = df.loc[(df[‘height’] >= df[‘height’].quantile(0.025)) &
(df[‘height’] <= df[‘height’].quantile(0.975))]
Blockquote>

and neither yields the correct heatmap.

I thought you meant that I should not clean the data as I did to df_clean and then the df_clean1 dataframes, but to clean the original dataframe.

I apologize for my ignorance and would appreciate any clarification about ‘cleaning in one go’ you might provide. Gary

The second version does the cleaning in 3 seperate calculations - so that’s wrong.
The first one… What even is that? There is no assignment happening. Just 5 comparisons with no connection to anything?

I mean, the first one looks good. Just properly combine it and plug it into a selection and it should work.

Thanks so much, Jagaya. My code finally passed the tests with the cleaning operations grouped together.

I guess my clue was that all the cleaning operations were listed as bullet points under the cleaning instruction (meaning - do them as one operation?).

However, I still do not understand why the results are wrong if I do the data cleaning one step at a time vs. doing them all together. Could you explain why there is a difference?

I very much appreciate that you spend you time to help me.

Thanks again, Gary.

Thanks for the information and suggestions, Jeremy. This will help me immensely when I begin researching object types and attributes in more detail. I appreciated the time you took to provide this feedback.

Gary

So in blunt numbers: Imagine you got the numbers 1-10 and want to filter out all numbers above 50% twice.
If you combine the commands into one selection, you throw out the numbers above 5 twice → which is redundant but let’s ignore that, you are left with 1-5.
If you do it one after the other, you first filter out above 5 and then above 2 → so you are left with 1-2.

Because the “quantile” is just the “percentage” of a certain value, if you don’t do it in one go, the first filter will influence the examples left for the second one.

Thanks, Jagaya. Once you explained it, it becomes so obvious. How could I miss that?

Thanks for everything. I got my certification.

Gary

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.