Problems with "Data Analysis with Python Projects - Medical Data Visualizer"

Hi.

Im facing many problems and weird contradictions in this project.

1- The README does not specify what columns should the catplot use, although it says it should be similar to the example plot, and test_module checks for this columns. But the code comments says I should plot an other set of columns.

2- The code comments ask me to do the following:

# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.

Which I find completely stupid, unnecessary and wrong. I just did this and worked perfectly:

df_cat = pd.melt(df[["cardio", "cholesterol", "gluc", "smoke", "alco", "active", "overweight"]], id_vars="cardio")

# Draw the catplot with 'sns.catplot()'
fig = sns.catplot(x="variable", col="cardio", hue="value", data=df_cat, kind="count")

fig.set_axis_labels("variable", "total")

Is there any other way?

3- Now, even plotting an exact copy of the example plot following all conditions, I receive an error stating the following:

ERROR: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/fcc-medical-data-visualizer/test_module.py", line 27, in test_bar_plot_number_of_bars
    actual = len([rect for rect in self.ax.get_children()
AttributeError: 'numpy.ndarray' object has no attribute 'get_children'

======================================================================
ERROR: test_line_plot_labels (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/fcc-medical-data-visualizer/test_module.py", line 13, in test_line_plot_labels
    actual = self.ax.get_xlabel()
AttributeError: 'numpy.ndarray' object has no attribute 'get_xlabel'

4- Code comments say that before plotting the heatmap I should clean the data, but no clue about what needs to be cleaned.

5- Now, trying to plot the heatmap using the following code based on the entire dataframe I have this problem:

# Clean the data
    df_heat = df

    # Calculate the correlation matrix
    corr = np.corrcoef(df_heat)
MemoryError: Unable to allocate 29.8 GiB for an array with shape (63259, 63259) and data type float64

Based on this error, I believe I should use a smaller portion of the dataframe but I wont probably have the same expected values.

Did you have any success with resolving issue 3? I have the same problem, if I run the code without the test module then it generates a heatmap.png that looks the same as the example (apart from the colourmap), but I am getting this same error with the test module.

Hi,
Try this code:

fig = sns.catplot(...).fig
7 Likes

This worked for me, thanks much appreciated.