Hi.
Im facing many problems and weird contradictions in this project.
1- The README does not specify what columns should the catplot use, although it says it should be similar to the example plot, and test_module checks for this columns. But the code comments says I should plot an other set of columns.
2- The code comments ask me to do the following:
# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.
Which I find completely stupid, unnecessary and wrong. I just did this and worked perfectly:
df_cat = pd.melt(df[["cardio", "cholesterol", "gluc", "smoke", "alco", "active", "overweight"]], id_vars="cardio")
# Draw the catplot with 'sns.catplot()'
fig = sns.catplot(x="variable", col="cardio", hue="value", data=df_cat, kind="count")
fig.set_axis_labels("variable", "total")
Is there any other way?
3- Now, even plotting an exact copy of the example plot following all conditions, I receive an error stating the following:
ERROR: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/fcc-medical-data-visualizer/test_module.py", line 27, in test_bar_plot_number_of_bars
actual = len([rect for rect in self.ax.get_children()
AttributeError: 'numpy.ndarray' object has no attribute 'get_children'
======================================================================
ERROR: test_line_plot_labels (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/fcc-medical-data-visualizer/test_module.py", line 13, in test_line_plot_labels
actual = self.ax.get_xlabel()
AttributeError: 'numpy.ndarray' object has no attribute 'get_xlabel'
4- Code comments say that before plotting the heatmap I should clean the data, but no clue about what needs to be cleaned.
5- Now, trying to plot the heatmap using the following code based on the entire dataframe I have this problem:
# Clean the data
df_heat = df
# Calculate the correlation matrix
corr = np.corrcoef(df_heat)
MemoryError: Unable to allocate 29.8 GiB for an array with shape (63259, 63259) and data type float64
Based on this error, I believe I should use a smaller portion of the dataframe but I wont probably have the same expected values.