Hey,
I’ve been working on the medical data analyzer project but am having some problems with finishing it off.
First is the catplot stage. when I just run the sns.catplot(…) command in a cell, it outputs fine, but the function itself seems to have problems.
My code:
# Import data
df = pd.read_csv(“medical_examination.csv”)
# Add ‘overweight’ column
df[‘overweight’] = 0
**df.loc[(df[“weight”] / ((df[“height”]/100)2)) > 25, “overweight”] = 1
# Normalize data by making 0 always good and 1 always bad. If the value of ‘cholesterol’ or ‘gluc’ is 1, make the value 0. If the value is more than 1, make the value 1.
df.loc[df[“cholesterol”] == 1, “cholesterol”] = 0
df.loc[df[“cholesterol”] == 2, “cholesterol”] = 1
df.loc[df[“cholesterol”] == 3, “cholesterol”] = 1
df.loc[df[“gluc”] == 1, “gluc”] = 0
df.loc[df[“gluc”] == 2, “gluc”] = 1
df.loc[df[“gluc”] == 3, “gluc”] = 1
# Draw Categorical Plot
def draw_cat_plot():
** # Create DataFrame for cat plot using pd.melt
using just the values from ‘cholesterol’, ‘gluc’, ‘smoke’, ‘alco’, ‘active’, and ‘overweight’.**
** # Group and reformat the data to split it by ‘cardio’. Show the counts of each feature. You will have to rename one of the columns for the catplot to work correctly.**
** df_cat = df.drop([“id”, “age”, “gender”, “ap_lo”, “weight”, “height”, “ap_hi”], axis =1)**
** df_cat = df_cat.melt(id_vars = “cardio”)**
** # Draw the catplot with ‘sns.catplot()’**
** fig, ax = plt.subplots(figsize = (11, 9))**
** sns.catplot(data = df_cat, kind = “count”, **
** x = “variable”, hue = “value”, **
** col = “cardio”)**
** # Do not modify the next two lines**
** fig.savefig(‘catplot.png’)**
** return fig**
It throws errors saying:
======================================================================
FAIL: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
- File “/home/runner/boilerplate-medical-data-visualizer-1/test_module.py”, line 28, in test_bar_plot_number_of_bars*
- self.assertEqual(actual, expected, “Expected a different number of bars chart.”)*
AssertionError: 1 != 13 : Expected a different number of bars chart.
======================================================================
FAIL: test_line_plot_labels (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
- File “/home/runner/boilerplate-medical-data-visualizer-1/test_module.py”, line 15, in test_line_plot_labels*
- self.assertEqual(actual, expected, “Expected line plot xlabel to be ‘variable’”)*
But when I actually just run the sns.catplot(…) command, it looks fine. Including the “variable” labels.
The other issues I’m having are with my heatplot. As far as I can tell my data cleaning was fine, but the correlation values for a few are wrong before/after rounding:
-
cholesterol-aplo is 0.15 unrounded, but rounds to 0.1 and should be 0.2
-
overweight-height is -0.16 unrounded, rounds to -0.2, should be -0.1
-
weight-height is 0.25 unrounded, rounds to 0.2, should be 0.3
Two of these seem to just round the wrong way. My code is:
# Normalize data by making 0 always good and 1 always bad. If the value of ‘cholesterol’ or ‘gluc’ is 1, make the value 0. If the value is more than 1, make the value 1.
df.loc[df[“cholesterol”] == 1, “cholesterol”] = 0
df.loc[df[“cholesterol”] == 2, “cholesterol”] = 1
df.loc[df[“cholesterol”] == 3, “cholesterol”] = 1
df.loc[df[“gluc”] == 1, “gluc”] = 0
df.loc[df[“gluc”] == 2, “gluc”] = 1
df.loc[df[“gluc”] == 3, “gluc”] = 1
My draw_heat_map code is the following:
# Draw Heat Map
def draw_heat_map():
** # Clean the data**
** df_heat = df**
** df_heat = df_heat[df_heat[“ap_lo”] <= df_heat[“ap_hi”]]**
** df_heat = df_heat[df_heat[“height”] >= df_heat[“height”].quantile(0.025)]**
** df_heat = df_heat[df_heat[“height”] <= df_heat[“height”].quantile(0.975)]**
** df_heat = df_heat[df_heat[“weight”] >= df_heat[“weight”].quantile(0.025)]**
** df_heat = df_heat[df_heat[“weight”] <= df_heat[“weight”].quantile(0.975)]**
** # Calculate the correlation matrix**
** corr = df_heat.corr()**
** # Generate a mask for the upper triangle**
** mask = np.triu(np.ones_like(corr, dtype=bool))**
** # Set up the matplotlib figure**
** fig, ax = plt.subplots(figsize = (11, 9))**
** # Draw the heatmap with ‘sns.heatmap()’**
** sns.heatmap(corr, mask = mask, vmax = 0.3, center = 0, annot = True,**
** square = True, linewidths=0.5, cbar_kws={“shrink”: 0.5})**
** # Do not modify the next two lines**
** fig.savefig(‘heatmap.png’)**
** return fig**
Can anyone tell where I’m going wrong? It seems like I’m 99% there but am having some errors that ruin the whole thing.