Medical data visualizer problem with heatpmap

Hello,

I have a problem with the project, particularly with the correlations in the heatmap regarding the overweight column that after the division to calculate the ibm both with sort() and with **2 mi gives a column full only of NaN. I think if I fixed that the correlations would be better. Why do I get these empty values? I also have a problem with the catplot that doesn’t show all the bars (only 8 out of 13).

This is my code on Replit.

df_copy=df.copy()
radice=df_copy['height']**2
df_copy['weight']=df_copy['weight'].astype(int)
df_copy['overweight']=df_copy['weight']/radice

print(df_copy['overweight'])

df_copy.loc[df_copy['overweight'] < 25, 'overweight'] = 0
df_copy.loc[df_copy['overweight'] > 25, 'overweight'] = 1

Print the overweight column here. It looks like this:

0        0.002197
1        0.003493
2        0.002351
3        0.002871
4        0.002301
           ...   
69995    0.002693
69996    0.005047
69997    0.003135
69998    0.002710
69999    0.002491

They will all be well under 25, and so will all be converted to 0. It looks like you got the formula wrong.

Add an overweight column to the data. To determine if a person is overweight, first calculate their BMI by dividing their weight in kilograms by the square of their height in meters.

Have a close look at the instructions and a close look at the data you are given

i have resolved in part doing it

df_copy=df.copy()
df_copy['height']=df_copy['height'].apply(lambda c:c/100)
print(df_copy['height'])
radice=df_copy['height']**2
df_copy['overweight']=df_copy['weight']/radice 

But I still have the same problems with heatmap and catplot. The catplot show now 10 charts instead 13

Can you provide a bit more information please?

What’s the problem with the heatmap, let’s focus on that.

I get this error running your code:

---> 31 df[['gluc','cholesterol']]=df[['gluc','cholesterol']].map(lambda x:1 if x>1 else 0)

AttributeError: 'DataFrame' object has no attribute 'map'

map is a method that you can call on a pandas.Series object. This method doesn't exist on pandas.DataFrame objects. 

Are you getting any other errors when you run it?

1 Like

This two errors. The notice that almost all The values within the heatmap correspond to those of the example figure on your reposity on GitHub but there are discrepancies only in the height/overweight column and some 0.0 which in my graph are -0.0. I don’t understand where the error is still because if I solve the heatmap I will also solve the catplot graph accordingly. What do I still do wrong in the height/overweight column? I’ve been up for hours and I don’t understand. Anyway, I don’t get that error on map, it works for me. Thank you for your help.

======================================================================
FAIL: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/freecodecamp2/test_module.py", line 28, in test_bar_plot_number_of_bars
    self.assertEqual(actual, expected, "Expected a different number of bars chart.")
AssertionError: 10 != 13 : Expected a different number of bars chart.

======================================================================
FAIL: test_heat_map_values (test_module.HeatMapTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/freecodecamp2/test_module.py", line 47, in test_heat_map_values
    self.assertEqual(actual, expected, "Expected different values in heat map.")
AssertionError: Lists differ: ['0.0[14 chars]0', '-0.0', '-0.1', '0.5', '-0.0', '0.1', '0.2[593 chars]0.1'] != ['0.0[14 chars]0', '0.0', '-0.1', '0.5', '0.0', '0.1', '0.1',[593 chars]0.1']

First differing element 3:
'-0.0'
'0.0'

Diff is 1174 characters long. Set self.maxDiff to None to see it. : Expected different values in heat map.

----------------------------------------------------------------------
Ran 4 tests in 9.907s

FAILED (failures=2)

heatmap

My graphic

The problem with your catplot and heatmap are not related.

Let’s focus on the catplot since that just came first while I was reviewing.

If you print df_cat_2 it looks like this:

    value     variable  total  cardio
5       0       active  13739       0
6       1       active  56261       0
0       0         alco  66236       0
11      1         alco   3764       0
3       0  cholesterol  52385       1
8       1  cholesterol  17615       0
2       0         gluc  59479       1
9       1         gluc  10521       0
4       0   overweight  26454       0
7       1   overweight  43546       1
1       0        smoke  63831       1
10      1        smoke   6169       0

but it should look like this:

    cardio     variable  value  total
0        0       active      0   6378
1        0       active      1  28643
2        0         alco      0  33080
3        0         alco      1   1941
4        0  cholesterol      0  29330
5        0  cholesterol      1   5691
6        0         gluc      0  30894
7        0         gluc      1   4127
8        0   overweight      0  15915
9        0   overweight      1  19106
10       0        smoke      0  31781
11       0        smoke      1   3240
12       1       active      0   7361
13       1       active      1  27618
14       1         alco      0  33156
15       1         alco      1   1823
16       1  cholesterol      0  23055
17       1  cholesterol      1  11924
18       1         gluc      0  28585
19       1         gluc      1   6394
20       1   overweight      0  10539
21       1   overweight      1  24440
22       1        smoke      0  32050
23       1        smoke      1   2929

Notice it should have index 0-23 but you only have 0-11. You are missing half the data. Each variable needs 4 entries:

0        0       active      0   6378
1        0       active      1  28643
12       1       active      0   7361
13       1       active      1  27618

You only have 2 entries:

5       0       active  13739       0
6       1       active  56261       0

You’re missing the entries for active when cardio is 1

One of the main problems is here:

    # Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the columns for the catplot to work correctly.
    df_cat_2=pd.DataFrame(df_cat.groupby(['value'])['variable'].value_counts())

I think you use groupby here because it says to “group and reformat to split by cardio” but you’ve already done that in the previous step, so you don’t need groupby there:

    df_cat=pd.melt(df,id_vars=['cardio'],value_vars=['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke'])
  

If you print df_cat at this point, it’s correctly formatted and split by cardio already.

1 Like

There is some fundamental parts to this in this video

I would review the whole Pandas section, but especially grouping

1 Like

ok thanks I will look at this. Thanks for your help and sorry

No problem, this is a complicated one! The problem is troubleshooting many problems at once, trying to solve the heatmap and the catplot together it’s going to get mixed up and be really complicated.

Treat it as 2 different problems. The cleanup code and overweight at the start seems fine.

If you just get rid of that .groupby() in that line, that’s a big one.

I would go through and clean up your code a bit, try to make it really simple and clear. You have a lot of commented out things that you’ve changed and this hanging around:

    df_heat=df

but you never use df_heat for example. Cleaning up your code, making sure each line clearly achieves something in the instructions might bring to light other things that need to be changed or removed.

In looking through the forum I saw other solutions were related to versions of numpy or matplotlib needed to be updated, so that might also be an issue, but there’s other things that need to be fixed first.

1 Like

Thank you for your help.

I reviewed that part of code with groupby and realized that it was not necessary to group the variables but it was enough to do value_counts() and reset_index().

Now I understand even if I find myself two more columns so 23 but 25 of two more overweight that I will eliminate.

 df_cat_2=pd.DataFrame(df_cat.value_counts().reset_index())
    df_cat_2.rename(columns={"count": "total"}, inplace=True)
    df_cat_2.drop(index=[24,25],inplace=True)
    df_cat_2.sort_values(by='variable', inplace=True)
    df_cat_2.sort_values(by='cardio', inplace=True)
    df_cat_2.sort_values(‘variable’,ascending=True,inplace=True)
    print(df_cat_2.reset_index(drop=True))

I solved the catplot and now I only have the error on the heatmap

1 Like

You’re crushing it!

Can you share your latest code, heatmap image and error please?

1 Like

My code:

['0.0', '0.0', '-0.0', '-0.0', '-0.1', '0.5', '-0.0', '0.1', '0.2', '0.3', '0.0', '0.0', '0.0', '0.0', '0.0', '-0.0', '0.2', '0.1', '0.0', '0.2', '0.1', '0.0', '0.1', '-0.0', '-0.0', '0.1', '0.0', '0.2', '0.0', '0.1', '-0.0', '-0.0', '0.1', '0.0', '0.1', '0.4', '-0.0', '-0.0', '0.3', '0.2', '0.1', '-0.0', '0.0', '0.0', '0.0', '0.0', '-0.0', '0.2', '0.1', '0.1', '0.0', '0.0', '0.0', '0.0', '0.3', '0.0', '-0.0', '0.0', '-0.0', '-0.0', '-0.0', '-0.0', '0.0', '-0.0', '0.0', '0.0', '0.0', '0.2', '0.0', '-0.0', '0.2', '0.1', '0.3', '0.2', '0.1', '-0.0', '-0.0', '-0.0', '0.0', '0.1', '-0.0', '-0.1', '0.5', '0.0', '0.2', '0.1', '0.1', '0.0', '0.0', '-0.0', '0.1']
F
======================================================================
FAIL: test_heat_map_values (test_module.HeatMapTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/freecodecamp2/test_module.py", line 48, in test_heat_map_values
    self.assertEqual(actual, expected, "Expected different values in heat map.")
AssertionError: Lists differ: ['0.0[14 chars]0', '-0.0', '-0.1', '0.5', '-0.0', '0.1', '0.2[593 chars]0.1'] != ['0.0[14 chars]0', '0.0', '-0.1', '0.5', '0.0', '0.1', '0.1',[593 chars]0.1']

First differing element 3:
'-0.0'
'0.0'

  ['0.0',
   '0.0',
   '-0.0',
-  '-0.0',
?   -

+  '0.0',
   '-0.1',
   '0.5',
-  '-0.0',
?   -

+  '0.0',
   '0.1',
-  '0.2',
?     ^

+  '0.1',
?     ^

   '0.3',
   '0.0',
   '0.0',
   '0.0',
   '0.0',
   '0.0',
-  '-0.0',
?   -

+  '0.0',
   '0.2',
   '0.1',
   '0.0',
   '0.2',
   '0.1',
   '0.0',
   '0.1',
   '-0.0',
-  '-0.0',
?      ^

+  '-0.1',
?      ^

   '0.1',
   '0.0',
   '0.2',
   '0.0',
   '0.1',
   '-0.0',
   '-0.0',
   '0.1',
   '0.0',
   '0.1',
   '0.4',
   '-0.0',
   '-0.0',
   '0.3',
   '0.2',
   '0.1',
   '-0.0',
   '0.0',
   '0.0',
-  '0.0',
+  '-0.0',
?   +

-  '0.0',
+  '-0.0',
?   +

   '-0.0',
   '0.2',
   '0.1',
   '0.1',
   '0.0',
   '0.0',
   '0.0',
   '0.0',
   '0.3',
   '0.0',
   '-0.0',
   '0.0',
   '-0.0',
   '-0.0',
   '-0.0',
-  '-0.0',
?   -

+  '0.0',
   '0.0',
   '-0.0',
   '0.0',
   '0.0',
   '0.0',
   '0.2',
   '0.0',
   '-0.0',
   '0.2',
   '0.1',
   '0.3',
   '0.2',
   '0.1',
   '-0.0',
   '-0.0',
   '-0.0',
-  '0.0',
+  '-0.0',
?   +

   '0.1',
-  '-0.0',
   '-0.1',
+  '-0.1',
-  '0.5',
?     ^

+  '0.7',
?     ^

   '0.0',
   '0.2',
   '0.1',
   '0.1',
-  '0.0',
+  '-0.0',
?   +

   '0.0',
   '-0.0',
   '0.1'] : Expected different values in heat map.

----------------------------------------------------------------------
Ran 4 tests in 16.495s

FAILED (failures=1)

I still have this mistake but I’m not understanding where I’m wrong. I noticed comparing my heatmap with that of the reposity on GitHub that some values are different but only those related to the ‘overweight’ and ‘height’ column and then zeros that should not be negative but I followed all the steps in my code.
Thanks for your help!

heatmap

Is this really your most recent code? When I run it I get many errors including with the catplot

df[['gluc','cholesterol']]=df[['gluc','cholesterol']].map(lambda x:1 if x>1 else 0)
AttributeError: 'DataFrame' object has no attribute 'map'
df_cat_2.drop(index=[24,25],inplace=True)
KeyError: '[24, 25] not found in axis'
======================================================================
FAIL: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/boilerplate-medical-data-visualizer/test_module.py", line 28, in test_bar_plot_number_of_bars
    self.assertEqual(actual, expected, "Expected a different number of bars chart.")
AssertionError: 12 != 13 : Expected a different number of bars chart.

Also, from the instructions:

Clean the data in the df_heat variable by filtering out the following patient segments that represent incorrect data:

but there is no df_heat. You should start the heatmap by copying the original dataframe to df_heat.

You also have a few lines where you make a copy but then never refer to the copy. It makes it really hard for someone else to come in and read your code.

aplo=df[['ap_lo','ap_hi']].copy() #assign a copy
aplo=df[['ap_lo','ap_hi']][df['ap_lo']<=df['ap_hi']] #immediately overwrite it

You do the same thing for height and weight:

height=df['height'].copy()
height=df['height'][(df['height']>=df['height'].quantile(0.025))|(height<height.quantile(q=0.975))]
weight=df['weight'].copy()
weight=df['weight'][(df['weight']>df['weight'].quantile(0.025))|(weight<weight.quantile(q=0.975))]

Any ideas why I’m getting a catplot error? (this is on gitpod, I’ll try on replit as well)

I would clean up all these unnecessary copies and start with the df_heat copy for starters.

I’ll try to look more deeply into it.

Looks like these errors were all related to older versions of Python and pandas installed on Gitpod. Updating them cleared those errors (and replit fork was fine as well)

Did you update all of these while working on this? This could cause the tests not to work as expected so I’m a bit wary of that, but looking into it further.

Just to document it, your replit pyproject.toml lists

[tool.poetry.dependencies]
python = ">=3.10.0,<3.12"
matplotlib = "^3.8.3"
numpy = "^1.26.4"
pandas = "^2.2.1"
seaborn = "^0.13.2"

Gitpod requirements.txt:

seaborn==0.13.2
pandas==1.5.3

And Python 3.8.19 was installed.

I would still clear up the issues with copies and df_heat that I’ve indicated

I imagine you updated pandas when you changed this:

#df.loc[df['gluc']>1,'gluc']=1
#df.loc[df['gluc']==1,'gluc']=0
#df.loc[df['cholesterol']>1,'cholesterol']=1
#df.loc[df['cholesterol']==1,'cholesterol']=0
df[['gluc','cholesterol']]=df[['gluc','cholesterol']].map(lambda x:1 if x>1 else 0)

Why did you decide to use .map() instead of .loc ?