Medical data visualizer problem with heatpmap

vertebraofficial01 · March 27, 2024, 7:17pm

Hello,

I have a problem with the project, particularly with the correlations in the heatmap regarding the overweight column that after the division to calculate the ibm both with sort() and with **2 mi gives a column full only of NaN. I think if I fixed that the correlations would be better. Why do I get these empty values? I also have a problem with the catplot that doesn’t show all the bars (only 8 out of 13).

This is my code on Replit.

pkdvalis · March 27, 2024, 11:56pm

df_copy=df.copy()
radice=df_copy['height']**2
df_copy['weight']=df_copy['weight'].astype(int)
df_copy['overweight']=df_copy['weight']/radice

print(df_copy['overweight'])

df_copy.loc[df_copy['overweight'] < 25, 'overweight'] = 0
df_copy.loc[df_copy['overweight'] > 25, 'overweight'] = 1

Print the overweight column here. It looks like this:

0        0.002197
1        0.003493
2        0.002351
3        0.002871
4        0.002301
           ...   
69995    0.002693
69996    0.005047
69997    0.003135
69998    0.002710
69999    0.002491

They will all be well under 25, and so will all be converted to 0. It looks like you got the formula wrong.

Add an overweight column to the data. To determine if a person is overweight, first calculate their BMI by dividing their weight in kilograms by the square of their height in meters.

Have a close look at the instructions and a close look at the data you are given

vertebraofficial01 · April 8, 2024, 7:04pm

i have resolved in part doing it

df_copy=df.copy()
df_copy['height']=df_copy['height'].apply(lambda c:c/100)
print(df_copy['height'])
radice=df_copy['height']**2
df_copy['overweight']=df_copy['weight']/radice

vertebraofficial01 · April 8, 2024, 7:37pm

But I still have the same problems with heatmap and catplot. The catplot show now 10 charts instead 13

pkdvalis · April 8, 2024, 9:12pm

Can you provide a bit more information please?

What’s the problem with the heatmap, let’s focus on that.

I get this error running your code:

---> 31 df[['gluc','cholesterol']]=df[['gluc','cholesterol']].map(lambda x:1 if x>1 else 0)

AttributeError: 'DataFrame' object has no attribute 'map'

map is a method that you can call on a pandas.Series object. This method doesn't exist on pandas.DataFrame objects.

Are you getting any other errors when you run it?

vertebraofficial01 · April 9, 2024, 11:21am

This two errors. The notice that almost all The values within the heatmap correspond to those of the example figure on your reposity on GitHub but there are discrepancies only in the height/overweight column and some 0.0 which in my graph are -0.0. I don’t understand where the error is still because if I solve the heatmap I will also solve the catplot graph accordingly. What do I still do wrong in the height/overweight column? I’ve been up for hours and I don’t understand. Anyway, I don’t get that error on map, it works for me. Thank you for your help.

======================================================================
FAIL: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/freecodecamp2/test_module.py", line 28, in test_bar_plot_number_of_bars
    self.assertEqual(actual, expected, "Expected a different number of bars chart.")
AssertionError: 10 != 13 : Expected a different number of bars chart.

======================================================================
FAIL: test_heat_map_values (test_module.HeatMapTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/freecodecamp2/test_module.py", line 47, in test_heat_map_values
    self.assertEqual(actual, expected, "Expected different values in heat map.")
AssertionError: Lists differ: ['0.0[14 chars]0', '-0.0', '-0.1', '0.5', '-0.0', '0.1', '0.2[593 chars]0.1'] != ['0.0[14 chars]0', '0.0', '-0.1', '0.5', '0.0', '0.1', '0.1',[593 chars]0.1']

First differing element 3:
'-0.0'
'0.0'

Diff is 1174 characters long. Set self.maxDiff to None to see it. : Expected different values in heat map.

----------------------------------------------------------------------
Ran 4 tests in 9.907s

FAILED (failures=2)

vertebraofficial01 · April 9, 2024, 11:23am

heatmap

My graphic

pkdvalis · April 9, 2024, 1:51pm

The problem with your catplot and heatmap are not related.

Let’s focus on the catplot since that just came first while I was reviewing.

If you print df_cat_2 it looks like this:

    value     variable  total  cardio
5       0       active  13739       0
6       1       active  56261       0
0       0         alco  66236       0
11      1         alco   3764       0
3       0  cholesterol  52385       1
8       1  cholesterol  17615       0
2       0         gluc  59479       1
9       1         gluc  10521       0
4       0   overweight  26454       0
7       1   overweight  43546       1
1       0        smoke  63831       1
10      1        smoke   6169       0

but it should look like this:

    cardio     variable  value  total
0        0       active      0   6378
1        0       active      1  28643
2        0         alco      0  33080
3        0         alco      1   1941
4        0  cholesterol      0  29330
5        0  cholesterol      1   5691
6        0         gluc      0  30894
7        0         gluc      1   4127
8        0   overweight      0  15915
9        0   overweight      1  19106
10       0        smoke      0  31781
11       0        smoke      1   3240
12       1       active      0   7361
13       1       active      1  27618
14       1         alco      0  33156
15       1         alco      1   1823
16       1  cholesterol      0  23055
17       1  cholesterol      1  11924
18       1         gluc      0  28585
19       1         gluc      1   6394
20       1   overweight      0  10539
21       1   overweight      1  24440
22       1        smoke      0  32050
23       1        smoke      1   2929

Notice it should have index 0-23 but you only have 0-11. You are missing half the data. Each variable needs 4 entries:

0        0       active      0   6378
1        0       active      1  28643
12       1       active      0   7361
13       1       active      1  27618

You only have 2 entries:

5       0       active  13739       0
6       1       active  56261       0

You’re missing the entries for active when cardio is 1

One of the main problems is here:

    # Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the columns for the catplot to work correctly.
    df_cat_2=pd.DataFrame(df_cat.groupby(['value'])['variable'].value_counts())

I think you use groupby here because it says to “group and reformat to split by cardio” but you’ve already done that in the previous step, so you don’t need groupby there:

    df_cat=pd.melt(df,id_vars=['cardio'],value_vars=['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke'])

If you print df_cat at this point, it’s correctly formatted and split by cardio already.

pkdvalis · April 9, 2024, 1:57pm

There is some fundamental parts to this in this video

I would review the whole Pandas section, but especially grouping

vertebraofficial01 · April 9, 2024, 5:45pm

ok thanks I will look at this. Thanks for your help and sorry

pkdvalis · April 9, 2024, 6:03pm

No problem, this is a complicated one! The problem is troubleshooting many problems at once, trying to solve the heatmap and the catplot together it’s going to get mixed up and be really complicated.

Treat it as 2 different problems. The cleanup code and overweight at the start seems fine.

If you just get rid of that .groupby() in that line, that’s a big one.

I would go through and clean up your code a bit, try to make it really simple and clear. You have a lot of commented out things that you’ve changed and this hanging around:

    df_heat=df

but you never use df_heat for example. Cleaning up your code, making sure each line clearly achieves something in the instructions might bring to light other things that need to be changed or removed.

In looking through the forum I saw other solutions were related to versions of numpy or matplotlib needed to be updated, so that might also be an issue, but there’s other things that need to be fixed first.

vertebraofficial01 · April 13, 2024, 3:38pm

Thank you for your help.

I reviewed that part of code with groupby and realized that it was not necessary to group the variables but it was enough to do value_counts() and reset_index().

Now I understand even if I find myself two more columns so 23 but 25 of two more overweight that I will eliminate.

vertebraofficial01 · April 13, 2024, 4:12pm

 df_cat_2=pd.DataFrame(df_cat.value_counts().reset_index())
    df_cat_2.rename(columns={"count": "total"}, inplace=True)
    df_cat_2.drop(index=[24,25],inplace=True)
    df_cat_2.sort_values(by='variable', inplace=True)
    df_cat_2.sort_values(by='cardio', inplace=True)
    df_cat_2.sort_values(‘variable’,ascending=True,inplace=True)
    print(df_cat_2.reset_index(drop=True))

vertebraofficial01 · April 14, 2024, 5:35pm

I solved the catplot and now I only have the error on the heatmap

pkdvalis · April 14, 2024, 5:51pm

You’re crushing it!

Can you share your latest code, heatmap image and error please?

vertebraofficial01 · April 15, 2024, 9:55am

My code:

['0.0', '0.0', '-0.0', '-0.0', '-0.1', '0.5', '-0.0', '0.1', '0.2', '0.3', '0.0', '0.0', '0.0', '0.0', '0.0', '-0.0', '0.2', '0.1', '0.0', '0.2', '0.1', '0.0', '0.1', '-0.0', '-0.0', '0.1', '0.0', '0.2', '0.0', '0.1', '-0.0', '-0.0', '0.1', '0.0', '0.1', '0.4', '-0.0', '-0.0', '0.3', '0.2', '0.1', '-0.0', '0.0', '0.0', '0.0', '0.0', '-0.0', '0.2', '0.1', '0.1', '0.0', '0.0', '0.0', '0.0', '0.3', '0.0', '-0.0', '0.0', '-0.0', '-0.0', '-0.0', '-0.0', '0.0', '-0.0', '0.0', '0.0', '0.0', '0.2', '0.0', '-0.0', '0.2', '0.1', '0.3', '0.2', '0.1', '-0.0', '-0.0', '-0.0', '0.0', '0.1', '-0.0', '-0.1', '0.5', '0.0', '0.2', '0.1', '0.1', '0.0', '0.0', '-0.0', '0.1']
F
======================================================================
FAIL: test_heat_map_values (test_module.HeatMapTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/freecodecamp2/test_module.py", line 48, in test_heat_map_values
    self.assertEqual(actual, expected, "Expected different values in heat map.")
AssertionError: Lists differ: ['0.0[14 chars]0', '-0.0', '-0.1', '0.5', '-0.0', '0.1', '0.2[593 chars]0.1'] != ['0.0[14 chars]0', '0.0', '-0.1', '0.5', '0.0', '0.1', '0.1',[593 chars]0.1']

First differing element 3:
'-0.0'
'0.0'

  ['0.0',
   '0.0',
   '-0.0',
-  '-0.0',
?   -

+  '0.0',
   '-0.1',
   '0.5',
-  '-0.0',
?   -

+  '0.0',
   '0.1',
-  '0.2',
?     ^

+  '0.1',
?     ^

   '0.3',
   '0.0',
   '0.0',
   '0.0',
   '0.0',
   '0.0',
-  '-0.0',
?   -

+  '0.0',
   '0.2',
   '0.1',
   '0.0',
   '0.2',
   '0.1',
   '0.0',
   '0.1',
   '-0.0',
-  '-0.0',
?      ^

+  '-0.1',
?      ^

   '0.1',
   '0.0',
   '0.2',
   '0.0',
   '0.1',
   '-0.0',
   '-0.0',
   '0.1',
   '0.0',
   '0.1',
   '0.4',
   '-0.0',
   '-0.0',
   '0.3',
   '0.2',
   '0.1',
   '-0.0',
   '0.0',
   '0.0',
-  '0.0',
+  '-0.0',
?   +

-  '0.0',
+  '-0.0',
?   +

   '-0.0',
   '0.2',
   '0.1',
   '0.1',
   '0.0',
   '0.0',
   '0.0',
   '0.0',
   '0.3',
   '0.0',
   '-0.0',
   '0.0',
   '-0.0',
   '-0.0',
   '-0.0',
-  '-0.0',
?   -

+  '0.0',
   '0.0',
   '-0.0',
   '0.0',
   '0.0',
   '0.0',
   '0.2',
   '0.0',
   '-0.0',
   '0.2',
   '0.1',
   '0.3',
   '0.2',
   '0.1',
   '-0.0',
   '-0.0',
   '-0.0',
-  '0.0',
+  '-0.0',
?   +

   '0.1',
-  '-0.0',
   '-0.1',
+  '-0.1',
-  '0.5',
?     ^

+  '0.7',
?     ^

   '0.0',
   '0.2',
   '0.1',
   '0.1',
-  '0.0',
+  '-0.0',
?   +

   '0.0',
   '-0.0',
   '0.1'] : Expected different values in heat map.

----------------------------------------------------------------------
Ran 4 tests in 16.495s

FAILED (failures=1)

I still have this mistake but I’m not understanding where I’m wrong. I noticed comparing my heatmap with that of the reposity on GitHub that some values are different but only those related to the ‘overweight’ and ‘height’ column and then zeros that should not be negative but I followed all the steps in my code.
Thanks for your help!

vertebraofficial01 · April 15, 2024, 9:56am

heatmap

pkdvalis · April 15, 2024, 1:00pm

Is this really your most recent code? When I run it I get many errors including with the catplot

df[['gluc','cholesterol']]=df[['gluc','cholesterol']].map(lambda x:1 if x>1 else 0)
AttributeError: 'DataFrame' object has no attribute 'map'

df_cat_2.drop(index=[24,25],inplace=True)
KeyError: '[24, 25] not found in axis'

======================================================================
FAIL: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/boilerplate-medical-data-visualizer/test_module.py", line 28, in test_bar_plot_number_of_bars
    self.assertEqual(actual, expected, "Expected a different number of bars chart.")
AssertionError: 12 != 13 : Expected a different number of bars chart.

Also, from the instructions:

Clean the data in the df_heat variable by filtering out the following patient segments that represent incorrect data:

but there is no df_heat. You should start the heatmap by copying the original dataframe to df_heat.

You also have a few lines where you make a copy but then never refer to the copy. It makes it really hard for someone else to come in and read your code.

aplo=df[['ap_lo','ap_hi']].copy() #assign a copy
aplo=df[['ap_lo','ap_hi']][df['ap_lo']<=df['ap_hi']] #immediately overwrite it

You do the same thing for height and weight:

height=df['height'].copy()
height=df['height'][(df['height']>=df['height'].quantile(0.025))|(height<height.quantile(q=0.975))]
weight=df['weight'].copy()
weight=df['weight'][(df['weight']>df['weight'].quantile(0.025))|(weight<weight.quantile(q=0.975))]

Any ideas why I’m getting a catplot error? (this is on gitpod, I’ll try on replit as well)

I would clean up all these unnecessary copies and start with the df_heat copy for starters.

I’ll try to look more deeply into it.

pkdvalis · April 15, 2024, 1:19pm

Looks like these errors were all related to older versions of Python and pandas installed on Gitpod. Updating them cleared those errors (and replit fork was fine as well)

Did you update all of these while working on this? This could cause the tests not to work as expected so I’m a bit wary of that, but looking into it further.

Just to document it, your replit pyproject.toml lists

[tool.poetry.dependencies]
python = ">=3.10.0,<3.12"
matplotlib = "^3.8.3"
numpy = "^1.26.4"
pandas = "^2.2.1"
seaborn = "^0.13.2"

Gitpod requirements.txt:

seaborn==0.13.2
pandas==1.5.3

And Python 3.8.19 was installed.

I would still clear up the issues with copies and df_heat that I’ve indicated

pkdvalis · April 15, 2024, 1:20pm

I imagine you updated pandas when you changed this:

#df.loc[df['gluc']>1,'gluc']=1
#df.loc[df['gluc']==1,'gluc']=0
#df.loc[df['cholesterol']>1,'cholesterol']=1
#df.loc[df['cholesterol']==1,'cholesterol']=0
df[['gluc','cholesterol']]=df[['gluc','cholesterol']].map(lambda x:1 if x>1 else 0)

Why did you decide to use .map() instead of .loc ?

Topic		Replies	Views
AttributeError: 'str' object has no attribute 'get' Python	18	18894	June 3, 2022
Medical Data visualizer. test fails at heat_map values Python	11	1276	July 3, 2022
Medical Data Visualizer - Error	5	630	June 1, 2021
Medical Data Visualizer, heatmap	3	571	March 7, 2024
Data Analysis with Python Projects - Medical Data Visualizer - Data Cleaning Python	7	626	March 21, 2024

Medical data visualizer problem with heatpmap

Related topics