Medical data visualizer problem with heatpmap

vertebraofficial01 · April 16, 2024, 11:19am

Yes you’re right I realized it didn’t work

vertebraofficial01 · April 16, 2024, 11:26am

.count    66145.000000
mean         1.641974
std          0.067499
min          1.500000
25%          1.590000
50%          1.650000
75%          1.690000
max          1.790000
Name: height, dtype: float64
count    66465.000000
mean        73.730712
std         11.904887
min         51.500000
25%         65.000000
50%         72.000000
75%         81.000000
max        107.000000
Name: weight, dtype: float64```

vertebraofficial01 · April 16, 2024, 11:31am

I don’t understand why it still doesn’t work if I’ve now fixed it as you said

pkdvalis · April 16, 2024, 12:46pm

Can you show the .describe() before and after?

Minimum height looks ok

pkdvalis · April 16, 2024, 1:10pm

OOffff Ok I see the problem.

vertebraofficial01:

mean         1.641974
std          0.067499
min          1.500000
25%          1.590000
50%          1.650000
75%          1.690000
max          1.790000
Name: height, dtype: float64

Can you see what the problem is here?

pkdvalis · April 16, 2024, 1:26pm

Another problem is this line:

df_heat = df.copy()
df_heat['ap_lo']=df_heat['ap_lo'][df_heat['ap_lo']<=df_heat['ap_hi']]

print(df_heat['ap_lo'])
print(df['ap_lo'])

df_heat['ap_lo'] is unchanged after the line is run

0         80.0
1         90.0
2         70.0
3        100.0
4         60.0
         ...  
69995     80.0
69996     90.0
69997     90.0
69998     80.0
69999     80.0
Name: ap_lo, Length: 70000, dtype: float64
0         80
1         90
2         70
3        100
4         60
        ... 
69995     80
69996     90
69997     90
69998     80
69999     80
Name: ap_lo, Length: 70000, dtype: int64

It seems like you will need to filter into a new dataframe

https://www.geeksforgeeks.org/ways-to-filter-pandas-dataframe-by-column-values/

You could combine these two lines:

df_heat = df.copy()
df_heat['ap_lo']=df_heat['ap_lo'][df_heat['ap_lo']<=df_heat['ap_hi']]

Instead of making a copy into df_heat first, you can filter df into the new dataframe df_heat

vertebraofficial01 · April 16, 2024, 3:35pm

ok i will try it and update you. thanks for your help really and sorry!

vertebraofficial01 · April 16, 2024, 3:57pm

I get this error as I told you when I try to convert float to int and I used copies for it first
raise IntCastingNaNError(
pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

vertebraofficial01 · April 16, 2024, 4:04pm

Anyway, now I’m looking for a solution to that error but now I’ve lost hope that it will be able to solve the heatmap error.

  df_heat=df.copy()
  
  df_heat['ap_lo']=df['ap_lo'][df['ap_lo']<=df['ap_hi']]

vertebraofficial01 · April 16, 2024, 4:11pm

i will try later and update you

pkdvalis · April 16, 2024, 4:17pm

It’s not the float/int that’s a problem here but that it remains 70,000 lines and nothing is filtered out. I fixed it using an intermediate step for some reason filtering didn’t work “in place”

pkdvalis · April 16, 2024, 6:53pm

The note about the heights isn’t a float/int problem (although that might be something else to look at)

vertebraofficial01:

mean         1.641974
std          0.067499
min          1.500000
25%          1.590000
50%          1.650000
75%          1.690000
max          1.790000
Name: height, dtype: float64

Height is supposed to be in centimetres. Numbers seem quite small, right?

vertebraofficial01 · April 18, 2024, 9:13am

Yes, you’re right. However, it’s true that filtering doesn’t work as it should and I noticed it. I have to see if I can use some intermediate steps too. Thanks for the really help and sorry for the inconvenience.

pkdvalis · April 18, 2024, 10:42am

It’s no problem, I’m happy to help. We’re almost there!

vertebraofficial01 · April 18, 2024, 10:52am

Yes, later I’ll try to see how to filter more effectively. Yes, we’re close!

vertebraofficial01 · April 19, 2024, 10:29am

I checked the filters and they all work even without intermediate passage. so the problem has to be elsewhere. However in my opinion it could be that being then the columns in float after the filter and should be in int does not recognize them.

Then I managed to convert to int ‘ap_lo’ and ‘height’ after a few steps but I noticed that when I convert height back to cm I pass it again as float

pkdvalis · April 19, 2024, 11:21am

ap_lo looks good:

count    70000.000000
min        -70.000000
max      11000.000000
Name: ap_lo, dtype: float64

count    68766.000000
min        -70.000000
max        182.000000
Name: ap_lo, dtype: float64

Not sure why that int conversion isn’t working but if you print it right after loading the csv it starts a float so I wouldn’t worry about that for now.

pkdvalis · April 19, 2024, 1:01pm

df_heat['height']=df_heat['height'].apply(lambda c:c*100)
df_heat['height']=df_heat['height'].astype(int)
df_heat['height']=df_heat['height'].apply(lambda h:h/100)

You convert height back to cm here, but then undo it by dividing by 100 again, not sure what you’re trying to accomplish here?

0        1.68
1        1.56
2        1.65
3        1.69
4        1.56
         ... 
69994    1.65
69995    1.68
69996    1.58
69998    1.63
69999    1.70
Name: height, Length: 65000, dtype: float64

Height of 1.65 cm doesn’t make sense.

You also try to convert it back to cm after your quantile calculations, not before, but it doesn’t seem to affect it. I would just convert it back to cm immediately after calculating overweight, but it doesn’t make a difference for the quantile calculations.

The main problem is this does not give the correct result:

df_heat['height']=df_heat['height'][df_heat['height']>=quantile_1]

The quantile is correct. This format of filtering, something is wrong with it. It’s also inconsistent:

df_heat['height']=df_heat['height'][df_heat['height']>=quantile_1]
df_heat['height']=df_heat['height'][df_heat['height']<quantile_2]
df_heat['weight']=df_heat['weight'][df_heat['weight']>quantile_weight]
df_heat['weight']=df_heat['weight'][df_heat['weight']<quantile_weight_1]

Sometimes you do >= sometimes it’s >, why?

They give you the filter in the instructions:

height is less than the 2.5th percentile 
Keep the correct data with 
(df['height'] >= df['height'].quantile(0.025))

To use the filter is like this:

filtered = df[filter]

or

df = df[df['height'] >= df['height'].quantile(0.025)]

This filters the dataframe according to the boolean of the filter. This part of the video might help:
https://youtu.be/GPVsHOlRBBI?t=16625

To illustrate a bit futher here is an example dataframe:

Screenshot 2024-04-19 085107

The way you were filtering:

df = df['col1'][df['col1']>1]

Screenshot 2024-04-19 085212

Result doesn’t quite make sense… if you wanted col1 where col1 is > 1

Correct way:

df = df[df['col1']>1]

Screenshot 2024-04-19 085342

The resulting dataframe only has rows where col1 was > 1

So, ap_lo is good, all you need to do is apply this filter format for each quantile here:

height is less than the 2.5th percentile (Keep the correct data with (df['height'] >= df['height'].quantile(0.025)))

height is more than the 97.5th percentile

weight is less than the 2.5th percentile

weight is more than the 97.5th percentile

It should be 4 lines of code, total.

vertebraofficial01 · April 19, 2024, 4:30pm

Wait until I read everything. anyway I realized that even ‘gluc’ and ‘cholesterol’ have a problem maybe. because with map I change to 1 if it is greater than 1 otherwise 0 but there could also be the case where it is less than 1 and I’m replacing now

vertebraofficial01 · April 19, 2024, 4:46pm

Then I converted them height to int immediately after filtering but before converting it to int I converted it into meters and then again into cm. Anyway I realized that it does not filter as it should with regard to the first quartile and now I’m trying your solution.

Topic		Replies	Views
AttributeError: 'str' object has no attribute 'get' Python	18	18831	June 3, 2022
Medical Data visualizer. test fails at heat_map values Python	11	1267	July 3, 2022
Medical Data Visualizer - Error	5	623	June 1, 2021
Medical Data Visualizer, heatmap	3	563	March 7, 2024
Data Analysis with Python Projects - Medical Data Visualizer - Data Cleaning Python	7	608	March 21, 2024

Medical data visualizer problem with heatpmap

Related topics