Medical Data Visualizer, heatmap

Hi everyone,

Two main issues that im not getting sorted so far:

1 At first i can get rid of the overweight column (just wanna keep the overweight1 column), it wont show this column in the data frame named corr1 when i print it, but it is shown in the heatmap, why is that?

2 Second is more aesthetic matter, mi range of colors doesn’t match, you can find links to the database and the solution’s graph below

“Database link:
https://github.com/a-mt/fcc-medical-data-visualizer/blob/master/medical_examination.csv

The solution heatmap plot:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt #sets up plotting under plt 
import seaborn as sns #sets up styles and gives us more plotting options 

df = pd.read_csv(r'C:\Users\user\Desktop\it\www.freecodecamp.org\Data Analysis with Python\3.Medical Data Visualizer\medical_examination.csv')
#print(df)

#Add an overweight column to the data Kg/m^2 = BMI > 25
df['overweight'] = df.weight / (df.height/100)**2

#Use the value 0 for NOT overweight and the value 1 for overweight
df.loc[df['overweight'] > 25, 'overweight1'] = 1
df.loc[df['overweight'] <= 25, 'overweight1'] = 0

#If the value of cholesterol or gluc is 1, make the value 0. If the value is more than 1, make the value 1
df.loc[df['gluc'] <= 1, 'gluc'] = 0
df.loc[df['gluc'] > 1, 'gluc'] = 1
df.loc[df['cholesterol'] <= 1, 'cholesterol'] = 0
df.loc[df['cholesterol'] > 1, 'cholesterol'] = 1

df1 = df[['cardio', 'cholesterol', 'gluc', 'smoke', 'alco', 'active', 'overweight1']]
df2 = pd.melt(df1, id_vars=['cardio'], value_vars=['cholesterol', 'gluc', 'smoke', 'alco', 'active', 'overweight1'])

# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename
# one of the collumns for the catplot to work correctly.
df2 = df2.groupby(['cardio', 'variable', 'value']).size().reset_index()
df2 = df2.rename(columns={0 : 'total'}) 

# Draw the catplot with 'sns.catplot()'
graph = sns.catplot(data=df2, kind="bar", x="variable", y="total", hue="value", col="cardio")
fig = graph.fig

#Clean the data. Filter out the following patient segments that represent incorrect data:
heatmap = df[(df['ap_lo'] <= df['ap_hi']) & 
(df['height'] >= df['height'].quantile(0.025)) &
(df['height'] <= df['height'].quantile(0.975)) &
(df['weight'] >= df['weight'].quantile(0.025)) &
(df['weight'] <= df['weight'].quantile(0.975))   
]

# Create a correlation matrix using the dataset as requested
corr = heatmap.corr()
corr1 = corr.drop(['overweight'], axis=1)

# Fixing readable graph adjusting fonts and squares proportions
plt.subplots(figsize=(16, 9))

# Defining the heatmap and plotting
sns.heatmap(corr1, mask=mask, square=True, linewidths=0.5, annot=True, fmt="0.1f")
plt.show()

You might not pass the tests if you have a column with a different name. Do you want to drop overweight and rename overweight1? Could you just overwrite the overweight column?

Try the “center” parameter: https://seaborn.pydata.org/generated/seaborn.heatmap.html

And you should post this in the Python subforum :+1:

1 Like

Just modified them for a lambda function so i dont need to create any new column

df['overweight'] = df['overweight'].apply(lambda x: 1 if x > 25 else 0)

Yeah center=0 is working, thanks, will post on the Python subforum from now on

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.