Hi everyone,
Two main issues that im not getting sorted so far:
1 At first i can get rid of the overweight column (just wanna keep the overweight1 column), it wont show this column in the data frame named corr1 when i print it, but it is shown in the heatmap, why is that?
2 Second is more aesthetic matter, mi range of colors doesn’t match, you can find links to the database and the solution’s graph below
“Database link:
https://github.com/a-mt/fcc-medical-data-visualizer/blob/master/medical_examination.csv”
The solution heatmap plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt #sets up plotting under plt
import seaborn as sns #sets up styles and gives us more plotting options
df = pd.read_csv(r'C:\Users\user\Desktop\it\www.freecodecamp.org\Data Analysis with Python\3.Medical Data Visualizer\medical_examination.csv')
#print(df)
#Add an overweight column to the data Kg/m^2 = BMI > 25
df['overweight'] = df.weight / (df.height/100)**2
#Use the value 0 for NOT overweight and the value 1 for overweight
df.loc[df['overweight'] > 25, 'overweight1'] = 1
df.loc[df['overweight'] <= 25, 'overweight1'] = 0
#If the value of cholesterol or gluc is 1, make the value 0. If the value is more than 1, make the value 1
df.loc[df['gluc'] <= 1, 'gluc'] = 0
df.loc[df['gluc'] > 1, 'gluc'] = 1
df.loc[df['cholesterol'] <= 1, 'cholesterol'] = 0
df.loc[df['cholesterol'] > 1, 'cholesterol'] = 1
df1 = df[['cardio', 'cholesterol', 'gluc', 'smoke', 'alco', 'active', 'overweight1']]
df2 = pd.melt(df1, id_vars=['cardio'], value_vars=['cholesterol', 'gluc', 'smoke', 'alco', 'active', 'overweight1'])
# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename
# one of the collumns for the catplot to work correctly.
df2 = df2.groupby(['cardio', 'variable', 'value']).size().reset_index()
df2 = df2.rename(columns={0 : 'total'})
# Draw the catplot with 'sns.catplot()'
graph = sns.catplot(data=df2, kind="bar", x="variable", y="total", hue="value", col="cardio")
fig = graph.fig
#Clean the data. Filter out the following patient segments that represent incorrect data:
heatmap = df[(df['ap_lo'] <= df['ap_hi']) &
(df['height'] >= df['height'].quantile(0.025)) &
(df['height'] <= df['height'].quantile(0.975)) &
(df['weight'] >= df['weight'].quantile(0.025)) &
(df['weight'] <= df['weight'].quantile(0.975))
]
# Create a correlation matrix using the dataset as requested
corr = heatmap.corr()
corr1 = corr.drop(['overweight'], axis=1)
# Fixing readable graph adjusting fonts and squares proportions
plt.subplots(figsize=(16, 9))
# Defining the heatmap and plotting
sns.heatmap(corr1, mask=mask, square=True, linewidths=0.5, annot=True, fmt="0.1f")
plt.show()