Data Analysis with Python Projects: Heatmap section

Is there a guide or can I request for guides in terms of example of expected dataframe outputs for steps “clean the data”, calculate correlation and generate mask? I could not seem to produce on-point numbers on the heatmap output compared to the Figure 2 image file included in the boilerplate.

Hi, and welcome to the forum :wave:

Can you share a link to the step and your code and outputs, any relevant info?

There’s not really guides like this except the example heatmap image. If we are troubleshooting something I could provide examples of what the dataframe needs to look like where relevant.

1 Like

hi @pkdvalis , Thanks for confirming the lack of guides.

This is the link to the activity : heatmap .

a. I am now working on steps 11 through 16.

11. Clean the data in the df_heat variable by filtering out the following patient segments that represent incorrect data:
height is less than the 2.5th percentile (Keep the correct data with (df['height'] >= df['height'].quantile(0.025)))
height is more than the 97.5th percentile
weight is less than the 2.5th percentile
weight is more than the 97.5th percentile
12. Calculate the correlation matrix and store it in the corr variable
13. Generate a mask for the upper triangle and store it in the mask variable
14. Set up the matplotlib figure
15. Plot the correlation matrix using the method provided by the seaborn library import: sns.heatmap()
16. Do not modify the next two lines

b. Questions/Observations/need to clarify/
b.1. Please refer to their example heat map. It has a gender field. But the dataframe has no gender field. There was no instruction to rename(‘sex’) if need be.

b.2. In the step 11, it was not clear if the filters need to be applied all at once (ie All Filters VS dataframe df) or should the filters be applied one at a time creating a different dataframe after each filter therefor applying filter on a different dataframe at a time.

-I initially tried to apply the filters all at once. I also tried applying the filters one at a time transforming the dataframe each time a filter was applied. However, there is no example dataframe output prior to doing the correlation, I would not know if in a backtest point of view if the dataframe is correct.

df_heat = df[(df['height']>=df['height'].quantile(0.025)) & 
                (df['height']>=df['height'].quantile(0.975)) &
                (df['weight']>=df['weight'].quantile(0.025)) &

b3. Step 16 says “Do not modify the next two lines”. However, I was able to produce the file in different method. I had to modify the code.

original line: fig.savefig('heatmap.png')
adjusted line: fig.figure.savefig('heatmap.png')

c. If possible, can they (fcc) be requested to add or provide the dataframes after each steps 11, 12 and 13?

Thanks in advance.

Double check all your >= and <=.

I did this in 4 separate lines to keep it simple, but most people seem to do it in one line as you have.

The savefig solution is interesting, but I think the tests won’t work with that. You can achieve the same by adding the line

fig = fig.figure

just before the savefig. You can read a little more about this here:

I hope this helps!

1 Like

thank you @pkdvalis for the reply.

I think I was able to do create the heatmap successfully. What is not clear yet is how the heatmap’s color bar range is set, determined or customized. What is the context of its range. Where is it peg to?

Below is the combined heatmaps to show the activity’s Figure2 example and the generated heatmaps of my code. Sorry for the image if not clear. It is the “new user” image limitation feature not being helpful. It was supposed to be multiple ‘clear’ individual image that i was trying to show.

There are some parameters to control that here:

vmin, vmax: floats, optional
Values to anchor the colormap, otherwise they are inferred from the data and other keyword arguments.

cmap: matplotlib colormap name or object, or list of colors, optional
The mapping from data values to color space. If not provided, the default will depend on whether center is set.

center: float, optional
The value at which to center the colormap when plotting divergent data. Using this parameter will change the default cmap if none is specified.

1 Like

thank you for the pointers @pkdvalis .

I was able to use the center parameter. As for the cmap, I see that it controls what colormap will be used. I saw the collection of colormap options from matplotlib website. There is a lot but could not find the one that corresponds to the default colormap(or palette) that will be used if based on the usage of center parameter.

The final output now looks like the same in the activity’s Figure2 example

1 Like