Data Analysis with Python Projects - Medical Data Visualizer

panno.luca · August 27, 2024, 9:46am

Tell us what’s happening:

Hi,
I’m trying to create a correlation matrix, but the test fails because it expects negative zeros in some places (-0.0) but I get only zeros (0.0). What’s wrong?
At first, I thought that I needed to transform my matrix with abs() for each zero. But then I realised that the -0.0 are coded in the test module. How can I pass the test? I should not modify the test module, because it is for certification project.
Thanks!

Your code so far

corr = df_heat.corr().round(
decimals=1) # this is what I tried: .transform(lambda x: [0.0 if i == 0.0 else i for i in x])

Your browser information:

User Agent is: Mozilla/5.0 (X11; Linux x86_64; rv:129.0) Gecko/20100101 Firefox/129.0

Challenge Information:

Data Analysis with Python Projects - Medical Data Visualizer

panno.luca · August 27, 2024, 12:57pm

Here is the output og the test:

AssertionError: Lists differ: ['0.0', '0.0', '0.0', '0.0', '-0.1', '0.5', '0.0', '0.1',[579 chars]0.1'] != ['0.0', '0.0', '-0.0', '0.0', '-0.1', '0.5', '0.0', '0.1'[601 chars]0.1']

First differing element 2:
'0.0'
'-0.0'

I’m trying to see if this is a problem of library version. I tried to use the version of the libraries in the original GitHub project, but replit.com don’t let me do that. That’s why I had to remove the versions in the requirements.txt file. Here is what I have in this file.

seaborn
pandas
matplotlib
numpy

I couldn’t find a solution to test other versions. I tried to install them manually (with “pip instal …”) but there are always some errors.

pkdvalis · August 27, 2024, 1:04pm

pip install should work although there may be a chain of dependencies. What errors do you get?

Please share your full code for testing.

panno.luca · August 27, 2024, 3:00pm

I’m not sure if I can share my code, as this project is part of the certification.
I could install manually Seaborn 0.13.2 and Pandas 1.5.3.
But then, NumPy wasn’t compatible any more. Then, I tried to install manually NumPy 3.1.3 and 3.2.2, but the installation process didn’t succeed (on replit.com).
The only way to run the project without library problems is to remove the versions in the requirements.txt file. But the test module complains that my zeros 0.0 are not negative zeros -0.0.

I get this error with Seaborn 0.13.2 and Pandas 1.5.3 (and the defaul version of NumPy):

Traceback (most recent call last):
  File "/home/runner/boilerplate-medical-data-visualizer/main.py", line 2, in <module>
    import medical_data_visualizer
  File "/home/runner/boilerplate-medical-data-visualizer/medical_data_visualizer.py", line 1, in <module>
    import pandas as pd
  File "/home/runner/boilerplate-medical-data-visualizer/.pythonlibs/lib/python3.12/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/boilerplate-medical-data-visualizer/.pythonlibs/lib/python3.12/site-packages/pandas/compat/__init__.py", line 18, in <module>
    from pandas.compat.numpy import (
  File "/home/runner/boilerplate-medical-data-visualizer/.pythonlibs/lib/python3.12/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
    from pandas.util.version import Version
  File "/home/runner/boilerplate-medical-data-visualizer/.pythonlibs/lib/python3.12/site-packages/pandas/util/__init__.py", line 2, in <module>
    from pandas.util._decorators import (  # noqa:F401
  File "/home/runner/boilerplate-medical-data-visualizer/.pythonlibs/lib/python3.12/site-packages/pandas/util/_decorators.py", line 14, in <module>
    from pandas._libs.properties import cache_readonly
  File "/home/runner/boilerplate-medical-data-visualizer/.pythonlibs/lib/python3.12/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

panno.luca · August 27, 2024, 3:25pm

I’m seeing other people sharing their code. So, I hope this is OK. Here is my code:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 1
df = pd.read_csv('medical_examination.csv')

# 2 BMI=m/h^2
df['overweight'] = ((df['weight'] / ((df['height'] / 100)**2))
                    > 25).astype(int)

# 3
df['cholesterol'] = df['cholesterol'].apply(lambda x: 0 if x == 1 else 1)
df['gluc'] = df['gluc'].apply(lambda x: 0 if x == 1 else 1)


# 4
def draw_cat_plot():
    # 5
    df_cat = pd.melt(df,
                     id_vars='cardio',
                     value_vars=[
                         'cholesterol', 'gluc', 'smoke', 'alco', 'active',
                         'overweight'
                     ])

    # 6
    df_cat = df_cat.groupby(['cardio', 'variable',
                             'value']).size().reset_index(name='total')

    # 7
    cardio0 = df_cat[df_cat['cardio'] == 0]
    cardio1 = df_cat[df_cat['cardio'] == 1]

    # 8
    fig, axs = plt.subplots(ncols=2, figsize=(15, 5))
    sns.countplot(data=cardio0, x='variable', hue='value',
                  ax=axs[0]).set(title='cadio = 0', ylabel="total")
    axs[0].legend([], [], frameon=False)
    sns.countplot(data=cardio1, x='variable', hue='value',
                  ax=axs[1]).set(title='cadio = 1', ylabel="total")
    sns.move_legend(axs[1], "right", bbox_to_anchor=(1.15, 0.5))

    # 9
    fig.savefig('catplot.png')
    return fig


# 10
def draw_heat_map():
    # 11
    df_heat = df.copy()
    df_heat = df_heat[df_heat['ap_lo'] <= df_heat['ap_hi']]
    df_heat = df_heat[(df_heat['height'] >= df_heat['height'].quantile(0.025))
                      &
                      (df_heat['height'] <= df_heat['height'].quantile(0.975))]
    df_heat = df_heat[(df_heat['weight'] >= df_heat['weight'].quantile(0.025))
                      &
                      (df_heat['weight'] <= df_heat['weight'].quantile(0.975))]

    # 12
    corr = df_heat.corr().round(
        decimals=1)#.transform(lambda x: [0.0 if i == 0.0 else i for i in x])

    # 13
    mask = np.triu(np.ones_like(corr, dtype=bool))

    # 14
    fig, ax = plt.subplots(ncols=1, figsize=(10, 10))

    # 15
    sns.heatmap(corr, mask=mask, annot=True, fmt=".1f")

    # 16
    fig.savefig('heatmap.png')
    return fig

panno.luca · August 27, 2024, 3:41pm

I tested on my computer, and there is no more a problem with negative zeros (I get them now). But there are still some differences in values of the heat map between my code and the test values.

...['0.0', '0.0', '-0.0', '0.0', '-0.1', '0.5', '0.0', '0.1', '0.1', '0.2', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.2', '0.1', '0.0', '0.2', '0.1', '0.0', '0.1', '-0.0', '-0.1', '0.1', '0.0', '0.1', '0.0', '0.1', '-0.0', '-0.0', '0.1', '0.0', '0.1', '0.4', '-0.0', '-0.0', '0.3', '0.2', '0.1', '-0.0', '0.0', '0.0', '-0.0', '-0.0', '-0.0', '0.2', '0.1', '0.1', '0.0', '0.0', '0.0', '0.0', '0.3', '0.0', '-0.0', '0.0', '-0.0', '-0.0', '-0.0', '0.0', '0.0', '-0.0', '0.0', '0.0', '0.0', '0.2', '0.0', '-0.0', '0.2', '0.1', '0.3', '0.2', '0.1', '-0.0', '-0.0', '-0.0', '-0.0', '0.1', '-0.1', '-0.2', '0.7', '0.0', '0.2', '0.1', '0.1', '-0.0', '0.0', '-0.0', '0.1']
F
======================================================================
FAIL: test_heat_map_values (test_module.HeatMapTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/media/luca/maxone/DataScience/freeCodeCamp/Data Analysis with Python/Projects/3-Medical Data Visualizer/boilerplate-medical-data-visualizer/test_module.py", line 47, in test_heat_map_values
    self.assertEqual(actual, expected, "Expected different values in heat map.")
AssertionError: Lists differ: ['0.0[59 chars], '0.2', '0.0', '0.0', '0.0', '0.0', '0.0', '0[548 chars]0.1'] != ['0.0[59 chars], '0.3', '0.0', '0.0', '0.0', '0.0', '0.0', '0[548 chars]0.1']

First differing element 9:
'0.2'
'0.3'

Diff is 1023 characters long. Set self.maxDiff to None to see it. : Expected different values in heat map.

panno.luca · August 27, 2024, 4:29pm

To visualize in a better way my results and the test values, I post the image I get (first image) and the one of the example result (second image):

It seems to me that the problem is the way my data is rounded (because the differences are of 1/10 only, and they are very few).

For the other part of the project, I’m aware that my catplot.png is not correct (but the test doesn’t complain about it).

SzeYeung1 · August 27, 2024, 6:51pm

The problem may arise from the way you filter the data for correlation. Consider different ways of performing filtering with multiple conditions. You may notice the rows of dataframe differs if you do the filtering in different way.

panno.luca · August 27, 2024, 7:26pm

Thank you!
If I filter in one go (i.e. with &s), it works. But now, I have to solve the problem of the first chart.

SzeYeung1 · August 27, 2024, 7:56pm

Filtering multiple conditions in separate lines may look equivalent to using &. In many cases it is, but in this case we are using percentiles for filtering. If we filter the highest and lowest 2.5% in height first, when we filter on weight in separate line, we are filtering the highest and lowest 2.5% based on the 95% of cases remaining, not the original population, and so the result differs.

Topic		Replies	Views
Medical Data visualizer. test fails at heat_map values Python	11	1261	July 3, 2022
Medical Data Visualizer - Difference in results Python	3	716	July 3, 2022
Medical data examination Python	5	510	April 20, 2023
Issues with medical visualizer functions Python	14	774	August 14, 2021
Medical Data Visualizer : Heatmap values test fail Python	3	1636	June 1, 2021

Data Analysis with Python Projects - Medical Data Visualizer

Tell us what’s happening:

Your code so far

Your browser information:

Challenge Information:

Related topics