Data Analysis with Python Projects - Medical Data Visualizer

Tell us what’s happening:
Describe your issue in detail here.
Hello, I wrote some part of my code for this assignment. The Problem is that I can’t generate any graphs when I run the code. Can you help me to find why I can’t see any graphs or any errors at all when I run the code?
Your code so far
main.py

# This entrypoint file to be used in development. Start by reading README.md
import medical_data_visualizer
from unittest import main

# Test your function by calling it here
medical_data_visualizer.draw_cat_plot()
medical_data_visualizer.draw_heat_map()

# Run unit tests automatically
main(module='test_module', exit=False)

medical_data_visualizer.py

import pandas as pd #pandas
import seaborn as sns #seaborn
import matplotlib.pyplot as plt
import numpy as np
import math as math

# Import data
df = pd.read_csv('medical_examination.csv')
# Add 'overweight' column
value_num = df['weight'] / (df['height'] * df['height']) # I first tried math.sqrt which gave an error
# I ended up being too lazy to solve that error sooo, there u go
df['overweight'] = np.where(value_num > 25, 1, 0)

# Normalize data by making 0 always good and 1 always bad. If the value of 'cholesterol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.

df['overweight'] = np.where((df['cholesterol'] == 1) | (df['gluc'] == 1), 0, 1)

# Draw Categorical Plot
def draw_cat_plot():
    # Create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
    columns_to_melt = ['cholesterol', 'gluc', 'smoke', 'alco', 'active', 'overweight']
    df_cat = pd.melt(df, value_vars=columns_to_melt)


    # Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the columns for the catplot to work correctly.
    df_cat = df_cat.groupby(['cardio', 'variable', 'value']).size().reset_index(name='total')
    df_cat.rename(columns={'value': 'feature_value'}, inplace=True)

    # Draw the catplot with 'sns.catplot()'
    plt.figure(figsize=(10, 6))
    catplot = sns.catplot(x='variable', y='total', hue='feature_value', col='cardio', data=df_cat, kind='bar')

    # Get the figure for the output
    fig = catplot.fig


    # Do not modify the next two lines
    fig.savefig('catplot.png')
    return fig


# Draw Heat Map
def draw_heat_map():
    # Clean the data
    df_heat = None

    # Calculate the correlation matrix
    corr = None

    # Generate a mask for the upper triangle
    mask = None



    # Set up the matplotlib figure
    fig, ax = None

    # Draw the heatmap with 'sns.heatmap()'



    # Do not modify the next two lines
    fig.savefig('heatmap.png')
    return fig

Challenge: Data Analysis with Python Projects - Medical Data Visualizer

Link to the challenge:

Can you link to your replit?

1 Like

Here you go man, I appreciate your help:

getting this error: KeyError: 'cardio'

Generated on this line:
df_cat = df_cat.groupby(['cardio', 'variable', 'value']).size().reset_index(name='total')

The df looks like this:

0       cholesterol      1
1       cholesterol      3
2       cholesterol      3
3       cholesterol      1
4       cholesterol      1
...             ...    ...
419995   overweight      0
419996   overweight      1
419997   overweight      0
419998   overweight      0
419999   overweight      0

[420000 rows x 2 columns]

There is no Cardio column. It needs to look like this:

 	cardio 	variable 	value
0 	0 	cholesterol 	0
1 	1 	cholesterol 	1
2 	1 	cholesterol 	1
3 	1 	cholesterol 	0
4 	0 	cholesterol 	0
... 	... 	... 	...
419995 	0 	overweight 	1
419996 	1 	overweight 	1
419997 	1 	overweight 	1
419998 	1 	overweight 	1
419999 	0 	overweight 	0

420000 rows × 3 columns

Problem is created here, you are missing id_vars from this command:
df_cat = pd.melt(df, value_vars=columns_to_melt)

Example:
pd.melt(df, id_vars=['A'], value_vars=['B'])

Further reading:

https://www.geeksforgeeks.org/python-pandas-melt/

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html

2 Likes

Thanks for the helps. I’ll look to it :smiley:

1 Like

Hello, I searched for it. But I was unable to find an solution

You made some progress though, you generated a catplot.png. You’re almost there!

You have too many values. Your normalization didn’t work.

cardio     variable  feature_value  total
0        0       active              0   6378
1        0       active              1  28643
2        0         alco              0  33080
3        0         alco              1   1941
4        0  cholesterol              1  29330
5        0  cholesterol              2   3799
6        0  cholesterol              3   1892
7        0         gluc              1  30894
8        0         gluc              2   2112
9        0         gluc              3   2015
10       0   overweight              0  32720
11       0   overweight              1   2301
12       0        smoke              0  31781
13       0        smoke              1   3240
14       1       active              0   7361
15       1       active              1  27618
16       1         alco              0  33156
17       1         alco              1   1823
18       1  cholesterol              1  23055
19       1  cholesterol              2   5750
20       1  cholesterol              3   6174
21       1         gluc              1  28585

You have 0, 1, 2,3 for cholesterol and gluc. Needs to be 0 or 1.

# Normalize data by making 0 always good and 1 always bad. If the value of 'cholesterol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.
1 Like

I guess I did it. (After a long plane trip) Here is the result of the code.

    cardio     variable  feature_value  total
0        0       active              0   6378
1        0       active              1  28643
2        0         alco              0  33080
3        0         alco              1   1941
4        0  cholesterol              0  29330
5        0  cholesterol              1   5691
6        0         gluc              0  30894
7        0         gluc              1   4127
8        0   overweight              0  35021
9        0        smoke              0  31781
10       0        smoke              1   3240
11       1       active              0   7361
12       1       active              1  27618
13       1         alco              0  33156
14       1         alco              1   1823
15       1  cholesterol              0  23055
16       1  cholesterol              1  11924
17       1         gluc              0  28585
18       1         gluc              1   6394
19       1   overweight              0  34979
20       1        smoke              0  32050
21       1        smoke              1   2929

I added

id_vars=['cardio']

and instead

df['overweight'] = np.where((df['cholesterol'] == 1) | (df['gluc'] == 1), 0, 1)

I noticed that assignment instead wanted me to normalize values of cholestrol and glucoze. Though, I misunderstood the assignment there, so I ended up normalizing overweight according to cholestrol and glucose.

I solved it with the code below:

df['cholesterol'] = np.where((df['cholesterol'] == 1), 0, 1)
df['gluc'] = np.where((df['gluc'] == 1), 0, 1)

Thanks for the help mate :smile: :smile:

(PS: I only have one issue now. My ‘overweight’ data doesn’t match with the expected output)

1 Like

Double check the units of the height.

calculate their BMI by dividing their weight in kilograms by the square of their height in meters

Read about BMI, see what a normal BMI is and see what the BMI you’ve calculated are and you’ll see it’s way off.

But I already do it in this code:

value_num = df['weight'] / (df['height'] * df['height'])
df['overweight'] = np.where(value_num > 25, 1, 0)

I do weight/ (height^2) and check if it is greater than 25. If it is, I tell 1, if it is not I tell 0

height in meters

Read about BMI:

" BMI Categories:
Underweight = <18.5
Normal weight = 18.5–24.9
Overweight = 25–29.9
Obesity = BMI of 30 or greater"
https://www.nhlbi.nih.gov/health/educational/lose_wt/BMI/bmicalc.htm

Your BMI values:

0        0.002197
1        0.003493
2        0.002351
3        0.002871
4        0.002301

I just realized my mistake, I am such a dumbo. With the help of your reply, I checked the given data and noticed the height was not in meters. You were telling me to change the height to meters before using them.

I changed as a result. For any dumbos who get stuck in the easiest part like me;

The Height in the default data is given in cm, not m

Thanks for helping me all the time pkdvalis :smiley:

I appreciate it, I really do.

1 Like

When you reach the top, remember to send the ladder back down for the next person :metal:

1 Like

Tell us what’s happening:
Describe your issue in detail here.
I finished my code. Issue is that when I get the contents of main.py to medical_data_visualiser.py, it works, the test also get successful. But when I delete the mains code then try to run, nothing happens. No tests no outputs, nothing. How can I solve this.

(I delete the main.py code in medical_data_visualiser.py, main.py has its default code in all of these actions)

(And these assignments give you places to fill so I just filled those places. Haven’t added my own stuff anywhere)

Challenge: Data Analysis with Python Projects - Medical Data Visualizer

Link to the challenge:

why you wanna delete main.py? Take the pass and move on.

Not sure I totally understand the question here. The replit is likely configured to run main.py which calls the tests.

Let me give you my replit can you check it. I mean, the graphs are accurate but I can only reach these results when I ran them inside the medical_data_yadayada.py file

If you can’t also find a solution I will load my solution, by running the tests, function calls and the methods themselves at the same file. (which means I will delete main then submit)

It wasn’t running main.py or the tests.
You need to edit the .replit file.

Change:

entrypoint = “medical_data_visualizer.py”

to:

entrypoint = “main.py”

I did this in a fork, ran it, tests ran OK

1 Like

How many tests did it ran?
Is this below correct?

/home/runner/boilerplate-medical-data-visualizer/venv/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
/home/runner/boilerplate-medical-data-visualizer/venv/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
./home/runner/boilerplate-medical-data-visualizer/venv/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
..['0.0', '0.0', '-0.0', '0.0', '-0.1', '0.5', '0.0', '0.1', '0.1', '0.3', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.2', '0.1', '0.0', '0.2', '0.1', '0.0', '0.1', '-0.0', '-0.1', '0.1', '0.0', '0.2', '0.0', '0.1', '-0.0', '-0.0', '0.1', '0.0', '0.1', '0.4', '-0.0', '-0.0', '0.3', '0.2', '0.1', '-0.0', '0.0', '0.0', '-0.0', '-0.0', '-0.0', '0.2', '0.1', '0.1', '0.0', '0.0', '0.0', '0.0', '0.3', '0.0', '-0.0', '0.0', '-0.0', '-0.0', '-0.0', '0.0', '0.0', '-0.0', '0.0', '0.0', '0.0', '0.2', '0.0', '-0.0', '0.2', '0.1', '0.3', '0.2', '0.1', '-0.0', '-0.0', '-0.0', '-0.0', '0.1', '-0.1', '-0.1', '0.7', '0.0', '0.2', '0.1', '0.1', '-0.0', '0.0', '-0.0', '0.1']
.
----------------------------------------------------------------------
Ran 4 tests in 9.004s

OK

It says it ran 4 tests and they were OK, no fails. I couldn’t tell you much more than that.

The other messages are just warnings, not fails. Maybe related to deprecated features. You can investigate it further if you’re curious, but if you pass the tests I would move on and not worry too much about it.

If you dig into the test suite and figure out the error message I’m sure you will learn something, but I don’t think it’s necessary. There’s still a lot to work on.

On my own replit solution I get a different warning, and it also outputs a matrix, like yours:

UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df_heat = df_heat[df['weight'] <= df['weight'].quantile(0.975)]
['0.0', '0.0', '-0.0', '0.0', '-0.1', '0.5', '0.0', '0.1', '0.1', '0.3', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.2', '0.1', '0.0', '0.2', '0.1', '0.0', '0.1', '-0.0', '-0.1', '0.1', '0.0', '0.2', '0.0', '0.1', '-0.0', '-0.0', '0.1', '0.0', '0.1', '0.4', '-0.0', '-0.0', '0.3', '0.2', '0.1', '-0.0', '0.0', '0.0', '-0.0', '-0.0', '-0.0', '0.2', '0.1', '0.1', '0.0', '0.0', '0.0', '0.0', '0.3', '0.0', '-0.0', '0.0', '-0.0', '-0.0', '-0.0', '0.0', '0.0', '-0.0', '0.0', '0.0', '0.0', '0.2', '0.0', '-0.0', '0.2', '0.1', '0.3', '0.2', '0.1', '-0.0', '-0.0', '-0.0', '-0.0', '0.1', '-0.1', '-0.1', '0.7', '0.0', '0.2', '0.1', '0.1', '-0.0', '0.0', '-0.0', '0.1']
.
----------------------------------------------------------------------
Ran 4 tests in 8.610s

OK
1 Like