Well i am over 10 hours on this project and today i had a massive outbreak for not being able to procceed. Is it only me feeling that this is quite advanced with what we have been taught? I have done Python for everybody, Data analysis with Python videos and the numpy videos but here we are asked things we have not been taught. Pd.melt(), Mask, Normalization, Quantiles, heatmap(i think) and even though i have managed with other resources to learn and apply pd.melt() and normalize, have studied a bit heatmaps and quantiles, the heatmap does not appear correct no matter what i do and i don’t understand why. It’s hard to state here what exactly is the problem, i will have to redo it from the beginning because something is messed up. In addition, the first count plot i was able to make but the cholesterol and glucose values are wrong even though up until there i can’t have made a mistake. The result is that cholesterol and glucose are lower to the people with cardiovascular problems, which does not make sense. The rest of the values appear normal. I have normalised those values and checked them multiple times. I dont understand what could be wrong.
TLDR: This project asks for more advanced techinques that we have not been taught in the video and i am not sure if that’s intented so we study other resources as well in order to solve it. If everyone else is able to make it just fine and i may need 20-30 hours , i dont know, maybe there is something wrong with me. I would appreciate some feedback on that. Thank you all and happy coding.
Hi @Liakos and welcome to the freeCodeCamp forum!
We understand your fustration (you are not alone, we have probably been there too). Absolutely nothing wrong with you!
First of all, I would personally suggest you to take care of your mental health while learning to program, specially if you are a self-learner. Take some rest. I have seen people burning out from trying too hard.
Bear in mind that your situation is not strange - coding is hard and it is not necessarily linear, there are “jumps” that you have to overcome. And I am afraid it is not the first time you are going to face this kind of issues. As a professsion, finding yourself in these situations will be common place.
Keep practicing: that will help you to improve! By doing that you will learn to solve the “dark sides” of the libraries. And whenever possible, look for support when you find a lost end. Here we can help.
Coming back to your problem, when you feel ready please share the part of the project that you are finding difficult? Can you probably show us this part?
The result is that cholesterol and glucose are lower to the people with cardiovascular problems, which does not make sense. The rest of the values appear normal. I have normalised those values and checked them multiple times. I dont understand what could be wrong.
You’re not alone in having difficulty with this project:
https://forum.freecodecamp.org/search?q=medical%20data%20%23python%20order%3Alatest_topic
I second everything @evaristoc mentioned, learning to code alone is difficult, and I agree with you this project has some things that might not be exactly clear at first. There could be some small error that you’ve missed that a fresh set of eyes will help you to spot. Instructions can easily be misinterpreted as well.
You can search the forum and see if other people were able to find the solution to your problem or you can ask yourself here. The forum is a great resource for help and research.
You should definitely be using other resources to learn as much as possible.
@evaristoc @pkdvalis Thank you guys for taking the time. Well, i am uploading a pdf with my work so far(it is my test notebook so it is not very clean and i have tested and erased a bunch of stuff to try and figure out what can work.) Well, the main issue is that i can’t make a proper heatmap as you can see. I tried at first to use my previous dataframe(long_df, as data for the heatmap) but it showed an error about cholesterol being a string and can’t convert it to float. Cholesterol is not a string though, i have even used astype(int) and printed and say int to make sure. So after searching a bit, i tried to use .pivot but it didn’t work because it was saying i have duplicate values, well yes i do because i need to display the different values split by cardio. .pivot_table works but it does not give me what i want at all. I really don’t understand what i should do. Also you can see in the second cat_plot my concern about cholesterol and glucose values not appearing according to logic and being shown at lower levels for people with cardiovascular issues and i can’t spot something wrong up until there. Maybe you guys that are more experienced can see something i don’t. Thank you again.
(https://github.com/LiakosData/medical/blob/main/Medical_Examination%20-%20JupyterLab.pdf).
I am not yet familiar with github so i am not great as creating the exact workspaces i want and then share my work. I hope that this works.
404 on that link.
Please just post your code here or share link to a google collab notebook.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.options.display.float_format = '{:.2f}'.format
df = pd.read_csv(r"medical_examination.csv")
df.head(10)
in_meters = df['height'] /100
overweight = df['weight'] / (in_meters)**2
df['overweight'] = overweight
df.head()
df['overweight'] = np.where(df['overweight'] >= 25, 1, 0)
df.head()
df['cholesterol'] = df['cholesterol'].astype(int)
df['gluc'] = df['gluc'].astype(int)
print(df["cholesterol"].value_counts())
print(df["gluc"].value_counts())
print(df['gluc'].dtype)
df['cholesterol'] = np.where(df['cholesterol'] > 1, 1, 0)
df['gluc'] = np.where(df['gluc'] > 1, 1, 0)
df.head()
long_df = pd.melt(df,
id_vars = ['cardio'],
value_vars = ['cholesterol', 'gluc', 'alco', 'smoke', 'active','overweight'])
long_df
cat_plot = sns.catplot( data = long_df,
x = 'variable',
kind = 'count',
hue = 'value',
palette = "viridis",
height = 5,
aspect = 1.2
)
cat_plot.set(ylabel = 'cardio')
plt.title("Epic Count Plot", fontsize = 20, color = 'darkmagenta', fontweight = 'bold')
plt.xlabel("VARIABLE", fontsize = 10, color = 'red', fontweight = 'bold')
plt.ylabel("CARDIO", fontweight = 'bold', fontsize = 10, color = 'orange')
plt.xticks(rotation=15)
plt.show()
df7 = long_df.groupby(['cardio', 'variable', 'value']).size().reset_index(name='count')
df7.rename(columns={'variable': 'feature', 'value': 'categorical_value'}, inplace=True)
df7
sns.set_style("darkgrid")
plot = sns.catplot(
data=df7,
x="feature",
y="count",
hue="cardio",
kind="box",
height=6,
aspect=2
)
plot.set_axis_labels("Feature", "Count")
plot.fig.suptitle("Categorical Feature Counts by Cardio", fontsize=16)
plt.show()
fig = plot.fig
df_heatmap = df7.pivot_table(index = "cardio", columns = "feature", values = "count")
sns.heatmap(
df_heatmap,
annot=True,
cmap = 'coolwarm'
)
Thats the best i can do for now till i learn better how to handle that kind of stuff.
Here is the cat_plot with the gluc, cholesterol problem, i can’t upload more images as a new user as i am told.
I’ve edited your code for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.
You can also use the “preformatted text” tool in the editor (</>
) to add backticks around text.
See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (').
Looking through it.
You should use the boilerplate code and functions. Not able to run the tests like this.
- If that value is > 25 then the person is overweight. Use the value
0
for NOT overweight and the value1
for overweight.
df['overweight'] = np.where(df['overweight'] >= 25, 1, 0)
Not sure if this is causing an issue but just noting it to watch out for > vs >=
- Draw the Categorical Plot in the
draw_cat_plot
function.
You don’t have a draw_cat_plot
function, as noted.
- Create a DataFrame for the cat plot using
pd.melt
with values fromcholesterol
,gluc
,smoke
,alco
,active
, andoverweight
in thedf_cat
variable.
You’re making a long_df
variable instead of the df_cat
variable. Best to follow the instructions exactly because you won’t know what the tests are checking for.
cat_plot = sns.catplot( data = long_df,
x = 'variable',
This whole section is that just for your own data check? It’s not asked for in the instructions. Again, if the tests are checking cat_plot
it will not work.
- Convert the data into
long
format and create a chart that shows the value counts of the categorical features using the following method provided by the seaborn library import:sns.catplot()
.
kind="box"
The kind of plot to draw, corresponds to the name of a categorical axes-level plotting function. Options are: “strip”, “swarm”, “box”, “violin”, “boxen”, “point”, “bar”, or “count”.
https://seaborn.pydata.org/generated/seaborn.catplot.html
Take a look at the different kinds of catplots here, the example most resembles a bar chart.
Well, i could not find anything in the boilerplate. The link provided in the project just opens a terminal that i also can’t use, it gives error, no matter the command and also does not include anything, it is empty.
in your mention, haven’t i done exactly that?
where the value is >=25, put 1 and for the rest put 0.
In my head() prompts as well it is like that, overweight people are asigned 1 and not overweight 0. And overweight is also displayed correctly in the graphs. The troubleshome values are glucose and cholesterol and the fact that the heatmap does not work with everything i have tried. I don’t know how its possible to get the dtype as an int and at the next input to be told it is a string so it gives an error. Thank you by the way for trying to help me and for your feedback about the format of my post. I hope you have a great new year.
I’ve updated my notes above, I’ll continue a bit later.
If that value is > 25 then the person is overweight
If the value is over 25, then overweight is 1.
You have greater than or equal to is 1.
Is 25 over 25?
Might not be a factor here, but it’s a detail to watch for. The instructions ask for over 25 which does not include 25. You have included 25 by using >=
You might need to login to gitpod using an account or you can login with your github credentials (You will need to authorize that on github) but you should see something like this:
You will not be able to complete this project and run the tests without this.
- Do not modify the next two lines.
This is the boilerplate code referred to in step 9.
df_heatmap = df7.pivot_table(index = "cardio", columns = "feature", values = "count")
Do not use the dataframe that you’ve modified for the cat_plot. You should load df
as it was left in Step 3.
You can see the instructions refer directly to df
:
(Keep the correct data with
(df['ap_lo'] <= df['ap_hi'])
)
All of the modifications that were done for the catplot should be stored in df_cat
(which you’ve named df_long
or df7
)
I hope this helps, happy new year!
@pkdvalis Thanks a lot for your help. I will look more carefully tomorrow . Some variable names are changed because the initial draft is where i test my code and i try a bunch of stuff and when i am done i write it clean on new notebook. Sadly i have completed the first 2 projects without the gitpod as well and they were fine but from what you are saying maybe i will need to do them again? Thing is for some reason i can’t get the gitpod page to look like yours. It shows like that to me and i can’t find how i can change that. I may need to talk to FCC support if it is no way around it without that. Thanks again for everything!
EDIT: Well somehow i managed to do it and opened it like your example with vs code gitpod, though i am not sure how i did and how i will do it again haha but i guess i will figure it out. I really appreciate you. I have a lot to work on now, will not be easy.
Yes, you should definitely put them into the boilerplate and run the tests on them to make sure they are passing.
Good @Liakos ,
@pkdvalis is one of the best moderators when it comes to Python, so you are in good hands!
A lot of work… for now. Keep practicing. Coding is a bit of handcrafting. You have to learn how that fails and when so you can have an answer to it.
To be fair, it is also true that freeCodeCamp’s tests might be sometimes very strict in what you have to input. Unfortunately the community in charge of making the exercises can not take in consideration all the possible cases so there is situations where your code is a “false negative”.
But either a “false -” or a “true negative”, whenever you are not finding the hints enough for debugging we are here to give you a second pair of eyes.
Wish you the best with your progress! Happy coding!