Medical Data Visualizer bug & drop function

lujanluka · August 21, 2021, 8:28pm

hello people. this task isnt that much hard, however i have one “maybe bug” problem, and one “question”.
i tried to solve it and google it, but no success.

First is “bug”. When i type this code at the end of “draw heatmap()” function i got redline on my replit … while that code works nice in jupyter notebook;

fig, ax = plt.subplots(figsize=(12,12)) -i dont know why i get redline on replit code under this line, that throws error , while its working nice on jupyter. … i tried various ways, everytime i type “fig” its an error … is there something that i am missing?

and second… when i was about to “clean data” in draw_heat_map() function, i started with “drop”… it didnt work… latter i checked for “solutions” and found that this simple codes works…

def draw_heat_map():
# Clean the data
# i have used “drop” function before but apparently results were incorrect. is there #any way to use drop as alternative?
#for example this is not working

#HOWEVER - i still want to go with my first idea to “drop” everything that i dont #need…
#for example why this is not working? is there anyway to drop all those rows that #doesnt satisfy our needs?
# df4.drop(df4[ (df4[‘height’] <= df4[‘height’].quantile(0.025)) & (df4[‘height’] >= df4[‘height’].quantile(0.975))].index, inplace=True)

#this bellow works

df4 =df2.copy()
df4 =df4[(df4['ap_lo'] <= df4['ap_hi']) &  
     (df4['height'] >= df4['height'].quantile(0.025)) & 
      (df4['height'] <= df4['height'].quantile(0.975))& 
     (df4['weight'] <= df4['weight'].quantile(0.975))& 
     (df4['weight'] >= df4['weight'].quantile(0.025))]

df_heat = df4

# Calculate the correlation matrix
corr =   df_heat.corr()


# Generate a mask for the upper triangle
mask=np.triu(np.ones_like(corr, dtype=bool)



# Set up the matplotlib figure
fig, ax = plt.subplots(figsize=(12,12))
#  why i have error here????


# Draw the heatmap with 'sns.heatmap()'

sns.heatmap( corr , mask=mask, center=0, annot=True,
        fmt='.1f', square=True)



# Do not modify the next two lines
fig.savefig('heatmap.png')
return fig

LINK
https://replit.com/@LukaKujundiLuja/boilerplate-medical-data-visualizer-1#medical_data_visualizer.py

jeremy.a.gray · August 21, 2021, 9:31pm

This bit is easy. You are missing a close parenthesis here:

So, definitely a bug in your code. As for the drop versus select problem, it looks just like the simultaneous selections versus sequential selection problem that has been discussed here before. When I modify your code with the drop version, it works but generates a very slightly different heatmap. If I compare two separate dataframes and drop one and select the other and display the shapes, I get

dropped:  (70000, 14)
selected:  (63259, 14)

Since the data has 70,000 entries to begin, nothing is dropped. So the drop is not working the way you think (apparently no line meets all 5 conditions). If I do 5 sequential drops with the correct conditions, the cleaned data frames are identical and the tests pass. So, you can drop or select and either will work as long as they are done correctly, as one would expect.

Lastly, Jupyter notebooks have all manner of problems when you leave the “notebook” realm (read, like Matlab or Mathematica) and enter the programming realm. So it’s never surprising some things work in Jupyter that won’t with a Python interpreter, and vice versa.

lujanluka · August 23, 2021, 7:22pm

thanks. first sorry for late reply, i was busy with another job, then in free time i fixed this mistake and i got problem that sns catplot doesnt recognize change name on y axis… after lot of google and trying i fixed that as well, and it passed all tests.

so thank you. however;

If I do 5 sequential drops with the correct conditions, the cleaned data frames are identical and the tests pass. So, you can drop or select and either will work as long as they are done correctly, as one would expect.

can you pls give me one example , because i started with “that idea” i google everyway how to drop all those rows at once but couldnt do it properly… i want to learn where it was my mistake.

jeremy.a.gray · August 23, 2021, 10:32pm

This second example of yours:

lujanluka:

df4 =df4[(df4['ap_lo'] <= df4['ap_hi']) &  
     (df4['height'] >= df4['height'].quantile(0.025)) & 
      (df4['height'] <= df4['height'].quantile(0.975))& 
     (df4['weight'] <= df4['weight'].quantile(0.975))& 
     (df4['weight'] >= df4['weight'].quantile(0.025))]

is the way I usually see people attack the problem. This is selecting the rows you want to keep, so you need to logically and them all together. The drop method like this

is a negative method; you are trying to remove everything you don’t want, so you need to logically or the choices with

df4.drop(df4[ (df4[‘height’] <= df4[‘height’].quantile(0.025)) ].index, inplace=True)
df4.drop(df4[ (df4[‘height’] >= df4[‘height’].quantile(0.975))].index, inplace=True)

and so on with the rest of the conditions. See de Morgan’s laws for more explanation. Anyway, the two methods are logically equivalent, so they both should work or there’s a problem somewhere.

system · February 22, 2022, 10:33am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.