Data Analysis with Python Projects - Medical Data Visualizer . How to pass dataframes to functions that aren't supposed to receive any parameters

parodiadrian3 · March 2, 2023, 12:45pm

Tell us what’s happening:
Hi! I was able to fix the problems about installing dependencies for this project. Thanks for that.
Now, I’m having an issue related to the execution of code and test passing. I’ve been looking to test.module.py and I noticed the functions
draw_cat_plot and draw_heat_map are invoked without passing any arguments to it.
Here comes my doubt:
If these functions are imported to main.py from medical_data_visualizer.py and the data frame is created in the last file, how can I pass the data frame to the functions if they aren’t supposed to receive any parameters? I’m having a hard time figuring this out. For me, the functions must accept the data frame as a parameter. Or maybe I need to create the data frame in each function. But this doesn’t make any sense to me. Could you please take a look to my code and give some direction here? I’m sharing the link to the code on my replit .

Thanks and have a good day!

Your code so far

medical-data-visualizer - SAUL GOODMAN - Replit

Your browser information:

User Agent is: Mozilla/5.0 (X11; Linux x86_64; rv:107.0) Gecko/20100101 Firefox/107.0

Challenge: Data Analysis with Python Projects - Medical Data Visualizer

Link to the challenge:

parodiadrian3 · March 2, 2023, 8:09pm

Hi, thanks for you answer. Now the program finally runs but, as expected, now I have some errors and failures with the tests. The first one is with the CatPlotTestCase. I got the following error:

===========================================================
ERROR: test_bar_plot_number_of_bars (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-medical-data-visualizer-Dependencies-OKAY/test_module.py", line 26, in test_bar_plot_number_of_bars
    actual = len([rect for rect in self.ax.get_children() if isinstance(rect, mpl.patches.Rectangle)])
AttributeError: 'numpy.ndarray' object has no attribute 'get_children'
===========================================================

I’ve found information on the internet that says sns.catplot is a figure-level function and that you cannot use axes. and I try to write something like this:


fig = plt.figure()
ax1 = fig.add_subplot(111)
g = sns.catplot(x="variable", col="cardio", data= dfB_long, hue="value", kind= 'count', orient= "v", ax=ax1)
#plt.close(1)

Then I always get two plots. The catplot I got is exactly the same as the one in the Figure1.png, but it seems that the test module cannot retrieve what’s needed. Furthermore I don’t know why the error says the object is of the type np.ndarray.
On the other hand, i think there’s a mistake in line 27 of test module.py and it should be modified to
expected = 12 # because there are 12 bars per plot

Let’s move to the second error:

ERROR: test_line_plot_labels (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-medical-data-visualizer-Dependencies-OKAY/test_module.py", line 13, in test_line_plot_labels
    actual = self.ax.get_xlabel()
AttributeError: 'numpy.ndarray' object has no attribute 'get_xlabel'

I think this one is pretty much related to the previous one.

My question for this two first errors is that is okay to use the seaborn function catplot or this will always lead me to errors? What would you suggest?

The final error is the draw_heat_map function:

=============================================================
FAIL: test_heat_map_values (test_module.HeatMapTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-medical-data-visualizer-Dependencies-OKAY/test_module.py", line 47, in test_heat_map_values
    self.assertEqual(actual, expected, "Expected different values in heat map.")
AssertionError: Lists differ: [] != ['0.0', '0.0', '-0.0', '0.0', '-0.1', '0.5[616 chars]0.1']

Second list contains 91 additional elements.
First extra element 0:
'0.0'

Diff is 941 characters long. Set self.maxDiff to None to see it. : Expected different values in heat map.

I’ve found some info in a post of the forum which said to close all the figures before running the tests again. But I tried that and it doesn’t seem to fix the problem in my case.
Moreover, it seems my solution is generating an empty list. But again, the heat map I did is exactly as the one in Figure2.png

I would appreciate any help. Thank you so much!

parodiadrian3 · March 3, 2023, 10:56am

Hi Randell, how are you?
I didn’t change the code in test_module.py
Nevertheless, there are 12 bars for each subplot in Figure_1.png not 13. That’s why I suspect it is incorrect, as I said it in my previouse message. But again, I didn’t modify the code.

expected = 13
        self.assertEqual(actual, expected, "Expected a different number of bars chart.")

The whole problem in the previous message is not about changing that number in line 27 in test_module.py

parodiadrian3 · March 10, 2023, 3:15pm

Hi again, I sent a couple of doubts the last week, but most of them haven’t been addressed yet. Particularly, I’m struggling with catplot and heatmap, because I can reproduce those plots exactly, but the tests don’t pass. I can’t figure out why. I shared my thoughts of why it may not be working. But I’m still stuck.

Any advice to solve those issues? thanks

kinome79 · March 10, 2023, 3:34pm

What is kind=‘count’ on your catplot? And for some reason your heatmap output has a barchart attached to it as well (at least in the png). Also, I had some heatmap issues because the test says 0 isn’t equal to 0.0. Had to do some formatting.

Oh, also, check your heatmap numbers… the numbers on your heatmap aren’t the same as the expected output. There is something wrong with the way you are processing your data. I believe I made the same mistake too.

HINT:

It has to do with processing your data… say if I wanted to take the10% of data off the top and bottom of a dataset that has 100 entries, you’d expect to remove the 10 ten and bottom 10 entries… well, if I first take the 10% off the top… and then take that data, and take 10% off the bottom… why would I end up with 81 entries instead of only 80?

parodiadrian3 · March 10, 2023, 4:59pm

Hi Kinome79! thanks for your reply.

Yes I’m looking at heatmap.png and it’s a complete mess. Probably that’s why it fails, I haven’t noticed before. I’ll take a look to the points you made. Thanks!

kinome79 · March 10, 2023, 5:02pm

I didn’t look deep, but I think you’re heatmap might be a mess because you use plt.gcf() - get current figure… which I think is getting a figure you’ve already drawn on instead of creating a new blank figure. I’m just guessing at that assumption though as I didn’t try to verify that assumption.

parodiadrian3 · March 13, 2023, 6:48pm

Hi again, Kinome. I’ve been working on these figures (still struggling, but a bit closer to the end). I took into acount your advice and, for the heat map, I cleaned up the data in two steps.
First, I eliminated the data for when diastolic pressure is higher than systolic
and then I generated a big conditional to eliminate rows higher and below the asked quantiles. I could finally generate the heatmap with the right format. Nevertheless, some values are still wrong. I wonder if you (or someone else) could check. I think the code for the heatmap is pretty neat and clean right now.

On the other hand, with catplot I’m stuck hard. You asked me before what was <<kind=‘count’>> for. Well, I was trying to get the correct graphic and if I remove that, I get the following graph:

(Not what we want!)

But the main problem I have is with the tests, which try to retrieve information about the axes but they fail miserably. I really don’t know what to do. I tried to create subplots and then place the catplots there but I also failed. Can you tell me if I’m using the correct function? How can I convert the figure to an object from which the values of the axis can be recovered? That seems not to function properly with catplot.

Please, tell me how can I use catplot with subplots to be able to pass the tests. I just was unable to find that (The code should be updated in the link provided)

I appreciate your help. Have a great week.

kinome79 · March 13, 2023, 9:22pm

So, if you do kind="bar" it gives you that weird graph above for your catplot?

Also, if your heatmap data is still slightly off, I believe its probably the same cause as before. You seem to have only move halfway toward fixing the problem.

bsandma1 · March 16, 2023, 5:33am

Hi @parodiadrian3 , I’ve been through the data analysis course and yes a lot of it is kind of confusing at first and hard to find reliable answers on the internet. Most of the answers I found were way more complex than the actual solutions I ended up with. It’s been a month or so since I did this project but I’ll try and help.

When you use matplotlib by itself, you will make a figure and add axes and plots to it and what not but when you’re using seaborn I don’t think you need to do all that.

In your code example below, I think there is unnecessary lines of code.

fig = plt.figure()
ax1 = fig.add_subplot(111)
g = sns.catplot(x=“variable”, col=“cardio”, data= dfB_long, hue=“value”, kind= ‘count’, orient= “v”, ax=ax1)
#plt.close(1)

Instead of that, try this and see if the cat plot passes or at least displays the same. Let me know the result.

g = sns.catplot(x=“variable”, col=“cardio”, data= dfB_long, hue=“value”, kind= ‘count’, orient= “v”, ax=ax1).figure

fig = g

It’s late so I didn’t actually run through your code yet but if what you’re saying is true and your graph is identical to the test then this should work. All I did delete all the other lines of code from your previous code block since seaborn doesn’t need those and I added .figure at the end of it to make it a figure and then in replit I made the value of fig equal g. In Replit, fig was given to us to define. This is how the figure is saved and then returned to us in an image, in Replit. It may not work and you may need to change some of the keyword arguments within the catplot function (for ex: changing kind= to bar instead of count), but for now lets see how this line of code works out.

EDIT:

I went into your code and saw that the first graphs code was this:

# Draw the catplot with 'sns.catplot()'
  g = sns.catplot(x="variable",
                  col="cardio",
                  data=df_cat,
                  hue="value",
                  kind='count',
                  orient="v")
  g.set_axis_labels("variable", "total")

  # Get the figure for the output
  fig = g

The ONLY issue is that fig = g needs to be in figure format. So literally just change that to fig = g.figure, and everything but the final test passes. You cleaned your data and created the graph a bit different than I did so my first suggestion didn’t work because you can’t set axis labels after you use .figure.

FINAL EDIT:

As for the heatmap, you’re having the same issue I had, I got your final test to pass by changing one part of the dataframe cleaning process. Your final values are slightly off because when removed the blood pressure, weight and height values, you first removed the blood pressure, and THEN removed height and weight. When the blood pressure values were removed, some of the height and weights were also removed. Readdress that section and let me know the results.

P.S. The colors don’t have to be exact to pass the test but if you want them to be then try adding this to your keyword arguments in the heatmap function:

vmin=-0.16,
vmax=0.32,
center=0.0,
cbar_kws={'shrink': 0.5, 'ticks': [0.24, 0.16, 0.08, 0.00, -0.08]}

# vmin sets the bottom range of the colorbar (not the tick, the range)
# vmax sets the top range of the colorbar (not the tick, the range)
# center sets 0.0 as the center color of the colorbar (making 0.0 values black)
# cbar_kws first shrinks the colorbar a bit and then sets the ticks to match the example

parodiadrian3 · March 16, 2023, 5:08pm

Hi there! thanks for your answer bro. I didn’t go trough it yet but I will. I just felt the need to thank you in advance. But I’ll let you know if this works. Thanks again!

pd: yes, I also have the feeling that the answers on the Internet are quite complex but sometimes it seems that the “easy” solutions don’t fit the problem.

bsandma1 · March 16, 2023, 5:40pm

You’re welcome! To save you some time, just read the edits! I left the original text for reference!
Also, sometimes I’ll come back later and realize the complex solution was actually super simple, I just hadn’t learned it well enough yet

parodiadrian3 · March 16, 2023, 5:42pm

Update: great, I haven’t seen that notation ( sns.catplot(…).figure). It looks pretty simple now, but I couldn’t really find such a simple answer. WOW!! thanks man I truly appreciate your help!! I have been coming back to this for weeks and it finally worked! catplot passes all the test with that simple adjustment.

Perhaps you can recommend me some source to look for this kind of stuff? I guess that you had found this information somewhere, I mean, how to convert from objects to figures and all that!

Working on heatmap now!

Update: ALL DONE!!!
yes that was the problem with heat map. I actually thought about that, but it seemed more reasonable for me to eliminate the outliers first. Nevertheless, my logic was wrong in this case.

I’ve passed all the tests now!! thank you so much! and thanks to the other guys for their insights too! You were very helpful!!

bsandma1 · March 16, 2023, 6:04pm

Awesome, glad I could help! Yeah, these data analysis projects were my favorite but they were a lot more challenging than the scientific computing projects in my opinion. When I was doing these I had like 10-15 tabs up at all times because I was searching so many different places trying to figure stuff out. Honestly I can’t recall a specific source but I do know that if I search something and an example from stack overflow pops up, I will usually look through all of the answers until someone puts it simply. If it looks overly complicated I just try a different search lol. I just try to avoid it if I can’t understand it at all because then I can’t even explain my work when I finish.

The only 2 things I bookmarked were:

Indexing on ndarrays — NumPy v1.24 Manual

and

Pandas Get Row Number of DataFrame - Spark By {Examples}

At some point I needed those

system · September 15, 2023, 6:05am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data Analysis with Python Projects - Medical Data Visualizer - Figure Settings Python	10	4108	April 4, 2024
Problems with "Data Analysis with Python Projects - Medical Data Visualizer" Code Feedback	4	2609	June 1, 2021
Medical Data Visualizer Confusion Python	54	14009	June 1, 2021
Test doesn’t work-medical data visualizzer Python	5	125	September 11, 2024
Medical Data Visualizer Help- Python	3	1684	June 1, 2021

Data Analysis with Python Projects - Medical Data Visualizer . How to pass dataframes to functions that aren't supposed to receive any parameters

Related topics