ValueError: Could not interpret input ‘total’
‘total’ isn’t in your data. If you input df_cat.columns
, there is no ‘total’. Remove ‘total’ from your catplot()
.
ValueError: Could not interpret input ‘total’
‘total’ isn’t in your data. If you input df_cat.columns
, there is no ‘total’. Remove ‘total’ from your catplot()
.
kind='count'
and hue='value'
too. You haven’t been matching up with what ArbyC mentioned.
Hi @nmiquan - I have made yet another attempt following ArbyC and your directions…and started altogether a new assignment … yet the error message is unchanged - below is the code and
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Import data
df = pd.read_csv('medical_examination.csv')
# Add 'overweight' column
df['overweight'] = (df['weight']/ (df['height']/ 100 **2))
df['overweight'] = df['overweight'].apply(lambda x: 1 if x > 25 else 0)
# Normalize data by making 0 always good and 1 always bad. If the value of 'cholestorol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.
df['gluc'] = df['gluc'].apply(lambda x : 0 if x == 1 else 1)
df.loc[df['cholesterol'] == 1, 'cholesterol'] = 0
df.loc[df['cholesterol'] > 1, 'cholesterol'] = 1
# Draw Categorical Plot
def draw_cat_plot():
# Create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
df_cat = pd.melt(df, value_vars=['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke'], id_vars ='cardio')
# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.
#df_cat = None
# Draw the catplot with 'sns.catplot()'
fig = sns.catplot(data=df_cat, kind="count", x="variable", hue="value", col="cardio")
# Do not modify the next two lines
fig.savefig('catplot.png')
return fig
# Draw Heat Map
def draw_heat_map():
# Clean the data
df_heat = df[(df['ap_lo'] <= df['ap_hi']) &
df['height'] >= (df['height'].quantile(0.025)) &
df['height'] <= (df['height'].quantile(0.975)) &
df['weight'] >= (df['weight'].quantile(0.025)) &
df['weight'] <= (df['weight'].quantile(0.975))
]
# Calculate the correlation matrix
corr = df_heat.corr()
# Generate a mask for the upper triangle
mask = np.triu(corr)
# Set up the matplotlib figure
fig, ax = plt.subplots(figsize=(9,9))
# Draw the heatmap with 'sns.heatmap()'
sns.heatmap(corr, linewidths=1, mask=mask, vmax=.3, center=0.09,square=True, cbar_kws = {'orientation' : 'horizontal'})
# Do not modify the next two lines
fig.savefig('heatmap.png')
return fig
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv(‘medical_examination.csv’)
df[‘overweight’] = (df[‘weight’]/ (df[‘height’]/ 100 **2))
df[‘overweight’] = df[‘overweight’].apply(lambda x: 1 if x > 25 else 0)
df[‘gluc’] = df[‘gluc’].apply(lambda x : 0 if x == 1 else 1)
df.loc[df[‘cholesterol’] == 1, ‘cholesterol’] = 0
df.loc[df[‘cholesterol’] > 1, ‘cholesterol’] = 1
def draw_cat_plot():
# Create DataFrame for cat plot using pd.melt
using just the values from ‘cholesterol’, ‘gluc’, ‘smoke’, ‘alco’, ‘active’, and ‘overweight’.
df_cat = pd.melt(df, value_vars=[‘active’, ‘alco’, ‘cholesterol’, ‘gluc’, ‘overweight’, ‘smoke’], id_vars =‘cardio’)
# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.
#df_cat = None
# Draw the catplot with 'sns.catplot()'
fig = sns.catplot(data=df_cat, kind="count", x="variable", hue="value", col="cardio")
# Do not modify the next two lines
fig.savefig('catplot.png')
return fig
def draw_heat_map():
# Clean the data
df_heat = df[(df[‘ap_lo’] <= df[‘ap_hi’]) &
df[‘height’] >= (df[‘height’].quantile(0.025)) &
df[‘height’] <= (df[‘height’].quantile(0.975)) &
df[‘weight’] >= (df[‘weight’].quantile(0.025)) &
df[‘weight’] <= (df[‘weight’].quantile(0.975))
]
# Calculate the correlation matrix
corr = df_heat.corr()
# Generate a mask for the upper triangle
mask = np.triu(corr)
# Set up the matplotlib figure
fig, ax = plt.subplots(figsize=(9,9))
# Draw the heatmap with 'sns.heatmap()'
sns.heatmap(corr, linewidths=1, mask=mask, vmax=.3, center=0.09,square=True, cbar_kws = {'orientation' : 'horizontal'})
# Do not modify the next two lines
fig.savefig('heatmap.png')
return fig
Your draw_catplot()
works fine.
Check parentheses here
df_heat = df[(df['ap_lo'] <= df['ap_hi']) &
df['height'] >= (df['height'].quantile(0.025)) &
df['height'] <= (df['height'].quantile(0.975)) &
df['weight'] >= (df['weight'].quantile(0.025)) &
df['weight'] <= (df['weight'].quantile(0.975))
]
in your draw_heat_map()
.
Thanks @nmiquan ; addressed parantheses and looks better except for failure in heatmap values? Diff of 962 characters?
def draw_heat_map():
# Clean the data
df_heat = df[
(df['ap_lo'] <= df['ap_hi']) &
(df['height'] >= (df['height'].quantile(0.025))) &
(df['height'] <= (df['height'].quantile(0.975))) &
(df['weight'] >= (df['weight'].quantile(0.025))) &
(df['weight'] <= (df['weight'].quantile(0.975)))
]
# Calculate the correlation matrix
corr = df_heat.corr()
# Generate a mask for the upper triangle
mask = np.triu(corr)
# Set up the matplotlib figure
fig, ax = plt.subplots(figsize=(9,9))
# Draw the heatmap with 'sns.heatmap()'
sns.heatmap(corr, linewidths=1, mask=mask, vmax=.3, center=0.09,square=True, cbar_kws = {'orientation' : 'horizontal'})
# Do not modify the next two lines
fig.savefig('heatmap.png')
return fig
Traceback (most recent call last):
File “/home/runner/ColossalRegalProject/test_module.py”, line 47, in test_heat_map_values
self.assertEqual(actual, expected, “Expected differnt values in heat map.”)
AssertionError: Lists differ: != [‘0.0’, ‘0.0’, ‘-0.0’, ‘0.0’, ‘-0.1’, '0.5[628 chars], ‘’]
Second list contains 94 additional elements.
First extra element 0:
‘0.0’
Diff is 962 characters long. Set self.maxDiff to None to see it. : Expected differnt values in heat map.
Ran 4 tests in 8.669s
FAILED (failures=1, errors=2)
Manually check if the values in your heatmap are correct. Otherwise, I am suspecting there is an error in the unittest.
If you look at the saved plot, are there values inside of squares? annot
argument passed to sns.heatmap
should handle that, along fmt
argument to format annotations.
If that doesn’t help, can you paste link to repl.it with your code, or somewhere else? That’ll make helping easier.
Hi @sanity - below is my code pasted in along with the repl.it URL and the heatmap plot image.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Import data
df = pd.read_csv('medical_examination.csv')
# Add 'overweight' column
df['overweight'] = (df['weight']/ (df['height']/ 100 **2))
df['overweight'] = df['overweight'].apply(lambda x: 1 if x > 25 else 0)
# Normalize data by making 0 always good and 1 always bad. If the value of 'cholestorol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.
df['gluc'] = df['gluc'].apply(lambda x : 0 if x == 1 else 1)
df.loc[df['cholesterol'] == 1, 'cholesterol'] = 0
df.loc[df['cholesterol'] > 1, 'cholesterol'] = 1
# Draw Categorical Plot
def draw_cat_plot():
# Create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
df_cat = pd.melt(df, value_vars=['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke'], id_vars ='cardio')
# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.
#df_cat = None
# Draw the catplot with 'sns.catplot()'
fig = sns.catplot(data=df_cat, kind="count", x="variable", hue="value", col="cardio")
# Do not modify the next two lines
fig.savefig('catplot.png')
return fig
# Draw Heat Map
def draw_heat_map():
# Clean the data
df_heat = df[
(df['ap_lo'] <= df['ap_hi']) &
(df['height'] >= (df['height'].quantile(0.025))) &
(df['height'] <= (df['height'].quantile(0.975))) &
(df['weight'] >= (df['weight'].quantile(0.025))) &
(df['weight'] <= (df['weight'].quantile(0.975)))
]
# Calculate the correlation matrix
corr = df_heat.corr()
# Generate a mask for the upper triangle
mask = np.triu(corr)
# Set up the matplotlib figure
fig, ax = plt.subplots(figsize=(9,9))
# Draw the heatmap with 'sns.heatmap()'
sns.heatmap(corr,annot=True, fmt='.1f', linewidths=1, mask=mask, vmax=.8, center=0.09,square=True, cbar_kws = {'shrink':0.5})
# Do not modify the next two lines
fig.savefig('heatmap.png')
return fig
Alright, regarding errors with draw_cat_plot
, sns.catplot
doesn’t return figure, it returns FacetGrid
object. fig
attribute can be used to access figure from it.
Regarding draw_heat_map
. As it can be seen on the plot there are missing some values. Take a closer look at the way there’s added df['overweight']
column and make sure all is correct there.
After getting this in order you will probably also encounter another issue, with some differences in the ax data returned by the function, due to different version of matplotlib. To bypass that issue without changing test you can force repl.it to use older version. To do that, in poetry.lock
file for matplotlib
change the version to 3.2.2
and then re-run to make repl.it update dependencies.
@sanity Thank you very much for your patience and assistance; based on your last post I have rectified df[‘overweight’] and using .fig method plotted sns.catplot. Lastly; changed matplotlib version to 3.2.2 in poetry.lock file and all seems to be okay except for one last thing - it fails unable to plot 'Ylabel" - as TOTAL ?? I have pasted all my code along with error message and the TEST FILE code. Thank you in advance.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Import data
df = pd.read_csv('medical_examination.csv')
# Add 'overweight' column
df['overweight'] = (df['weight']/ (df['height']/ 100) **2).apply(lambda x : 1 if x > 25 else 0)
# Normalize data by making 0 always good and 1 always bad. If the value of 'cholestorol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.
df['gluc'] = df['gluc'].apply(lambda x : 0 if x == 1 else 1)
df.loc[df['cholesterol'] == 1, 'cholesterol'] = 0
df.loc[df['cholesterol'] > 1, 'cholesterol'] = 1
# Draw Categorical Plot
def draw_cat_plot():
# Create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
df_cat = pd.melt(df, value_vars=['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke'], id_vars ='cardio')
# Group and reformat the data to split it by 'cardio'. Show the counts of each feature. You will have to rename one of the collumns for the catplot to work correctly.
#df_cat = None
# Draw the catplot with 'sns.catplot()'
fig = sns.catplot(data=df_cat, kind='count', x='variable', hue='value', col='cardio').fig
# Do not modify the next two lines
fig.savefig('catplot.png')
return fig
# Draw Heat Map
def draw_heat_map():
# Clean the data
df_heat = df[
(df['ap_lo'] <= df['ap_hi']) &
(df['height'] >= (df['height'].quantile(0.025))) &
(df['height'] <= (df['height'].quantile(0.975))) &
(df['weight'] >= (df['weight'].quantile(0.025))) &
(df['weight'] <= (df['weight'].quantile(0.975)))
]
# Calculate the correlation matrix
corr = df_heat.corr()
# Generate a mask for the upper triangle
mask = np.triu(corr)
# Set up the matplotlib figure
fig, ax = plt.subplots(figsize=(9,9))
# Draw the heatmap with 'sns.heatmap()'
sns.heatmap(corr,annot=True, fmt='.1f', linewidths=1, mask=mask, vmax=.8, center=0.09,square=True, cbar_kws = {'shrink':0.5})
# Do not modify the next two lines
fig.savefig('heatmap.png')
return fig
FAILURE message:
FAIL: test_line_plot_labels (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
.F.['0.0', '0.0', '-0.0', '0.0', '-0.1', '0.5', '0.0', '0.1', '0.1', '0.3', '0.0',
'0.0', '0.0', '0.0', '0.0', '0.0', '0.2', '0.1', '0.0', '0.2', '0.1', '0.0', '0.1
', '-0.0', '-0.1', '0.1', '0.0', '0.2', '0.0', '0.1', '-0.0', '-0.0', '0.1', '0.0'
, '0.1', '0.4', '-0.0', '-0.0', '0.3', '0.2', '0.1', '-0.0', '0.0', '0.0', '-0.0',
'-0.0', '-0.0', '0.2', '0.1', '0.1', '0.0', '0.0', '0.0', '0.0', '0.3', '0.0', '-
0.0', '0.0', '-0.0', '-0.0', '-0.0', '0.0', '0.0', '-0.0', '0.0', '0.0', '0.0', '0
.2', '0.0', '-0.0', '0.2', '0.1', '0.3', '0.2', '0.1', '-0.0', '-0.0', '-0.0', '-0
.0', '0.1', '-0.1', '-0.1', '0.7', '0.0', '0.2', '0.1', '0.1', '-0.0', '0.0', '-0.
0', '0.1', '', '', '']
.
======================================================================
FAIL: test_line_plot_labels (test_module.CatPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/ColossalRegalProject/test_module.py", line 18, in test_line_p
lot_labels
self.assertEqual(actual, expected, "Expected line plot ylabel to be 'total'")
AssertionError: 'count' != 'total'
- count
+ total
: Expected line plot ylabel to be 'total'
----------------------------------------------------------------------
Ran 4 tests in 9.872s
FAILED (failures=1)
def test_line_plot_labels(self):
actual = self.ax.get_xlabel()
expected = "variable"
self.assertEqual(actual, expected, "Expected line plot xlabel to be 'variable'")
actual = self.ax.get_ylabel()
expected = "total"
self.assertEqual(actual, expected, "Expected line plot ylabel to be 'total'")
actual = []
for label in self.ax.get_xaxis().get_majorticklabels():
actual.append(label.get_text())
expected = ['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke']
That just means label on the plot for the y axis needs to be changed from count
to total
. There’s few ways to do that, for example FacetGrid
object has methods to do that.
Apologies for repeatedly asking for help; @sanity
Okay; so the catplot (bar graph) has y-axis read as “count” that label needs to be changed to Total. So; I look up seaborn documentation and it says “methods to tweak presentation”
But; when I use the following code to set_axis labels it doesnt work?
fig.set_axis_labels("", "total")
Traceback (most recent call last):
File “main.py”, line 6, in
medical_data_visualizer.draw_cat_plot()
File “/home/runner/ColossalRegalProject/medical_data_visua
lizer.py”, line 32, in draw_cat_plot
fig.set_axis_labels("", “total”)
AttributeError: ‘Figure’ object has no attribute ‘set_axis_l
abels’
Keep in mind that once you write fig = sns.catplot(...).fig
that’s no longer FacetGrid
, but just a figure. So the label setting needs to happen before assigning fig
attribute from the FacetGrid
.
For example:
g = sns.catplot(...)
g.(...) # setting y label in here
fig = g.fig
Thanks heaps for all you help. I feel as if I asked for more help than ever on this assignment; it wasn’t an easy one. I shall try bit harder the next assignment and I am sure it won’t be easy too. Thank you once again.
Hi, is anything wrong with this display ? been stuck and clueless
I remember being advised without using groupby; should do.
So; the first step was to use melt() function (df; value_vars, – >against id_vars) and later the result would be sns.catplot( x, kind, hue, col=‘cardio’) and then draw the fig. Sorry; like I said this is what I recall. Let me know if that helps.
try this
df_cat = df_cat.groupby([‘cardio’,‘variable’, ‘value’], as_index = False).size().rename(columns={‘size’:‘total’})
Yes, it works perfectly fine. I wonder what’s the purpose of creating a data-frame which is grouped by cardio…
Doesn’t work for me, I get wired plot in my legend there are two 1 and two 0
df_cat = pd.melt(df, id_vars = "cardio", value_vars = ["active", "alco", "cholesterol", "gluc", "overweight", "smoke"])
fig = sns.catplot(x = "variable", hue = "value", col = "cardio", data = df_cat, kind = "count").set_axis_labels("variable", "total")