Data Analysis with Python Projects - Page View Time Series Visualize - Bar Plot has Wrong Number of Barsr

Tell us what’s happening:
Failed the test: test_bar_plot_number_of_bars(self) because it expected the boxplot to have 49 bars and got 51.

Your code so far

def draw_bar_plot():
    # Copy and modify data for monthly bar plot
    df_bar = df.copy()
    df_bar.reset_index(inplace=True)

    df_bar['year'] = [d.year for d in df_bar.date]
    df_bar['month'] = [d.month for d in df_bar.date]

    # Create monthly average plot
    df_avg_bar = pd.DataFrame(df_bar.groupby(['year', 'month'])['value'].mean().round(decimals = 2))
    df_avg_bar.reset_index(inplace = True)

    # Month list for Legend
    month_list = ['January', 'February', 'March', 'April', 'May', 'June', 
                  'July', 'August', 'September', 'October', 'November', 'December']

    # Draw bar plot
    fig = plt.figure()
    sns.barplot(data = df_avg_bar, x = 'year', y = 'value', hue = 'month')
    plt.xlabel('Years')
    plt.ylabel('Average Page Views')
    plt.legend(month_list, loc='upper left')

    # Save image and return fig (don't change this part)
    fig.savefig('bar_plot.png')
    return fig

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/118.0

Challenge: Data Analysis with Python Projects - Page View Time Series Visualizer

Link to the challenge:

Can you provide your full code please? Error might occur somewhere else like cleaning up the data or elsewhere.

This function is ok.

This is kind of a wild guess but I would check your code that implements this instruction:

Clean the data by filtering out days when the page views were in the top 2.5% of the dataset or bottom 2.5% of the dataset.

Make sure you are using > or >= where appropriate. This could create an “off by 2” error.

Here’s the full code.

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters

register_matplotlib_converters()

# Import data (Make sure to parse dates. Consider setting index column to 'date'.)
df = pd.read_csv('fcc-forum-pageviews.csv', parse_dates=True, index_col='date')

# Clean data
df = df[(df['value'] > df['value'].quantile(0.025))
      & (df['value'] <= df['value'].quantile(0.975))]

# Draw Line Plot
def draw_line_plot():
    # Draw line plot
    fig, axes = plt.subplots(figsize=(18, 6))
    axes.plot(df, color='r')
    axes.set_xlabel('Date')
    axes.set_ylabel('Page Views')
    axes.set_title("Daily freeCodeCamp Forum Page Views 5/2016-12/2019")

    # Save image and return fig (don't change this part)
    fig.savefig('line_plot.png')
    return fig

# Draw Bar Plot
def draw_bar_plot():
    # Copy and modify data for monthly bar plot
    df_bar = df.copy()
    df_bar.reset_index(inplace=True)

    df_bar['year'] = [d.year for d in df_bar.date]
    df_bar['month'] = [d.month for d in df_bar.date]

    # Create monthly average plot
    df_avg_bar = pd.DataFrame(df_bar.groupby(['year', 'month'])['value'].mean().round(decimals = 2))
    df_avg_bar.reset_index(inplace = True)

    # Month list for Legend
    month_list = ['January', 'February', 'March', 'April', 'May', 'June', 
                  'July', 'August', 'September', 'October', 'November', 'December']

    # Draw bar plot
    # fig, ax = plt.subplots(figsize=(10, 15))
    fig = plt.figure()
    sns.barplot(data = df_avg_bar, x = 'year', y = 'value', hue = 'month')
    plt.xlabel('Years')
    plt.ylabel('Average Page Views')
    plt.legend(month_list, loc='upper left')

    # Save image and return fig (don't change this part)
    fig.savefig('bar_plot.png')
    return fig

# Draw Box Plot
def draw_box_plot():
    # Prepare data for box plots (this part is done!)
    df_box = df.copy()
    df_box.reset_index(inplace=True)
    df_box['year'] = [d.year for d in df_box.date]
    df_box['month'] = [d.strftime('%b') for d in df_box.date]

    # Month List for ordering Seasonality Plot
    month_list = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                  'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

    # Draw box plots (using Seaborn)

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 6))

    # Yearly Box Plot
    sns.boxplot(data=df_box, x = 'year', y = 'value', hue = 'year', ax = ax1)
    ax1.set_title('Year-wise Box Plot (Trend)')
    ax1.set_xlabel('Year')
    ax1.set_ylabel('Page Views')

    # Monthly Box Plot
    sns.boxplot(data=df_box, x='month', y='value',
                order = month_list, hue = 'month', ax = ax2)
    ax2.set_title('Month-wise Box Plot (Seasonality)')
    ax2.set_xlabel('Month')
    ax2.set_ylabel('Page Views')

    # Save image and return fig (don't change this part)
    fig.savefig('box_plot.png')
    return fig

Can someone explain what’s going on? This error isn’t very helpful.

I’ll try that, but I was able to use only <= and >= to clean the data in the Medical Data Visualizer and didn’t get any error like this.

Here’s what I used in that project:

df_heat = df[
				df['ap_lo'] <= df['ap_hi']
			][
				df['height'] >= df['height'].quantile(0.025)
			][
				df['height'] <= df['height'].quantile(0.975)
			][
				df['weight'] >= df['weight'].quantile(0.025)
			][
				df['weight'] <= df['weight'].quantile(0.975)
			]

Can you explain why you used > and <= here? You want to keep data less then or equal to quantile 0.975?

I was assuming that “removing the top 2.5% and bottom 2.5% of data” meant keeping data in the quantiles of 0.26 to 0.95, out of a max range of 0.01 to 1.00.

I also don’t see how the data cleaning is the problem because:

  1. The test I’m failing is checking the number of bars in the bar chart, which I presume means the number of monthly averages in the modified dataset (also it’s asking for 49, telling me I have 51, but the dataframe itself only shows 43).
  2. I’m getting the exact same error with any combination of = in the data cleaning.

I would love to copy the exact error for you, but the Replit console won’t let me.

And you’re still getting the same error with that code? I’m actually not able to replicate your error.

Here’s that test from my test_module.py:

    def test_bar_plot_number_of_bars(self):
        actual = len([rect for rect in self.ax.get_children() if isinstance(rect, mpl.patches.Rectangle)])
        expected = 49
        self.assertEqual(actual, expected, "Expected a different number of bars in bar chart.")

and the replit: https://replit.com/@pkdvalis/Sh0es-page-view-time-series-visualizer

I pasted your code in and it passes 11 tests. Can you link to your replit so I can have a closer look?

Certainly

Did you test my Replit? Mine works but yours has this error. I think mine was originally a fork of yours when we were troubleshooting the RAM overload?

I compared the code of both including the test script in an online Diff and they are identical. I suspect some spooky corruption.

Can you try this to test? Here is my Replit with a blank time_series_visualizer.py, fork it an paste in your code: https://replit.com/@pkdvalis/Sh0es-page-view-time-series-visualizer I think it should work.

Alternately, you can try forking this one: https://replit.com/@freeCodeCamp/fcc-time-series-visualizer EDIT: I forked this one, pasted in your code, and it passes all the tests.

Hi,
I know what is going on but can’t provide a solution to it as i am stuck on the same step. I am getting 57 instead of 49. If you have found the solution please do reply( i want to achieve it with sns.barplot() and not by pd.plot())

The error-causing factors are:

  1. Set legend=False in sns.barplot() and you will get 44 as your value of bars in the plot. Because the rows you get after the mean is 44, but if you leave legend as it is in she barplot() function then the shapes inside the legend are also considered rectangles if they are thick enough so number of legend labels with thick markers in the shape of rectangle is added to the total number of bars. In your case that number is 51. (P.S: when you use plt.legend() function then the values in side it are not counted for some weird reason.)
    barplot_menace
    for me it counts 57 = 44 df values + 12 legend values + 1 legend frame
  2. The missing values for the year 2016 which are the first 4 months cause your number of bars to be 44 and not 48. because we are only plotting 44 values from our dataframe.
    Keep in mind that the frame inside where the legend values are written it is also a shape of rectangle in the plot. that is the reason we have 49 bars instead of 48 even when we have only 48 months in 4 years. the additional 1 is for the legend frame.

This is a theory i came up with after wrecking my brain and still not able to replicate the figure2.png image for the plot if you have cracked the code please post it here as i am clearly not making any progress in sns.barplot()

I am sorry for my crude explanation. but please if any of you clear it let me know how to get it done or any clue.
I am adding my replit link so that you can check the code.

Hi, can you start a new thread please? Thanks!

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.