Page View Time Series Visualizer: box plot different from example

Hi, I’m getting the yearly box plot to look like the examples but the monthly plot is way off. Did I miss something? Code and result below:

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sns

# Going to import datetime as well

import datetime as dt

from pandas.plotting import register_matplotlib_converters

register_matplotlib_converters()

# Import data (Make sure to parse dates. Consider setting index column to 'date'.)

df = pd.read_csv('fcc-forum-pageviews.csv')

# Clean data

df = df[(df['value'] >= df['value'].quantile(0.025)) &

(df['value'] <= df['value'].quantile(0.975))]

# Going to convert the dates to time-date to assist with plotting

df['date'] = pd.to_datetime(df['date'])

def draw_box_plot():
    # Prepare data for box plots (this part is done!)
    df_box = df.copy()
    df_box.reset_index(inplace=True)
    df_box['year'] = [d.year for d in df_box.date]
    df_box['month'] = [d.strftime('%b') for d in df_box.date]

    # Draw box plots (using Seaborn)
    fig,axes = plt.subplots(1,2,figsize=[36,12])

    sns.boxplot(x='year',y='value',data=df_box,ax=axes[0])
    axes[0].set_xlabel('Year')
    axes[0].set_ylabel('Page Views')
    axes[0].set_title('Year-wise Box Plot (Trend)')

    sns.boxplot(x='month',y='value',data=df_box,ax=axes[1])
    axes[1].set_xlabel('Month')
    axes[1].set_ylabel('Page Views')
    axes[1].set_title('Month-wise Box Plot (Seasonality)')
    axes[1].set_xticklabels(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep'
                        ,'Oct','Nov','Dec'])

    plt.ylim([10000,190000])





    # Save image and return fig (don't change this part)
    fig.savefig('box_plot.png')
    return fig

https://repl.it/@sapienza789/boilerplate-page-view-time-series-visualizer-1#box_plot.png

I’ve edited your post for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.

You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.

See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (’).

Comparing the box plots, it looks like yours is 4 months off from the example since your large month is in June as opposed to October as it should be. Since the data starts in May, you’ll need to tell seaborn that the data starts in May, so that it draws the May data in May and not January as I believe it’s doing here (if the May data is in January, the October data would be in June). You probably should look at passing the order parameter to your call to sns.boxplot().

@jeremy.a.gray: Thanks! That was definitely part of the issue, compounded by me forgetting that the provided data frame already had the date in 3 letter format. My numbers are still off though. Time to do some more digging.

I noticed some differences in my graphs and the example graphs on the box plots in particular. Others have commented on that here as well. With my graphs and others I’ve seen, it looks like the shape is similar and that the numbers are of similar size or the same, but I’ve not been able to conclusively determine if they are different and if so, why. My suspicion is that there have been enough changes in the libraries used to produce a different appearance or that the examples were generated with an older data set or one cleaned in a slightly different way.

However, if you are passing the data cleaning test for this project, your graphs look similar or the same, and all the other tests pass, then I would assume that you have correctly analyzed the data.

Yes, I took a look at the raw data and noticed a maximum of about 120000 views in January which matches the maximum in my box plot. Maybe the examples were run on slightly different data.

My updated plot is available through the link I posted previously.

1 Like