FCC - Time Series Analyser

Hi,
I am trying couple of different approaches when working on Line Plot and Bar plot in this assignment. Wonder if someone can just clarify if what I am trying to do is correct and achievable in that manner.

Q.1 When parsing Data column and also changing index to “Date” column can this be achieved different way compared to whats in my code? like
pd.read_csv(‘fcc…csv’)
pd.to_datetime(df[‘date’])

Q. 2. Can I use plt Matplotlib object to set my xlabel/ ylable/ title etc.? Or it has to be using ax object? like ax.set_xlabel and ax.set_title??

Q. 3 I imported datetime namespace can I not use DT object to set my newly created columns as df.dt.year or df.dt.month ? if No; why?

Q.4 For Bar plot; if I want to find out average views by grouping them into month & year can I use value_counts() function?
My bad; vaoue_counts() will only pick up unique values that re-occur. Disregard this ques pls

Thank in much advance; pls let me know if any of my questions make any sense.

import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# Import data (Make sure to parse dates. Consider setting index column to 'date'.)
df = pd.read_csv('fcc-forum-pageviews.csv', parse_dates = ['date'], index_col ='date')
#df.set_index('Date')


# Clean data
df = df[(df['value'] >= (df['value'].quantile(0.025))) &
(df['value'] <= (df['value'].quantile(0.975)))
]


def draw_line_plot():
    # Draw line plot
    fig, ax = plt.subplots()
    plt.figure(figsize= (9,14))

    plt.title("Daily freeCodeCamp Forum Page Views 5/2016-12/2019")
    plt.xlabel("Date")
    plt.ylabel("Page Views")
    plt.plot(df.index, df['value'], linewidth = 2)


    # Save image and return fig (don't change this part)
    fig.savefig('line_plot.png')
    return fig

def draw_bar_plot():
    df['year'] = df.dt.year
    df['month'] = df.dt.month
    
    # Copy and modify data for monthly bar plot
    df_bar = df.groupby(['year'],['month'])['value'].mean().value_counts()

@sanity @yjwang868 Anyone who can please help. All graphs (plots) appear fine although there is an error and it fails.

import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# Import data (Make sure to parse dates. Consider setting index column to 'date'.)
df = pd.read_csv('fcc-forum-pageviews.csv', parse_dates = ['date'], index_col ='date')
#df.set_index('Date')

# Clean data
df = df[(df['value'] >= (df['value'].quantile(0.025))) &
(df['value'] <= (df['value'].quantile(0.975)))
]


def draw_line_plot():
    # Draw line plot
    fig, ax = plt.subplots()
    plt.figure(figsize= (12,18))

    # plt.title("Daily freeCodeCamp Forum Page Views 5/2016-12/2019")
    # plt.xlabel("Date")
    # plt.ylabel("Page Views")
    # plt.plot(df.index, df['value'], linewidth = 2)

    ax.set_title("Daily freeCodeCamp Forum Page Views 5/2016-12/2019")
    ax.set_xlabel("Date", fontsize = 9)
    ax.set_ylabel("Page Views", fontsize =9)
    ax.plot(df.index, df['value'], linewidth = 1)

    plt.xticks(fontsize = 5)
    plt.yticks(fontsize = 5)

    # Save image and return fig (don't change this part)
    fig.savefig('line_plot.png')
    return fig

def draw_bar_plot():
    df['year'] = df.index.year
    df['month'] = df.index.month

    # df_bar = df.copy()
    # df_bar['date'] = df_bar.set_index
    # df['year'] = df_bar['year'].dt.year
    # df['month'] = df_bar['month'].dt.month
    
    # Copy and modify data for monthly bar plot
    df_bar = df.groupby(['year','month'])['value'].mean()
    df_bar = df_bar.unstack()
    # df_bar = df_bar.pivot(index='Date', columns=['year', 'month'], values='value')

    # Draw bar plot
    fig = df_bar.plot(figsize=(12,8), legend=True, kind='bar').figure
    plt.xlabel("Years")
    plt.ylabel("Average Page Views")
    #ax = plt.subplots()
    plt.xticks(fontsize = 9)
    plt.yticks(fontsize = 9)
    legend_labels = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

    plt.legend(legend_labels, loc='best')

    # Save image and return fig (don't change this part)
    fig.savefig('bar_plot.png')
    return fig

def draw_box_plot():
    # Prepare data for box plots (this part is done!)
    df_box = df.copy()
    df_box.reset_index(inplace=True)
    df_box['year'] = [d.year for d in df_box.date]
    df_box['month'] = [d.strftime('%b') for d in df_box.date]

    # Draw box plots (using Seaborn)
    df_box['monthorder'] = df_box['date'].dt.month
    df_box = df_box.sort_values('monthorder')

    fig, ax = plt.subplots(ncols = 2, figsize=(9, 4))

    ax[0] = sns.boxplot(x= df_box['year'], y = df_box['value'], ax = ax[0])
    ax[1] = sns.boxplot(x= df_box['month'], y = df_box['value'], ax = ax[1])

    ax[0].set_title("Year-wise Box Plot (Trend)")
    ax[0].set_xlabel("Year")
    ax[0].set_ylabel("Page Views")

    ax[1].set_title("Month-wise Box Plot (Seasonality)")
    ax[1].set_xlabel("Month")
    ax[1].set_ylabel("Page Views")

    # Save image and return fig (don't change this part)
    fig.savefig('box_plot.png')
    return fig

ot a writable directory; it is highly recommended to set the MPLCO
NFIGDIR environment variable to a writable directory, in particula
r to speed up the import of Matplotlib and to better support multi
processing.
…E…

====
ERROR: test_data_cleaning (test_module.DataCleaningTestCase)


Traceback (most recent call last):
File “/home/runner/DecimalShadyBootstrapping/test_module.py”, li
ne 7, in test_data_cleaning
actual = int(time_series_visualizer.df.count())
File “/opt/virtualenvs/python3/lib/python3.8/site-packages/panda
s/core/series.py”, line 129, in wrapper
raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class ‘int’>



Ran 11 tests in 10.488s

FAILED (errors=1)

This is the result of side effects in one of the function. What I mean is that function is modifying the original data, instead of changing its own copy of the data, those changes makes the test fail.