Page View Time Series Visualizer - error for cleaning data

Hi,

I am seeing an issue here:

I am not sure why this is happening. I am only calling “value” in this part and am not sure why it says it cannot be turned into an INT when it is a INT.

Here is my code

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# Import data (Make sure to parse dates. Consider setting index column to 'date'.)
df = pd.read_csv("fcc-forum-pageviews.csv",parse_dates = ["date"], index_col = "date")

# Clean data
df = df[(df["value"] >= df["value"].quantile(0.025)) &
   (df["value"] <= df["value"].quantile(0.975))]


def draw_line_plot():
    #draw plot 
    fig = plt.figure()
    plt.plot(df.index, df['value']) 
    plt.ylabel('Page Views')
    plt.xlabel('Date')
    plt.title('Daily freeCodeCamp Forum Page Views 5/2016-12/2019') 





    # Save image and return fig (don't change this part)
    fig.savefig('line_plot.png')
    return fig

def draw_bar_plot():
    # Copy and modify data for monthly bar plot
    df["month"] = df.index.month
    df["year"] = df.index.year
    df_bar = df.groupby(["year", "month"])["value"].mean()
    df_bar = df_bar.unstack()
    

    # Draw bar plot
    fig = df_bar.plot(kind ="bar", legend = True, figsize = (8, 8)).figure
    plt.xlabel("Years", fontsize= 9)
    plt.ylabel("Average Page Views", fontsize= 9)
    plt.legend(fontsize = 10, labels = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])




    # Save image and return fig (don't change this part)
    fig.savefig('bar_plot.png')
    return fig

def draw_box_plot():
    # Prepare data for box plots (this part is done!)
    df_box = df.copy()
    df_box.reset_index(inplace=True)
    df_box['year'] = [d.year for d in df_box.date]
    df_box['month'] = [d.strftime('%b') for d in df_box.date]

    # Draw box plots (using Seaborn)    
    df_box["month_1"] = df_box["date"].dt.month
    df_box = df_box.sort_values("month_1")
    
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize = (8,8))


    axes[0] = sns.boxplot(x=df_box["year"], y=df_box["value"], ax = axes[0])
    axes[1] = sns.boxplot(x=df_box["month"], y=df_box["value"], ax = axes[1])

    axes[0].set_title("Year-wise Box Plot (Trend)")
    axes[0].set_xlabel('Year')
    axes[0].set_ylabel('Page Views')

    axes[1].set_title("Month-wise Box Plot (Seasonality)")
    axes[1].set_xlabel('Month')
    axes[1].set_ylabel('Page Views')







    # Save image and return fig (don't change this part)
    fig.savefig('box_plot.png')
    return fig

I don’t have the original boilerplate for this project handy, but in my version I left a comment and changed the test like this:

class DataCleaningTestCase(unittest.TestCase):
    def test_data_cleaning(self):
        # This does not return the row count by itself.
        # actual = int(time_series_visualizer.df.count())
        # This returns the row count.
        actual = int(time_series_visualizer.df.shape[0])
        expected = 1238
        self.assertEqual(actual,
                         expected,
                         "Expected DataFrame count after cleaning to be 1238.")

I ran your code with my tests and everything passed.

Hi! I am having this issue too…is it ok to modify the test if i would apply for the certification? Isn’t there any other approach?

I can’t remember what caused this or if I even investigated the cause. My guess is the project was developed with an old pandas version and when I did it I was using whatever was current (the above test works with pandas 1.2.0).

Your only choices are to fix the test, run a virtual environment (locally or on repl.it) with the correct old version of pandas, or check to see if the project boilerplate has been updated and there is a new version with a corrected test.

Problem here isn’t the test or pandas version.

Make sure you aren’t modifying df after cleaning data and if one of the functions needs to change something always use copy of the df.

2 Likes

You’re right! Thank you! I used “pd.DataFrame(df)” to copy the df for the bar plot and now changed to df.copy() and it works. :+1:

Thank you! that fixed it :slight_smile: