Page View Time Series Visualizer - error for cleaning data

rrf54 · November 10, 2020, 6:34pm

Hi,

I am seeing an issue here:

I am not sure why this is happening. I am only calling “value” in this part and am not sure why it says it cannot be turned into an INT when it is a INT.

Here is my code

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# Import data (Make sure to parse dates. Consider setting index column to 'date'.)
df = pd.read_csv("fcc-forum-pageviews.csv",parse_dates = ["date"], index_col = "date")

# Clean data
df = df[(df["value"] >= df["value"].quantile(0.025)) &
   (df["value"] <= df["value"].quantile(0.975))]


def draw_line_plot():
    #draw plot 
    fig = plt.figure()
    plt.plot(df.index, df['value']) 
    plt.ylabel('Page Views')
    plt.xlabel('Date')
    plt.title('Daily freeCodeCamp Forum Page Views 5/2016-12/2019') 





    # Save image and return fig (don't change this part)
    fig.savefig('line_plot.png')
    return fig

def draw_bar_plot():
    # Copy and modify data for monthly bar plot
    df["month"] = df.index.month
    df["year"] = df.index.year
    df_bar = df.groupby(["year", "month"])["value"].mean()
    df_bar = df_bar.unstack()
    

    # Draw bar plot
    fig = df_bar.plot(kind ="bar", legend = True, figsize = (8, 8)).figure
    plt.xlabel("Years", fontsize= 9)
    plt.ylabel("Average Page Views", fontsize= 9)
    plt.legend(fontsize = 10, labels = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])




    # Save image and return fig (don't change this part)
    fig.savefig('bar_plot.png')
    return fig

def draw_box_plot():
    # Prepare data for box plots (this part is done!)
    df_box = df.copy()
    df_box.reset_index(inplace=True)
    df_box['year'] = [d.year for d in df_box.date]
    df_box['month'] = [d.strftime('%b') for d in df_box.date]

    # Draw box plots (using Seaborn)    
    df_box["month_1"] = df_box["date"].dt.month
    df_box = df_box.sort_values("month_1")
    
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize = (8,8))


    axes[0] = sns.boxplot(x=df_box["year"], y=df_box["value"], ax = axes[0])
    axes[1] = sns.boxplot(x=df_box["month"], y=df_box["value"], ax = axes[1])

    axes[0].set_title("Year-wise Box Plot (Trend)")
    axes[0].set_xlabel('Year')
    axes[0].set_ylabel('Page Views')

    axes[1].set_title("Month-wise Box Plot (Seasonality)")
    axes[1].set_xlabel('Month')
    axes[1].set_ylabel('Page Views')







    # Save image and return fig (don't change this part)
    fig.savefig('box_plot.png')
    return fig

jeremy.a.gray · November 10, 2020, 9:06pm

I don’t have the original boilerplate for this project handy, but in my version I left a comment and changed the test like this:

class DataCleaningTestCase(unittest.TestCase):
    def test_data_cleaning(self):
        # This does not return the row count by itself.
        # actual = int(time_series_visualizer.df.count())
        # This returns the row count.
        actual = int(time_series_visualizer.df.shape[0])
        expected = 1238
        self.assertEqual(actual,
                         expected,
                         "Expected DataFrame count after cleaning to be 1238.")

I ran your code with my tests and everything passed.

ungureanu_daniel86 · January 8, 2021, 10:24pm

Hi! I am having this issue too…is it ok to modify the test if i would apply for the certification? Isn’t there any other approach?

jeremy.a.gray · January 8, 2021, 10:38pm

I can’t remember what caused this or if I even investigated the cause. My guess is the project was developed with an old pandas version and when I did it I was using whatever was current (the above test works with pandas 1.2.0).

Your only choices are to fix the test, run a virtual environment (locally or on repl.it) with the correct old version of pandas, or check to see if the project boilerplate has been updated and there is a new version with a corrected test.

sanity · January 9, 2021, 6:12am

Problem here isn’t the test or pandas version.

Make sure you aren’t modifying df after cleaning data and if one of the functions needs to change something always use copy of the df.

ungureanu_daniel86 · January 9, 2021, 8:49am

You’re right! Thank you! I used “pd.DataFrame(df)” to copy the df for the bar plot and now changed to df.copy() and it works.

niccoleme · February 15, 2021, 2:02am

Thank you! that fixed it

Topic		Replies	Views
Cannot convert the series to class 'int' Python	6	3761	December 9, 2021
Data Analysis with Python Projects - Page View Time Series Visualizer Python	2	348	March 23, 2024
Problem with data cleaning Python	1	427	June 1, 2021
Data Analysis with Python Projects - Page View Time Series Visualizer Python	4	511	June 9, 2023
Page View Time Series Visualizer (test_data_cleaning error, test_box_plot_labels fail) Python	2	1264	June 1, 2021

Page View Time Series Visualizer - error for cleaning data

Related topics