Page View Time Series Visualizer issue

Hi everyone,

I’m running the code below without issues in Google Colab but will get the following error when testing in replit.

Thanks in advance for your assistance.

Error:

File “/home/runner/boilerplate-page-view-time-series-visualizer-2/time_series_visualizer.py”, line 32, in draw_bar_plot
df_bar.groupby(pd.Grouper(freq=‘M’)).mean()
AttributeError: ‘function’ object has no attribute ‘groupby’

My code:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

df = pd.read_csv(‘fcc-forum-pageviews.csv’)
df = df.set_index(‘date’)
df.index = pd.to_datetime(df.index)
df = df[(df.iloc[:,0] >= df.iloc[:,0].quantile(0.025))&
(df.iloc[:,0] <= df.iloc[:,0].quantile(0.975))
]

def draw_bar_plot():
df_bar = df.copy()
df_bar.groupby(pd.Grouper(freq=‘M’)).mean()
df_bar.index=df_bar.index.strftime(’%Y-%m’)
df_bar=df_bar.assign(Months=df_bar.index.get_level_values(‘date’))
df_bar[‘Months’]= pd.to_datetime(df_bar[‘Months’])
df_bar[‘Months’]=df_bar[‘Months’].dt.strftime(’%B’)
df_bar=df_bar.assign(years=df_bar.index.get_level_values(‘date’))
df_bar[‘years’]= pd.to_datetime(df_bar[‘years’])
df_bar[‘years’]=df_bar[‘years’].dt.strftime(’%Y’)
indextest=pd.date_range(start=‘1/31/2016’, periods=4, freq =‘M’)
indextest=indextest.strftime(’%Y-%m’)
test = pd.DataFrame(np.zeros(4),index=indextest)
test.index.name=‘date’
test.columns=[‘value’]
test=test.assign(Months=test.index.get_level_values(‘date’))
test[‘Months’]= pd.to_datetime(test[‘Months’])
test[‘Months’]=test[‘Months’].dt.strftime(’%B’)
test=test.assign(years=test.index.get_level_values(‘date’))
test[‘years’]= pd.to_datetime(test[‘years’])
test[‘years’]=test[‘years’].dt.strftime(’%Y’)
df_bar = pd.concat([test,df_bar])
fig, ax = plt.subplots(figsize=(6.65,7.57))
sns.barplot(data=result,x=‘years’,y=‘value’,hue=‘Months’,palette=‘tab10’)
ax.set(xlabel=‘Years’, ylabel=‘Average Page Views’)
return(fig)

You’ll need to post a link to the notebook or repl on repl.it or use a code block and indent properly because this doesn’t seem like all the code and it’s hard to debug as-is.

I copied it into a working project and fixed it up to run and it did not have the error you mention. It did error on the undefined result later on.

Thanks for you reply, I was able to resolve the original issue but now I’m getting a new error that I’m not able to understand:

ERROR: test_box_plot_number_of_boxes (test_module.BoxPlotTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-page-view-time-series-visualizer-1/test_module.py”, line 68, in setUp
self.fig = time_series_visualizer.draw_box_plot()
File “/home/runner/boilerplate-page-view-time-series-visualizer-1/time_series_visualizer.py”, line 70, in draw_box_plot
df_box[‘year’] = [d.year for d in df_box.date]
File “/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/generic.py”, line 5575, in getattr
return object.getattribute(self, name)
AttributeError: ‘DataFrame’ object has no attribute ‘date’

Here’s a link to my repl:
repl link

It’s your data cleaning, for now anyway. When you load the data

# Import data (Make sure to parse dates. Consider setting index column to 'date'.)
df =pd.read_csv('fcc-forum-pageviews.csv')
df = df.set_index('date')
df.index = pd.to_datetime(df.index).date

I think you are trying to parse the dates to datetimes, but I don’t think that’s what the third line is doing (I didn’t investigate past “doesn’t work”). pandas.read_csv() will parse the dates to datetimes for you and I know that works; check the documentation for details.

If you print(df) after every step, you’ll see that the date column name disappears during the transformations. That is the cause of your missing attribute error.

Thanks.
It looks like the error is related to setting first the 1st column as index and then converting the index to datetime, rather then first converting the 1st column to datetime and then setting said column to the index. In other words, flipping:
From:
df = df.set_index(‘date’)
df.index = pd.to_datetime(df.index)
To:
df[‘date’]=pd.to_datetime(df[‘date’])
df=df.set_index(‘date’)

fixes the issue. I am not sure why though as both solutions produce same result when I test in my notebook.

Thanks for your assistance