I’m almost 2/3 of the way through the “Page View Time Series Visualizer” Project, but am not sure why two of my bar plot tests are failing/getting errors. I have been developing this on my local machine/laptop.
Here is my code thus far (omitting the box plot part because I haven’t done that yet):
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
import numpy as np
register_matplotlib_converters()
# Import data (Make sure to parse dates. Consider setting index column to 'date'.)
df = pd.read_csv(
filepath_or_buffer='fcc-forum-pageviews.csv',
parse_dates=['date'],
index_col='date'
)
# Clean data
# thanks to https://towardsdatascience.com/10-examples-that-will-make-you-use-pandas-query-function-more-often-a8fb3e9361cb
low_end = df.value.quantile(0.025)
high_end = df.value.quantile(0.975)
df = df.query(f"value > {low_end} and value < {high_end}")
def draw_line_plot():
df_line = df.copy()
fig = plt.figure(figsize=(15, 5))
x_values = df_line.index.tolist()
y_values = df_line['value'].tolist()
plt.plot(x_values, y_values, 'r') # 'r' for red line
plt.title('Daily freeCodeCamp Forum Page Views 5/2016-12/2019')
plt.xlabel('Date')
plt.ylabel('Page Views')
# plt.show()
# # Save image and return fig (don't change this part)
fig.savefig('line_plot.png')
return fig
def draw_bar_plot():
# Copy and modify data for monthly bar plot
# This includes the cleaned data, which explains why some dates are missing
df_bar = df.copy()
df_bar = df_bar.reset_index()
# adds the missing months, seen here: https://stackoverflow.com/questions/43408621/add-a-row-at-top-in-pandas-dataframe
new_rows = []
new_rows.insert(0, {'date': pd.to_datetime(
'2016-04-01 00:00:00'), 'value': 0})
new_rows.insert(0, {'date': pd.to_datetime(
'2016-03-01 00:00:00'), 'value': 0})
new_rows.insert(0, {'date': pd.to_datetime(
'2016-02-01 00:00:00'), 'value': 0})
new_rows.insert(0, {'date': pd.to_datetime(
'2016-01-01 00:00:00'), 'value': 0})
df_bar = pd.concat([pd.DataFrame(new_rows), df_bar], ignore_index=True)
# adds in year and month columns
df_bar['year'] = df_bar['date'].dt.strftime('%Y')
df_bar['month'] = df_bar['date'].dt.strftime('%m')
df_bar = df_bar.groupby(['year', 'month'])['value'].mean()
df_bar = df_bar.reset_index(drop=False)
# https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
df_bar['month_name'] = pd.to_datetime(
df_bar['month'], format='%m').dt.month_name()
print(df_bar)
# https://stackoverflow.com/questions/51879686/pandas-only-recognizes-one-column-in-my-data-frame
# this should be good for the most part
bar_plot = sns.barplot(
data=df_bar,
x='year',
y='value',
hue='month_name',
# https://www.codecademy.com/article/seaborn-design-ii
palette=sns.color_palette("Paired", 12)
)
bar_plot.set(
title='Monthly freeCodeCamp Forum Page Views 5/2016-12/2019',
xlabel='Years',
ylabel='Average Page Views',
)
plt.legend(
title='Months'
)
fig = bar_plot.figure
# plt.show()
# # Draw bar plot
# # Save image and return fig (don't change this part)
fig.savefig('bar_plot.png')
return fig
And here’s the output from the failing/erroring tests (again, omitting the box plot parts because I haven’t done that yet):
======================================================================
FAIL: test_bar_plot_legend_labels (test_module.BarPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/mgermaine93/Desktop/CODE/fcc-code-challenges/data-analysis-with-python/page-view-time-series-visualizer/test_module.py", line 53, in test_bar_plot_legend_labels
self.assertEqual(
AssertionError: Lists differ: ['Jan[111 chars]mber', 'January', 'February', 'March', 'April'[199 chars]ber'] != ['Jan[111 chars]mber']
First list contains 24 additional elements.
First extra element 12:
'January'
['January',
'February',
'March',
'April',
'May',
'June',
'July',
'August',
'September',
'October',
'November',
- 'December',
- 'January',
- 'February',
- 'March',
- 'April',
- 'May',
- 'June',
- 'July',
- 'August',
- 'September',
- 'October',
- 'November',
- 'December',
- 'January',
- 'February',
- 'March',
- 'April',
- 'May',
- 'June',
- 'July',
- 'August',
- 'September',
- 'October',
- 'November',
'December'] : Expected bar plot legend labels to be months of the year.
======================================================================
FAIL: test_bar_plot_number_of_bars (test_module.BarPlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/mgermaine93/Desktop/CODE/fcc-code-challenges/data-analysis-with-python/page-view-time-series-visualizer/test_module.py", line 76, in test_bar_plot_number_of_bars
self.assertEqual(actual, expected,
AssertionError: 193 != 49 : Expected a different number of bars in bar chart.
----------------------------------------------------------------------
The output of the final print(df_bar)
line in the code is as follows:
year month value month_name
0 2016 01 0.000000 January
1 2016 02 0.000000 February
2 2016 03 0.000000 March
3 2016 04 0.000000 April
4 2016 05 19432.400000 May
5 2016 06 21875.105263 June
6 2016 07 24109.678571 July
7 2016 08 31049.193548 August
8 2016 09 41476.866667 September
9 2016 10 27398.322581 October
10 2016 11 40448.633333 November
11 2016 12 27832.419355 December
12 2017 01 32785.161290 January
13 2017 02 31113.071429 February
14 2017 03 29369.096774 March
15 2017 04 30878.733333 April
16 2017 05 34244.290323 May
17 2017 06 43577.500000 June
18 2017 07 65806.838710 July
19 2017 08 47712.451613 August
20 2017 09 47376.800000 September
21 2017 10 47438.709677 October
22 2017 11 57701.566667 November
23 2017 12 48420.580645 December
24 2018 01 58580.096774 January
25 2018 02 65679.000000 February
26 2018 03 62693.774194 March
27 2018 04 62350.833333 April
28 2018 05 56562.870968 May
29 2018 06 70117.000000 June
30 2018 07 63591.064516 July
31 2018 08 62831.612903 August
32 2018 09 65941.733333 September
33 2018 10 111378.142857 October
34 2018 11 78688.333333 November
35 2018 12 80047.483871 December
36 2019 01 102056.516129 January
37 2019 02 105968.357143 February
38 2019 03 91214.483871 March
39 2019 04 89368.433333 April
40 2019 05 91439.903226 May
41 2019 06 90435.642857 June
42 2019 07 97236.566667 July
43 2019 08 102717.310345 August
44 2019 09 97268.833333 September
45 2019 10 122802.272727 October
46 2019 11 143166.428571 November
47 2019 12 150733.500000 December
And when I do plt.show()
, my bar plot looks like this:
It’s clear to me that somehow the legend labels and the number of bars aren’t matching up, but I’m not entirely sure how that’s happening. Perhaps a second set of eyes will help?
Thank you in advance for any assistance!