Page View Time Series Visualizer - how to group bar graphs by year?

Tell us what’s happening:
I’m currently on the 2nd function, draw_bar_plot. I feel that the main thing I am not understanding is how to group the data by year in seaborn (or matplotlib would also be fine). I think if I understood how to do that, the other details would fall into place. I also feel like I am probably not doing things the most effective way as you might see indicated in my comments.

Here is my current output:

This is close, but has two problems: a) it is not grouped by year and b) the colour pattern does not restart each year.

I can’t figure out how to get them grouped by year like that with either matplotlib or seaborn. It might be that I simply need to reorganize the data in pandas somehow, but I’m drawing a blank on what would get it right.

Any pointers on what to try or search?

Your code so far

def draw_bar_plot():
    # Copy and modify data for monthly bar plot
    df_bar = df.copy()
    
    # adding year and month column
    df_bar.reset_index(inplace=True)
    df_bar['year'] = [d.year for d in df_bar.date]
    df_bar['month'] = [d.month for d in df_bar.date]
    df_bar["month_name"] = [d.strftime('%B') for d in df_bar.date]
    
    # mean by month per year
    mean_bars = df_bar.groupby(["year", "month"]).mean().reset_index()
    # TODO: There's gotta be a better way to do this...
    mean_bars["date"] = pd.Series(str(int(row["year"])) + "-" + str(int(row["month"])).zfill(2) for idx, row in mean_bars.iterrows())

    # Draw bar plot redo
    # set the color palette
    # TODO: how to change order of colours to sync with months?
    palette = sns.color_palette("tab10")

    # TODO: This still isn't right because we want to start at palette[0] = January
    # but it wraps around such that November = January
    # Right now, it gets to Nov, Dec, Jan... but the wrap happens at November incorrectly
    # this would probably be fixed by correctly grouping by year

    # Potentially this is still needed for first data alignment, so keeping it
    # color alignment will be such that the palette starts at month 1 (January)
    # so, determine the first month present in the dataset so we can reorder
    # the palette for the bar plot
    # -1 because 0 indexed
    first_month = int(mean_bars["date"].iloc[0][-2:]) - 1
    reordered_palette = palette[first_month:] + palette[:first_month]

    # TODO: how to group by the year, but keep on same figure?
    ax = sns.barplot(x="date", y="value", dodge=False, data=mean_bars, palette=reordered_palette)
    
    # set ticks halfway through the year
    xticks = []
    for idx, full_date in enumerate(mean_bars["date"]):
        if full_date[-2:] == "06":
            xticks.append(idx)
    ax.set_xticks(xticks)

    # use year as the label and make them vertical
    years = df_bar["year"].unique()
    ax.set_xticklabels(years, rotation=90)

    # setting plot labels
    ax.set_xlabel("Years")
    ax.set_ylabel("Average Page Views")

    # sync colors to month order
    months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
    handles = []
    for idx, month in enumerate(months):
        # mod 10 because we only want 10 color patches for some reason to match the answer
        handles.append(mpatches.Patch(color=palette[idx % 10], label=month))

    # set the legend accordingly
    ax.legend(handles=handles, title="Months")

    # not sure why I seem to need this to wipe an extraneous graph off the figure
    plt.figure()
    ax.figure.set_size_inches(8, 7)
    fig = ax.get_figure()

    # Save image and return fig (don't change this part)
    # DEBUG
    fig.savefig(os.getcwd() + '/debug/bar_plot.png')
    fig.savefig('bar_plot.png')
    return fig

Your browser information:

User Agent is: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0

Challenge: Page View Time Series Visualizer

Link to the challenge:

Hey miles, I used this stack overflow website to understand the main idea using ax.bar and matplotlib
https://stackoverflow.com/questions/14270391/python-matplotlib-multiple-bars
Look at the second answer by John Lyon, and see how you can maybe use a helper method to gather the lists together by month, and then use ax.bar with the list in order to graph all the elements across the years.

Try to copy the code if you don’t get it still, thats what i did because i was so confused by this problem

I missed this earlier.

The main problem with this project is that the data starts in May of one year and if you don’t compensate for that, all the dates are off. So for the bar plots, you want to make sure your data cleaning is correct (print your dataframe and make sure that it’s the tabular version of a bar graph) and then use seaborn to create the graph (sns.catplot() and let it do the work or sns.barplot() if you want to do the work), and finally tweak things by direct manipulation of the resulting matplotlib object to pass the tests. You can pass ordering parameters to the relevant seaborn functions to control the order of labels/colors; you don’t have to do that manually.

Finally, you’re doing a lot of manual work here and seaborn doesn’t play especially well with that. Get your dataframe correct and then let seaborn create the graph and it’s usually correct (or very nearly so). You can do it manually but you have to get all the details correct.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.