Catplot issues for the PageView project in python data analysis

I have a question related to the following project:

The question I have is how I would create the catplot. Here is what I am doing now:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df= pd.read_csv('PageView.csv')
print(df.shape)
df.set_index('date')
df = df[
    (
    (df['value'] >= (df['value'].quantile(0.025))) &
    (df['value'] <= (df['value'].quantile(0.975))))

    ]
print(df.shape)
""" The part where I try creating the dataframe for the cat plot."""
df['year']=df['date'].str[0:4]
df['month']=df['date'].str[5:7]
values=[]
yearslis=['2016','2017','2018','2019']
monthlis=['01','02','03','04','05','06','07','08','09','10','11','12']
monthlisw=['January','February','March','April','May','June','July','August','September','October','November','December']
monthlisw=monthlisw+monthlisw+monthlisw+monthlisw
yearslis1=[]
monthlis1=[]
monthavgs=[]
print(df)
for year in yearslis:
    for mon in monthlis:
        #print(df[(df['year']==year) & (df['month']==mon)])
        monthlis1.append(mon)
        yearslis1.append(year)
        monthavgs.append(df[(df['year']==year) & (df['month']==mon)].mean()['value'])
print(monthavgs)
monthavgs=monthavgs[4:len(monthavgs)]
yearslis1=yearslis1[4:48]
monthlisw=monthlisw[4:48]
newframe =pd.DataFrame()
newframe['years']=yearslis1
newframe['months']=monthlisw
newframe['average']=monthavgs

#sns.lineplot(data=df,x='date',y='value')
#plt.xlabel("Date", size=16)
#plt.ylabel("Page Views", size=16)
#plt.title("Daily freeCodeCamp Forum Page Views 5/2016-12/2019", size=24)
""" Where I create the catplot"""
sns.catplot(data=newframe,x='months',col='years',kind='count')
plt.xlabel("Years", size=16)
plt.ylabel("Average Page Views", size=16)

plt.show()

I know that I should be using pd.melt() to create a dataframe for the catplot but I found melt to be a confusing function. I was able to make the correct catplot for medical visualization project without using melt. The question I have is how would I make it so all years have January to December except the first year (2016) and how would I make the bars have the values for the average views per day for the different months. My code above does not seem to be correct.

For a start, the plotting-function itself does most of the work. You don’t need to create a list for the years or months - you just pass those into the sns.catplot() and it will do it all for you :wink:
If you pass in an attribut “order=monthlisw” it might also retain their order even in years with missing data.

Personally, I also find .melt() very confusing. Instead I used a .groupby().mean().reset_index() (I think that’s the correct syntax) to get a nice DF for plotting.

So your saying we would get the original dataframe, add the year and month column and then pass dataframe.groupby().mean().reset_index() to the catplot function? What column would we have to pass into groupby function?

Just think about the data you want to plot how and that should be the answer :wink:

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.