I’m trying to figure out how I can read a CSV file and make computations on every 5 lines that are from the same time interval.
My csv looks like:
2020-11-25 1:06:26.919832+00 24
2020-11-25 1:07:40.992583+00 27
2020-11-25 1:08:55.057877+00 29
2020-11-25 1:10:13.118558+00 23
2020-11-25 1:11:39.194837+00 23
2020-11-25 1:13:06.291681+00 22
2020-11-25 1:14:29.391644+00 25
2020-11-25 3:16:11.513485+00 29
2020-11-25 3:19:31.718056+00 33
2020-11-26 5:40:14.750345+00 34
If there aren’t 5 in a row it should just skip until there are 5 in a row again. It’s ok if there is a few minutes between each data point, that would still be part of the 5 but once there is a 30 min gap that would be a new set already.
So far I have:
import pandas
df = pandas.read_csv('timevalue.csv', names=['date', 'value'])
print(df)
value_column = df.loc[:,'value']
print(value_column)
every_five_mean = value_column.rolling(5).mean()
every_five_max = value_column.rolling(5).max()
print(every_five_mean)
Which computes the mean and max of every 5.
But I’m not sure how I can get every 5 based on the time.
Ideally I’d like to save mean, max etc into a new row.
Any help would be great, thanks!