When I run the code, I get an error saying the following:
Traceback (most recent call last):
File "/home/runner/page-view-time-series/test_module.py", line 7, in test_data_cleaning
actual = int(time_series_visualizer.df.count(numeric_only=True))
File "/nix/store/011fpws3ix9hym56a8q54h08df5ymhab-python3.8-pandas-1.2.4/lib/python3.8/site-packages/pandas/core/series.py", line 141, in wrapper
raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'int'>
Could someone please help me with tackling this by figuring out what exactly the error message means and what the problem is? Thank you so much!
The error is saying it can’t convert the df.count() result to an integer. If you modify the test to print that result and run it on a working project, you’ll see:
value 1238
dtype: int64
which is convertible to integer. Your code is producing
value 1238
month 1238
year 1238
dtype: int64
which is the right value in the wrong format, which means that your data cleaning is working enough for the calculations but just not passing the tests yet. It looks like your cleaned data still has extra columns and df.count() is returning the overall length and the length the of columns as well and python can’t convert that series (3 numbers) to an integer (one number) because it can’t know which you want.
This problem can be solved if you modify test_module line #6 to actual = int(time_series_visualizer.df.count(numeric_only=True)[0]) so that it only takes 1 number as jeremy suggested.
Sorry, that’s not what I was suggesting. The test is correct; the original code is wrong and should be modified to pass the test. As I indicated in my example, it’s possible to clean the data in such a way as to not add extra columns which cause the problem with df.count().