Sea level Predictor errors

Hello

I am currently trying to get my sea level predictor code to work but it seems there are some issues with the test and i’m getting a couple of failures.

here is my code:

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
import numpy as np

def draw_plot():
    # Read data from file
    df = pd.read_csv("epa-sea-level.csv")

    # Create scatter plot
    plt.scatter(df["Year"], df["CSIRO Adjusted Sea Level"])

    # Create first line of best fit
    slope, intercept, r_value, p_value, std_err = linregress(df["Year"], df["CSIRO Adjusted Sea Level"])
    x1 = np.arange(df["Year"].min(),2050,1)
    plt.plot(x1, intercept + slope*x1, "r")

    # Create second line of best fit
    x2=df["Year"][df["Year"] >= 2000]
    y2=df.loc[(df["Year"] >= 2000), "CSIRO Adjusted Sea Level"]
    xi=np.arange(x2.min(),2050,1)
    slope, intercept, r_value, p_value, std_err = linregress(x2,y2)
    plt.plot(xi, intercept + slope*xi, "g")

    # Add labels and title
    ax = plt.gca()
    ax.set(xlabel = "Year", ylabel = "Sea Level (inches)", title = "Rise in Sea Level", xticks=[1850.0, 1875.0, 1900.0, 1925.0, 1950.0, 1975.0, 2000.0, 2025.0, 2050.0, 2075.0])

    # Save plot and return data for testing (DO NOT MODIFY)
    plt.savefig('sea_level_plot.png')
    return plt.gca()

here is the result from repl:

F.F.
======================================================================
FAIL: test_plot_data_points (test_module.LinePlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 30, in test_plot_data_points
    self.assertEqual(actual, expected, "Expected different data points in scatter plot.")
AssertionError: Lists differ: [[188[26 chars]72441], [1882.0, -0.440944881], [1883.0, -0.23[2982 chars]951]] != [[188[26 chars]7244100000002], [1882.0, -0.440944881], [1883.[3226 chars]951]]

First differing element 1:
[1881.0, 0.220472441]
[1881.0, 0.22047244100000002]

Diff is 6114 characters long. Set self.maxDiff to None to see it. : Expected different data points in scatter plot.

======================================================================
FAIL: test_plot_lines (test_module.LinePlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 37, in test_plot_lines
    self.assertEqual(actual, expected, "Expected different line for second line of best fit.")
AssertionError: Lists differ: [7.06[42 chars]04435186, 7.560361677767105, 7.726788951098968[873 chars]3011] != [7.06[42 chars]04435242, 7.560361677767105, 7.726788951098968[873 chars]3011]

First differing element 2:
7.393934404435186
7.393934404435242

Diff is 1253 characters long. Set self.maxDiff to None to see it. : Expected different line for second line of best fit.

--------------------------------------------```

looking at the "CSIRO Adjusted Sea Level"  data in the CSV file it seems to have a lot less  decimals which is leading to my results not matching in the tests.

any suggestions how i can solve this.

thanks,

When floats are this close (9 and 11 decimal places), they are usually equal. Try changing the assertEqual() in the tests with assertAlmostEqual(), with maybe 8 decimal places and see what happens.

See the docs for more info.

Good luck.

When i change the assertEqual() to asserAlmostEqual() in the test module, i am getting the error below.

======================================================================
ERROR: test_plot_data_points (test_module.LinePlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 30, in test_plot_data_points
    self.assertAlmostEqual(actual, expected, places=8)
  File "/usr/lib/python3.8/unittest/case.py", line 943, in assertAlmostEqual
    diff = abs(first - second)
TypeError: unsupported operand type(s) for -: 'list' and 'list'

I did some more looking and this appears to be due to a recent change in either python or the dependencies of this project since my project broke after a recent python upgrade.

Since assertAlmostEqual() only works on pairs of floats, and not lists like almostEqual(), you need a loop. But, it’s almost always better to test float equality with assertAlmostEqual() and an appropriate number of decimal places to avoid errors like these. It’s not a completely trivial change, but here is original code:

def test_plot_data_points(self):
    actual = self.ax.get_children()[0].get_offsets().data.tolist()
    expected = ...
        self.assertEqual(
            actual,
            expected,
            "Expected different data points in scatter plot.",
        )

Same tests, but with a loop:

    def test_plot_data_points(self):
        actual = self.ax.get_children()[0].get_offsets().data.tolist()
        expected = ...
        for (act, exp) in zip(actual, expected):
            # Check the years.
            self.assertEqual(
                act[0],
                exp[0],
                "Both data sets should have the same years.",
            )
            # Check the heights.
            self.assertAlmostEqual(
                act[1],
                exp[1],
                places=8,
                msg="Both data sets should have approximately the same heights.",
            )

Do the same thing for the test_plot_lines test and you should be good.

You shouldn’t change the freeCodeCamp test suite. If you change the test suite, then you aren’t passing the tests as written for the certificate.

How do you suggest i deal with these errors. A number of the CSIRO Adjusted Sea Level values in the test_plot_data_points function do not match exactly with the data in the csv. which is resulting in the minor differences in the resulting float values.

I’ll have to politely disagree on several points.

One, this was done after I finished this project a while back and passed the tests as written. I track debian bullseye on my machine and python just got upgraded, and I started seeing differences in floating point numbers right afterward. When I retested my project after this update, I found the floating point errors, realized the cause and implemented the fix, which is the same thing I would do on any software when the dependencies or environment changes.

Two, this only changes how equal the values in these arrays are. Since the values represent inches, ensuring equality to a hundred-millionth of an inch seems acceptable to me. This is even more reasonable since the data is provided to nine places (my tests pass to nine places as well).

Third, this clearly falls under the “academic honesty” part of the curriculum. There is nothing dishonest about these changes; they are a legitimate way of addressing a problem with a change in floating point numbers. The task set is to analyze the change in sea level, not to adhere to the unit tests without question. Since the boilerplate for the project does not pin versions of pandas or scipy and sets python at^3.7, there can be no expectation that the versions of these requirements will be the same when someone completes the project later.

Fourth, unit tests have bugs all the time. The only code I write that has more bugs than my unit tests is my program code. The list of embarrassingly terrible mistakes I have made in my unit tests is surpassed only by the list of embarrassingly terrible mistakes I have made in my program code. So I fix unit test bugs all the time and I imagine that everyone else does too.

Fifth, there are other fixes, but they seem rather odious to me. The provided data has nine places of precision, but the expected value arrays do not include the data, they include the floating point representations of the data. Seeing data like

            [1980.0, 5.5984251910000005],
            [1981.0, 6.0866141670000005],
            [1982.0, 5.858267711],

in the expected value array for test_plot_data_points should have been a dead give away to use assertAlmostEqual() in the beginning. To fix this problem without using assertAlmostEqual() would require creating the scatter plots with matplotlib and the lines with scipy.stats.linregress() and then massaging the returned values to match the expected floating point values. I don’t know how to do that without either knowing how they were originally represented by the software versions used by the project author and how they are currently represented, or what the values are from the beginning. Issues could also be opened and PRs suggested, but that is a much lengthier process that would not address the immediate need of the user or help them see the root cause of the problem.

In summary, I’m sure that this is just a retread of all the issues discussed in the past about testing and verifying the python projects. Since there is no server to host the projects in a controlled environment with pinned versions of the environment and no black box beyond the user’s control to test the project (like with the express projects hitting an exposed API on the web), these types of issues will persist. And the next time a dependency breaks testing on one of my projects, I will fix whichever part needs the fix, be it my code, the dependency, or the test.

You are welcome to disagree. That does not change the fact that changing the test suite to pass invalidates your certificate and is dishonest.

If you believe that the test suite requires a change, you can raise a GitHub Issue, but you cannot submit a project with a modified test suite.

1 Like

I’ve stumbled on the same problem today and have just raised a GitHub issue for the problem:

3 Likes

This might be caused by new pandas version 1.2.0.

With forced previous version - 1.1.5 (or other earlier versions I suppose) in project.toml file and updating dependencies, these discrepancies with tests data don’t occur.

3 Likes

Thanks for everyone’s help with this issue. I went ahead and updated the official test in the curriculum to use assertAlmostEqual(). Hopefully, this should solve the problems people have been having.

2 Likes

I just had the same issue. Changing pandas back to 1.1.5 in pyproject.toml and poetry.lock and updating allowed me to pass all tests.

Same problem here.
I even changed pandas version in pyproject.toml and poetry.lock to 1.1.5 but still tests fail.

Also I don’t understand why the scatterplot fails. Like, why is the data for evaluation different from the data given in the csv? Or am I supposed to calculate something there?

Can someone tell me if I am doing something wrong?
Code is: https://repl.it/@shuizid/sea-level-predictor

I had the same issues, and I was able to pass the tests after forcing pandas to 1.1.5 and python to 3.6.1 in pyproject.toml and poetry.lock.

I still have the same error. I looked into the tests, but they aren’t changed to assertAlmostEqual(), still at assertEqual() (on repl.it).

I think its still (or again) necessary for someone to change this or solve the problem with the test otherwise.

But if you want to pass the test now i suggest using the solution described in the github forum posted above. Adding float_precision=‘legacy’ to the read_csv call does the trick.

df = read_csv(“epa-sea-level.csv”, float_precision=‘legacy’)

1 Like