Help with Sea level-predictor

I don’t know what do with the error
This is my code

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress

def draw_plot():
    # Read data from file
    df = pd.read_csv("epa-sea-level.csv")

    # Create scatter plot
    plt.scatter(df["Year"], df["CSIRO Adjusted Sea Level"])

    # Create first line of best fit
    slope, intercept, r_value, p_value, std_err = linregress(x=df["Year"], y=df["CSIRO Adjusted Sea Level"])
    year_extended = list(range(1880, 2050, 1))
    line = [intercept + slope * j for j in year_extended]
    plt.plot(year_extended, line, linewidth=2, color="r")

    # Create second line of best fit
    mod_df = df.loc[df["Year"] >= 2000]
    slope2, intercept2, r_value2, p_value2, std_err2 = linregress(x=mod_df["Year"], y=mod_df["CSIRO Adjusted Sea Level"])
    year2 = list(range(2000, 2050, 1))
    line2 = [intercept2 + slope2 * j for j in year2]
    plt.plot(year2, line2, linewidth=3, color="k")

    # Add labels and title
    plt.xlabel("Year")
    plt.ylabel("Sea Level (inches)")
    plt.title("Rise in Sea Level")
    
    # Save plot and return data for testing (DO NOT MODIFY)
    plt.savefig('sea_level_plot.png')
    return plt.gca()

This is the error

FAIL: test_plot_data_points (test_module.LinePlotTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-sea-level-predictor/test_module.py”, line 30, in test_plot_data_points
 python main.py
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-umaphdjp because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
F.F.

FAIL: test_plot_data_points (test_module.LinePlotTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-sea-level-predictor/test_module.py”, line 30, in test_plot_data_points
self.assertEqual(actual, expected, “Expected different data points in scatter plot.”)
AssertionError: Lists differ: [[188[26 chars]72441], [1882.0, -0.44
 python main.py
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-suvt1xh9 because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
F.F.

FAIL: test_plot_data_points (test_module.LinePlotTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-sea-level-predictor/test_module.py”, line 30, in test_plot_data_points
self.assertEqual(actual, expected, “Expected different data points in scatter plot.”)
AssertionError: Lists differ: [[188[26 chars]72441], [1882.0, -0.440944881], [1883.0, -0.23[2982 chars]951]] != [[188[26 chars]7244100000002], [1882.0, -0.440944881], [1883.[3226 chars]951]]

First differing element 1:
[1881.0, 0.220472441]
[1881.0, 0.22047244100000002]

Diff is 6114 characters long. Set self.maxDiff to None to see it. : Expected different data points in scatter plot.

======================================================================
FAIL: test_plot_lines (test_module.LinePlotTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-sea-level-predictor/test_module.py”, line 37, in test_plot_lines
self.assertEqual(actual, expected, “Expected different line for second line of best fit.”)
AssertionError: Lists differ: [7.06[42 chars]04435186, 7.560361677767105, 7.726788951098968[873 chars]3011] != [7.06[42 chars]04435242, 7.560361677767105, 7.726788951098968[873 chars]3011]

First differing element 2:
7.393934404435186
7.393934404435242

Diff is 1253 characters long. Set self.maxDiff to None to see it. : Expected different line for second line of best fit.

This is caused by pandas new versions changing precision of representing float numbers. There are two ways to mitigate that in own code. One is adding float_precision='legacy' keyword argument to the pd.read_csv call. Another is forcing pandas version 1.1.5 in pyproject.toml file and updating dependencies.

@sanity Thank you, it worked, I used the float_precision=‘legacy’, although I still don’t get why it worked.

Expected results in tests were written some time ago, at that time pandas were using as default different (less precise) representation of float numbers when read with read_csv method. pandas 1.2.0 changed that default, but float_precision='legacy' optional argument allows to use the same precision as in older versions.

Hello. I’m also having the same problem. I only failed one assertion test. I already added float_precision=‘legacy’ to read_csv, but it still failed. What could be the problem ?

I’m still having the same error, after adding floating_precision=‘legacy’ to read_csv. Is this still an ongoing problem?

Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 35, in test_plot_lines
    np.testing.assert_almost_equal(actual, expected, 7, "Expected different line for first line of best fit.")
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 581, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1044, in assert_array_almost_equal
    assert_array_compare(compare, x, y, err_msg=err_msg, verbose=verbose,
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 761, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 7 decimals
Expected different line for first line of best fit.
(shapes (170,), (171,) mismatch)

Just change the stop parameter in range function from 2050 to 2051, then it will work.

Mine is like this:

#first line of best fit

x_fit1 = pd.Series(range(df[‘Year’].min(), 2051))
y_fit1 = intercept1 + slope1 * x_fit1

#second line of best fit

x_fit2 = pd.Series(range(2000, 2051))
y_fit2 = intercept2 + slope2 * x_fit2

The basic problem here in the test_module.py, xdata and ydata of timestamp 2050 is included, but you already thought that is “through but excluding”.
So len of expected result is always one-element more than len of actual if stop of range is 2050.