Help with Sea level-predictor

YH_Lee · February 17, 2021, 10:48am

I don’t know what do with the error
This is my code

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress

def draw_plot():
    # Read data from file
    df = pd.read_csv("epa-sea-level.csv")

    # Create scatter plot
    plt.scatter(df["Year"], df["CSIRO Adjusted Sea Level"])

    # Create first line of best fit
    slope, intercept, r_value, p_value, std_err = linregress(x=df["Year"], y=df["CSIRO Adjusted Sea Level"])
    year_extended = list(range(1880, 2050, 1))
    line = [intercept + slope * j for j in year_extended]
    plt.plot(year_extended, line, linewidth=2, color="r")

    # Create second line of best fit
    mod_df = df.loc[df["Year"] >= 2000]
    slope2, intercept2, r_value2, p_value2, std_err2 = linregress(x=mod_df["Year"], y=mod_df["CSIRO Adjusted Sea Level"])
    year2 = list(range(2000, 2050, 1))
    line2 = [intercept2 + slope2 * j for j in year2]
    plt.plot(year2, line2, linewidth=3, color="k")

    # Add labels and title
    plt.xlabel("Year")
    plt.ylabel("Sea Level (inches)")
    plt.title("Rise in Sea Level")
    
    # Save plot and return data for testing (DO NOT MODIFY)
    plt.savefig('sea_level_plot.png')
    return plt.gca()

This is the error

FAIL: test_plot_data_points (test_module.LinePlotTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-sea-level-predictor/test_module.py”, line 30, in test_plot_data_points
 python main.py
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-umaphdjp because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
F.F.

FAIL: test_plot_data_points (test_module.LinePlotTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-sea-level-predictor/test_module.py”, line 30, in test_plot_data_points
self.assertEqual(actual, expected, “Expected different data points in scatter plot.”)
AssertionError: Lists differ: [[188[26 chars]72441], [1882.0, -0.44
 python main.py
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-suvt1xh9 because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
F.F.

FAIL: test_plot_data_points (test_module.LinePlotTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-sea-level-predictor/test_module.py”, line 30, in test_plot_data_points
self.assertEqual(actual, expected, “Expected different data points in scatter plot.”)
AssertionError: Lists differ: [[188[26 chars]72441], [1882.0, -0.440944881], [1883.0, -0.23[2982 chars]951]] != [[188[26 chars]7244100000002], [1882.0, -0.440944881], [1883.[3226 chars]951]]

First differing element 1:
[1881.0, 0.220472441]
[1881.0, 0.22047244100000002]

Diff is 6114 characters long. Set self.maxDiff to None to see it. : Expected different data points in scatter plot.

======================================================================
FAIL: test_plot_lines (test_module.LinePlotTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-sea-level-predictor/test_module.py”, line 37, in test_plot_lines
self.assertEqual(actual, expected, “Expected different line for second line of best fit.”)
AssertionError: Lists differ: [7.06[42 chars]04435186, 7.560361677767105, 7.726788951098968[873 chars]3011] != [7.06[42 chars]04435242, 7.560361677767105, 7.726788951098968[873 chars]3011]

First differing element 2:
7.393934404435186
7.393934404435242

Diff is 1253 characters long. Set self.maxDiff to None to see it. : Expected different line for second line of best fit.

sanity · February 17, 2021, 2:45pm

This is caused by pandas new versions changing precision of representing float numbers. There are two ways to mitigate that in own code. One is adding float_precision='legacy' keyword argument to the pd.read_csv call. Another is forcing pandas version 1.1.5 in pyproject.toml file and updating dependencies.

YH_Lee · February 17, 2021, 4:16pm

@sanity Thank you, it worked, I used the float_precision=‘legacy’, although I still don’t get why it worked.

sanity · February 17, 2021, 5:12pm

Expected results in tests were written some time ago, at that time pandas were using as default different (less precise) representation of float numbers when read with read_csv method. pandas 1.2.0 changed that default, but float_precision='legacy' optional argument allows to use the same precision as in older versions.

synerjay · May 6, 2021, 1:26pm

Hello. I’m also having the same problem. I only failed one assertion test. I already added float_precision=‘legacy’ to read_csv, but it still failed. What could be the problem ?

synerjay · May 6, 2021, 1:27pm

I’m still having the same error, after adding floating_precision=‘legacy’ to read_csv. Is this still an ongoing problem?

Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 35, in test_plot_lines
    np.testing.assert_almost_equal(actual, expected, 7, "Expected different line for first line of best fit.")
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 581, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1044, in assert_array_almost_equal
    assert_array_compare(compare, x, y, err_msg=err_msg, verbose=verbose,
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 761, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 7 decimals
Expected different line for first line of best fit.
(shapes (170,), (171,) mismatch)

nvabl · May 26, 2021, 7:46am

Just change the stop parameter in range function from 2050 to 2051, then it will work.

Mine is like this:

#first line of best fit

x_fit1 = pd.Series(range(df[‘Year’].min(), 2051))
y_fit1 = intercept1 + slope1 * x_fit1

#second line of best fit

x_fit2 = pd.Series(range(2000, 2051))
y_fit2 = intercept2 + slope2 * x_fit2

The basic problem here in the test_module.py, xdata and ydata of timestamp 2050 is included, but you already thought that is “through but excluding”.
So len of expected result is always one-element more than len of actual if stop of range is 2050.

Topic		Replies	Views
Sea Level Predictor Failed 2 Test Cases Python	10	1590	October 17, 2021
Sea level Predictor errors Python	15	2559	June 1, 2021
Sea Level Visualizer, Data Points Mismatch Python	5	410	June 1, 2021
Sea Level Predictor Test Python	2	981	June 1, 2021
Sea level predictor: test_plot_lines Python	6	798	October 27, 2021

Help with Sea level-predictor

FAIL: test_plot_data_points (test_module.LinePlotTestCase)

FAIL: test_plot_data_points (test_module.LinePlotTestCase)

FAIL: test_plot_data_points (test_module.LinePlotTestCase)

====================================================================== FAIL: test_plot_lines (test_module.LinePlotTestCase)

Related topics

======================================================================
FAIL: test_plot_lines (test_module.LinePlotTestCase)