Sea Level Predictor Failed 2 Test Cases

I have used the following code for the problem.

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress

def draw_plot():
    # Read data from file
    df = pd.read_csv('epa-sea-level.csv')
    # df = df.dropna()
    # Create scatter plot
    fig, ax = plt.subplots(figsize = (10,10))
    x = df["Year"]
    y = df["CSIRO Adjusted Sea Level"]
    plt.scatter(x, y)
    plt.savefig('scatter_plot.png')

    # Create first line of best fit
    slope, int1, r, p, std = linregress(x, y)

    pred_X = list(range(1880, 2050))
    pred_Y = []

    for i in pred_X:
      pred_Y.append(slope * i + int1)
    
    plt.plot(pred_X, pred_Y, "red")

    # Create second line of best fit
    x = df.loc[df["Year"] >= 2000]['Year']
    y = df.loc[df["Year"] >= 2000]["CSIRO Adjusted Sea Level"]

    slope2, int2, r2, p2, std2 = linregress(x, y)
    pred_X2 = list(range(2000, 2050))
    pred_Y2 = []
    for i in pred_X2:
      pred_Y2.append(slope2 * i + int2)
    
    plt.plot(pred_X2, pred_Y2, "green")

    # Add labels and title
    ax.set_xlabel("Year")
    ax.set_ylabel("Sea Level (inches)")
    ax.set_title("Rise in Sea Level")
    
    # Save plot and return data for testing (DO NOT MODIFY)
    plt.savefig('sea_level_plot.png')
    return plt.gca()

I am facing the following error. Which I do not understand how to solve.

F.F.
======================================================================
FAIL: test_plot_data_points (test_module.LinePlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 30, in test_plot_data_points
    self.assertEqual(actual, expected, "Expected different data points in scatter plot.")
AssertionError: Lists differ: [[188[26 chars]72441], [1882.0, -0.440944881], [1883.0, -0.23[2982 chars]951]] != [[188[26 chars]7244100000002], [1882.0, -0.440944881], [1883.[3226 chars]951]]

First differing element 1:
[1881.0, 0.220472441]
[1881.0, 0.22047244100000002]

Diff is 6114 characters long. Set self.maxDiff to None to see it. : Expected different data points in scatter plot.

======================================================================
FAIL: test_plot_lines (test_module.LinePlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 37, in test_plot_lines
    self.assertEqual(actual, expected, "Expected different line for second line of best fit.")
AssertionError: Lists differ: [7.06[42 chars]04435186, 7.560361677767105, 7.726788951098968[873 chars]3011] != [7.06[42 chars]04435242, 7.560361677767105, 7.726788951098968[873 chars]3011]

First differing element 2:
7.393934404435186
7.393934404435242

Diff is 1253 characters long. Set self.maxDiff to None to see it. : Expected different line for second line of best fit.

----------------------------------------------------------------------
Ran 4 tests in 4.294s

FAILED (failures=2)

Please help.

2 Likes

Issue you are seeing is caused by some changes to read_csv method in new version of pandas.

Currently there are two ways to deal with this

  • add float_precision='legacy' parameter to pd.read_csv method call,
  • force using pandas 1.1.5 for the project, in repl.it that would require changing pyproject.toml file.
5 Likes

Yes!!!
Thank you very much. I was stuck because of this for a while.

Hi I have the following code

def draw_plot():
    # Read data from file
    df = pd.read_csv('epa-sea-level.csv', float_precision='legacy', dtype='float64')

    # Create scatter plot
    plt.figure(figsize=(14,6))
    plt.scatter(x=df['Year'], y=df['CSIRO Adjusted Sea Level'])
    # Create first line of best fit
    x=df['Year']
    y=df['CSIRO Adjusted Sea Level']
    res = linregress(x, y)
    x_det = list(range(1880,2050))
    y_det = list()
    for year in x_det:
      y_det.append(year*res.slope + res.intercept)
    plt.plot(x_det, y_det, 'r')

    # Create second line of best fit
    y_from_2000 = df[df['Year'] >= 2000]['CSIRO Adjusted Sea Level']
    x_from_2000 = df[df['Year'] >= 2000]['Year']

    res_2000 = linregress(x_from_2000, y_from_2000)

    x_2000 = list(range(2000, 2050))
    y_2000 = list()
    for each in x_2000:
      y_2000.append((each*res_2000.slope + res_2000.intercept))
    plt.plot(x_2000, y_2000, 'g')
    
    # Add labels and title
    plt.xlabel('Year')
    plt.ylabel('Sea Level (inches)')
    plt.title('Rise in Sea Level')
    
    # Save plot and return data for testing (DO NOT MODIFY)
    plt.savefig('sea_level_plot.png')
    return plt.gca()

And I keep getting this error

`AssertionError: Lists differ: [[188[26 chars]7244100000002], [1882.0, -0.440944881], [1883.[3226 chars]951]] != [[188[26 chars]72441], [1882.0, -0.440944881], [1883.0, -0.23[3218 chars]951]]

First differing element 1:
[1881.0, 0.22047244100000002]
[1881.0, 0.220472441]

Diff is 3769 characters long. Set self.maxDiff to None to see it. : Expected different data points in scatter plot.

`

Any help will be appreciated. I already changed pandas version to 1.1.5 and added float_precision = ‘legacy’ in the read_csv method.

I’ve copied your code and it runs for me without any errors.

Error output:

First differing element 1:
[1881.0, 0.22047244100000002]
[1881.0, 0.220472441]

Suggests that the value expected by test is 0.220472441, that’s not how it should be. If you perhaps modify the test code, revert those changes or fork the fresh copy from the boilerplate code.

Thank you for your help thus far. How do I go about forking the fresh copy from the boilerplate code?

Just go to challenge page and use again link to the code on repl.it.

Oh wow! Worked like magic! Thank you so much! That was really giving me some headache.

Hi. I 've added the legacy float argument as you suggested but it still failed. So obviously there is something more than float. This is my code. Could you take a look what I’m doing wrong ? The error message is below my code

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
import numpy as np

def draw_plot():
    # Read data from file
    df = pd.read_csv("epa-sea-level.csv", float_precision='legacy')

    # Create scatter plot
    plt.scatter(df['Year'], df['CSIRO Adjusted Sea Level'])

    # Create first line of best fit
    lineA = linregress(df['Year'], df['CSIRO Adjusted Sea Level'])
    xA = np.arange(df['Year'].min(),2050,1)
    yA = xA*lineA.slope + lineA.intercept

    plt.plot(xA,yA)

    # Create second line of best fit
    df_2000 = df[df['Year'] >= 2000]

    lineB = linregress(df_2000['Year'], df_2000['CSIRO Adjusted Sea Level'])
    xB = np.arange(2000,2050,1)
    yB = xB*lineB.slope + lineB.intercept

    plt.plot(xB,yB)

    # Add labels and title
    plt.xlabel('Year')
    plt.ylabel('Sea Level (inches)')
    plt.title('Rise in Sea Level')
    
    # Save plot and return data for testing (DO NOT MODIFY)
    plt.savefig('sea_level_plot.png')
    return plt.gca()

It’s the same error. ONE failed error as before. I tried copying people’s code who successfully passed but I m having the same error message…

Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 35, in test_plot_lines
    np.testing.assert_almost_equal(actual, expected, 7, "Expected different line for first line of best fit.")
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 581, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1044, in assert_array_almost_equal
    assert_array_compare(compare, x, y, err_msg=err_msg, verbose=verbose,
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 761, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 7 decimals
Expected different line for first line of best fit.
(shapes (170,), (171,) mismatch)
 x: array([-0.542124 , -0.4790794, -0.4160349, -0.3529903, -0.2899457,
       -0.2269011, -0.1638565, -0.1008119, -0.0377674,  0.0252772,
        0.0883218,  0.1513664,  0.214411 ,  0.2774556,  0.3405002,...
 y: array([-0.542124 , -0.4790794, -0.4160349, -0.3529903, -0.2899457,
       -0.2269011, -0.1638565, -0.1008119, -0.0377674,  0.0252772,
        0.0883218,  0.1513664,  0.214411 ,  0.2774556,  0.3405002,...

----------------------------------------------------------------------
Ran 4 tests in 1.272s

FAILED (failures=1)
 

What am I doing wrong here?

Thanks for this fix, this issue was driving me crazy!