Sea Level Predictor Failed 2 Test Cases

I have used the following code for the problem.

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress

def draw_plot():
    # Read data from file
    df = pd.read_csv('epa-sea-level.csv')
    # df = df.dropna()
    # Create scatter plot
    fig, ax = plt.subplots(figsize = (10,10))
    x = df["Year"]
    y = df["CSIRO Adjusted Sea Level"]
    plt.scatter(x, y)
    plt.savefig('scatter_plot.png')

    # Create first line of best fit
    slope, int1, r, p, std = linregress(x, y)

    pred_X = list(range(1880, 2050))
    pred_Y = []

    for i in pred_X:
      pred_Y.append(slope * i + int1)
    
    plt.plot(pred_X, pred_Y, "red")

    # Create second line of best fit
    x = df.loc[df["Year"] >= 2000]['Year']
    y = df.loc[df["Year"] >= 2000]["CSIRO Adjusted Sea Level"]

    slope2, int2, r2, p2, std2 = linregress(x, y)
    pred_X2 = list(range(2000, 2050))
    pred_Y2 = []
    for i in pred_X2:
      pred_Y2.append(slope2 * i + int2)
    
    plt.plot(pred_X2, pred_Y2, "green")

    # Add labels and title
    ax.set_xlabel("Year")
    ax.set_ylabel("Sea Level (inches)")
    ax.set_title("Rise in Sea Level")
    
    # Save plot and return data for testing (DO NOT MODIFY)
    plt.savefig('sea_level_plot.png')
    return plt.gca()

I am facing the following error. Which I do not understand how to solve.

F.F.
======================================================================
FAIL: test_plot_data_points (test_module.LinePlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 30, in test_plot_data_points
    self.assertEqual(actual, expected, "Expected different data points in scatter plot.")
AssertionError: Lists differ: [[188[26 chars]72441], [1882.0, -0.440944881], [1883.0, -0.23[2982 chars]951]] != [[188[26 chars]7244100000002], [1882.0, -0.440944881], [1883.[3226 chars]951]]

First differing element 1:
[1881.0, 0.220472441]
[1881.0, 0.22047244100000002]

Diff is 6114 characters long. Set self.maxDiff to None to see it. : Expected different data points in scatter plot.

======================================================================
FAIL: test_plot_lines (test_module.LinePlotTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/boilerplate-sea-level-predictor/test_module.py", line 37, in test_plot_lines
    self.assertEqual(actual, expected, "Expected different line for second line of best fit.")
AssertionError: Lists differ: [7.06[42 chars]04435186, 7.560361677767105, 7.726788951098968[873 chars]3011] != [7.06[42 chars]04435242, 7.560361677767105, 7.726788951098968[873 chars]3011]

First differing element 2:
7.393934404435186
7.393934404435242

Diff is 1253 characters long. Set self.maxDiff to None to see it. : Expected different line for second line of best fit.

----------------------------------------------------------------------
Ran 4 tests in 4.294s

FAILED (failures=2)

Please help.

Issue you are seeing is caused by some changes to read_csv method in new version of pandas.

Currently there are two ways to deal with this

  • add float_precision='legacy' parameter to pd.read_csv method call,
  • force using pandas 1.1.5 for the project, in repl.it that would require changing pyproject.toml file.

Yes!!!
Thank you very much. I was stuck because of this for a while.

Hi I have the following code

def draw_plot():
    # Read data from file
    df = pd.read_csv('epa-sea-level.csv', float_precision='legacy', dtype='float64')

    # Create scatter plot
    plt.figure(figsize=(14,6))
    plt.scatter(x=df['Year'], y=df['CSIRO Adjusted Sea Level'])
    # Create first line of best fit
    x=df['Year']
    y=df['CSIRO Adjusted Sea Level']
    res = linregress(x, y)
    x_det = list(range(1880,2050))
    y_det = list()
    for year in x_det:
      y_det.append(year*res.slope + res.intercept)
    plt.plot(x_det, y_det, 'r')

    # Create second line of best fit
    y_from_2000 = df[df['Year'] >= 2000]['CSIRO Adjusted Sea Level']
    x_from_2000 = df[df['Year'] >= 2000]['Year']

    res_2000 = linregress(x_from_2000, y_from_2000)

    x_2000 = list(range(2000, 2050))
    y_2000 = list()
    for each in x_2000:
      y_2000.append((each*res_2000.slope + res_2000.intercept))
    plt.plot(x_2000, y_2000, 'g')
    
    # Add labels and title
    plt.xlabel('Year')
    plt.ylabel('Sea Level (inches)')
    plt.title('Rise in Sea Level')
    
    # Save plot and return data for testing (DO NOT MODIFY)
    plt.savefig('sea_level_plot.png')
    return plt.gca()

And I keep getting this error

`AssertionError: Lists differ: [[188[26 chars]7244100000002], [1882.0, -0.440944881], [1883.[3226 chars]951]] != [[188[26 chars]72441], [1882.0, -0.440944881], [1883.0, -0.23[3218 chars]951]]

First differing element 1:
[1881.0, 0.22047244100000002]
[1881.0, 0.220472441]

Diff is 3769 characters long. Set self.maxDiff to None to see it. : Expected different data points in scatter plot.

`

Any help will be appreciated. I already changed pandas version to 1.1.5 and added float_precision = ‘legacy’ in the read_csv method.

I’ve copied your code and it runs for me without any errors.

Error output:

First differing element 1:
[1881.0, 0.22047244100000002]
[1881.0, 0.220472441]

Suggests that the value expected by test is 0.220472441, that’s not how it should be. If you perhaps modify the test code, revert those changes or fork the fresh copy from the boilerplate code.

Thank you for your help thus far. How do I go about forking the fresh copy from the boilerplate code?

Just go to challenge page and use again link to the code on repl.it.

Oh wow! Worked like magic! Thank you so much! That was really giving me some headache.