Sea Level Predictor - Scaling through 2050 w/o Append

Tell us what’s happening:

What is the simplest way to add values to a data frame to predict the sea level rise in 2050?
I started by filling a list and appending this to the dataframe but… the attribute append has been removed from DataFrames. I did some looking at past projects and it seems this is the route most people took.
My next approach was to take the list I had created and concatenate it to the DataFrame.
Unfortunately, concat only takes 1 positional argument. I converted my list to a Series. Finally… I am getting somewhere. Now I add all the rows up through 2050. Unfortunately, this creates a 6th column for the years instead of filling them into the original ‘Year’ column.
I am stuck and feel there should be a simpler answer to all this…
I know there is newer information about using loc to add additional columns, but I was reading that it is horribly inefficient since it essentially just creates a copy of the df.

Your code so far

[https://www.freecodecamp.org/learn/data-analysis-with-python/data-analysis-with-python-projects/sea-level-predictor](https://www.freecodecamp.org/learn/data-analysis-with-python/data-analysis-with-python-projects/sea-level-predictor)

You don’t need to add the new list of years to the dataframe, just use that new list or series as the argument for the x-axis of the regression line plot (and in the regression formula where you reference Years)

You aren’t really adding new data, so you want to leave the dataframe alone, just stretching out this line slope further along the plot

Ah, that would make sense to not majorly alter the dataframe.
How can you make a list compatible with the dataframe in the equation?
Currently I am getting a ValueError:
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 171 and the array at index 1 has size 134

It makes sense to me to make a list of all of the years contained in the dataframe, extended through 2050. This gives me ‘x’.
However, if I try to use this in either of my lines, then the sizes do not match.
For example:

# In my linregress equation, lyears and df['CSIRO Adjusted Sea Level'] will have different sizes
 res = linregress(lyears, df['CSIRO Adjusted Sea Level'])

# in my plot fitted line function, I can replace the df['Year'], but this will still result in a value error
    plt.plot(df['Year'], (res.intercept + res.slope*df['Year']), color='r', label='fitted line')

I’ve edited your code for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.

You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.

See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (').

This line is calculating the regression. Only use the original dataframe years here, not the extended. You only want to calculate based on data you have. You don’t have data points for the future years so it would mess up the calculation.

Here you are just plotting a line, and you can swap out the dataframe Year for your extended years. I used np.arange() to generate the new years and stored that in a variable. It’s a numpy array.

I tested it and the regular Python range() will work as well, it doesn’t need to be an array.

If you’re still getting an error in that line can you please update your code and share?

Sure: I have attached the code producing the Value Error:

As you can see, using ‘Years’ from the dataframe has a size of 134 (which matches the size of y), but when I extend the range to predict for 2050 I get an x with a size of 171.

You are plotting a line with a slope. You know the x values will go from 1880 - 2050. Now you need to calculate the y value for each given x.

https://www.khanacademy.org/math/algebra/x2f8bb11595b61c86:forms-of-linear-equations/x2f8bb11595b61c86:intro-to-slope-intercept-form/v/less-obvious-slope-intercept-form

y=mx+b the slope is m and the y-intercept is b

This is the formula that you’re using. Y value is a function of x.

plt.plot(lyears, (res.intercept + res.slope*df['Year']), color='r', label='fitted line')

You are plotting plt.plot(x,y)
which is plot( x, mx+b) where m is slope and b is the intercept. x is the x, the same x

Screenshot 2024-01-22 204701

x = lyears :white_check_mark:

y =(slope * x) + intercept
y = (m * x) + b

m = res.slope
b = res.intercept
x = lyears

res.intercept + res.slope*df['Year']

This is why there is a problem with the sizes. You’ve given lyears as the x axis, but then to calculate y using (y=mx+b) (x=x, m=slope and b=y-intercept), you’ve given x as df['Year'].

There’s a good example in these docs as well, at the bottom:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html

Plot the data along with the fitted line:

plt.plot(x, res.intercept + res.slope*x, 'r', label='fitted line')

You can see x used for the x-axis and in the calculation for the y point.

Screenshot 2024-01-22 212033