What is the simplest way to add values to a data frame to predict the sea level rise in 2050?
I started by filling a list and appending this to the dataframe but… the attribute append has been removed from DataFrames. I did some looking at past projects and it seems this is the route most people took.
My next approach was to take the list I had created and concatenate it to the DataFrame.
Unfortunately, concat only takes 1 positional argument. I converted my list to a Series. Finally… I am getting somewhere. Now I add all the rows up through 2050. Unfortunately, this creates a 6th column for the years instead of filling them into the original ‘Year’ column.
I am stuck and feel there should be a simpler answer to all this…
I know there is newer information about using loc to add additional columns, but I was reading that it is horribly inefficient since it essentially just creates a copy of the df.
You don’t need to add the new list of years to the dataframe, just use that new list or series as the argument for the x-axis of the regression line plot (and in the regression formula where you reference Years)
You aren’t really adding new data, so you want to leave the dataframe alone, just stretching out this line slope further along the plot
Ah, that would make sense to not majorly alter the dataframe.
How can you make a list compatible with the dataframe in the equation?
Currently I am getting a ValueError:
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 171 and the array at index 1 has size 134
It makes sense to me to make a list of all of the years contained in the dataframe, extended through 2050. This gives me ‘x’.
However, if I try to use this in either of my lines, then the sizes do not match.
For example:
# In my linregress equation, lyears and df['CSIRO Adjusted Sea Level'] will have different sizes
res = linregress(lyears, df['CSIRO Adjusted Sea Level'])
# in my plot fitted line function, I can replace the df['Year'], but this will still result in a value error
plt.plot(df['Year'], (res.intercept + res.slope*df['Year']), color='r', label='fitted line')
I’ve edited your code for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.
You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.
This line is calculating the regression. Only use the original dataframe years here, not the extended. You only want to calculate based on data you have. You don’t have data points for the future years so it would mess up the calculation.
Here you are just plotting a line, and you can swap out the dataframe Year for your extended years. I used np.arange() to generate the new years and stored that in a variable. It’s a numpy array.
I tested it and the regular Python range() will work as well, it doesn’t need to be an array.
If you’re still getting an error in that line can you please update your code and share?
Sure: I have attached the code producing the Value Error:
As you can see, using ‘Years’ from the dataframe has a size of 134 (which matches the size of y), but when I extend the range to predict for 2050 I get an x with a size of 171.
You are plotting plt.plot(x,y)
which is plot( x, mx+b) where m is slope and b is the intercept. x is the x, the same x
x = lyears
y =(slope * x) + intercept y = (m * x) + b
m = res.slope
b = res.intercept
x = lyears
res.intercept + res.slope*df['Year']
This is why there is a problem with the sizes. You’ve given lyears as the x axis, but then to calculate y using (y=mx+b) (x=x, m=slope and b=y-intercept), you’ve given x as df['Year'].