How to subtract two sets of data with unequal spacings?

Hi everybody,

I have a problem regarding data processing in python. Assuming we have two sets of data (x1, y1 and x2, y2), both have different points and spacing. For example:

x1=[4, 3, 2.5, 2, 1, -1, -1.5, -2, -4]
y1=[1, 0.8, 0.7, 0.5, 0.3, 0, -0.3, -0.6, -0.9]
x2=[5, 3, 2, 1, -1, -1.5, -5]
y2=[2, 1.8, 1, 0, 0.2, -0.5, -1.5, 2.5]

How to subtract y1 from y2 ?

y3=y2-y1

So far I have tried to use interpolate to first get y1_new, which represents y1 in x2:

y1_new = y1 = np.interp(x2,x1,y1)

Thanks in advance.

What do you want y1 - y2 to mean in this context? Subtraction between two vectors of different lengths is not, in general, well defined.

Hi Jeremy,

Actually y1 is considered as baseline, which I want to subtract from y2.

Sure, but what does it mean when you have different numbers of values in each array? Those arrays don’t appear to be linearly spaced, so its hard to see a clean way to do this.

Are you trying to find the difference between two lines/curves in some sense?

The data could be presented in a better way:
y1 = f(x1)
y2 = f(x2)

Exactly, I want to find the difference between two lines.

The problem here is that you have an extrapolation problem, which is inherently messy because you’re basically making wild guesses about the sampled function outside of the domain.

The only way I can think to do this cleanly would be to use np.interp() as you have above, mapping the values from the larger domain (x2) on the smaller domain (x1).

y2_smaller_domain = np.interp(x1, x2, y2)

There are also be some more advanced interpolation functions (it depends upon the context of the data what interpolant is best).

You could do the reverse, mapping the values from the smaller domain (x1) onto the larger domain (x2), but you would have to decide how you want to ‘guess’ that your function behaves outside of its domain, which is very difficult. Extrapolation always has more error than interpolation.

Edit: It looks like scipy.interpolate.interp1d has an extrapolation capability: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html

Same caveats about how error prone extrapolation is from above still apply. There is a reason ‘extrapolate wildly’ is used as a term to criticize.

Thanks for your explanations Jeremy! My data is huge (around 5000 points) and two sets of x only differ slightly. So making extrapolation/interpolation is reasonable, I think.
Now I figured it out by using following code:

from scipy import interpolate
f = interpolate.interp1d(x1, y1, kind='nearest',fill_value="extrapolate")
y_ = f(x2)
y = y2-y_
plt.plot(x2,y)
1 Like

It’s the answer, thanks so much!

1 Like

As long as the bit of error is acceptable in your application, that’s what matters in the end. I’m glad I could help!