Intuition of the convolution as a method to calculate the response of a system

https://ocw.mit.edu/resources/res-6-007-signals-and-systems-spring-2011/video-lectures/lecture-4-convolution/

We want to know the response of a system to any input.
Let's say we are at n (discrete time) = 3. We have been having as input a step function, active for 4 units of time. 0,1,2, and 3. We consider the input x[k] as a bunch of impulses, or deltas, and h[k] as the response of the system to one of them. If the system had an impulse input at n = 0, then y[n] = x[0]. h[n] would be the answer to the problem for an impulse input at t=0.
Well, that's easy to measure. Let's suppose that we know what h[n], the impulse response of our system, is.

If we decompose x[n] as several impulses, we could say that the response for any x[n] input equals the sum of all those impulse responses. We could separate any input signal as a set of impulses, and get each separate impulse response. But, how do we add them up?
For that we use the convolution.
 
The convolution says that  $y[n] = \sum_k x[k]*h[n-k]$
We add all the products for all the different k's, getting a function of n in the process.

If n = 0, the only term that matters is h[0], because for any positive k, h[n-k] = h[<0] = 0.
So we end up with x[0]*h[0]. Pretty easy. We get the current impulse x[0] times the value of the impulse response right after the beginning, h[0].

For n = 1, here we have the interesting stuff. Consider the last impulse (from all impulses that the input signal is composed of) that we have witnessed in this case. That's x[1]. The input at n=1. The effect caused by an impulse now, at n=1, is h[0]. The function of the impulse response evaluated at the beginning. Without any delays.
Well, here we have an impulse that just happened (at n=1), with amplitude x[1] (the intensity of the impulse at n=1). The response for that impulse is x[1]*h[0]. The intensity of the impulse at time n=1, times the effect of any impulse in our system (h) right after its appearance (at time 0). BUT, to get the full response of our system, we also have to take into account the previous impulse at n = 0. That is, the effect on the system caused by any impulse that happened 1 unit of time ago is h[1]. Or, said differently, the effect of any impulse or this system after 1 unit of time is h[1]. h because of our system, and 1 because it's 1 unit of time later. We know that the impulse had an intensity of x[0], that is, the value of the input (decomposed into impulses) at the beginning, when the impulse was created. So the current state of that impulse response is x[0]*h[1]. The intensity of the impulse at time 0, times the effect of any impulse after 1 unit of time.

Try to understand now why the convolution has x[k]*h[n-k]. n is the current state, and k is the time when each input impulse happened. n-k = "how old the input is", and h[n-k] = "what effect a unit delta signal had after n-k units of time?", and x[k] = "what was the intensity of the input signal at time k?". We add for all k because we need the effect of all the input impulses that made our input signal.

Therefore the convolution means: "add all the (intensity of the input signal at time k) * (effect of an impulse after n-k units of time) for all k".
We separate the input signal as a bunch of impulses, calculate the impulse response to each of them, and match them in time, starting by the first one (in the past) x[0]*h[n], then the second one x[1]*h[n-1], ..., until the last one x[n]*h[0]. And add the effect of all of them.

Visual example: Convolution - MIT

Comentarios