Thanks a lot.. I did not expected it to be so simple. Directly going to look for other of your tutorials. Maybe I really have a chance to understand the math I did not have an opportunity to learn until now... and how to use it in Matlab. Once more great thanks for your affords of providing such a nice (and dummies-friendly) explanation.
Great explanation, but I'm curious why Matlab's xcorr function uses FFT to calculate the correlation, instead of shifted and trunctates Hadamard products of the signals?
I have a doubt with some homework I have to do. The thing is that I have a .wav signal and I have to compute its autocorrelation. I wrote this code in a script: [xt,fs]=wavread('signal8.wav'); Nt=length(xt); t1=0; t2=Nt/fs; t0=(t2-t1)/Nt; t=t1:t0:t2-t0; %Compute the autocorrelation, phitau and the shift tau using the xcorr function [phitau,tau]=xcorr('signal8.wav'); close all; plot(t,xt); xlabel('t sek'); ylabel('x(t)'); figure; plot(t0*tau,t0*phitau); xlabel('tau sek'); ylabel('phi(tau)'); and at the end in the command window I try to execute my script but I have an error like this: Undefined function 'fft' for input arguments of type 'char'. Error in xcorr>vectorXcorr (line 105) X = fft(x,2^nextpow2(2*M-1)); Error in xcorr (line 53) [c,M,N] = vectorXcorr(x,autoFlag,varargin{:}); Error in lab4b (line 8) [phitau,tau]=xcorr('signal8.wav'); Could you help me with there problem?
Hi! Why correlation looks different when we use digital samples like [.... 1 0 1 1 1 0 ..... ]? In this case, final plot shows when correlation occurs (which is ok), but the rest values seems to form an triangular shape along the "lag" values
Actually, when I plot lags against xcorr I get on the Y axes values up to 150. How to make sense of this? More generally, how to know if two signals are crosscorrelated? Is there an objective measure?
A really nice and helpful video! I'm just reading through the books about digital signal processing but mostly there are formulas with integrals. But I want to cross-correlate two signals in an FPGA (looking for a barker code in a stochastic signal) and it doesn't understand integrals. This video gives me an idea of how to do it - thank you!
Hi, Nice tutorial. Thanks. I have a small query. I am supposed to calculate the "average of cross correlation" of over 20 series at zero lag. If i do it pair wise, i am assuming there would be 20C2 (20 Choose 2 ) coefficients which is a very high number and then i will have to calculate the average. Is there an easier way to do it ? perhaps something that can be implemented on excel ? Many thanks.
yes. In the event that you have two sequences of numbers that do not have the same number of elements you can either zero-pad the shorter one or truncate the longer one.
David Dorran magma169 No they do not. Correct your intuition in Matlab if you can. The resultant of xcorr(a,b) will be of length a.length + b.length - 1 and the 'zero index' will be the (b.length -1)th entry.
Andrew Gallasch Perhaps we have different versions of matlab - I have 7.11 (R2010b) and it always returns 2N-1 correlation value, where N is the length of the longest input sequence. There is also a note in the help on xcorr that the shorter sequence is zero padded by the function.
so it does. That will only result in extra zeros appearing at the end of the xcorr result. They can be ignored. There is no mathematical limitation that inputs have to be equal length however.
so if we have the largest number in correlation sequence (in this vide, it is 23.18 with lag 2) means that there is highest similarity but at the end of video, you said that between 0 to 2 the signals are most similar which means the highest similarity at at lag1. It is disconnected story. What is the criteria to select the correlation sequence with highest similarity?
I'm not sure where I said "said that between 0 to 2 the signals are most similar which means the highest similarity at at lag1". If I did then I was incorrect. The lag at which the signals are most similar is at a lag of 2 samples.
i have a query that i was hoping you would be able to help me with, for my final year project i have been researching into calculating distance using sound on an iphone 6. I have been playing short frequency sweeps on one iphone and recording the data on another phone sitting on top of the other. What i'm planning to do is calculate the delay between the initial sound and the reflected sound and combining that with the speed of sound to give me the distance between a wall and the iphone. however i'm struggling to do so. I know you are a wizard on MatLab and was wondering if there was any techniques or methods to approach in calculating that time delay within MatLab or Audacity.
+Khash Ghalam This should be doable, though I'm not sure what kind of resolution you'll get. The key is to send a very short but high amplitude, high frequency impulse (a click) and record it on the second phone. Ideally, the second phone should record two clicks, one directly from the first phone, and one (much weaker) from the reflected surface. Auto-correlate the signal to determine the delay between the two pulses. That delay is the distance. Complicating factors will be the limited bandwidth of the audio circuits at high frequency distorting your pulse (that's why this is usually done with ultrasonic transducers) and the multipath and smearing of the return signals since it isn't going to bounce off of just one point on the wall, but off of multiple points with slightly different times.
Hello David. Great tutorials. I have one question, I want to derive approaches for the signals are not aligned vertically. In normalized correlation also the data points are vertically aligned. How can we derive correlation , normalized correlation for non vertical aligned signals.
Great explanation. I have one problem with this approach, maybe you can clear things up. I have two signals which look like peaks correlate where one peak is towards the beginning of signal 'A', and the other peak is towards the end of signal 'B'. Using cross correlation, the lag (which is large) to allign these samples doesn't provide the largest correlation value purely because there are less points involved in calculating the the correlation at this lag i.e. because many of the data points on one signal don't have an associated point on the other signal to be multiplied by because the two signals now have only a small region of overlap. I hope I have explained that comprehendably. Do you know of a solution for this/is this a known problem of cross-correlation, or am I missing something major in my understanding. Thanks in advance.
+volcEmpire In matlab there is an unbiased version of the xcorr (cross correlation) function. I think this just divides each correlation measure by the 'overlapping' vertically aligned samples which gives more weight to the correlations associated with larger lags. You should be careful when using this technique as sometimes the correlation measures at large lags can be excessively scaled.
The meaning of the number is dependent upon the signals involved. A value of 7.52 might mean signals are identical for one pair of signals but extremely dissimilar for another pair of signals. The reason for this is because the number returned by a standard correlation function are dependent upon the energy in the signals. Normalised correlation attempts to resolve this by normalising to the energy of both signals so that the result lies within +- 1. A value of 1 in this case means that the signals are identical, -1 means the signals are an inverted form of each other and 0 means that they are orthogonal to each other. So you are probably wondering what 0.5 means versus say 0.9 - the simple answer is to say that a result of 0.9 means that the signals are more similar than signals that have a normalised correlation of 0.5. A more complete answer could be obtained by looking at the equations - I was trying to come up with a verbal description but couldn't come up with something that was easy to interpret - this question did get me thinking about it though so I'll get back to this at some stage.
I guess I meant standard, if that is what you're using in this video. You have 7.52 and -12.48 and so on so I am curious as to what 100% identical signals would yield.
17joren As an example say you had a signal [ -2 3 4 -10] then a standard correlation measure if you correlated this signal with itself would be 4+9+16+100 = 129.
But hmmm.. I have issue about the lag is 2 Cuz I thought data2 shall be source and data1 shall be a time-lag series It make sense in real. So lag is 2 shall be a negative time (-2) in real. Btw its opposite. Wonder this right or not, please explain it for me. Thanks alot.
Hi David, excellent video. I'm using excel 2003 but with a vast set of data (over 40,000 rows) is there a formula to calculate the correlation sequence value without having to individually multiply each numerical value associated with each sample? This is killing me!
You could downsample your data before correlating. Or you could cross correlate over a smaller range of lags. Both of these approaches would require a good understanding of the data you are working with to avoid missing useful info. Alternative you could use octave to process your data (an online version is available at octave-online.net/)
8 years later, this video still helps people out!
I don't know how to thank you, you are my one the best TH-camrs that explains these concepts clearly. Thanks a lot
this is by far the best explanation of corrlation
I find your videos, explanations, and your channel of so much use. It mixes theory and application. Thank you for the explanations!
finally , I found useful video to understand cross correlation , thank you David
This video is the one i found most usefull for understanding how matlab is doing in this func.!!
Finally found a simple explanation that made me understand it! Thanks.
Thank you so much, i was looking for a video like that, 2hrs minimum... now i finally get xcorr !
Great understanding equals great teacher truly illuminating this is now I can go back to my agricultural cycle use case
Best explanation for cross correlation! Thanks David
Thanks a lot.. I did not expected it to be so simple. Directly going to look for other of your tutorials. Maybe I really have a chance to understand the math I did not have an opportunity to learn until now... and how to use it in Matlab. Once more great thanks for your affords of providing such a nice (and dummies-friendly) explanation.
You're welcome. Good luck with your studies.
Exactly what I'm searching for, great respect!
Great explanation, but I'm curious why Matlab's xcorr function uses FFT to calculate the correlation, instead of shifted and trunctates Hadamard products of the signals?
Best explanation on cross correlation!! Thanks for sharing!
I have a doubt with some homework I have to do. The thing is that I have a .wav signal and I have to compute its autocorrelation. I wrote this code in a script:
[xt,fs]=wavread('signal8.wav');
Nt=length(xt);
t1=0;
t2=Nt/fs;
t0=(t2-t1)/Nt;
t=t1:t0:t2-t0;
%Compute the autocorrelation, phitau and the shift tau using the xcorr function
[phitau,tau]=xcorr('signal8.wav');
close all;
plot(t,xt);
xlabel('t sek');
ylabel('x(t)');
figure;
plot(t0*tau,t0*phitau);
xlabel('tau sek');
ylabel('phi(tau)');
and at the end in the command window I try to execute my script but I have an error like this:
Undefined function 'fft' for input arguments of type 'char'.
Error in xcorr>vectorXcorr (line 105)
X = fft(x,2^nextpow2(2*M-1));
Error in xcorr (line 53)
[c,M,N] = vectorXcorr(x,autoFlag,varargin{:});
Error in lab4b (line 8)
[phitau,tau]=xcorr('signal8.wav');
Could you help me with there problem?
Thanks for the great video, David. It was very helpful!
Hi! Why correlation looks different when we use digital samples like [.... 1 0 1 1 1 0 ..... ]? In this case, final plot shows when correlation occurs (which is ok), but the rest values seems to form an triangular shape along the "lag" values
Actually, when I plot lags against xcorr I get on the Y axes values up to 150. How to make sense of this? More generally, how to know if two signals are crosscorrelated? Is there an objective measure?
The video up to 6:05 is unfortunately wrong. You are calculating the convolution of two signals, not cross correlation.
I think you're right
Great, simple explanations. Thank you
A really nice and helpful video! I'm just reading through the books about digital signal processing but mostly there are formulas with integrals. But I want to cross-correlate two signals in an FPGA (looking for a barker code in a stochastic signal) and it doesn't understand integrals. This video gives me an idea of how to do it - thank you!
Thank you for your great explanation. I use this to find a similarity between countries economic cycle.
nice to see it being applied in different disciplines!
Millions of thank you
Thanks for making it so clear
Hi, Nice tutorial. Thanks. I have a small query.
I am supposed to calculate the "average of cross correlation" of over 20 series at zero lag. If i do it pair wise, i am assuming there would be 20C2 (20 Choose 2 ) coefficients which is a very high number and then i will have to calculate the average. Is there an easier way to do it ? perhaps something that can be implemented on excel ? Many thanks.
Amazing explanation, thanks a lot
Is it possible to correlate two dataset (financial asset) even though their probability density function is not normally distributed?
For the cross-correlation to be valid, do you absolutely need to have the same number of elements between the two signals ?
yes. In the event that you have two sequences of numbers that do not have the same number of elements you can either zero-pad the shorter one or truncate the longer one.
David Dorran magma169 No they do not. Correct your intuition in Matlab if you can. The resultant of xcorr(a,b) will be of length a.length + b.length - 1 and the 'zero index' will be the (b.length -1)th entry.
Andrew Gallasch Perhaps we have different versions of matlab - I have 7.11 (R2010b) and it always returns 2N-1 correlation value, where N is the length of the longest input sequence.
There is also a note in the help on xcorr that the shorter sequence is zero padded by the function.
so it does. That will only result in extra zeros appearing at the end of the xcorr result. They can be ignored. There is no mathematical limitation that inputs have to be equal length however.
Thank you so much, why do people overcomplicate the explenation of this topic when you can explain it as simply as this
Thank you so much. This is of great help. All the best!
so if we have the largest number in correlation sequence (in this vide, it is 23.18 with lag 2) means that there is highest similarity but at the end of video, you said that between 0 to 2 the signals are most similar which means the highest similarity at at lag1. It is disconnected story. What is the criteria to select the correlation sequence with highest similarity?
I'm not sure where I said "said that between 0 to 2 the signals are most similar which means the highest similarity at at lag1". If I did then I was incorrect. The lag at which the signals are most similar is at a lag of 2 samples.
Thanks for your explanation and demo, it really helped!
Does the data have to be stationary?
The data can be any sequence of numerical values
made this so clear ,thank you
Glad it helped!
i have a query that i was hoping you would be able to help me with, for my final year project i have been researching into calculating distance using sound on an iphone 6. I have been playing short frequency sweeps on one iphone and recording the data on another phone sitting on top of the other. What i'm planning to do is calculate the delay between the initial sound and the reflected sound and combining that with the speed of sound to give me the distance between a wall and the iphone. however i'm struggling to do so. I know you are a wizard on MatLab and was wondering if there was any techniques or methods to approach in calculating that time delay within MatLab or Audacity.
+Khash Ghalam This should be doable, though I'm not sure what kind of resolution you'll get. The key is to send a very short but high amplitude, high frequency impulse (a click) and record it on the second phone. Ideally, the second phone should record two clicks, one directly from the first phone, and one (much weaker) from the reflected surface. Auto-correlate the signal to determine the delay between the two pulses. That delay is the distance. Complicating factors will be the limited bandwidth of the audio circuits at high frequency distorting your pulse (that's why this is usually done with ultrasonic transducers) and the multipath and smearing of the return signals since it isn't going to bounce off of just one point on the wall, but off of multiple points with slightly different times.
thanks a lot for simple explanation..Can you give link to next video?
It is very clear.thank you so much david..
can you please suggest me how to implement the same in c language
Simple and efficient Thanks !
Glad it was helpful!
+Axel Thieffry This is not a normalized correlation
Thank you for such a beautiful explanation.
Priceless, thank you so much sir !
Hello David. Great tutorials. I have one question, I want to derive approaches for the signals are not aligned vertically. In normalized correlation also the data points are vertically aligned. How can we derive correlation , normalized correlation for non vertical aligned signals.
raghunath n My initial approach would be to interpolate the data so that it is vertically aligned and see how that works.
Thank you David. Your message confirmed my solution. Thank you.
Great explanation. I have one problem with this approach, maybe you can clear things up. I have two signals which look like peaks correlate where one peak is towards the beginning of signal 'A', and the other peak is towards the end of signal 'B'. Using cross correlation, the lag (which is large) to allign these samples doesn't provide the largest correlation value purely because there are less points involved in calculating the the correlation at this lag i.e. because many of the data points on one signal don't have an associated point on the other signal to be multiplied by because the two signals now have only a small region of overlap. I hope I have explained that comprehendably. Do you know of a solution for this/is this a known problem of cross-correlation, or am I missing something major in my understanding.
Thanks in advance.
+volcEmpire In matlab there is an unbiased version of the xcorr (cross correlation) function. I think this just divides each correlation measure by the 'overlapping' vertically aligned samples which gives more weight to the correlations associated with larger lags. You should be careful when using this technique as sometimes the correlation measures at large lags can be excessively scaled.
Thanks for your effort...It really helped me understand.
So what is the resulting number actually mean? 7.52 what? %?
The meaning of the number is dependent upon the signals involved. A value of 7.52 might mean signals are identical for one pair of signals but extremely dissimilar for another pair of signals. The reason for this is because the number returned by a standard correlation function are dependent upon the energy in the signals.
Normalised correlation attempts to resolve this by normalising to the energy of both signals so that the result lies within +- 1. A value of 1 in this case means that the signals are identical, -1 means the signals are an inverted form of each other and 0 means that they are orthogonal to each other. So you are probably wondering what 0.5 means versus say 0.9 - the simple answer is to say that a result of 0.9 means that the signals are more similar than signals that have a normalised correlation of 0.5. A more complete answer could be obtained by looking at the equations - I was trying to come up with a verbal description but couldn't come up with something that was easy to interpret - this question did get me thinking about it though so I'll get back to this at some stage.
So using your number system, what number result would indicate 100% identical, in phase, etc?
Using normalised correlation a value of 1. For standard correlation sum(x.^2) where x is one of the signals
I guess I meant standard, if that is what you're using in this video. You have 7.52 and -12.48 and so on so I am curious as to what 100% identical signals would yield.
17joren As an example say you had a signal [ -2 3 4 -10] then a standard correlation measure if you correlated this signal with itself would be 4+9+16+100 = 129.
Even me as a non mathematics non engineer understood this 😊
Thanks David, help me alot. Easily understand.
But hmmm.. I have issue about the lag is 2
Cuz I thought data2 shall be source and data1 shall be a time-lag series
It make sense in real.
So lag is 2 shall be a negative time (-2) in real. Btw its opposite.
Wonder this right or not, please explain it for me. Thanks alot.
Hi David, excellent video. I'm using excel 2003 but with a vast set of data (over 40,000 rows) is there a formula to calculate the correlation sequence value without having to individually multiply each numerical value associated with each sample? This is killing me!
You could downsample your data before correlating. Or you could cross correlate over a smaller range of lags. Both of these approaches would require a good understanding of the data you are working with to avoid missing useful info.
Alternative you could use octave to process your data (an online version is available at octave-online.net/)
Great tutorial and great code. Many thanks..
Thanks a million. That’s very kind of you.
Thank you for that useful demo !
Excellent explanation--thank you so much!
I like your explanation. Thanks
Thank you for the video. Simple explanation
Great!! simple and understandable
Ah yes, how handy! - Got some earthquakes to check. ;-)
Thanks. This was very helpful!
Really thanks a lot!
Great stuff !
Thanks for uploading...😄
"At Lag zero we have a correlation value of 7.52"..... really??? A correlation of 7+ ??? What the..
very helpful, thank you
Thanks, very helpful
thanks a lot
thank you so much!!!
Thank you very very very very much
Thank you!
Supercalifragilisticexpialidocious. thanks.
Thanks a lot!
thank you
Thank you so much omg
thanks dude
10 years later, this video still helps people out!
is lag the time delay?
Yes
Thanks for this!
thank you so much!!!!
Thank you