Hi, thanks for the great video! One question though, do you have any references to support winsorizing the outlier value to the outer percentile +/-1 rather than just changing it to the value itself? In other words, are there any papers I could reference that would support changing the value of 19 to 9 rather than 8? Thanks for any help!
Thanks for the video. I also wondered why the plus 1, it could be nice if you explained why. However, reading the scientific literature I find that there are more studies that do this. In my understanding, it makes sense. Because if the extreme values are replaced by the highest value, then the extreme value modified and the original high value will look the same. Therefore, you will not know in your data how many extreme values had been changed, but with the option of +1 you still can see this in the analysis. I recommend the following article: Wainer H. Robust statistics: A survey and some prescriptions. Journal of Educational Statistics 1976;1:285-312
Actually, in Winsorizing you are altering a "percentile" of your data not always outliers reporting by 3IQ. For instance, you may have 10 data points and you want 1st Winsorize. It means that only "one" extreme number will be changed no matter. in fact, that number could be an outlier or not. Or you may had more than one outliers but 1st Winsoize replaces the largest one. It is a bit tricky :) In this video, the outlier and winsorized number was the same but as I mentioned you may have more outliers.
Hi Sarah - Although I cannot give you a reference or publication for this method, I can confirm that I had studied exactly the same method for winsorizing (the next highest/slowest data point value +/-1) on my statistical course in Hungary.
Very interesting video! It was very useful!! Can you give me the page number of the text shown in the video, and the references of Dixon, 1960 and 1957? I would like to cite them Thanks a lot!!
I have measured the response times each participant took to respond to 24 items, however, only the times of the correct responses of each participant were considered, thus leaving me with distinct number of response times for each participant. Some participants have taken more time than one would expect to answer the question, which appeared to be related to how difficult the task was for them. Thus, I do not want to eliminate the outliers (I used the z-scores and identified results 3SD above the mean as outliers). I was thinking about winsorizing the outliers, however I have many doubts on how to do it. 1. Does one always have to winsorize both extreme ends? Because in my case only the above the mean results were considered outliers. 2. How does one conduct the winsorization in an uniform way for all participants when they have different numbers of response times that will be analysed? Whilst one participant may have given 20 correct responses, another may only have answered correctly to 13 or 7 questions. How can I set that I will only winsor 10 % of the data for example, when the number of responses is so different? In the end all I want to use from this task is the mean of the response times of each participant. Thank you.
I have a question: after some digging, it is apparently that Winsorization should be symmetrically apply to both ends. So what happens if I only have one outlier at one end but not the other? Would it still be a valid approach to modify a score at one end?
I have never seen it suggested in a peer reviewed publicatoin to Winsorize on both sides of the distribution. If you have one or more outliers on only one side, then deal only with those outliers on that one side. That's my recommendation.
Excellent video, thank you. I also hope this catches on more over time--we are throwing away a lot of extreme yet valid data.
Hi, thanks for the great video! One question though, do you have any references to support winsorizing the outlier value to the outer percentile +/-1 rather than just changing it to the value itself? In other words, are there any papers I could reference that would support changing the value of 19 to 9 rather than 8? Thanks for any help!
Still wondering this. I have never heard of this before?
It is realy useful! and expecting more from you.
Thanks for the video. I also wondered why the plus 1, it could be nice if you explained why. However, reading the scientific literature I find that there are more studies that do this. In my understanding, it makes sense. Because if the extreme values are replaced by the highest value, then the extreme value modified and the original high value will look the same. Therefore, you will not know in your data how many extreme values had been changed, but with the option of +1 you still can see this in the analysis. I recommend the following article: Wainer H. Robust statistics: A survey and some prescriptions. Journal of Educational Statistics 1976;1:285-312
Actually, in Winsorizing you are altering a "percentile" of your data not always outliers reporting by 3IQ. For instance, you may have 10 data points and you want 1st Winsorize. It means that only "one" extreme number will be changed no matter. in fact, that number could be an outlier or not. Or you may had more than one outliers but 1st Winsoize replaces the largest one. It is a bit tricky :)
In this video, the outlier and winsorized number was the same but as I mentioned you may have more outliers.
@@pooyax61 Hi! Do you have any papers that i can cite for this? I would like to use this method:)
Do you have a reference for the Windsorize +1 method described in this video?
Hi Sarah - Although I cannot give you a reference or publication for this method, I can confirm that I had studied exactly the same method for winsorizing (the next highest/slowest data point value +/-1) on my statistical course in Hungary.
@@dianamintal8480 ez megnyugtató :D magamtól találtam ezt a módszert és jó látni, hogy használják.
Is it possible to highlight extreme outliers (when they are too many) in data set (data view) of spss to remove them?
Really helpful video. I'm just wondering how you would report any winsorizing that you have done to the dataset?
How to winsorize first and 99th percentile?
Really clear and informative - thanks
Thanks a lot. However, the link in the book doesn't open this video. Perhaps, you need to edit the link in the pdf.
The links in my how2statsbook are hosted by my how2statsbook channel: th-cam.com/channels/low2dWW7y1uO4kK4SxwsjA.html
Very interesting video!
It was very useful!!
Can you give me the page number of the text shown in the video, and the references of Dixon, 1960 and 1957?
I would like to cite them
Thanks a lot!!
Thank you so much! Now I just need to found a reference for winsorize +1
But why is it better to add+1 instead of just transforming it to the next highest value?
I have measured the response times each participant took to respond to 24 items, however, only the times of the correct responses of each participant were considered, thus leaving me with distinct number of response times for each participant. Some participants have taken more time than one would expect to answer the question, which appeared to be related to how difficult the task was for them. Thus, I do not want to eliminate the outliers (I used the z-scores and identified results 3SD above the mean as outliers). I was thinking about winsorizing the outliers, however I have many doubts on how to do it.
1. Does one always have to winsorize both extreme ends? Because in my case only the above the mean results were considered outliers.
2. How does one conduct the winsorization in an uniform way for all participants when they have different numbers of response times that will be analysed? Whilst one participant may have given 20 correct responses, another may only have answered correctly to 13 or 7 questions. How can I set that I will only winsor 10 % of the data for example, when the number of responses is so different? In the end all I want to use from this task is the mean of the response times of each participant.
Thank you.
Really helpful , thank you
I have a question: after some digging, it is apparently that Winsorization should be symmetrically apply to both ends. So what happens if I only have one outlier at one end but not the other? Would it still be a valid approach to modify a score at one end?
I have never seen it suggested in a peer reviewed publicatoin to Winsorize on both sides of the distribution. If you have one or more outliers on only one side, then deal only with those outliers on that one side. That's my recommendation.
You have other techniques to perform in that case, other than winsorization