How do you create a new data.frame from the original data set but remove outliers from the all the columns defined as 2SD away from the mean in all the columns?
I would first get the mean and SD of the variable in question and save them to R objects. Then use the filter function as demoed at the 9minute mark, but modify your logical evaluation to identify cases that are greater than 2 SD from the mean. Like Below.... Mv1 = mean(data$var1) SDv1 = sd(data$var1) data$outlier = var1 > (Mv1+2*SDv1) | var1 < (Mv1-2*SDv1) Cleandata = filter(data, outlier != TRUE)
The optimal way might be to use the pipe operator. I’m sure everyone has their personal favorite. However, this method and other best research practices (which address this concern) are not covered in this video. It is tailored to undergraduate psychology students who are just using R for the first time and have no coding experience. I wanted to keep it as simple as possible. Never-the-less, it might be prudent to make and link to a second video for upper level students covering this technique and other research practices. Thank you for the idea and the comment.
How do you create a new data.frame from the original data set but remove outliers from the all the columns defined as 2SD away from the mean in all the columns?
I would first get the mean and SD of the variable in question and save them to R objects. Then use the filter function as demoed at the 9minute mark, but modify your logical evaluation to identify cases that are greater than 2 SD from the mean. Like Below....
Mv1 = mean(data$var1)
SDv1 = sd(data$var1)
data$outlier = var1 > (Mv1+2*SDv1) | var1 < (Mv1-2*SDv1)
Cleandata = filter(data, outlier != TRUE)
This way we will lose important data as well.. What could be the optimal way?
The optimal way might be to use the pipe operator. I’m sure everyone has their personal favorite. However, this method and other best research practices (which address this concern) are not covered in this video. It is tailored to undergraduate psychology students who are just using R for the first time and have no coding experience. I wanted to keep it as simple as possible.
Never-the-less, it might be prudent to make and link to a second video for upper level students covering this technique and other research practices. Thank you for the idea and the comment.