You can read this paper: Husson, F. & Josse, J. (2014). Multiple Correspondence Analysis. In The Visualization and Verbalization of Data, Greenacre et Blasius, Chapman & Hall
Muchas gracias por el vídeo, fue realmente ilustrativo y me ha ayudado para realizar unos análisis que necesito hacer para un trabajo de la universidad
Thank you very much for the video, it's quite useful! One question, in the FactoMineR the variance would correspond to the inertia? If not, is there any formula to obtain it?
If I use MCA for Feature Selection (variable selection), how could I know which variables is good enough as Cluster input based on above results a.k.a MCA.summary? Please, answer. Thanks.
Thank you so much for this very helpful video. I have been trying to include survey weights through the row.w option. Can you please give me an advise or an example on how to do that? I have been trying to refer to my "weight" column in different ways, but I always get the error that length of 'dimnames' is not equal to array extent. My weights are between ~0.9 and ~1.4. Trying to turn them into integers changes them to 0 and 1, which doesn't help either. Any hint is much appreciated.
Hi, The argument row.w must be a vector with the weights. So you must use for instance if you have 150 individuals in your data set and if you give a weight of 1 for the first 100 and a weight of 0.5 for the last 50: MCA(MyData, row.w=c(rep(1,100), rep(0.5, 50)))
Thank you very much, however, I still have another question: When I pull the mean across all five dimensions, each one equals zero. How can I use coordinates in regression if all the means are zero? Is there a paper that you are aware of and that I can read to understand the interpretations of the coordinates and if it is proper to use them in further analysis.
Dear Francois, very useful video! Thanks ! I have a doubt: My sample is conformed by 232 farmers, which are divided into 45 variables. Three of those variables are not important for the cluster I will run after the MCA, but they are important in terms of location and nature of the producer. e.g. I have all my 232 individuals categorised by their municipalities, by some agroecological zones where those municipalities are located, and by the nature of farmers themselves: whether he or she is a single farmer or whether the individual features as an agricultural enterprise....How could I exclude those three variables from the MCA analysis, specifically how to make explicit in the MCA formule you are showing, that I do not want MCA to take those variables into account to obtain the main components?
In MCA, all the variables that construct the dimensions are categorical, but some continuous variables can be used as supplementary variables (they do not participate to the construction, but can be used to interpret; see the variable age in the video). If you want that both continuous and categorical are used to construct the dimensions, you need to use the FAMD method (Factrial Analysos for Mixed Data; function FAMD in FactoMineR).
You can have continuous data as supplementary variables, so they do not participate to the construction of the dimensions. If you want that the continuous variables contribute to the construction of dimensions, you should use FAMD: factorial analysis for mixed data.
Hi I am getting this error Error in gene[gene >= 6] = 6] - 1 : NAs are not allowed in subscripted assignments > please help me on this , as I could not find anything related to this in internet
Can you tell more about your error. What are your lines of code? Can you check that your data are read as you want (quantitative variables and qualitative variables are the variables you want, check with summary).
Francois Hi!. When I run the MCA, I obtain a lot of dimensions, more than 30. I already eliminate some variables that may have caused noise ( i checked or their correlation with the dimensions), so I run it once again, and even if now the number of dimensions is decreased, I still got like 60 dimensions. What to do in this case?...maybe I shuld not be using MCA ? how do you do corelation between categorical variables in R?
The number of dimensions is equal to the total number of categories minus the number of variables. So if you have many categorical variables with many categories, there will be lot of dimensions. But, MCA allows you to summarize the information on the first dimensions, and this is why it is useful. When you explore a dataset, you are interested by this dataset so you have to keep all the variables as active. And then, you interpret the first dimensions. So, don't suppress the variables to increase the percentage of variance explained by the first dimensions.
Francois, thanks for you answer :). Still, I have a doubt. Normally, we decide to run a principal component method after obtaining correlations between the variables we're examining. If the variables have significant correlation, it is suggested to do PCA or MCA. So, my first uestion is how do you check for the correlation between categorical variables in R before doing the MCA?..... I know ( a priori) that some of my variables will not explain a lot o the variation of the data, take for example, my variable of FERT, which refers to whether farmers apply or not apply fertilizer to the crop I'm studying : 70% of them said they do apply something.....so, based on that 70%, would you still consider that variable in the analysis? ( i have some 6 variables with similar percentages)
No, the objective of MCA or PCA is to describe a dataset, i.e. the proximities between individuals (taking into account all the variables) as well as the links between the variables. And the principal component methods tell you what are the variables that are linked, but you do not have to previously suppress any variables.
Francois! Hi. i am writing my paper on the analysis I did some months ago. I would like to ask you how to use dimdesc to display all the dimensions with the categories and variables that explain each of them, and not only the three that the program displays by default.
These videos are tremendously helpful, and well-produced. Thank you very much Drs. Husson and Houee-Bigot. Please keep up the good work!
You can read this paper:
Husson, F. & Josse, J. (2014). Multiple Correspondence Analysis. In The Visualization and Verbalization of Data, Greenacre et Blasius, Chapman & Hall
Hello, how we can change the method from indicator to burt table or Joint correspondence analysis? Thank s in advance
Muchas gracias por el vídeo, fue realmente ilustrativo y me ha ayudado para realizar unos análisis que necesito hacer para un trabajo de la universidad
Thank you very much for the video, it's quite useful! One question, in the FactoMineR the variance would correspond to the inertia? If not, is there any formula to obtain it?
Yes, the inertia is the variance. We say inertia because it is multidimensional.
Excellent video! Thank you very much!
Hello! Thank you for the video. If we wish to do rotation, can we use var$coord?
If I use MCA for Feature Selection (variable selection), how could I know which variables is good enough as Cluster input based on above results a.k.a MCA.summary? Please, answer. Thanks.
Thank you so much for this very helpful video. I have been trying to include survey weights through the row.w option. Can you please give me an advise or an example on how to do that? I have been trying to refer to my "weight" column in different ways, but I always get the error that length of 'dimnames' is not equal to array extent. My weights are between ~0.9 and ~1.4. Trying to turn them into integers changes them to 0 and 1, which doesn't help either. Any hint is much appreciated.
Hi,
The argument row.w must be a vector with the weights. So you must use for instance if you have 150 individuals in your data set and if you give a weight of 1 for the first 100 and a weight of 0.5 for the last 50:
MCA(MyData, row.w=c(rep(1,100), rep(0.5, 50)))
@@HussonFrancois Thank you so much for your quick reply and help!
Hi, what can we use for factor/dimension scores for individuals?
you should use the object
$ind$coord
Thank you very much, however, I still have another question:
When I pull the mean across all five dimensions, each one equals zero. How can I use coordinates in regression if all the means are zero? Is there a paper that you are aware of and that I can read to understand the interpretations of the coordinates and if it is proper to use them in further analysis.
Great video! Can you please tell me what you mean by "supplementary category" in your video?
how to individually color and label in the mca plot
Supplementary variables are variables that are not used to construct the dimensions of MCA, but that are used to interpret these dimensions.
@@HussonFrancois So they do not affect the plot at all? Thanks.
Dear Francois, very useful video! Thanks ! I have a doubt: My sample is conformed by 232 farmers, which are divided into 45 variables. Three of those variables are not important for the cluster I will run after the MCA, but they are important in terms of location and nature of the producer. e.g. I have all my 232 individuals categorised by their municipalities, by some agroecological zones where those municipalities are located, and by the nature of farmers themselves: whether he or she is a single farmer or whether the individual features as an agricultural enterprise....How could I exclude those three variables from the MCA analysis, specifically how to make explicit in the MCA formule you are showing, that I do not want MCA to take those variables into account to obtain the main components?
You can use these variables as supplementary variables, using the quali.sup argument.
Merci bcp! :)
Hi! Thank you for the video and the package! Is it possible to analyze both continuous and categorical data with MCA?
Thank you in advance!
In MCA, all the variables that construct the dimensions are categorical, but some continuous variables can be used as supplementary variables (they do not participate to the construction, but can be used to interpret; see the variable age in the video). If you want that both continuous and categorical are used to construct the dimensions, you need to use the FAMD method (Factrial Analysos for Mixed Data; function FAMD in FactoMineR).
You can have continuous data as supplementary variables, so they do not participate to the construction of the dimensions. If you want that the continuous variables contribute to the construction of dimensions, you should use FAMD: factorial analysis for mixed data.
Hello, where can i find the data set? Thanks
Tous les jeux de données et supports de cours sont ici : husson.github.io/MOOC.html#AnaDo
great but where can I get the scrip
All the material is here: husson.github.io/MOOC.html#MCAcourse
Hi
I am getting this error
Error in gene[gene >= 6] = 6] - 1 :
NAs are not allowed in subscripted assignments
>
please help me on this , as I could not find anything related to this in internet
Can you tell more about your error. What are your lines of code? Can you check that your data are read as you want (quantitative variables and qualitative variables are the variables you want, check with summary).
Where can i find the script file?
this video is very useful!
All the material is here: husson.github.io/MOOC.html#AnaDOGB
Francois Hi!. When I run the MCA, I obtain a lot of dimensions, more than 30. I already eliminate some variables that may have caused noise ( i checked or their correlation with the dimensions), so I run it once again, and even if now the number of dimensions is decreased, I still got like 60 dimensions. What to do in this case?...maybe I shuld not be using MCA ? how do you do corelation between categorical variables in R?
The number of dimensions is equal to the total number of categories minus the number of variables. So if you have many categorical variables with many categories, there will be lot of dimensions. But, MCA allows you to summarize the information on the first dimensions, and this is why it is useful.
When you explore a dataset, you are interested by this dataset so you have to keep all the variables as active. And then, you interpret the first dimensions. So, don't suppress the variables to increase the percentage of variance explained by the first dimensions.
Francois, thanks for you answer :).
Still, I have a doubt. Normally, we decide to run a principal component method after obtaining correlations between the variables we're examining. If the variables have significant correlation, it is suggested to do PCA or MCA. So, my first uestion is how do you check for the correlation between categorical variables in R before doing the MCA?..... I know ( a priori) that some of my variables will not explain a lot o the variation of the data, take for example, my variable of FERT, which refers to whether farmers apply or not apply fertilizer to the crop I'm studying : 70% of them said they do apply something.....so, based on that 70%, would you still consider that variable in the analysis? ( i have some 6 variables with similar percentages)
No, the objective of MCA or PCA is to describe a dataset, i.e. the proximities between individuals (taking into account all the variables) as well as the links between the variables. And the principal component methods tell you what are the variables that are linked, but you do not have to previously suppress any variables.
Merci Francois!
Francois! Hi. i am writing my paper on the analysis I did some months ago. I would like to ask you how to use dimdesc to display all the dimensions with the categories and variables that explain each of them, and not only the three that the program displays by default.
Helpful video but couldn't understand what she says.Is she speaking in English?
Yes, french accent :)