Here's a fun pet project I've been working on: udreamed.com/. It is a dream analytics app. Here is the TH-cam channel where we post a new video almost three times per week: th-cam.com/channels/iujxblFduQz8V4xHjMzyzQ.html Also available on iOS: apps.apple.com/us/app/udreamed/id1054428074 And Android: play.google.com/store/apps/details?id=com.unconsciouscognitioninc.unconsciouscognition&hl=en Check it out! Thanks!
Dear Prof. Gaskin, I have three IVs predicting one DV. One of the IVs did not predict DV. In my initial manuscript, I mentioned that possible reason accounting for the null finding could be the reason that the particular IV varies from one ethnic group to another ethnic group (ethnic difference). I did not hypothesize any ethnic influence in my study at all but only in the discussion section suggesting for future research possibility. And in my initial write-up, I gave the reason that our ethnic group has disproportionate size(some ethnic size big some small) and my research focus is not on any ethnic difference hence I did not study ethnic in my analysis. The reviewer came back and provided this comment: - the authors identified ethnic as a potential confounding variable-has the authors taken this into consideration in their analyses(e. g., covariate), especially since data on ethnic composition are available? 1. I am not sure what the reviewer wants but I am thinking they may need to rerun my analysis? 2. Do I need to add a control variable (ethnicity) in my smartPLS model based on the reviewer's comment. 3. And if so, I am not sure how to do it. I have already created the dummy variables via SmartPLS and four ethnic groups have been created. And I am not sure how to proceed now. I would appreciate it if you could provide me some insights. Thank you.
Yes, just include those dummy variables in as control variables. The way to control for the effects of any variable is to draw a regression line from that variable to the other variables that it might affect.
@@Gaskination Dear Prof, I am not sure whether I just need to control the ethnicity by just adding the variable to my existing PLS model or I need to conduct moderation as I have also been given a comment by a professor: I think since you mentioned about the ethnicity difference and the reviewer has suggested you to consider the confluence of ethnicity on the relationships, you need to analyse the moderating effect of ethnicity on the three relationships in your model (present the moderating effect of ethnicity on each relationship in the structural model). If there is no effect, then ethnicity is no the reason for the insignificant relationship. If yes, then just add this results to you manuscript to answer the reviewer's suggestion, it would add some value to you manuscript. Shall I just stick to add the control variable (ethnicity) and point the regression line to self-esteem and run the analysis- to check whether it is significant or just conduct a moderation as suggested by him?
Hello Prof. I found an article "Hair, J. F., Ringle, C. M., Gudergan, S. P., Fischer, A., Nitzl, C., & Menictas, C. (2019). Partial least squares structural equation modeling-based discrete choice modeling: an illustration in modeling retailer choice. Business Research, 12(1), 115-142." in this article they mentioned about four steps to estimate categorical variables the PLS path model estimation of categorical variables requires that the following steps be followed: 1. [Model] When creating the PLS path model, use Boolean blocks for each categorical variable, whereby a Boolean variable represents each category. 2. [Data] Use orthonormal data, that have no correlations between the Boolean indicators and that are standardized to unit variance. 3. [Estimation] When data is orthonormal, Mode A and Mode B model estimations include the same results; thus, the standard PLS-SEM algorithm allows the estimation of PLS path models using categorical indicator data and multiple latent variables. 4. [Rescaling] A last step involves the transformation of the estimated inner and outer weights and outer loadings into the metric of the Boolean variables (i.e., the metric of interpretation). Can you help to show practically how to follow these steps on PLS ? Eagerly waiting for your reply.
I have a question regarding the "design" of my formative model in SmartPLS. I have one dependent variable "employee motivation" (scale from 1=very motivated, to 6=not motivated at all) and 12 latent independent variables which are the "determinants of employee motivation" (every latent variable has 2-4 indicators which measure the LV formatively). -> When i want to include the "moderating effect" of the generation (4 dummies = baby boomers, genX, genY, genZ) into the model to evaluate if the generation has a significant effect on employee motivation - do i have to include 3 latent variables for each dummy (not 4 because one is the reference) just like you did it in the video and analyze if there is a significant influence via bootstrapping?
For this situation, if your hypothesis is that generation impacts motivation, I would instead recommend doing an ANOVA where the outcome variable is motivation and the grouping (factoring) variable is generation. This will give you more easily interpretable results.
@@Gaskination Thanks for the answer! The ANOVA works well to see if there is a general impact of generation on motivation. But when i want to analyze if the generation has a impact on which determinant of motivation is more important in the sense of which path coefficient to the motivation is bigger that in SmartPLS?
@@schluggasishot Ah, in this case then MGA is definitely more appropriate. I would recommend using generation as the multigroup variable, as shown in this video: th-cam.com/video/b3-dyfhGE4s/w-d-xo.html
@@Gaskination Thanks again! But the MGA is only working with 2 different groups but I have 4, and I think it makes no sense to put some of them together like you did in the video for Low/High-Frequency. Is there a possibility to analyze it with 4 groups?
Hi @James Gaskin! Thanks for your helpful videos! I'm using AMOS and I have two categorical independent variables (IVs). Each IV has 3 categories. IV1: Platform rating IV1 categories: 1, 3, 5 IV2: Independent rating IV2 categories :1, 3, 5 The data was collected from a survey (experimental vignette), where respondents were exposed to one of 9 possible combinations based on values from IV1 and IV2, for example, IV1 = 3 and IV2 = 5, etc. In AMOS, I have dummy variables as indicators towards the dependent variable (DV). I have k-1 dummy variables in AMOS, so for IV1, I have 2 dummies and IV2 I have 2 dummies, which is a total of 4 dummies. Question: How do I model the interaction of IV1 and IV2 on the DV in AMOS? Can I simply multiply IV1 and IV2 dummies and create indicators for the multiplied terms, or is this more suitable for moderation?
Would it be meaningful to directly use Onyl Child and Middle Child as observed variables to determine the latent variables Playful and Decision Quality?
That is exactly what I've done in this video. In these cases, we are essentially controlling for the potentially confounding effects of these two dummy variables.
@@lurenz404 oh, in SmartPLS, it would be better to attach them to latent factors because otherwise it would interpret them as indicators of Playful and DQ.
thank you fro immediate responses. According to the interval categorical predictor (IV) (e.g. Age (18-28; 29-39;40-50---> coded 0;1;3) should we do these steps?
Hi james, thanks for your videos. I was wondering about what to do if one of my DVs is measured on a single item Likert scale and whether it is possible to include latent variables as control variables.
It is common enough practice to use single-item dependent variables. No special tests or procedures (or citations) needed. Same with latent control variables. That is perfectly normal. Just include them as you would include any independent variable.
Dear James, thanks for the videos! I was wondering, can we combine CO_1, CO_2, CO_3, and CO_4 dummy variables into one single variable to check whether they have an influence on DV? Thanks in advance for your responses!
No. That's the reason we had to break them up. They cannot be used directly in the model as a single factor because the numbers associated with group value is not meaningful. A 3 is not more than a 2 in this case; it is just different. So, the internal variance is not indicative of construct variance. Thus any "effect" they have on another variable is meaningless, spurious, and uninterpretable.
Hi James, appreciate you shed light on my another question. I have four control variables: gender, income, age, education. Gender is dummy variable. The others are ordinal (answer in 5-7 ranges with order meaning low to high). For the ordinal ones, can I simply use them in Smartpls? I read someone said ordinal scales cannot be used in Smartpls but I wonder Likert scale itself is also an ordinal scale? And for doing the control, I was told I should just point them to the DV. Hopefully they do not have signficant paths and they can be reported as controlled. But if one of them has significant path, what should I do? Or is there other better way from your view to do control variables in Smartpls in my case?
Ordinal variables should be fine in PLS. I have never been discouraged from using them. Yes, Likert scales are ordinal. As for how to include controls, you are correct that you can just point them to your DV. Keep them as separate factors though (i.e., gender gets a factor, age gets a factor, etc.)
Hi Prof Gaskin, if i have a control variable for age group coded as 1 = 21 below, 2 = 21 to 30, 3 = 31 to 40, 4 = 41 to 50 and 5 = 51 and above. Should i dummy code them when i use it as a control in my PLS for my dependent? I saw that the majority of the papers didnt dummy code them but technically it should be? thanks in advance
@@TheKoaydarren You are correct that technically this is not a proper ordinal variable because the intervals are not equal. However, if you take a loose interpretation of the resulting estimates, then it is okay. If you just say that the estimate is positive or negative and small or large, rather than reading too much into the exact estimate number, then it is fine.
You could, although it would be hard to interpret. Also, usually binary variables are not mediators because a mediator must be a causal consequence of the IV. It is theoretically uncommon (though not impossible) for a binary outcome to both be predicted by something and predict something else.
hi, James,thanks for your videos.I wondered that the laten variable named only child can only have one dummy variable? can it have all the three dummy variables to effect the laten variable playful?
@@GaskinationYes, for this example, it makes more sence to keep them separate. Can a laten variance have two or all dummy variables to effect another laten variable in other examples?
@@zhiqiangli1082 I recall investigating this a year or two ago and I remember not being able to find conclusive support for that approach. However, I also remember not being able to find conclusive support against it. So, I'm not sure. To play it safe, I have always kept them separate. Sorry to be not much help on this one...
Hi JAMES please can you answer this question. is there a way through algorithm or bootstrapping to figure the number of data included in the project according to the categorical variables included I am asking because a study i have done include a variables with three categories however in the model I only included two of these dummy variables. however, I forgot how I labelled them I suspect that I combined two variables in one. I need a way to figure this out not through the data set but either through algorithm or bootstrapping. sorry if my question sounds confusing i just need to know the n in each category in the project it's self not in the dataset. Thanks a million
If you mean you no longer have access to the dataset, then it won't run in SmartPLS anyway. But, if you have access to the dataset, just open it and do a "sumif" function for the column of interest.
Many thanks for the valuable videos. I wonder why I get n/a in sample, standard, statistic and P-value in bootstrapping table? Also, I can see 0 for all the factors. I need your advice, please. Thanks
If you are using PLSconsistent, then switch to PLS. The consistent algorithm is ironically inconsistent. Sometimes it gives n/a for results. If you are not using PLSc, then the issue is with the variables. Make sure to have done proper data preparation to ensure valid data/variables are being used.
Hi James, useful video as always! I was wondering, is there a way to use multinominal categorical scales as indicators for a formative construct which is being predicted by several antecedent factors and in turn predicts a reflective construct? All other constructs are reflective using Likert scale measures.
multinomial categorical variables should not be used as indicators in a set of indicators. Instead, they should be included separately. as dummy variables with only one indicator.
Hi James, Thank you so much for the video. I have a question here. My model has 6 latent variables, and I tried to include 1 control variable (company size) with 4 dummy variables. As you mentioned, I left one as a reference out of the model. However, the SmartPLS is unable to produce the result (Sample Mean is shown as N/A, and P-value etc are all blank) after I hit consistent bootstrapping (I have a reflective model). Please can you tell what happened here? And any solution? My original sample size is 96, is it too small for such a model? Many thanks!
This is probably a problem with the PLSc algorithm. SmartPLS still doesn't seem to have it stable. In such a case, just use the original PLS algorithm.
Dear Mr. Gaskin, I have a question regarding categorical predictor variables. In my model I try compare one music format with 2 others. I have 96 observations in total and a perfect distribution for each group with i=32. How is it possible to compare the treatment group with every single control group? My starting IV would be the category dummy coded, meaning 1=vinyl record (with i=32), 0=CD (i=32). The same method I would apply for comparing the vinyl record (1) with MP3 (0). Is that a suitable approach to compare them? I do not find any paper in this matter. Kind regards, Benjamin Beiersdorf
Many thanks for the great video and clear explanations! Really helped me with my thesis. One quick question: Would it make sense to use PLS to test a model which basically consists of 1 categorical predictor for 3-5 metric endogenous variables with these 3-5 variables being predictors for 1 metric outcome variable? Or do you recommend a different approach? Thank you very much. Stay safe and healthy!
Hi, Dr. James! Thank you for the explanation. But, I am still confused, when I want to figure out whether sociodemographic variables (such as 4 group of age, 3 group of income, and 4 group of household size) have significant effect to the model. Can I included all of the variables to the model simultaneously? Or I have to add to the model one by one (first add the 4 group of age to the model and then take it out, then add 3 group of income)? Thank you!
@@Gaskination thank you for your response! I have another question. If all of the sociodemographic variables are not significant, should I keep these variables in my model? Because keep these variables will effect the other path significant. So, the other variables path significant between maintain the sociodemographic variables and without sociodemographic variables in the model will be different. Thank you in advance.
@@sharfinazatadini3398 It is optional. You can definitely argue that it is okay to exclude them because they do not have any significant effect on the outcome variable. You can say that you exclude them for the sake of parsimony.
Thanks a lot for the explanation! Just one more question, if sociodemographic variables are not significant to the outcome variable. Are there any probability that sociodemographic would have an effect when analyzed using MGA? Thank you!
@@sharfinazatadini3398 It is possible. The regression as a control variable implies a direct impact on a specific variable, whereas the MGA implies a direct effect on the relationship between two variables.
Dear Gaskin, Greetings I have one query regarding interpretation of Categorical variables. In my study, I found similar beta value (male, female) to predict endogenous construct. How would I interpret (separately for male and female) as results are significant. Thanks
Hello Scholars I am working on a research project by using SmartPLS for the first time. I have several control variables such as gender (used dummy variables for Male and Female), and working experience (dummy variables based on different length of service). I like to find out how to report these control variables, as i could not find any reference so far in which control variables has been used while applying PLS approach using SmartPLS.Waiting guidance regarding how to report Control Variables or by providing any reference.
Thank you so much for this helpful video! Is there any chance to add control variables to the model in this way? I have 2 control variables, each has 4-5 options for respondents to choose. When I add all of them as dummy variable to the model, Bootstrapping seems to have a single matrix error. 🤔 And when I add each option separately and run the model, answers (t-values and beta) would be different; I don't know which one to put on my tables using this method.
Dear James, thank you for the video. Can we include an independent latent variable that has two binary items (1/0)? i.e. Is it OK to have a latent variable with binary items?I am using Warppls. Many thanks
If those two items should theoretically move together (be highly correlated), then it is okay. However, if they are independent of each other, I would strongly recommend to keep them as separate predictors.
Hye dr. Hope you are doing fine. Awesome video. But I have a question, if you dont mind. When we run structural model path for hypothesis testing of thesis. Do categorical moderator needs to be included in the model together (with direct and mediator variables). Or excludes first, which mean run direct and mediator model first, then report. After that includes categorical moderator in the model when testing the hypothesis for moderator. Thanks in advance. I really appreciate your expertise and respond.
You can exclude the categorical moderator (used for multigroup analysis I assume) when testing mediation. The only reason to include it at the same time is if you are hypothesizing moderated mediation.
Hi Dr. James thanks for the videos, I would like to ask you regarding the categorical variabel in PLS model. Why all of the result of outter loading, AVE, Composite reability for all categorical variabel is 1? . is it not calculated in the PLS methods? Thanks in advance for your help
A factor with a single indicator has perfect reliability (the item with itself) and is fully predicted by it's only item (hence the loading/weight = 1).
@@Gaskination ohh i see, thanks for the insight. I would like to ask you more regarding the interpretation of categorical variable. If some of the categorical variable are not significant (in this case if you connected all of cat variables the "child order" variable only to 1 laten variable) for example all of the child order with "only child" as a reference variable that connected to only 1 laten variable for example "decision quality" then the result after bootstraping showed that "first child" and "last child" isn't significant (only "middle child is significant) then should we remove the unsignificant categorical variable from our model? or should we keep it? If wee still keep it in our model then how can we interpret the unsignificant one? Thanks in advanced :)
Dear Dr. Gaskin, Is it ok if I just create separate set of categorized data in excel and create separate models in SmartPLS (for example one model for the 'only child set' and other one for 'middle child set') just to see the results for each set?
Thank you so much Dr.Gaskin for all your tutorials , they were great help. Just to make sure , I have one DV with a single categorical indicator which is the type of new venture creation the respondant plan to launch in the future (independent entity , joint venture, acquiring expertise , expansion , none (in case he/she already have one and don't plan to start another one ) , can i use dummy variables based on the type ? Thank you
@@Gaskination Thank you so much Dr.Gaskin. If you don't mind asking , is it normal after converting DV variables to dummy variables to have a low composite realibility (0.127) and AVE (.324) ?
@@Gaskination Am really sorry Dr.Gaskin for asking again . but i have this new issue related to the dummy variable. am getting a negative path coefficient between LV and DV , where the LV is likelihood to start a business and DV is categorical variable which is the type of business, so the path coefficient must be positive but I got a coefficient of -.359 and p-value of .085 (>.05). How is that possible ?
Dear Dr. Gaskin, Can I test all my categorical data on my dependent variable for its significance first? If one categorical data is significant, then only I proceed with the steps as shown by Dr. Gaskin to find out which of the response under that categorical data possesses the impact on the dependent variable?
Thank you Dr. Gaskin. Anyhow, I was recommended that I can proceed with the steps shown in this video, but tested with DV one by one to see directly which category under the categorical variable is significant. If all categories are not significant per categorical variable, then that categorical variable is considered not significant to DV. Can I do that?
Hi Mr. Gaskin, thank you so much for the video. I am really new to smartpls and your video really help me. I have a question regarding this, I have 1 independent variable which is categorical (Type of employment, 3 types), and I make it into dummy variable. Should I make it into 2 dummy variable or 3? Also, do I need to test the factor loading/ alpha cronbach to this IV? because the measurement is only one (which type of employment you're having), so my factor loading currently is 1. Therefore I am wondering if I only need to test factor loading and alpha cronbach to the mediator and dependent variable? Thank you, I'm hoping for your reply.
Two dummy variables (one value is the reference category). Cronbach alpha is only relevant to reflective factors (which don't include nominal/categorical variables).
@@Gaskination Thank you for the quick reply, Sir. So when drawing the smartpls model, I only need to put the two dummy variable as the IV? Is it possible to use this IV and connect it to a mediator then DV? If I use this model, is it the same with multi-group analysis? Thank you very much, I’m waiting for your reply.
@@raneedevina2369 You can connect through a mediator. This is not the same as multigroup analysis. Multigroup analysis compares paths across groups. In this case, you could use your categorical IV as a grouping variable and then just examine the path between the mediator and the DV. If you do this, then make sure to exclude the categorical variable as an IV (i.e., don't use it as IV and grouping variable).
Dear Dr.Gaskin, I am very thankful to be given useful video. I have 307 samples. In my research, My model has 8 latent variables. In addition, I have 11 categorical variables as control variable and each has 2-7 questions. When I transformed each control variable to dummy variable, Singular Matrix Problem occurred in smartpls3. Unfortunately, I don't know what variable is troublesome. Could you give me any advice on solving this problem?
It is because you created dummies for all the values. Instead you have to leave one value out as the "reference" value. For example, if you had a variable for gender where 0=male and 1=female, you would create one dummy, just for female. For industry, if you have retail, service, manufacturing, and other, you would create three dummy variables, and then leave "other" out as the reference value.
Hi Sir! I want to ask you if it is possible to evaluate the affect of latent variables to a categorical variable. To be more specific, I mean how latent variables affect the yes or no of the responder. Is this something that is appropriate to do with PLS SEM? And if it is, is this something that can be done through CB SEM? Thank you in advance!
It can be done. The resulting estimate is interpreted a bit differently though. If positive, then more likely Yes (assuming Yes=1), and if negative, then more likely No (assuming No=0).
Hi Mr.Gaskin, thank you so much for the great video. I am new to SmartPLS and your videos really helped me. I have a question regarding the categorical variables. In a part of my model I have two independent variables (IV1 & IV2) both influencing a dependent latent variable (DV). IV1 has 3 levels (categories) and IV2 has 4 levels (categories), would you please explain how should I draw them in SmartPLS? [For example for IV1, Should I do like this: leave out category 1 as reference category and draw two latent variables (factors) for category 2 and category 3 and then put values for category 2 on its latent variable and similarly put values for category 3 on its latent variable, and at the end draw paths from factors for category 2 and 3 to the DV?] And one more question, if the above process is true, can I do this simultaneously for all categories or Should I run model for each category separately? And if the process I said is not true, I would be much appreciated if you can explain and guide me, thanks.
If you mean that the IVs have lower order latent dimensions (dimensions with their own measures), then the IV sounds like a higher order factor. In this case, follow the video here: th-cam.com/video/LRND-H-hQQw/w-d-xo.html If instead you mean that these are individual variables (single measures) with multiple values (e.g., Country as the IV and USA, Japan, India as the values), then follow the video above. If something else, please clarify.
Dear@@Gaskination thank you for your response. The IVs is similar to your second example and something like the example described in the video above (child order). My question is that for example in the case of this video, how should I act if I want to see the effect of different categories of "child order" on the "decision quality" ? is the following way true?: First I leave out one category (e.g., only child) as the reference category then make three variables (in the way you explained in the video above) for other three categories and draw a path from each one to the "decision quality"? If this is true, can I do it simultaneously for different categories of "child order" or I should run the model three time, each time for just one category?
@@Gaskination Thanks again for your response Dr.Gaskin It really helped me Just one more question, would you please tell me an example of how to interpret the results? Are Path coefficients relative to the reference category? and how can we understand the effect of reference category on the dependent variable? (Sorry if my questions are trivial, I'm very new to this field)
@@aminnaeeni4297 correct, they are relative to the reference category. So, if you see a positive effect, then it is a stronger effect than the reference category. If it is a negative effect, then it is a weaker effect than the reference category. This is also how you understand the effect of the reference category (relative to all other categories' effects).
Hi James, i created a path model in smartPLS3 (all reflective measures), with traditional bootsrapping, i find that my data supported 6 hypothesis (P value significant), but with consistent bootrapping, none of them are supported (all P value > 0.8). How is it possible ? Thank you very much for helping me.
The PLSc algorithm still has some bugs. If it is producing erroneous results, I recommend sticking with the PLS algorithm, or (better yet) using a covariance-based software, such as AMOS when you have no formative factors.
Dear Dr. Gaskin, Thank you very much for this video. In my model, there is one independent variable, two mediators and one dependent variable. Also, there are two moderators between IV and mediators. Can I use dummy variable as independent variable? I am using PLS 2. Thanks.
Thanks for your reply. In that case how do I interpret the result? My independent variable has two options (yes/no). So, what will the individual value of path coefficient represent between IV and mediators?
You would interpret it more like a t-test. If the binary variable is coded as 0=No and 1=Yes, then a positive beta would be associated with Yes, whereas a negative beta would be associated with No.
DR GASKIN Which version of spss u r using in this video in my spss i don't get this option i e create dummy variables can u assist me from where can i get this option Thanks in advance
Awesome video Thank you. I learn easily through your video for my analysis. But I have 2 questions if you don't mind. Do moderator (categorical) only can be tested if X and Y significant? Because when I run the moderator even the X and Y not significant during direct path analysis, but somehow when moderator been introduced, its do effect other Moderator relationship and make it significant. If yes, do u have a good reference for my reference? as I am unable to find it. ....2nd Question..., I can run 2 different categorical moderators (Gender & Diet) at one time am I right ? I don't have to run 2 different model differently and do it 1 by 1. Have I understood it right?...Looks forward for your response. Anyway thanks a lot for making easy tutorial and free education. May God bless you
1. This is a great situation as it indicates that the X has no effect on Y without considering the moderation effect. No reference needed. This is perfectly normal. 2. In SmartPLS, it will require you to test only two groups at a time (if doing MGA). You can use the same model and have all moderators loaded, but it will require you to select which two groups you're comparing.
prof. Please answer: Q1. I don't understand if i want to find significant difference between the groups on individual items CSR1 to CSR 6 (incase they are formative items), then how to do it...? Please suggest.. Q2. And also do i need to do MICOM (measurement invariance) before doing this, as during MICOM second step, if some of the items may be insignificant in my constructs, which i will have to remove them...
Dear Mr. Gaskin, thank you very much for your videos, I find them incredibliy helpful. Maybe you are able to help me with some troubles I had with calculating my model. I created three dummy variables, but when I run the analysis, the program tells me I have a singular matrix problem. This doesn't happen, when I only include two of the dummies. Do you have an idea, why this happens? Thanks in advance for your help!
This is expected. You are only supposed to include n-1 categories as dummy variables. This leaves the excluded category as the "reference" category. Usually the "other" category is left as the reference category. If you don't have an "other" category, then just pick the one you are least theoretically interested in. When it is binary, this is a lot easier. E.g., if you have a gender dummy variable, you might say 1=Female and 0=Not Female (i.e., Male), but we leave Male out of the model as the reference category.
SmartPLS does not have an anova equivalent. ANOVA is intended to compare a single variable across multiple groups. SEM (like in SmartPLS) can be used for assessing a structural and measurement model across multiple groups.
Dear Dr. Gaskin Thank you for your informative videos. They have definitely helped me a lot with my analysis. I have stumbled upon one problem though and the broader Internet has not been able to help me much, so I was hoping you might be able to. I have my survey participants 8 literacy questions, with the answer options Yes(True), No(False) and I don't know. I have created 8 dummy variables for each question with 0=Incorrect and 1= correct answer. How do I include this variable in my SmartPls model? Do I add each dummy variable or can I combine all 8 and treat them as one single continuous variable (since the people with the highest numbers are most literate)? I thank you for your help.
I would recommend just creating and overall score (sum) outside of SmartPLS, and then you can bring it in as its own factor (single indicator) into SmartPLS.
@@Frasta-qc7qz You could just use SPSS or Excel or Google sheets to sum up (or average) the items for that factor. That sum or average will create a new column in your dataset. That new column is a variable that you can use to replace the latent factor.
@@Gaskination Dear Dr. Gaskin, thanks a lot for your responses! Can I do that for my control variable, such as age (with 5 answer options)? Because I want to check whether those control variable has a significant relationship with DV as a single factor like latent variable in IV or DV regardless of which age category has influence with DV. Thank you!
@@Frasta-qc7qz Yes, if the value of the category increases with the value of the construct, then you could include it as a single variable, rather than breaking it up into dummies.
I have not. As long as the dependent variable is binary, then nothing changes. However, if the dependent variable has multiple categories, or has multiple indicators, then special preparation must be made by reducing the DV down to binary categories (dummy variables) with a reference category left out.
Hiiii james i took categorical variable i.e Gender as moderator in my thesis should i create dummy variables for gender like male 0 female 1 in spss first or itz not necessary for gender i.e dichotmous further i am using smartpls4 for further analysis
Dear Sir, I am new to the field of structural modelling and your videos have been extremely helpful. Thank you. i have a few questions regarding my model. The model has 7 categorical variables:- 'age', 'income', 'gender', 'lifecycle stage', 'level of education' etc. Some of the variables have up to 5 categories. Hence, the number of dummy variables per categorical variable range between 1 and 4. These variables have a direct effect on my two dependent variables. How do I test the affect of all the categorical variables? Because, when connecting these dummy variables directly to the dependant variables, leaving the reference dummy variable out of the model, the system shows singular matrix problem. Thanking You
The singularity problem happens when one of the variables does not vary (e.g., if you have all male respondents). Try removing one variable at a time to see which one is causing the problem. Then you can examine that variable's values more closely to see what the issue is.
Hello Dear Prof. I have two dependent variables. One of the dependent variable is dummy variable ? Can i use dependent dummy variable in SEM using SMARTPLS ?
Yes. Just make sure to interpret it in a 'relative' way. So, for example, if the DV is gender and is coded as 0=male and 1=female, then this variable is actually a "female" variable, and not a gender variable. In this case, you would interpret a positive effect on this variable as: greater values of the IV are associated more with females. A negative effect would be interpreted inversely: greater values of the IV are associated more with males (i.e., not females).
Thank you ,Prof. Your advice really helpful. My dependent variable is "Active stock market paricipation". If respondents participate in the market then it equals to 1 , if they dont participate it equals to "0". In this case, what will be my interpretation ? same as you explained above ? Thank you ,Again Prof.
You can do it if you convert to binary (dummy) variables. In this case, if you have four categories, you would have three binary variables to represent three of those categories. The fourth category would be omitted as a reference variable.
Hi Sir, Most of my research variables are dichotomous, how to use SEM with them to bring out more meaning. Thanks for this video. Waiting for your reply.
Usually when most of the variables are dichotomous, SEM is not a good methodological approach. Usually simpler methods, such as t-tests and ANOVAs, or maybe logistic regression or multiple discriminant analysis are more appropriate.
@@Gaskination Dear James, thank you as always for your great videos.....I have six independent variables (all single item) 3 are continuous and 3 are dummy (yes/no). My dependent variable is continuous. I cannot use spss as I think my residuals violate normality. Can I test this model with warppls please ? Ned Kock says in his blog that warppls handles dichotomous variables, and it seems to hold for single items variables too ....Please I am really desperate for help here. Thanks
@@cinovsky I've never used WarpPLS, so I'm not sure. You could use SPSS if you use a non-parametics approach. If all else fails, you can use any appropriate approach and then just list this issue as a limitation.
@@Gaskination Dear James, I very much appreciate your prompt reply. Thank you! I run the analysis with SPSS (despite violating the normality assumption) and with WarpPLS and I am getting almost identical results. Would this consistency be a sign that WarpPLS is appropriate? Many thanks for all your help and sorry for taking time off your busy schedule!
@@cinovsky I am not surprised the results are similar. It would have to be pretty abnormally distributed data to change the estimates. As I understood WarpPLS (a very long time ago...), it was primarily meant for estimating curvelinear effects.
Dear Dr. Gaskin, Thank you for the information on categorical video.. it is very useful. In my study, I have 5 categorical variables and each has 4-5 questions.. and i have 5 latent variables.. in my journal revision reviwer said to show the effect of all control variables.. how i can show that many links in my model.. it will be a mess. plz suggest
Dear sir, please can you help me I have a model with one dependent (buy) and five independent variables. there is also the demographic yield that measures the income status with 5 categories (1-$1000, $2-1001-2000$, 3-2001-3000$, 4-3001-4000$ and 5-4001-5000$). I want to measure the effect of income on the dependent variable (purchase) of the model. What should I do. I am using Smart PLS version 3.3.9.
@@Gaskination first of all thank you for the reply. this is what i understand. I need to think of it as a 5 scale question/scale and add it to the model as a latent variable.
Dear Prof.@@Gaskination, thanks for your response. In my research model, I have considered 3 control variables (Age of the company, type, and location). Do I need to assume them as dummy variables and include them in my PLS analysis exactly same as this video?
@@ArashAsiaei Age can be included as it is, since it will be numerically meaningful. However, Company Type and Location will be categorical, so they will need to be split into dummy variables like this video.
Dear sir, In my model i have one categorical independent variable that is Referral hiring i have made that a dummy variable giving them teh value of 1=referral , 0=Non referral. can I Run such model in smart PLS
James Gaskin Thank you so much sir for your prompt response.. I will appreciate if you can resolve one more querry.. Can I run this model with group data technique in PLS?
Hiii james i have my moderator "age" in my study it has 4 sub categories i have to create dummy variable of age in spss Category ist Below 35 = old value 1 convert into new value 1 36-45= old value 2 convert into new value 1 46-55= old value 3 convert into new value Above 55 = old value 4 convert into new value In age group 46-55 and above 55 i am confused what new value i put there plz help
If you are creating dummy variables, then you must leave one category out as the "reference" category. So, you would omit the oldest (or youngest) group. For each other group, you would give a 1 if they are within that range, and give a zero if they are not. However, if your age variable is sufficiently ordinal (roughly equal intervals), then you might consider simply including it as it is without creating dummy variables. Then just make sure not to interpret the estimates from age as exact units (of years), but instead as just a relative effect (positive or negative) with amplitude (strong or weak) on a standardized scale. So, if the standardized regression weight for the effect of age on job satisfaction was -0.237, you would say that age has a moderate negative effect on job satisfaction.
@@moayadrwajfah2055 Correct, because it is already a single binary variable to represent two categories. The variable is essentially a binary for female (rather than gender), where 1= yes and 0= no.
Here's a fun pet project I've been working on: udreamed.com/. It is a dream analytics app. Here is the TH-cam channel where we post a new video almost three times per week: th-cam.com/channels/iujxblFduQz8V4xHjMzyzQ.html
Also available on iOS: apps.apple.com/us/app/udreamed/id1054428074
And Android: play.google.com/store/apps/details?id=com.unconsciouscognitioninc.unconsciouscognition&hl=en
Check it out! Thanks!
Simple and Amazing Prof.
Dear Prof. Gaskin,
I have three IVs predicting one DV. One of the IVs did not predict DV. In my initial manuscript, I mentioned that possible reason accounting for the null finding could be the reason that the particular IV varies from one ethnic group to another ethnic group (ethnic difference). I did not hypothesize any ethnic influence in my study at all but only in the discussion section suggesting for future research possibility. And in my initial write-up, I gave the reason that our ethnic group has disproportionate size(some ethnic size big some small) and my research focus is not on any ethnic difference hence I did not study ethnic in my analysis.
The reviewer came back and provided this comment:
- the authors identified ethnic as a potential confounding variable-has the authors taken this into consideration in their analyses(e. g., covariate), especially since data on ethnic composition are available?
1. I am not sure what the reviewer wants but I am thinking they may need to rerun my analysis?
2. Do I need to add a control variable (ethnicity) in my smartPLS model based on the reviewer's comment.
3. And if so, I am not sure how to do it. I have already created the dummy variables via SmartPLS and four ethnic groups have been created. And I am not sure how to proceed now.
I would appreciate it if you could provide me some insights.
Thank you.
Yes, just include those dummy variables in as control variables. The way to control for the effects of any variable is to draw a regression line from that variable to the other variables that it might affect.
@@Gaskination Dear Prof,
I am not sure whether I just need to control the ethnicity by just adding the variable to my existing PLS model or I need to conduct moderation as I have also been given a comment by a professor:
I think since you mentioned about the ethnicity difference and the reviewer has suggested you to consider the confluence of ethnicity on the relationships, you need to analyse the moderating effect of ethnicity on the three relationships in your model (present the moderating effect of ethnicity on each relationship in the structural model). If there is no effect, then ethnicity is no the reason for the insignificant relationship. If yes, then just add this results to you manuscript to answer the reviewer's suggestion, it would add some value to you manuscript.
Shall I just stick to add the control variable (ethnicity) and point the regression line to self-esteem and run the analysis- to check whether it is significant or just conduct a moderation as suggested by him?
@@joelyap2799 It sounds like you should do a moderation analysis. I have videos for this as well (for multigroup and for interaction).
Hello Prof. I found an article "Hair, J. F., Ringle, C. M., Gudergan, S. P., Fischer, A., Nitzl, C., & Menictas, C. (2019). Partial least squares structural equation modeling-based discrete choice modeling: an illustration in modeling retailer choice. Business Research, 12(1), 115-142." in this article they mentioned about four steps to estimate categorical variables
the PLS path model estimation of categorical variables requires that
the following steps be followed:
1. [Model] When creating the PLS path model, use Boolean blocks for each
categorical variable, whereby a Boolean variable represents each category.
2. [Data] Use orthonormal data, that have no correlations between the Boolean
indicators and that are standardized to unit variance.
3. [Estimation] When data is orthonormal, Mode A and Mode B model
estimations include the same results; thus, the standard PLS-SEM algorithm
allows the estimation of PLS path models using categorical indicator data and
multiple latent variables.
4. [Rescaling] A last step involves the transformation of the estimated inner and
outer weights and outer loadings into the metric of the Boolean variables (i.e.,
the metric of interpretation).
Can you help to show practically how to follow these steps on PLS ?
Eagerly waiting for your reply.
Can i do this in place of one way ANOVA with three predictor groups?
Thank you so much. All of your videos are very helpful!
I have a question regarding the "design" of my formative model in SmartPLS.
I have one dependent variable "employee motivation" (scale from 1=very motivated, to 6=not motivated at all) and 12 latent independent variables which are the "determinants of employee motivation" (every latent variable has 2-4 indicators which measure the LV formatively).
-> When i want to include the "moderating effect" of the generation (4 dummies = baby boomers, genX, genY, genZ) into the model to evaluate if the generation has a significant effect on employee motivation - do i have to include 3 latent variables for each dummy (not 4 because one is the reference) just like you did it in the video and analyze if there is a significant influence via bootstrapping?
For this situation, if your hypothesis is that generation impacts motivation, I would instead recommend doing an ANOVA where the outcome variable is motivation and the grouping (factoring) variable is generation. This will give you more easily interpretable results.
@@Gaskination Thanks for the answer! The ANOVA works well to see if there is a general impact of generation on motivation.
But when i want to analyze if the generation has a impact on which determinant of motivation is more important in the sense of which path coefficient to the motivation is bigger that in SmartPLS?
@@schluggasishot Ah, in this case then MGA is definitely more appropriate. I would recommend using generation as the multigroup variable, as shown in this video: th-cam.com/video/b3-dyfhGE4s/w-d-xo.html
@@Gaskination Thanks again! But the MGA is only working with 2 different groups but I have 4, and I think it makes no sense to put some of them together like you did in the video for Low/High-Frequency. Is there a possibility to analyze it with 4 groups?
@@schluggasishot You can do two at a time. Then just switch to another two, etc. Another way is to do one group vs all other groups.
Hi @James Gaskin! Thanks for your helpful videos! I'm using AMOS and I have two categorical independent variables (IVs). Each IV has 3 categories.
IV1: Platform rating
IV1 categories: 1, 3, 5
IV2: Independent rating
IV2 categories :1, 3, 5
The data was collected from a survey (experimental vignette), where respondents were exposed to one of 9 possible combinations based on values from IV1 and IV2, for example, IV1 = 3 and IV2 = 5, etc.
In AMOS, I have dummy variables as indicators towards the dependent variable (DV). I have k-1 dummy variables in AMOS, so for IV1, I have 2 dummies and IV2 I have 2 dummies, which is a total of 4 dummies.
Question: How do I model the interaction of IV1 and IV2 on the DV in AMOS? Can I simply multiply IV1 and IV2 dummies and create indicators for the multiplied terms, or is this more suitable for moderation?
Yes, you can multiply. That should work fine. It may be easier to interpret by doing an ANOVA instead though...
Would it be meaningful to directly use Onyl Child and Middle Child as observed variables to determine the latent variables Playful and Decision Quality?
That is exactly what I've done in this video. In these cases, we are essentially controlling for the potentially confounding effects of these two dummy variables.
@@Gaskination I mean without creating any additional latent variable.
@@lurenz404 oh, in SmartPLS, it would be better to attach them to latent factors because otherwise it would interpret them as indicators of Playful and DQ.
thank you fro immediate responses. According to the interval categorical predictor (IV) (e.g. Age (18-28; 29-39;40-50---> coded 0;1;3) should we do these steps?
oh. if it is not exact year, then yes, it is categorical. So yes, you can do these steps.
@@Gaskination thank you so much man. what a kind person.
Hi james, thanks for your videos. I was wondering about what to do if one of my DVs is measured on a single item Likert scale and whether it is possible to include latent variables as control variables.
It is common enough practice to use single-item dependent variables. No special tests or procedures (or citations) needed. Same with latent control variables. That is perfectly normal. Just include them as you would include any independent variable.
Dear James, thanks for the videos! I was wondering, can we combine CO_1, CO_2, CO_3, and CO_4 dummy variables into one single variable to check whether they have an influence on DV? Thanks in advance for your responses!
No. That's the reason we had to break them up. They cannot be used directly in the model as a single factor because the numbers associated with group value is not meaningful. A 3 is not more than a 2 in this case; it is just different. So, the internal variance is not indicative of construct variance. Thus any "effect" they have on another variable is meaningless, spurious, and uninterpretable.
Hi James, appreciate you shed light on my another question. I have four control variables: gender, income, age, education. Gender is dummy variable. The others are ordinal (answer in 5-7 ranges with order meaning low to high). For the ordinal ones, can I simply use them in Smartpls? I read someone said ordinal scales cannot be used in Smartpls but I wonder Likert scale itself is also an ordinal scale? And for doing the control, I was told I should just point them to the DV. Hopefully they do not have signficant paths and they can be reported as controlled. But if one of them has significant path, what should I do? Or is there other better way from your view to do control variables in Smartpls in my case?
Ordinal variables should be fine in PLS. I have never been discouraged from using them. Yes, Likert scales are ordinal. As for how to include controls, you are correct that you can just point them to your DV. Keep them as separate factors though (i.e., gender gets a factor, age gets a factor, etc.)
Hi Prof Gaskin, if i have a control variable for age group coded as 1 = 21 below, 2 = 21 to 30, 3 = 31 to 40, 4 = 41 to 50 and 5 = 51 and above. Should i dummy code them when i use it as a control in my PLS for my dependent? I saw that the majority of the papers didnt dummy code them but technically it should be?
thanks in advance
@@TheKoaydarren You are correct that technically this is not a proper ordinal variable because the intervals are not equal. However, if you take a loose interpretation of the resulting estimates, then it is okay. If you just say that the estimate is positive or negative and small or large, rather than reading too much into the exact estimate number, then it is fine.
Dear Dr. Gaskin,
Can I use a dummy variable as mediator in the relationship between 2 latent variables in SMARTPLS?
You could, although it would be hard to interpret. Also, usually binary variables are not mediators because a mediator must be a causal consequence of the IV. It is theoretically uncommon (though not impossible) for a binary outcome to both be predicted by something and predict something else.
Thank you so much for your answer.
hi, James,thanks for your videos.I wondered that the laten variable named only child can only have one dummy variable? can it have all the three dummy variables to effect the laten variable playful?
To me it makes more sense to keep them separate, or else it will be difficult to interpret the results.
@@GaskinationYes, for this example, it makes more sence to keep them separate. Can a laten variance have two or all dummy variables to effect another laten variable in other examples?
@@zhiqiangli1082 I recall investigating this a year or two ago and I remember not being able to find conclusive support for that approach. However, I also remember not being able to find conclusive support against it. So, I'm not sure. To play it safe, I have always kept them separate. Sorry to be not much help on this one...
Hi JAMES
please can you answer this question.
is there a way through algorithm or bootstrapping to figure the number of data included in the project according to the categorical variables included
I am asking because a study i have done include a variables with three categories however in the model I only included two of these dummy variables. however, I forgot how I labelled them I suspect that I combined two variables in one. I need a way to figure this out not through the data set but either through algorithm or bootstrapping.
sorry if my question sounds confusing i just need to know the n in each category in the project it's self not in the dataset.
Thanks a million
If you mean you no longer have access to the dataset, then it won't run in SmartPLS anyway. But, if you have access to the dataset, just open it and do a "sumif" function for the column of interest.
Many thanks for the valuable videos. I wonder why I get n/a in sample, standard, statistic and P-value in bootstrapping table? Also, I can see 0 for all the factors. I need your advice, please. Thanks
If you are using PLSconsistent, then switch to PLS. The consistent algorithm is ironically inconsistent. Sometimes it gives n/a for results. If you are not using PLSc, then the issue is with the variables. Make sure to have done proper data preparation to ensure valid data/variables are being used.
Hi James, useful video as always! I was wondering, is there a way to use multinominal categorical scales as indicators for a formative construct which is being predicted by several antecedent factors and in turn predicts a reflective construct? All other constructs are reflective using Likert scale measures.
multinomial categorical variables should not be used as indicators in a set of indicators. Instead, they should be included separately. as dummy variables with only one indicator.
Hi James,
Thank you so much for the video.
I have a question here. My model has 6 latent variables, and I tried to include 1 control variable (company size) with 4 dummy variables. As you mentioned, I left one as a reference out of the model. However, the SmartPLS is unable to produce the result (Sample Mean is shown as N/A, and P-value etc are all blank) after I hit consistent bootstrapping (I have a reflective model). Please can you tell what happened here? And any solution? My original sample size is 96, is it too small for such a model? Many thanks!
This is probably a problem with the PLSc algorithm. SmartPLS still doesn't seem to have it stable. In such a case, just use the original PLS algorithm.
Dear Mr. Gaskin,
I have a question regarding categorical predictor variables. In my model I try compare one music format with 2 others. I have 96 observations in total and a perfect distribution for each group with i=32. How is it possible to compare the treatment group with every single control group? My starting IV would be the category dummy coded, meaning 1=vinyl record (with i=32), 0=CD (i=32). The same method I would apply for comparing the vinyl record (1) with MP3 (0). Is that a suitable approach to compare them? I do not find any paper in this matter.
Kind regards,
Benjamin Beiersdorf
If you just want to know how some variable is affected by (or differs based upon) the categorical variable, then instead do an ANOVA.
Many thanks for the great video and clear explanations! Really helped me with my thesis. One quick question: Would it make sense to use PLS to test a model which basically consists of 1 categorical predictor for 3-5 metric endogenous variables with these 3-5 variables being predictors for 1 metric outcome variable? Or do you recommend a different approach? Thank you very much. Stay safe and healthy!
Hi, Dr. James! Thank you for the explanation. But, I am still confused, when I want to figure out whether sociodemographic variables (such as 4 group of age, 3 group of income, and 4 group of household size) have significant effect to the model. Can I included all of the variables to the model simultaneously? Or I have to add to the model one by one (first add the 4 group of age to the model and then take it out, then add 3 group of income)? Thank you!
You can include all predictors simultaneously.
@@Gaskination thank you for your response! I have another question. If all of the sociodemographic variables are not significant, should I keep these variables in my model? Because keep these variables will effect the other path significant. So, the other variables path significant between maintain the sociodemographic variables and without sociodemographic variables in the model will be different. Thank you in advance.
@@sharfinazatadini3398 It is optional. You can definitely argue that it is okay to exclude them because they do not have any significant effect on the outcome variable. You can say that you exclude them for the sake of parsimony.
Thanks a lot for the explanation! Just one more question, if sociodemographic variables are not significant to the outcome variable. Are there any probability that sociodemographic would have an effect when analyzed using MGA? Thank you!
@@sharfinazatadini3398 It is possible. The regression as a control variable implies a direct impact on a specific variable, whereas the MGA implies a direct effect on the relationship between two variables.
Dear Gaskin, Greetings
I have one query regarding interpretation of Categorical variables.
In my study, I found similar beta value (male, female) to predict endogenous construct. How would I interpret (separately for male and female) as results are significant.
Thanks
Use gender as a multigroup variable instead: th-cam.com/video/b3-dyfhGE4s/w-d-xo.html
Hello Scholars
I am working on a research project by using SmartPLS for the first time. I have several control variables such as gender (used dummy variables for Male and Female), and working experience (dummy variables based on different length of service). I like to find out how to report these control variables, as i could not find any reference so far in which control variables has been used while applying PLS approach using SmartPLS.Waiting guidance regarding how to report Control Variables or by providing any reference.
Hi Kim,
do you found an answer for your question? I have the same problem right now.
Thank you so much for this helpful video! Is there any chance to add control variables to the model in this way? I have 2 control variables, each has 4-5 options for respondents to choose. When I add all of them as dummy variable to the model, Bootstrapping seems to have a single matrix error. 🤔
And when I add each option separately and run the model, answers (t-values and beta) would be different; I don't know which one to put on my tables using this method.
Make sure to leave one option out so that there is a "reference category". Otherwise you get a singularity matrix.
Dear James, thank you for the video. Can we include an independent latent variable that has two binary items (1/0)? i.e. Is it OK to have a latent variable with binary items?I am using Warppls. Many thanks
If those two items should theoretically move together (be highly correlated), then it is okay. However, if they are independent of each other, I would strongly recommend to keep them as separate predictors.
Hye dr. Hope you are doing fine. Awesome video. But I have a question, if you dont mind. When we run structural model path for hypothesis testing of thesis. Do categorical moderator needs to be included in the model together (with direct and mediator variables). Or excludes first, which mean run direct and mediator model first, then report. After that includes categorical moderator in the model when testing the hypothesis for moderator.
Thanks in advance. I really appreciate your expertise and respond.
You can exclude the categorical moderator (used for multigroup analysis I assume) when testing mediation. The only reason to include it at the same time is if you are hypothesizing moderated mediation.
Hi Dr. James thanks for the videos, I would like to ask you regarding the categorical variabel in PLS model. Why all of the result of outter loading, AVE, Composite reability for all categorical variabel is 1? . is it not calculated in the PLS methods?
Thanks in advance for your help
A factor with a single indicator has perfect reliability (the item with itself) and is fully predicted by it's only item (hence the loading/weight = 1).
@@Gaskination ohh i see, thanks for the insight. I would like to ask you more regarding the interpretation of categorical variable. If some of the categorical variable are not significant (in this case if you connected all of cat variables the "child order" variable only to 1 laten variable) for example all of the child order with "only child" as a reference variable that connected to only 1 laten variable for example "decision quality" then the result after bootstraping showed that "first child" and "last child" isn't significant (only "middle child is significant) then should we remove the unsignificant categorical variable from our model? or should we keep it?
If wee still keep it in our model then how can we interpret the unsignificant one?
Thanks in advanced :)
@@grisdymahardikana8372 You can keep all variables.
@@Gaskinationokayy then, so how can we interpret the unsignificant variabel from the model?
@@grisdymahardikana8372 You would just say that there is no effect.
Dear Dr. Gaskin,
Is it ok if I just create separate set of categorized data in excel and create separate models in SmartPLS (for example one model for the 'only child set' and other one for 'middle child set') just to see the results for each set?
Yes. You can also use the MGA approach: th-cam.com/video/b3-dyfhGE4s/w-d-xo.html
Thank you so much Dr.Gaskin for all your tutorials , they were great help. Just to make sure , I have one DV with a single categorical indicator which is the type of new venture creation the respondant plan to launch in the future (independent entity , joint venture, acquiring expertise , expansion , none (in case he/she already have one and don't plan to start another one ) , can i use dummy variables based on the type ? Thank you
Yes, this is a perfect example of when to use a set of dummy variables. Just make 'none' the reference category.
@@Gaskination Thank you so much Dr.Gaskin. If you don't mind asking , is it normal after converting DV variables to dummy variables to have a low composite realibility (0.127) and AVE (.324) ?
@@shaimaabdullah4835 Reliability doesn't apply to dummy variables since they are not usually part of reflective factors.
@@Gaskination Thank You so much Dr. Gaskin , you are a savior 🙏
@@Gaskination Am really sorry Dr.Gaskin for asking again . but i have this new issue related to the dummy variable. am getting a negative path coefficient between LV and DV , where the LV is likelihood to start a business and DV is categorical variable which is the type of business, so the path coefficient must be positive but I got a coefficient of -.359 and p-value of .085 (>.05). How is that possible ?
Dear Dr. Gaskin,
Can I test all my categorical data on my dependent variable for its significance first? If one categorical data is significant, then only I proceed with the steps as shown by Dr. Gaskin to find out which of the response under that categorical data possesses the impact on the dependent variable?
Yes, this is an option. To do this, I recommend using ANOVA with the categorical variable as the factoring variable and the DV as the DV.
Thank you Dr. Gaskin.
Anyhow, I was recommended that I can proceed with the steps shown in this video, but tested with DV one by one to see directly which category under the categorical variable is significant. If all categories are not significant per categorical variable, then that categorical variable is considered not significant to DV. Can I do that?
@@derickteoh1128 yes
Thank you so much Dr. Gaskin and have a nice day! 😊
Hi Mr. Gaskin, thank you so much for the video. I am really new to smartpls and your video really help me.
I have a question regarding this, I have 1 independent variable which is categorical (Type of employment, 3 types), and I make it into dummy variable. Should I make it into 2 dummy variable or 3? Also, do I need to test the factor loading/ alpha cronbach to this IV? because the measurement is only one (which type of employment you're having), so my factor loading currently is 1. Therefore I am wondering if I only need to test factor loading and alpha cronbach to the mediator and dependent variable?
Thank you, I'm hoping for your reply.
Two dummy variables (one value is the reference category). Cronbach alpha is only relevant to reflective factors (which don't include nominal/categorical variables).
@@Gaskination Thank you for the quick reply, Sir.
So when drawing the smartpls model, I only need to put the two dummy variable as the IV? Is it possible to use this IV and connect it to a mediator then DV? If I use this model, is it the same with multi-group analysis?
Thank you very much, I’m waiting for your reply.
@@raneedevina2369 You can connect through a mediator. This is not the same as multigroup analysis. Multigroup analysis compares paths across groups. In this case, you could use your categorical IV as a grouping variable and then just examine the path between the mediator and the DV. If you do this, then make sure to exclude the categorical variable as an IV (i.e., don't use it as IV and grouping variable).
@@Gaskination Thank you for the answer, Sir. It really helps me!
Dear Dr.Gaskin,
I am very thankful to be given useful video.
I have 307 samples.
In my research, My model has 8 latent variables. In addition, I have 11 categorical variables as control variable and each has 2-7 questions.
When I transformed each control variable to dummy variable, Singular Matrix Problem occurred in smartpls3.
Unfortunately, I don't know what variable is troublesome. Could you give me any advice on solving this problem?
It is because you created dummies for all the values. Instead you have to leave one value out as the "reference" value. For example, if you had a variable for gender where 0=male and 1=female, you would create one dummy, just for female. For industry, if you have retail, service, manufacturing, and other, you would create three dummy variables, and then leave "other" out as the reference value.
Hi Sir! I want to ask you if it is possible to evaluate the affect of latent variables to a categorical variable. To be more specific, I mean how latent variables affect the yes or no of the responder. Is this something that is appropriate to do with PLS SEM? And if it is, is this something that can be done through CB SEM?
Thank you in advance!
It can be done. The resulting estimate is interpreted a bit differently though. If positive, then more likely Yes (assuming Yes=1), and if negative, then more likely No (assuming No=0).
Hi Mr.Gaskin, thank you so much for the great video. I am new to SmartPLS and your videos really helped me.
I have a question regarding the categorical variables. In a part of my model I have two independent variables (IV1 & IV2) both influencing a dependent latent variable (DV).
IV1 has 3 levels (categories) and IV2 has 4 levels (categories), would you please explain how should I draw them in SmartPLS?
[For example for IV1, Should I do like this: leave out category 1 as reference category and draw two latent variables (factors) for category 2 and category 3 and then put values for category 2 on its latent variable and similarly put values for category 3 on its latent variable, and at the end draw paths from factors for category 2 and 3 to the DV?]
And one more question, if the above process is true, can I do this simultaneously for all categories or Should I run model for each category separately?
And if the process I said is not true, I would be much appreciated if you can explain and guide me, thanks.
If you mean that the IVs have lower order latent dimensions (dimensions with their own measures), then the IV sounds like a higher order factor. In this case, follow the video here: th-cam.com/video/LRND-H-hQQw/w-d-xo.html If instead you mean that these are individual variables (single measures) with multiple values (e.g., Country as the IV and USA, Japan, India as the values), then follow the video above. If something else, please clarify.
Dear@@Gaskination thank you for your response. The IVs is similar to your second example and something like the example described in the video above (child order).
My question is that for example in the case of this video, how should I act if I want to see the effect of different categories of "child order" on the "decision quality" ?
is the following way true?:
First I leave out one category (e.g., only child) as the reference category then make three variables (in the way you explained in the video above) for other three categories and draw a path from each one to the "decision quality"?
If this is true, can I do it simultaneously for different categories of "child order" or I should run the model three time, each time for just one category?
@@aminnaeeni4297 Yes, you are correct. Create the dummy variables and reference category, and you can test them all at once (just one model).
@@Gaskination Thanks again for your response Dr.Gaskin
It really helped me
Just one more question, would you please tell me an example of how to interpret the results? Are Path coefficients relative to the reference category? and how can we understand the effect of reference category on the dependent variable?
(Sorry if my questions are trivial, I'm very new to this field)
@@aminnaeeni4297 correct, they are relative to the reference category. So, if you see a positive effect, then it is a stronger effect than the reference category. If it is a negative effect, then it is a weaker effect than the reference category. This is also how you understand the effect of the reference category (relative to all other categories' effects).
Hi James, i created a path model in smartPLS3 (all reflective measures), with traditional bootsrapping, i find that my data supported 6 hypothesis (P value significant), but with consistent bootrapping, none of them are supported (all P value > 0.8). How is it possible ? Thank you very much for helping me.
The PLSc algorithm still has some bugs. If it is producing erroneous results, I recommend sticking with the PLS algorithm, or (better yet) using a covariance-based software, such as AMOS when you have no formative factors.
Dear Dr. Gaskin,
Thank you very much for this video.
In my model, there is one independent variable, two mediators and one dependent variable. Also, there are two moderators between IV and mediators. Can I use dummy variable as independent variable? I am using PLS 2. Thanks.
Yes.
Thanks for your reply. In that case how do I interpret the result? My independent variable has two options (yes/no). So, what will the individual value of path coefficient represent between IV and mediators?
You would interpret it more like a t-test. If the binary variable is coded as 0=No and 1=Yes, then a positive beta would be associated with Yes, whereas a negative beta would be associated with No.
Thanks for your suggestions.
Can you please suggest me a paper for following the result interpretations?
DR GASKIN
Which version of spss u r using in this video in my spss i don't get this option i e create dummy variables can u assist me from where can i get this option
Thanks in advance
I'm probably using version 23 or 24. If your version does not have this, you'll just need to do it manually.
Awesome video Thank you. I learn easily through your video for my analysis. But I have 2 questions if you don't mind. Do moderator (categorical) only can be tested if X and Y significant? Because when I run the moderator even the X and Y not significant during direct path analysis, but somehow when moderator been introduced, its do effect other Moderator relationship and make it significant. If yes, do u have a good reference for my reference? as I am unable to find it. ....2nd Question..., I can run 2 different categorical moderators (Gender & Diet) at one time am I right ? I don't have to run 2 different model differently and do it 1 by 1. Have I understood it right?...Looks forward for your response. Anyway thanks a lot for making easy tutorial and free education. May God bless you
1. This is a great situation as it indicates that the X has no effect on Y without considering the moderation effect. No reference needed. This is perfectly normal.
2. In SmartPLS, it will require you to test only two groups at a time (if doing MGA). You can use the same model and have all moderators loaded, but it will require you to select which two groups you're comparing.
prof. Please answer: Q1. I don't understand if i want to find significant difference between the groups on individual items CSR1 to CSR 6 (incase they are formative items), then how to do it...? Please suggest.. Q2. And also do i need to do MICOM (measurement invariance) before doing this, as during MICOM second step, if some of the items may be insignificant in my constructs, which i will have to remove them...
univariate differences across groups can be done with t-test or anova - no invariance required.
Dear Mr. Gaskin,
thank you very much for your videos, I find them incredibliy helpful. Maybe you are able to help me with some troubles I had with calculating my model. I created three dummy variables, but when I run the analysis, the program tells me I have a singular matrix problem. This doesn't happen, when I only include two of the dummies. Do you have an idea, why this happens? Thanks in advance for your help!
This is expected. You are only supposed to include n-1 categories as dummy variables. This leaves the excluded category as the "reference" category. Usually the "other" category is left as the reference category. If you don't have an "other" category, then just pick the one you are least theoretically interested in. When it is binary, this is a lot easier. E.g., if you have a gender dummy variable, you might say 1=Female and 0=Not Female (i.e., Male), but we leave Male out of the model as the reference category.
perfect, thank you very much for answering so quickly!
prof. is this method an alternative way to do ANOVA in sem? PLs tell how can we do analysis similar to ANOVA in smart pls?
SmartPLS does not have an anova equivalent. ANOVA is intended to compare a single variable across multiple groups. SEM (like in SmartPLS) can be used for assessing a structural and measurement model across multiple groups.
Dear Dr. Gaskin
Thank you for your informative videos. They have definitely helped me a lot with my analysis.
I have stumbled upon one problem though and the broader Internet has not been able to help me much, so I was hoping you might be able to.
I have my survey participants 8 literacy questions, with the answer options Yes(True), No(False) and I don't know. I have created 8 dummy variables for each question with 0=Incorrect and 1= correct answer. How do I include this variable in my SmartPls model? Do I add each dummy variable or can I combine all 8 and treat them as one single continuous variable (since the people with the highest numbers are most literate)? I thank you for your help.
I would recommend just creating and overall score (sum) outside of SmartPLS, and then you can bring it in as its own factor (single indicator) into SmartPLS.
@@GaskinationDear Dr. Gaskin, can you please tell me the detail of "creating and overall (sum) score outside SmartPLS? Thank you!
@@Frasta-qc7qz You could just use SPSS or Excel or Google sheets to sum up (or average) the items for that factor. That sum or average will create a new column in your dataset. That new column is a variable that you can use to replace the latent factor.
@@Gaskination Dear Dr. Gaskin, thanks a lot for your responses! Can I do that for my control variable, such as age (with 5 answer options)? Because I want to check whether those control variable has a significant relationship with DV as a single factor like latent variable in IV or DV regardless of which age category has influence with DV. Thank you!
@@Frasta-qc7qz Yes, if the value of the category increases with the value of the construct, then you could include it as a single variable, rather than breaking it up into dummies.
Prof. have you uploaded any video where dependent variables are categorical and independent variables are continuous in SMart PLS
I have not. As long as the dependent variable is binary, then nothing changes. However, if the dependent variable has multiple categories, or has multiple indicators, then special preparation must be made by reducing the DV down to binary categories (dummy variables) with a reference category left out.
Hiiii james i took categorical variable i.e Gender as moderator in my thesis should i create dummy variables for gender like male 0 female 1 in spss first or itz not necessary for gender i.e dichotmous further i am using smartpls4 for further analysis
If it is binary, then it is already a dummy variable.
Kindly tell that can we use 3,4 demographic factors as dummy variables (the demographic factors are used as independent variables )in model
I'm not sure what is meant by 3,4. If you mean you want to use many demographic variables, then that is fine.
@@Gaskination thanks alot
Dear Sir,
I am new to the field of structural modelling and your videos have been extremely helpful. Thank you.
i have a few questions regarding my model. The model has 7 categorical variables:- 'age', 'income', 'gender', 'lifecycle stage', 'level of education' etc. Some of the variables have up to 5 categories. Hence, the number of dummy variables per categorical variable range between 1 and 4.
These variables have a direct effect on my two dependent variables. How do I test the affect of all the categorical variables? Because, when connecting these dummy variables directly to the dependant variables, leaving the reference dummy variable out of the model, the system shows singular matrix problem.
Thanking You
The singularity problem happens when one of the variables does not vary (e.g., if you have all male respondents). Try removing one variable at a time to see which one is causing the problem. Then you can examine that variable's values more closely to see what the issue is.
@@Gaskination Thank you so much Sir! I am able to run the model now.
Grateful! :)
Hello Dear Prof.
I have two dependent variables. One of the dependent variable is dummy variable ? Can i use dependent dummy variable in SEM using SMARTPLS ?
Yes. Just make sure to interpret it in a 'relative' way. So, for example, if the DV is gender and is coded as 0=male and 1=female, then this variable is actually a "female" variable, and not a gender variable. In this case, you would interpret a positive effect on this variable as: greater values of the IV are associated more with females. A negative effect would be interpreted inversely: greater values of the IV are associated more with males (i.e., not females).
Thank you ,Prof. Your advice really helpful. My dependent variable is "Active stock market paricipation". If respondents participate in the market then it equals to 1 , if they dont participate it equals to "0". In this case, what will be my interpretation ? same as you explained above ?
Thank you ,Again Prof.
same as above, but replace "female" with "active participation" and "male" with "not participating"
Thank you so much, Prof.
How can I use an independent variable that is categorical in AMOS?
You can do it if you convert to binary (dummy) variables. In this case, if you have four categories, you would have three binary variables to represent three of those categories. The fourth category would be omitted as a reference variable.
James Gaskin thank you so much for your replay you gave me hope I’ll try it and tell you
Hi sir, can i use rasio, interval and dummy variable in the same model using pls ?
yes. Just make sure to interpret the estimates appropriately based on what type of variable it is.
@@Gaskination ok, thankyou, sir
Hi Sir,
Most of my research variables are dichotomous, how to use SEM with them to bring out more meaning.
Thanks for this video. Waiting for your reply.
Usually when most of the variables are dichotomous, SEM is not a good methodological approach. Usually simpler methods, such as t-tests and ANOVAs, or maybe logistic regression or multiple discriminant analysis are more appropriate.
@@Gaskination Dear James, thank you as always for your great videos.....I have six independent variables (all single item) 3 are continuous and 3 are dummy (yes/no). My dependent variable is continuous. I cannot use spss as I think my residuals violate normality. Can I test this model with warppls please ? Ned Kock says in his blog that warppls handles dichotomous variables, and it seems to hold for single items variables too ....Please I am really desperate for help here. Thanks
@@cinovsky I've never used WarpPLS, so I'm not sure. You could use SPSS if you use a non-parametics approach. If all else fails, you can use any appropriate approach and then just list this issue as a limitation.
@@Gaskination Dear James, I very much appreciate your prompt reply. Thank you! I run the analysis with SPSS (despite violating the normality assumption) and with WarpPLS and I am getting almost identical results. Would this consistency be a sign that WarpPLS is appropriate? Many thanks for all your help and sorry for taking time off your busy schedule!
@@cinovsky I am not surprised the results are similar. It would have to be pretty abnormally distributed data to change the estimates. As I understood WarpPLS (a very long time ago...), it was primarily meant for estimating curvelinear effects.
Dear Dr. Gaskin,
Thank you for the information on categorical video.. it is very useful. In my study, I have 5 categorical variables and each has 4-5 questions.. and i have 5 latent variables.. in my journal revision reviwer said to show the effect of all control variables.. how i can show that many links in my model.. it will be a mess. plz suggest
Yes, it will be a mess. Just show the table instead of the model.
Dear sir, please can you help me
I have a model with one dependent (buy) and five independent variables.
there is also the demographic yield that measures the income status with 5 categories (1-$1000, $2-1001-2000$, 3-2001-3000$, 4-3001-4000$ and 5-4001-5000$).
I want to measure the effect of income on the dependent variable (purchase) of the model. What should I do. I am using Smart PLS version 3.3.9.
Since the income status is ordinal with even intervals, then you can include it as a regular variable, rather than spitting it into dummy variables.
@@Gaskination first of all thank you for the reply. this is what i understand. I need to think of it as a 5 scale question/scale and add it to the model as a latent variable.
@@dr.barsarmutcu3073 Yes, it will be a single indicator factor.
@@Gaskination Thanks for your answer
May I know that is the other name of "categorical predictors", "control variables" or totally different concept?
Categorical predictors are usually split into dummy variables, which are binary. These are often used as control variables, but not always.
Dear Prof.@@Gaskination, thanks for your response. In my research model, I have considered 3 control variables (Age of the company, type, and location). Do I need to assume them as dummy variables and include them in my PLS analysis exactly same as this video?
@@ArashAsiaei Age can be included as it is, since it will be numerically meaningful. However, Company Type and Location will be categorical, so they will need to be split into dummy variables like this video.
Dear sir,
In my model i have one categorical independent variable that is Referral hiring i have made that a dummy variable giving them teh value of 1=referral , 0=Non referral. can I Run such model in smart PLS
Yes. You can do it exactly as shown in this video.
James Gaskin
Thank you so much sir for your prompt response.. I will appreciate if you can resolve one more querry.. Can I run this model with group data technique in PLS?
Yes. Here is a video for it: th-cam.com/video/b3-dyfhGE4s/w-d-xo.html
James Gaskin
Thank you so much sir for your great help!
Hiii james i have my moderator "age" in my study it has 4 sub categories i have to create dummy variable of age in spss
Category ist
Below 35 = old value 1 convert into new value 1
36-45= old value 2 convert into new value 1
46-55= old value 3 convert into new value
Above 55 = old value 4 convert into new value
In age group 46-55 and above 55 i am confused what new value i put there plz help
If you are creating dummy variables, then you must leave one category out as the "reference" category. So, you would omit the oldest (or youngest) group. For each other group, you would give a 1 if they are within that range, and give a zero if they are not. However, if your age variable is sufficiently ordinal (roughly equal intervals), then you might consider simply including it as it is without creating dummy variables. Then just make sure not to interpret the estimates from age as exact units (of years), but instead as just a relative effect (positive or negative) with amplitude (strong or weak) on a standardized scale. So, if the standardized regression weight for the effect of age on job satisfaction was -0.237, you would say that age has a moderate negative effect on job satisfaction.
what about gender 0=female 1=male please?
That is fine. it can be anything e.g., 99=female, 786=male. The numbers don't matter, as long as they're consistent.
@@Gaskination so if we have gender (e.g. Male=0; Female=1) we don't have to do these steps; so we drag this variable into the model as IV?
@@moayadrwajfah2055 Correct, because it is already a single binary variable to represent two categories. The variable is essentially a binary for female (rather than gender), where 1= yes and 0= no.
@@Gaskination thank you so much. very kind of you.
Translate indonesia please😢
This might help: th-cam.com/video/j0GZ4eS-GlM/w-d-xo.html