Creating New Variables Using Stata

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 พ.ย. 2024

ความคิดเห็น • 45

  • @auddjurhuus3704
    @auddjurhuus3704 9 ปีที่แล้ว

    thank you, you have been a big help for me in doing my maters thesis.

    • @smilex3
      @smilex3  9 ปีที่แล้ว +1

      +Aud Djurhuus I am glad these videos have been useful for you. Please let me know if ou have suggestions for new videos on Stata.

  • @smilex3
    @smilex3  9 ปีที่แล้ว

    Ricardo, there is no reply link on your posting, but a quick Internet search shows three different ways to gain access to the General Social Survey (GSS) data. Each link goes to a page that works a little differently, but you should be able to get to the data. The GSS a great social science dataset that is interesting to explore.
    www3.norc.org/GSS+Website/Download/
    www.icpsr.umich.edu/icpsrweb/landing.jsp
    sda.berkeley.edu/sdaweb/analysis/?dataset=gss14
    Best wishes,
    Alan

  • @Aurelaso
    @Aurelaso 7 ปีที่แล้ว

    Alan, thanks for your videos. I have a quick question. Is it possible to randomly assign values in variable "x" to variable "y" (which only has missing values)?

    • @smilex3
      @smilex3  7 ปีที่แล้ว

      Aurelio, I am not certain why you need to do this, but if I understand your question, the following should point you to a solution:
      sysuse auto, clear
      generate rndnum=runiform()
      sort rndnum
      rename mpg mpg1
      keep mpg1
      tempname rndorder
      save `rndorder'
      sysuse auto, clear
      merge 1:1 _n using `rndorder'
      list mpg mpg1
      pwcorr mpg mpg1
      Best,
      Alan

  • @santiagoc85
    @santiagoc85 10 ปีที่แล้ว +1

    Hi Alan. By chance I found this video. Perhaps if you don't mind you could help me with my issue. I'am working with a GMM model (dynamic panel data) and I need to collapse my data into 5-years averages and sd. I have created a new variable named period with each pair of year (eg. period=80 if year>=1980 & year

  • @smilex3
    @smilex3  10 ปีที่แล้ว

    Santiago,
    The reply option is not showing up on your commune on TH-cam. Hopefully, you will see this reply.
    It is hard to give you good advice without more information. But, The Stata code below may be helpful. It uses one of the example datasets that comes with Stata and show one way to collapse a dataset by year producing means and standard deviations.
    Best,
    Alan

  • @goswaminilanjana07
    @goswaminilanjana07 9 ปีที่แล้ว +2

    hi! how can i recode a string variable to numeric variable. I want to recode principal occupation occupation into 4 levels .the principal occupation is a string variable .

    • @smilex3
      @smilex3  9 ปีที่แล้ว

      Nilanjana, without more details about your data I can't be too specific. But, I can point you to a couple of possible solutions. These include the Stata commands "destring" and "encode". The following Stata code builds a sample dataset and shows their simplest use.
      Best wishes,
      Alan
      /* Create a small dataset with two string variables */
      clear *
      input str1 var1str str4 var2str
      1 occ1
      2 occ2
      3 occ3
      4 occ4
      end
      list
      describe
      /* Use destring to convert numbers stored as strings to numbers */
      destring var1str, gen(var1num)
      /* Use encode to convert strings with non-numeric characters to numbers */
      encode var2str, gen(var2num)
      list
      list, nolabel
      describe

  • @ibidunnioloniniyi8806
    @ibidunnioloniniyi8806 5 ปีที่แล้ว

    Hi,
    I need some help.
    I have a variable which is monthly income but the data is from three different countries.
    What command can i use to convert these incomes to US dollars since the denominator for each country currency is different.
    Awaiting your response.
    Thanks

    • @smilex3
      @smilex3  5 ปีที่แล้ว

      Hi Ibidunni, It may be possible and it depends on the data you have. DO you have a second variable which denotes which currency is recorded for each observation? Do you have a variable indicating the location of the respondent that can be used for this purpose? Is there a pattern to the income variable. In other words, the first 300 cases are in pounds, the next 250 cases are in lira, and the remaining cases are in Swiss francs. Or, do the currencies alternate like this 1, 2, 3, 1, 2, 3, etc. With a little more information or a look at an example set of data I might be able to provide a bit more help.

  • @johndupont8596
    @johndupont8596 8 ปีที่แล้ว

    Hi Alan! Thanks a lot for your videos
    I just have a small question as I am having a small issue:
    I am looking at trade flows between countries and I have the following 5 variables in my dataset:
    COUNTRY PARTNER TRADE_FLOWS TIME GDP_COUNTRY
    Now my problem is that I would like to generate a new variable that indicates the gdp of the PARTNER country as well, thus I will have :
    COUNTRY PARTNER TRADE_FLOWS TIME GDP_COUNTRY GDP_PARTNER
    All countries are included in COUNTRY and PARTNER, thus i am looking for a command that says: "if PARTNER= usa then GDP_PARTNER=GDP_COUNTRY when COUNTRY==usa
    Any help with this will be greatly appreciated!!
    Many Thanks!!
    Best,
    John

    • @nnnwitharya
      @nnnwitharya 5 ปีที่แล้ว

      th-cam.com/video/kr2v3LuBw2I/w-d-xo.html

  • @Stine2207
    @Stine2207 10 ปีที่แล้ว

    Hi Alan. Maybe you can help me. I have a dataset which consists of two questionnaires. I want to create a age variable of the two variables with info on the age. So a new variable that combines two variables. How do I do that?
    /Stine

  • @DkAlexus
    @DkAlexus 9 ปีที่แล้ว +5

    Like the intro song!! :D

  • @ianyohane8182
    @ianyohane8182 4 ปีที่แล้ว

    this is awesome guys

  • @earningsmanagementestimati6028
    @earningsmanagementestimati6028 6 ปีที่แล้ว

    Hi sir
    I need your help on how to generate instrumental variables according to ivreg2h using STATA. in other words, how to generate instrumental variables from my data because I don't have external instruments. The method developed by (Lewbel, 2012). Please you help is highly appreciated.

    • @smilex3
      @smilex3  6 ปีที่แล้ว

      You did not give me much information to work with. Maybe this video from StataCorp will be helpful.

  • @GeorgyPorgy76
    @GeorgyPorgy76 8 ปีที่แล้ว

    Thank you! Very helpful.

  • @3foss191
    @3foss191 7 ปีที่แล้ว

    Sorry, i'm working on a dataset were they are missing values(represented with the "dots" (not really looking like the dot i saw in the preview data on which i've been working)), the problem here is that when i try to delete the missing values , stata (14) did not allow me to do it.
    . drop if missing (hc)
    missing not found
    r(111);
    end of do-file
    r(111);
    Please could You give me a hand. thks

    • @smilex3
      @smilex3  7 ปีที่แล้ว

      It is hard to tell from you description what is going on. My first suspicion is that the variable you are using to determine what to drop is a string variable. Missing values for string variables a null entries, not periods.
      If you install the user-written program dataex, you could send me a small amount of your data showing me exactly what you have. From the command window type: findit dataex. The help file will tell you how to use the program.
      Finally, here is some Stata code demonstrating how missing values and the drop command work for numeric data:
      clear
      input byte y x
      1 5
      2 4
      . 3
      4 .
      5 1
      end
      list
      drop if missing(y)
      list
      clear
      input byte y x
      1 5
      2 4
      . 3
      4 .
      5 1
      end
      list
      drop if missing(y)
      drop if missing(x)
      list
      clear
      input byte y x
      1 5
      2 4
      . 3
      4 .
      5 1
      end
      list
      drop if missing(y, x)
      list

    • @smilex3
      @smilex3  7 ปีที่แล้ว

      Here is some more followup that is perhaps clearer. In this example y is numeric and x is string. x also contains the value ".". Yo can see in the first small program that the record where x=. is not treated as missing. I am still able to drop that record as shown in the second exaple.

    • @3foss191
      @3foss191 7 ปีที่แล้ว

      I will control again and let you know. thks a lot

    • @3foss191
      @3foss191 7 ปีที่แล้ว

      Sorry, but i don't know how it functions(dataex).

    • @3foss191
      @3foss191 7 ปีที่แล้ว

      The variable is not a string but a numeric

  • @loanpham9365
    @loanpham9365 10 ปีที่แล้ว

    Hi Alan. Would you mind helping me this situation? I import my data in excel into Stata. I want to use this data for tssmooth ma. But i cannot do it because Stata requires tsset varname. Then i creat a new var but when i use tssmooth ma, error is integer number accepted only (my data is decimal). Please help me! How can i use tssmooth ma in this situation?

    • @smilex3
      @smilex3  10 ปีที่แล้ว

      It is hard to respond without more information. I assume that you have some kind of time series or longitudinal or crosssectional data. But, knowing bow to use -tsset- depends on your particular data. For example, using the General Social Survey longitudinal data requires reshaping from -wide- to -long- and in the process creating a an idnum and a panel wave identifier so the data can be -tsset-.
      So, either you have a straight time-series data set in which case you can do the following:
      tsset timevar [, options]
      Or, you have panel data and you can do the following:
      tsset panelvar timevar [, options]
      Best,
      Alan

    • @loanpham9365
      @loanpham9365 10 ปีที่แล้ว

      Alan Neustadtl I appreciate your support! My data is time-series, stata error is stata just accept integers while my data is decimal number. How can solve this situation? Eventhough form of data is float. I know that it is difficult to you without information. So, would you mind checking my data? How can i send it to you? Please support me! Thank you so much!

  • @kevinsegers8083
    @kevinsegers8083 7 ปีที่แล้ว

    I have a few questions for my Master thesis about creating lag variables in stata. Is there a possibility you could help me with that?
    KS

    • @smilex3
      @smilex3  7 ปีที่แล้ว

      Kevin, Check out the Stata faq on this topic and see if that is enough to get you going. The URL is www.stata.com/support/faqs/data-management/creating-lagged-variables/

    • @kevinsegers8083
      @kevinsegers8083 7 ปีที่แล้ว

      So by using the command 'gen lag1' I can create lag Variables of existing ones? Where do I put the Variable names in the command?
      Thx!

    • @smilex3
      @smilex3  7 ปีที่แล้ว

      Here is an example:
      sysuse auto, clear
      gen lag1 = mpg[_n-1]
      gen lag2 = mpg[_n-2]
      gen lead1 = mpg[_n+1]
      list mpg lag? lead1

    • @kevinsegers8083
      @kevinsegers8083 7 ปีที่แล้ว

      Alright, Thanks! Can I do this for the dependent variable as well? I'm using a time series of the volatility of stock returns.

    • @smilex3
      @smilex3  7 ปีที่แล้ว

      Sure. Stata neither knows nor cares what variables you consider to be dependent or indepedent. For Stata (and all statsitical applications) they are just variables.

  • @afraakbar9162
    @afraakbar9162 5 ปีที่แล้ว

    Please help me with this. There are three questions about to clarify informal or formal sector.
    1st one is EPF. Under this there are 3 answers
    1-Yes
    2-No
    3-Don't know
    2nd question
    Whether your instituition keeping the accounts.
    The same three answers
    3rd one how many regular employees in your instituition
    More than 10 consider as formal and less than 10 consider as informal.
    In my data analysis I need to get all the formal as EPF yes, accounts yes and more than 10 employees and how to calculate the total employees in the formal sector that people who have EPF, accounts and more than 10 employees. Please help me with this.

    • @smilex3
      @smilex3  5 ปีที่แล้ว

      This is very difficult to answer without knowing a lot more about your data. But, based on what you wrote the following might get you going on a solution:
      generate byte sector=0
      replace sector=1 if epf=="yes" & q2=="yes" & q3>10
      label define sectorlab 1 "formal" 0 "informal"
      label values sector sectorlab
      tab sector
      count if sector==1

  • @afraakbar9162
    @afraakbar9162 5 ปีที่แล้ว

    Please help needed. I want to create "1" as ever married and "2" as never married. But there are 1 as married 2 never married 3 widowed 4 seperated 5 divorced. What is the code I have to use for this. Please help me

    • @smilex3
      @smilex3  5 ปีที่แล้ว +1

      Hi Afra, there are several ways to do this but let me give you one of them that creates a new variable that only contains the values 1, 2, and missing. I will call the original variable "marital" and it has the values that you described above and create a new variable called "marital2cat". Here is one solution:
      generate byte marital2cat=.
      replace marital2cat=1 if marital==1
      replace marital2cat=2 if marital==2
      label define marcat 1 "ever married" 2 "never married"
      label values marital2cat marcat
      Some people prefer using "recode":
      recode marital (1=1 "ever married") (2=2 "never married") (else=.), generate(marital2cat)
      Best,
      Alan

    • @afraakbar9162
      @afraakbar9162 5 ปีที่แล้ว

      @@smilex3 Thank you so much. I did it and I got the results. I am doing my dissertation and stuck with analyzing data. Is there any way that you can help me? And I have again a question. I want to create a variable as log wage. But in the questionnaire there is a wage and my supervisor told me to get wages from 3 other questions too as I am doing the dissertation about informal economy. How can I get it. Please help me.

    • @smilex3
      @smilex3  5 ปีที่แล้ว

      @@afraakbar9162 So, you can use either the -log10()- or -log()- functions. It sounds like you want something like the following:
      generate lwage=log10(wage1+wage2+wage3)
      Be careful in case you have a value of 0 in the sum of your wages measure since log10(0) is not defined.
      Best,
      Alan

  • @TheKaduzin
    @TheKaduzin 9 ปีที่แล้ว

    If the main goal is teach about create news variables, I just dont understand why the data isnt avaliable... =/

  • @kamieog
    @kamieog 9 ปีที่แล้ว

    thank you very much :D

    • @smilex3
      @smilex3  9 ปีที่แล้ว

      +SLKRJD I'm glad you found this video helpful.