miércoles, 19 de agosto de 2015

First steps in Stata 4: Reshape

Now, one useful command whose use is almost unavoidable, “reshape”.

We were working with the data of Spanish inward foreign direct investment (First steps in Stata 1). You can download by clicking here. The original state of the database is the following:

desc
obs:       250                                             
vars:      23                                          
size:       54,000                                                  
                storage display  value
variable name   type       format  label      variable label
Country                str40      %40s                      Country
fdiflw1993           double  %10.0g                  fdiflw1993
fdiflw1994           double  %10.0g                  fdiflw1994
fdiflw1995           double  %10.0g                  fdiflw1995
fdiflw1996           double  %10.0g                  fdiflw1996
fdiflw1997           double  %10.0g                  fdiflw1997
fdiflw1998           double  %10.0g                  fdiflw1998
fdiflw1999           double  %10.0g                  fdiflw1999
fdiflw2000           double  %10.0g                  fdiflw2000
fdiflw2001           double  %10.0g                  fdiflw2001
fdiflw2002           double  %10.0g                  fdiflw2002
fdiflw2003           double  %10.0g                  fdiflw2003
fdiflw2004           double  %10.0g                  fdiflw2004
fdiflw2005           double  %10.0g                  fdiflw2005
fdiflw2006           double  %10.0g                  fdiflw2006
fdiflw2007           double  %10.0g                  fdiflw2007
fdiflw2008           double  %10.0g                  fdiflw2008
fdiflw2009           double  %10.0g                  fdiflw2009
fdiflw2010           double  %10.0g                  fdiflw2010
fdiflw2011           double  %10.0g                  fdiflw2011
fdiflw2012           double  %10.0g                  fdiflw2012
fdiflw2013           double  %10.0g                  fdiflw2013
fdiflw2014           double  %10.0g                  fdiflw2014
                                                                               
As it can be gathered, we have the amount of investment separately for each year, which can be useful for some type of regressions but not for others. The objective is to convert the 22 FDI variables into two, one that specifies the year of investment and the other that contains the flows of investment. In this case we use “reshape long”. The command is the following:

reshape “variable name without number”, i(unique identifier of each subject) j(period of time)


reshape long fdiflw, i(Country) j(year)


We will notice that now our database only has three variables (Country, year and fdiflw), by using the command describe (desc) you will get the following:


Then, if we want to go back to the original format of the data we do “reshape wide”    

reshape wide fdiflw, i(Country) j(year)  


First steps in Stata 2: Set names and labels

After importing data into Stata, we shall set the variables names and labels in a way that anyone is capable of understanding the information available in the database.  We put the abbreviated name to the variable, for instance gdp. And in the label we put the description “Gross domestic product”. I recommend avoiding capital letters in the name of variables, it will make your life easier.


We are going to use a share of the data from the Penn World table. Complete database can be downloaded from here, and the Stata database we are using from here.  Now the original description of the database if the following:


Maybe you can guess what each variable is, but is recommendable we change the variables names and labels so we can precisely know the information we have.  We can tackle this task in two different ways (as almost everything we want to do in Stata). One is by using commands and the other is by using the user friendly windows.

Commands

To change the name of the variable we use “rename” and for changing the label we use “label variable”

rename “old name” “new name”
label variable “variable name” “ “new label” “

rename countrycode cntrycode
label variable contrycode "country code"

Then, if we want to all the variables an ending, we can use the following:
rename * *_home

It is strongly recommendable to write down the commands on the do file. In this way we will be always able to apply them without needing to re-write them all. You can download the do file here.

Windows

Changing variables names and labels can be done without commands. I personally prefer this way of managing them. In the Stata window, on the top, we will find the following icon:


Then, it is quite easy:


Change and apply.

While you change the names and labels, you will notice that in the bar we commands are recorded the commands will appear. You may copy them to the do.

After the changes our database will be the following:



Value label

Sometimes we need to give a label to a variable. More when we are dealing with qualitative data. Supouse we create a variable named Music Quality (mscquality), and we divide the periods of time according to the quality of music. Our variable will have three different values: good (1), regular (2) and bad (3), but originally the information contained in mscquality is going to be just 1, 2 and 3. If we want each number to have a label (useful when we do tables, for example) we have to use the following commands:

label define “name of the label” number “label we give to the number”
label values “variable that has the values” “name of the group of labels”
Would be:
label define music 1 "good" 2 "regular" 3 "bad"
label values mscquality music
label variable mscquality "music quality"

This variable is completely made up. However, it would be interesting to explore how quality of music affects economic growth.

By clicking here you can download a do file with the labels for the firm level classification NACE Rev.2. 4 and 1 digits classification. I did it myself.  I hope I save you a few hours.

In the next section we look into the description of the database. 

Stata’s windows





Stata’s interface is quite simple. On the tool bar you can find many of the things you would like to do with Stata. However, in my opinion, the best way to work is with commands. You shall open a do file in order to write and save all your commands. Also, the commands you write box that is in the button are going to appear on the commands column. You can copy and paste them in the do file.
Then on the right the list of variables your data set has will appear. 

The following window is the do file:


You can select the commands you want to make work on your dataset or you can simply press one of the two buttons with a play. There are differences, one will run the do file completely, and the other will only apply the changes on the dataset.

sábado, 25 de julio de 2015

First steps in Stata 1: Import data

We are going to look into the case we have our data in an Excel file (which you can download by clicking here).
We open Stata, File, Import, Excel Spreadsheet:



And the following window appear:



In this case (as most of the times), our first row stands for the name of the variable, so we choose “Import first row as variable names”.  And press Ok.

And our data will be loaded. On the right, you will see the list of variables.

The data that we have imported is the gross flows of Inward Foreign Direct Investment in Spain by ultimate country in thousand of euros. Data retrieved from: http://datainvex.comercio.es/

We save as the information you have imported. File, Save As.  Is recommendable to save it in a place you won’t move it in the near future. In order to save time, is quite important to be organized.

Now we can go to the next section: First steps in Stata: Set names and labels
You can download the created file by clicking here