文档库 最新最全的文档下载
当前位置:文档库 › Stata 入门MA-Econometrics-Intro

Stata 入门MA-Econometrics-Intro

Stata 入门MA-Econometrics-Intro
Stata 入门MA-Econometrics-Intro

Introduction to Stata

UCD M.A.Econometrics

September30,2010

Contents

1Introduction3

1.1Opening Stata (3)

1.2Preliminaries (4)

1.3Audit Trails (5)

1.4Getting Help (6)

1.5Importing Data (6)

1.6User Written Commands (6)

1.7Menus and Command Window (7)

1.8Data browser and editor (7)

1.9Syntax (7)

1.10Types of Variables (7)

2Data Manipulation9

2.1Describing Data (9)

2.2Generating Variables (9)

2.3if Commands (9)

2.4Summarising with tab and tabstat (10)

2.5Introduction to Labels (10)

2.6Joining Datasets (11)

2.7Tabout (12)

2.7.1Tabout with Stata9/10/11 (12)

2.7.2Tabout with Stata8 (12)

2.8Recoding and Strings (13)

2.9Missing Values (14)

2.10Macros,Looping and Programming (14)

2.11Counting,sorting and ordering (16)

2.12Reshaping Datasets (17)

2.13Graphs (17)

3Regression Analysis21

3.1Dummy Variables (21)

3.2Outreg2 (22)

3.3Hypothesis Testing (23)

3.4Post Regression Commands (23)

3.5Interaction E?ects (24)

3.6Speci?cation and Misspeci?cation Testing (24)

1

4Binary Regression26

4.1The Problem With OLS (26)

4.2Logit and Probit (27)

4.3Marginal E?ects (29)

5Time Series32

5.1Initial Analysis (32)

5.2Testing For Unit Roots (32)

5.3Dealing With Non-Stationarity (33)

6Ordinal and Multinomial Regression36

6.1Ordinal Data (36)

6.2Multinomial Regression (39)

7Panel Data44

7.1Panel Set Up (44)

7.2Panel Data Is Special (48)

7.3Random and Fixed E?ects (48)

7.4The Hausman Test (49)

8Instrumental Variables52

8.1Endogeneity (52)

8.2Two Stage Least Squares (54)

8.3Weak Instruments,Endogeneity and Overidenti?cation (54)

9Recommended Reading57 List of Tables

1Logical Operators in Stata (8)

2Tabout Example1-Crosstabs (13)

3Tabout Example2-Variable Averages (13)

4OLS Regression Output (22)

5OLS Regression Output With Dummy Variables (23)

6Outreg Example (24)

7Linear Probability Model Output (26)

8Logit and Probit Output (28)

9Marginal E?ects Output (30)

10Alternative Binary Estimators for HighWage (31)

11Dickey Fuller Test Ouput (33)

12A Comparison of Time Series Models (35)

13Ordered Probit Output (38)

14OLS and MFX for Ordinal Data (40)

15Multinomial Logit Output (42)

16MFX for Multinomial Logit (43)

17xtdescribe Output (46)

18xtsum Output (46)

19xttrans Output (47)

20Random E?ects Output (49)

21Fixed E?ects Output (49)

22Comparison of Panel Data Estimators (50)

23Correlation Between Income,Openness and Area (54)

2

24OLS and IV Comparison (54)

25Testing for Weak Instruments (55)

List of Figures

1An example of a graph matrix chart (18)

2Graph Example1:Map (20)

3Graph Example2:Labelled Scatterplot (20)

4A Problem With OLS (27)

5Problem Solved With Probit and Logit (29)

6Autocorrelation Functions For Infant Mortality and GDP (32)

7Partial Autocorrelation Functions For Infant Mortality and GDP (33)

8Using OLS To Detrend Variables (34)

9Health Distribution (36)

10Height Distribution (37)

11Ordered Logit Predicted Probabilities (39)

12Multinomial Logit Predicted Probabilities (41)

13BHPS Income (45)

14BHPS Income by Job Satisfaction (46)

15Graph Matrix for Openess,Area and Income Per Capita (52)

16Openness and Area (53)

Objective

The aim of these labs is to introduce students to the use of Stata,and to adequately equip them with the ability to undertake the basics of data management and analysis.This document is designed to com-pliment rather than substitute for a comprehensive set of econometric notes.Topics covered fall under the following areas:data management,graphing,regression analysis,binary regression,ordered and multinomial regression,time series and panel data.Stata commands are shown in red and these are also available in a sepearate do?le,along with the data on the course website.During the these labs we will go through practical examples of how to use these commands.

1Introduction

1.1Opening Stata

Stata11is available on UCD computers by clicking on the“Networked Applications”.Select the“Mathe-matics and Statistics”folder and Stata v11.It is also possible to run Stata from your own computer.Log into UCD connect and click“Software for U”on the main page.You will?rst need to download and install the client software,then you will be able to access Stata11,again in the“Mathematics and Statistics”folder.For further details see http://www.ucd.ie/itservices/teachinglearningit/applications/softwareforu/d.en.21241 Stata11is recommended,however Stata8.0will also be available on the NAL(Novell Application Launcher)until the end of this academic year.Click Start and open the NAL.Open the Specialist Appli-cations folder and click into Economics.Open wsestata.exe,or right-click and add as a shortcut to your desktop.Alternatively,click Start>Run,paste in Y:\nalapps\W95\STATASE\v8.0and click enter.For further details see:http://www.ucd.ie/itservices/teachinglearningit/applications/nallabs/

3

1.2Preliminaries

Before starting,we need to cover a very important principle of data analysis.It is vital that you keep track of any changes you make to data.There is nothing worse than not knowing how you arrived at a particular result,or accidentally making a silly mistake and then saving your data.This can lead to completely incorrectly conclusions.For example you might confuse your values for male of female and conclude men are more at risk of certain outcomes,etc.These mistakes are embarrassing at best,and career threatening at worst.There are three simple tips to avoid these problems.Firstly keep a log of everything.Secondly, to ensure you don’t embed any mistakes you’ve made in future work,most econometricians never save their dataset.Generally people initially react badly to this suggestion.However you don’t need to saves changes to the dataset itself if you implement all manipulations using do?les.The?nal tip therefore,is to use do ?les.We will cover each of these in what follows.

The?rst thing we need to do is open our data.If we have a?le saved somewhere on our hard disk we could use the menus to load it.FILE,OPEN.Or we could write out the full path for the?le,e.g.“h:\Desktop\”.The path for your desktop will di?er depending on the computer your are using,however, if you are on a UCD machine this should be it.This is awkward,and we will also need somewhere to store results,and analysis.So we will create a new folder on our desktop called“Stata”.Right click on your desktop,and select NEW,FOLDER.Rename this to“Stata”.We will also create a new folder within this called“Ado”which we will use to install new commands.Save the?les for this class into the“Stata”folder. Stata starts with a default working directory,but it is well hidden and not very convenient,so we want to change the working directory to our new folder.First we check the current working directory with pwd.Now we can change it cd‘‘h:\Desktop\Stata’’.If you are unsure where your new“Stata”folder is,right click on it and go to PROPERTIES.You will see the path under LOCATION.Add“\Stata”to this.Now we can load our data?les.One?nal piece of housekeeping,because we can only write to the personal drive (“h:\”)on UCD computers we need to be able to install user written commands here.So we set this folder with sysdir set PLUS‘‘h:\Desktop\Stata\Ado’’.This is only necessary if you are running Stata from a UCD computer.

Now we have this set up,accessing?les saved in Stata format(.dta)is https://www.wendangku.net/doc/d64995066.html,e icecream2. If you make changes to the data,you will not be allowed to open another dataset without clearing Stata’s memory?rst.gen year=2010.We will encounter the gen command later.Now if we try and load the data again use icecream2we get the error message“no;data in memory would be lost”.We need to use the command clear?rst,then we can reload the dataset use icecream2.Alternatively,using the clear option automatically drops the dataset in current use use icecream2,clear.This raises a very important point, we need to keep track of our analysis and our changes to the data.Never ever save changes to a dataset. If you have no record of what you have done not only will you get lost and not be able to reproduce your results,neither will anyone else.And you won’t be able to prove that you’re not just making things up. This is where do?les come in.A do?le(not to be confused with and ado?le)1is simply a list of commands that you wish to perform on your data.Instead of saving changes to the dataset,you will run the do?le on the original data.You can add new commands to the do?le as your progress in your analysis.This way you will always have a copy of the original data,you will always be able to reproduce your results exactly, as will anyone else that has the do?le.You will also only need to make the same mistake once.The top journals require copies of both data and do?les so that your analysis is available to all.It is not uncommon for people to?nd mistakes in the analysis of published papers.We will look at simple example.Do?les have the su?x“.do”.You can execute a Do?le like this do intro.2do tutorial1would run all of the analysis for this tutorial.There are several ways to open,view and edit do?les.The?rst is through Stata. Using the menus go to WINDOW DO-FILE EDITOR,NEW DO-FILE.Or click on the notepad icon below the menus.Or type doedit in the command window.Or press CTRL F8.Each of these will open the do?le 1This is a do?le which contains a programme.Stata uses these to run most of its commands.This is also how we are able to install new user written https://www.wendangku.net/doc/d64995066.html,ually we will be able to install these automatically,however sometimes we need to do this manually.All that is involved here is saving the appropriate ado?le into the appropriate directory which you can locate with sysdir.

2run intro executes the do?le but suppresses any output.

4

editor.Alternatively you can write do?les in notepad or word.They must be saved as.do?les however.You don’t have to execute a whole do?le,you can also copy and paste commands into the command window.In this tutorial we will create our own do?le using the commands in this document.

As well as using do-?les to keep track of your analysis,it is important to keep a log(a record of all commands and output)in case Stata or your computer crashes during a session.You should open a log at the start of every session.log using newlog,replace.To examine the contents of a log using the menus go to FILE,VIEW.Alternatively type view logexample.Also useful is set more off,which allows Stata to give as much output as it wants.This setting is optional but otherwise Stata will give only one page of output at a time.Finally,you must have enough memory to use your data.You can set the amount of memory Stata uses.By default,it starts out with10megabytes which is not always enough.If you run out of memory you will get the error message“no room to add more observations”.For most data?les30 megabytes will be enough,so we will start by setting this as the memory allocation.set mem30m.To check the current memory usage type memory.You could set memory to several hundred megabytes to ensure that Stata will never run out,but this makes your computer slow(espescially if you have a slow computer)and so is not recommend.None of the?les we will be examining require more than this.Note that if you run out of memory you will have to clear your data,set the memory higher and re-run your analysis before proceeding.

In general all of these items are things you will want to place at the start of every do?le.

clear

set mem30m

cd"h:\Desktop\Stata"

sysdir set PLUS"h:\Desktop\Stata\Ado"

set more off

capture log close

local x=c(current_date)

log using"h:\Desktop\Stata\‘x’",append

Lines7and8require some explanation.The outcome of this is that Stata will record all analysis you conduct on a particular day in a log?le,the name of which will be that day’s date.We will explain how this works when we discuss macros.Note that Stata ignores lines with begin with“*”,so we will use this to write comments.The command“capture”is also important.If you are running a do?le and it encouters and error,the analysis will stop.The“capture”command tells Stata to proceed even if it encounters a mistake.

If you are running Stata on your own computer,there is a way to alter the default settings that Stata starts with.When it launches,Stata looks for a do?le called“pro?le.do”and runs any commands it contains.You can create this?le so that these changes are made automatically everytime you launch Stata.

(i.e.memory is set,directory is set and a log is started).As well as a working directory,Stata also has other directories where programmes are stored.We need to put our“pro?le.do”into the“personal”folder.To ?nd it,type sysdir.We now paste the following into a text?le(either using notepad or Stata),and save it as“pro?le.do”into that directory.

1.3Audit Trails

1.Remember to keep a record of everything

2.Never alter the original dataset

(a)Place the original dataset in a separate folder

(b)Make a backup of the dataset

(c)Use.dta?les for Stata use

3.Before completion,do a test run on a backup

5

1.4Getting Help

Stata has an inbuilt help function.In fact you can access the help?le for any command.Suppose we are inter-ested in the“tabstat”command,we can type help tabstat.However these are often aimed at experienced users and you may have di?culty understanding them.The syntax for this command is given as“tabstat varlist[if][in][weight][,options]”.Items in square brackets are optional.So all we require to run this com-mand is the command itself followed by at least one variable.The various options are explained,followed by some examples.Looking at the examples is often the best way of getting to grips with how a command works. If you cannot solve the problem using the help?le,Stata has an extensive online support system.It is more than likely that someone else has encountered the same problem.A google search usually throws up several items of interest.There are also several excellent websites which detail how to deal with various aspects of data analysis.The best of these are the UCLA(https://www.wendangku.net/doc/d64995066.html,/stat/stata/sk/default.htm)and Princeton

(https://www.wendangku.net/doc/d64995066.html,/online help/stats packages/stata/stata.htm)Stata websites.

We are now ready to begin analysing our data.

1.5Importing Data

We have already seen that importing data in Stata format(.dta?les)is simple.The command is“use ?lename”.We need to load the icecream dataset for this https://www.wendangku.net/doc/d64995066.html,e icecream.

Note:We’re opening a dataset.Remember,do not save any

changes you make.

More often than not you’ll have to make do with Microsoft Excel(.xls)?les.Fortunately Stata can import these?les quite easily.If you have an Excel?le named myfile.xls,you can import it using Stata’s insheet command.First,in Microsoft Excel,click File>Save As.Now instead of saving as a Microsoft O?ce Excel?le,save the?le as a CSV(Comma Delimited)?le using the dropdown menu.You can then load the data using the command insheet using myfile.csv.The“in?le”command is an alternative for loading other types of data.If a direct import with this command fails,try opening it in excel and following the instructions above.We will discuss how to import SPSS?les when discussing an example of a user written command.

1.6User Written Commands

One of the major advantages of Stata is the manner in which users can write their own commands.In the unlikely event that you are trying to do something that doesn’t have an o?cial command,you are practically guaranteed that someone else has had the same problem,and can be reasonably con?dent that someone has written their own code to deal with the issue.Finding the answer to your particular problem is not always straightforward,but it can be as easy as a google search.Knowing exactly what to look for can be the main problem.If you can?nd the name of your programme it can be relatively straightforward to https://www.wendangku.net/doc/d64995066.html,ter we will encounter two other user written commands(“tabout”and“catplot”),now we will consider the programme“usespss”which is used to import data saved in SPSS format.3.If we try to import the SPSS ?le in our directory,use spssfile.sav we get the error message“?le SPSS?le.sav not Stata format”.You would imagine that it should be relatively easy to transfer?les between di?erent statistics packages but this is not the case.Without this command you might need to use the expensive StatTransfer programme.As we know what we’re looking for,the process of installing the programme is easy.We use the“?ndit”command. findit usespss.In fact if you’re sure of the name you can simply type ssc install usespss.If you are not exactly sure of the name,but have a general idea of what it’s called you can use the command search usespss,all.A new window will appear,and clicking on the blue link will take you to a new page.Click on CLICK HERE to install.This programme is now ready to run.You can also access the help?le,help 3SPSS is a statistics package popular in the other social sciences

6

usespss.We can now open the SPSS?le in our https://www.wendangku.net/doc/d64995066.html,espss using spssfile.sav.We could now save this in Stata format for future use.save,replace.It is saved as“SPSS?le.dta”.However as you can see, this is just the same as our icecream dataset.4We will re-load the Stata https://www.wendangku.net/doc/d64995066.html,e icecream2,clear.

1.7Menus and Command Window

Across the top of the Stata programme are a number of menus,and you may be tempted to use these to carry out your analysis.For example,going to‘FILE’,‘OPEN’,and selecting your?le will open the dataset. As we will explain later on,this is not ideal.Not only is it slow,but it makes reproducing your results very tricky.The alternative is to type commands directly into the command window,which requires becoming familiar with the language of Stata,but is ultimately much more e?cient and reliable.One bene?t of using the menus is that when you run a command this way,it appears in the main window,which is where results of analysis are displayed.So this is a useful way to learn commands if you are unsure of them.Copying this from the main window into the command window and pressing enter will reproduce the command.This is in fact how to keep track of your analysis,by recording every change you make to the data.The Review window provides you with a list of all entered commands.Clicking on a command in the review window will cause it to appear in the command window.Failed commands are shown in red.The variable window gives

a list of all the variables in the dataset,their labels,and their attributes.

1.8Data browser and editor

Stata holds the data in memory like an excel?le.To see the actual data select‘DATA’then‘DATA BROWSER’from the menus.You will see the variables horizontally with the observations vertically.Se-lecting‘DATA EDITOR’instead provides you with the same view,except you are now able to edit the data like you would a spreadsheet.In fact this is an alternative way of entering data,you can simply paste it in.As before this is not recommended for reasons of reproducibility.Clicking on a particular case gives you the exact entry for that particular variable and observation.The data editor and browser windows must be closed before you can enter any new commands into Stata.

1.9Syntax

Getting to grips with how to communicate with Stata is perhaps the most daunting aspect of starting out. Generally programmes and commands take the form of“command name”“variable name(s)”“,options.”We will shortly see examples with tab and later regress.The exact syntax for a particular command is detailed in the help?le.For example,help tab.Here the aim is to introduce you to some of the most important commands.As you become more familiar with them you will be able to use the various options available,depending on the particular task you wish to perform.

Stata understands abbreviations,once the abbreviation can only be interpreted one way.For example, the full command to run a regression is“regress”,however Stata understands what you mean if you only type “reg”.The same principle applies to variable https://www.wendangku.net/doc/d64995066.html,ing the icecream dataset,typing tab ti is equivalent to tab time.However,the command tab t will return the error message“t ambiguous abbreviation.”In the help?le for a command,the shortest acceptable abbreviated version is underlined.

1.10Types of Variables

There are essentially two types of variables in Stata,words(referred to as strings)and numbers.Each is handled slightly di?erently when you are manipulating your data.Within numerical variables,there are two further types;continuous data,such as income or height,and categorical data(such as level of education or gender).In the second case a value will take on a particular meaning,e.g.1=male and2=female.As we will see these variables are often labelled in the data.In the Icecream dataset,the?rst5are obviously 4An alternative for transferring SPSS?les into Stata is to download SPSS which is available from UCD Connect,open the SPSS?le and save in Stata format.

7

Table1:Logical Operators in Stata

And&

Or|

Not!or?

Multiplication*

Division\

Addition+

Subtraction-

Less Than<

Greater Than>

Less Than or Equal=<

More Than or Equal=>

To The Power Of?

Wildcard*

continuous numerical variables(these appear in black in the data browser),however county,var7and weekend are di?erent.County is a string variable,with the entries appearing as words in the dataset.Strings such as this appear in red.If you click on the county variable for observation1,you will see“Antrim”appear as the entry for that case.On the other hand,for var7the entry for observation1appears as“Ulster”in the data browser,but“1”when you click on it.This means the data is in numerical form(in this case1-4),but has been labelled so that each number refers to a di?erent province.We will discuss manipulating each of these types of variables in the next section.

8

2Data Manipulation

Here we introduce the basic commands for manipulating your data.The most important logical operators in Stata are outlined in table1.The most frequently used are&(and),|(or)and!(not).These are essential for manipulating the data correctly.We can illustrate some of these using the“display”command,which we can shorten to“di”:di10*20and di6/(5-2)+18.Notice that strings require double quotes di Welcome does not work but di‘‘Welcome’’does.You can also access system values and programme results with this command,for example today’s date di‘‘‘c(current date)’’’.Note again that di‘c(current date)’returns an error message.We need single quotes because“c(current date)”is a macro.This is how we were able to name our log?le,see section2.10for more details on macros.We will explain the use of the wildcard in section2.9.Stata will know whether you mean multiplication or the wildcard depending on the situation.

2.1Describing Data

Stata provides several ways of investigating and describing data.It is generally a good idea to browse the data once you have it loaded.This allows you to view the data in spreadsheet-format.For a broader look at the variables in the dataset,use the sum command,which summarizes all the variables.Alternatively, specifying a variable after the command(e.g.sum price)will summarize the price variable on its own.If you would like more precise information(e.g.percentiles)then you can add the detail option to the end of that command,i.e.summ price,detail.A very useful command is inspect.This summarizes a variable’s missing values(if any)and provides a simple plot of the variable’s distribution.

Sometimes you may wish to break down summaries by a particular variable.For example,you might like to see how some variable changes over time.The bysort command is very useful here.It is best explained with an example.Suppose you want to see how consumption changes as temperature does.The command here would be:bysort temp:summ cons.This means,in English,“summarize consumption for every distinct value of temperature.”Other variables that you may use again include describe and list.

To investigate basic correlations,use the corr command followed by the variables you want to the correlation of.For example,corr price temp cons will provide a matrix of Pearson correlation coe?cients between price,temperature and consumption.You may want to use the pwcorr command instead.It’s essentially the same as corr but it allows for a little bit more detail.If you type pwcorr temp cons,sig it will provide p-values of the test with the null of the correlation being zero.

2.2Generating Variables

You will regularly have to generate variables to aid econometric analysis.For example,you may want to create dummy variables or run log-log regressions.To create a variable named loggedprice equal to the natural log of price,the command is gen loggedprice=ln(price).Similarly,to generate a variable equal to twice the square root of temperature,use the command gen twice root temp=2?sqrt(temp).Note that variable names cannot contain spaces.

The egen(“extended gen”)command works just like gen but with a few extra options.For exam-ple,egen avg price=mean(price).With egen we can also break commands down into percentiles very easily.For example,to create a variable equal to the99th percentile of price,enter egen high price= pctile(price),p(99).Changing the99to50in that command would produce a variable equal to the median price.“Egen”is often used to obtain a breakdown of a particular statistic by another variable.For example,we could obtain a variable containing the minimum income value for that particular province egen minincome=min(income),by(var7).

2.3if Commands

Oftentimes we want Stata to run a command conditional on some requirement.For example,the correlation between price and consumption if the temperature is greater than25?C.This is easily achieved:corr price cons if temp>25.To add more conditions to a command,for example to test the correlation

9

if temperature is greater than25?C and less than35?C,we use the&operator:corr price cons if temp>25&temp<35.

We can also use if to investigate data more closely:summ cons if price>.275.We can create dummy variables with if commands.Typically two steps are needed.First we create a variable set equal to zeros:gen expensive=0.Now we replace it:replace expensive=1if price>avg price.See the tutorial on regression analysis for more details on dummy variables.Similarly we can control for outliers using if commands.For example if you want to eliminate the most expensive5%of observations,the following would work:

egen top_fivepercent_prices=pctile(price),p(95)

drop if price>top_fivepercent_prices

We remove these variables from the data with the“drop”command5as we do not need them in this analysis.drop loggedprice twice root temp avg price high price expensive.

2.4Summarising with tab and tabstat

One of the?rst things you will want to do with your data is to summarise its main features.Crosstabs are a useful pre-regression tool,and are also useful for presenting the main points of your data succinctly.The two most important commands for this are“tab”and“tabstat”.“Tab”tells you how many times each answer is given in response to a particular variable.This is only suitable for variables with relatively few entries,such as categorical data.If you try to use“tab”on a variable which has hundreds of di?erent entries you will get an error message.Typing tab var7will show the how many entries there are for each province.As with all commands,it can also be accessed through the menus via:STATISTICS,SUMMARIES,TABLES.It is also easy to obtain crosstabs which give a breakdown of a variable by another.For example typing tab var7 weekend will show how many weekend and weekday entries there are for each province.It is often useful to know the percentages as well as the actual numbers.We need to add an option to our“tab”command. tab var7weekend,col will give us the percentage with each column.Typing tab var7weekend,row will give us the percentage within each row.

The second command which is useful here is“tabstat”.This is used for continuous variables,and its main use is to provide mean values.For example tabstat price will give the average price in the dataset. Using the command options we can also access other useful statistics such as the median tabstat price, stats(med),or the variance tabstat price,stats(var).For a full list of the available statistics,type help tabstat.As before we can obtain these statistics according to di?erent levels of a second variable. tabstat price,by(var7)gives the average price for each province.

2.5Introduction to Labels

Labels are designed to help the user of a dataset understand and present their?ndings.These are often essential,for example if you have a categorical variable gender with two values1and2,and no label you are in trouble as you will not know which refers to male and which to female.Generally these will be provided in your data,but not necessarily.Often the variable name itself will be self explanatory,for example in the icecream dataset the meaning of the variable“income”is obvious,although we do not know the unit it is measured in.It is not so obvious for var7,but a look at descriptives or the data browser makes it clear that it refers to Irish provinces.It is easy to rename a variable.Type rename var7province into the command window.

As well as their actual names,you can also label a variable.These can be used to provide additional information that is not apparent from the variable name.So far none of the variables in the dataset have these. Adding a variable label to a variable is also straightforward.Type label variable temp‘‘Temperature Degrees C’’and label variable cons‘‘The number of ice-creams purchased.’’.You can see the 5We can also select the variables we wish to remain in the data,with the“keep”command.

10

result of this in the variable window where this label will appear.The label will also be used in tables and other output generated by Stata.

We have already encountered the other type of labels,value labels.These are labels attached to particular values of a variable and are mainly used with categorical data.In the case of the variable province,we have already seen how this works.For a quick way of seeing exactly which labels are attached to each value,type codebook province to obtain the name of the value label.Than lablebook province1will show all the values and their labels.From this we can see that in the variable province,the value“1”is labeled with the name“Ulster”,2“Leinster”etc.If there are many value labels you may need to use the option all. codebook province,all.

You will often need to either create your own value labels,or else modify existing ones.The procedure is almost the same in both cases.If you are starting from scratch,you will need to pick a name for your label.In this case we will create value labels for the weekend variable,and call the label weekend1.Suppose we know that“1”refers to“weekend”and“2”refers to“weekday”.We use the“label de?ne”command, followed by the values and their labels in double quotation https://www.wendangku.net/doc/d64995066.html,bel define weekend11’’Weekend’’2’’Weekday’’.We then need to attach our value label to the existing variable using label values weekend weekend1.We tab weekend to con?rm the change.The only di?erence in the case of modifying an existing set of values labels is that you need to obtain the name of the value label(“codebook”will supply you with this).Then you use the“label de?ne”command with the“modify”option to change one or more of the labels.You do not need to reattach the modi?ed value label to the variable,this is done automatically.You also do not need to write out the full set of labels,only the one you want to change.For example,if we wanted to change the label on the province variable for the value“2”from“Leinster”to“Dublin”,we would use label define prov2’’Dublin’’,modify.

2.6Joining Datasets

There are two di?erent situations when you will need to join datasets,each requires a di?erent command. Broadly these involve adding more observations,or adding more variables.We will demonstrate each using our icecream dataset.For the?rst situation we will add two extra observations.In the second case we will add an extra variable(rainfall).

The?rst is easier.Suppose you want to join two rounds of a survey into a single dataset.This often happens with the likes of the European Social Survey,the QNHS or other surveys which involve repeated cross sections.If you think about what you would do in excel,all you need to do in this case is stack the two datasets on top of each other.But don’t do it in excel because there will be no record of what you’ve done or how you’ve done it.We will use the“append”command to add our new observations which are in the dataset icecream3.It is as straightforward as append using icecream3.Now we have a dataset with an extra observations.We con?rm this with tab county and the data browser.

Adding new variables is slightly more tricky.We can’t just add the information at random,we need to make sure that each observation is matched in each dataset.This often arises in the context of surveys such as SHARE which have several modules(and data?les)dealing with di?erent domains such as health, income,demographics etc.If we want to join modules we have to make sure that the extra variables are really the responses which that particular individual gave.So you need a variable in each dataset that uniquely identi?es each individual(or?rm,country etc.whatever your level of observation is).6In this case we will assume we are interested in time periods,and add an extra variable(rainfall)for each time period.In this case our unique identi?er is time,and we will need this variable to be the same in the new dataset.The?rst thing we need to do is make sure the data is ordered correctly,so we need to sort by our identi?er.sort time.The new dataset will also need to be sorted by this variable.Now we can use the merge command with the dataset containing the new variable icecream4.merge time using icecream4.We can now check that we have an additional rainfall variable for each time period.tabstat rainfall,by(time).And in the data browser.A new variable,“merge”is created,which tells us whether the variables were present in 6In certain situations you may need to merge across more than one variable,for example merge mergeid year if you have a panel dataset.

11

the master dataset,the dataset we merged with,or both.See help merge for details.For no we will drop this variable with drop merge.

2.7Tabout

We have already discussed tabulating variables and looking at crosstabs.Here we will examine how to extract these results in a way that can be easily used in presentations or papers.To do this we need the user written command“tabout”.To?nd and install,we?rst search for it.search tabout,all.The option all allows us to search the internet.We?nd an entry in blue for tabout,with the description:‘TABOUT’:module to export publication quality cross-tabulations/tabout is a table building program for oneway and twoway/tables of frequencies and percentages,and for summary tables.It/produces publication quality tables for export to a text?le.

2.7.1Tabout with Stata9/10/11

You can install tabout from this link,or else by typing ssc install tabout.

As before,there are essentially two cases involved,one analogous to“tab”,and the other analogous to“tabstat”.tabout county using filename.xls,replace gives us an excel table with the numbers in each county.The replace option is important(and an option which is available in most programmes which involve exporting results)and tells Stata to overwrite the?le if it already exists.The alternative is append,which tells Stata to add new results to the same?le.If we want percentages as well as numbers in each category we can use the following option:tabout county using filename.xls,append cells(freq co).We may also want to display crosstabs which we can do by adding a second variable,for example tabout county province using filename.xls,append.If we want percentages as well as numbers we can use the following:tabout county province using filename.xls,append cells(freq co).“Tabout”only works with variables which have labels,as these kind of tables do not really make sense for continuous variables such as income.However,we can make tables of means(or other statistics)like with“tabstat”. tabout province using filename.xls,append cells(mean income)sum gives us the average income for each province.

2.7.2Tabout with Stata8

Clicking on the link after running search tabout,all will open a page which allows you to install the pro-gramme.You will notice that Stata version9is required.We need a version which is compatible with the verision of Stata we are running.Here we need“tabout8”,but a search is unsuccessful.search tabout8,all.We will need to install it ourselves.If we google“tabout8”we will?nd the website “https://www.wendangku.net/doc/d64995066.html,.au/stata.html”as the?rst entry.Here we?nd a link to the.ado?le,which we download and save in our personal directory which we can locate with the command sysdir.We also save the corresponding help?le in this directory.This may seem like a lot of e?ort,however this only needs to be done once,and will save you a lot of time if you are handling datasets with large numbers of variables. It is possible to copy and paste from the Stata window into excel after using the“tab command”,but this is a painstaking process,especially if you are constantly updating your analysis.“Tabout”automates this process.

As before,there are essentially two cases involved,one analogous to“tab”,and the other analogous to “tabstat”.tabout8county using filename.xls,replace gives us an excel table with the numbers in each county.The replace option is important(and an option which is available in most programmes which involve exporting results)and tells Stata to overwrite the?le if it already exists.The alternative is append, which tells Stata to add new results to the same?le.If we want percentages as well as numbers in each category we can use the following option:tabout8county using filename.xls,append cells(fcount fper).We may also want to display crosstabs which we can do by adding a second variable,for ex-ample tabout8county province using filename.xls,append.If we want percentages as well as num-bers we can use the following:tabout8county province using filename.xls,append cells(double).

12

Table2:Tabout Example1-Crosstabs

Province

Ulster Dublin Munster Connacht Total Weekend Num%Num%Num%Num%Num%

Weekend555.6758.3360.0350.01856.3

Weekday444.4541.7240.0350.01443.8

Total9100.012100.05100.06100.032100.0

Table3:Tabout Example2-Variable Averages

Mean Income

By Weekend

Weekend84

Weekday85

Total85

By Province

Ulster83

Dublin85

Munster85

Connacht86

Total85

“Tabout8”only works with variables which have labels,as these kind of tables do not really make sense for continuous variables such as income.However,we can make tables of means(or other statistics)like with “tabstat”.tabout8province using filename.xls,append cells(mean income)gives us the average income for each province.

“Tabout”is at its most powerful when used in conjunction with latex,which is the software which was used to create this document.Note that the more recent version has a slightly di?erent syntax,but the general idea is the same.You can also get con?dence intervals using the latest version.

2.8Recoding and Strings

Sometimes we may need to recode particular values in order to carry out our analysis.For example we have information on all four provinces,but we may only be interested in comparing Ulster to the other provinces. We will generate a new variable province2,which at?rst is exactly the same as our province variable.gen province2=province.Now we will recode the Ulster value to be equal to one,and the other provinces to be equal to zero.In case we’ve forgotten,codebook province and then lablebook province1will tell us the value labels.Ulster is already coded as1,so we leave that as it is.There’s more than one way to change the value for the other countries.We could use the replace command.Either of the following would do the job:replace province2=0if province>1or replace province2=0if province2!=1or replace province2=0if province2==2|province2==3|province2==4.The recode command often accomplishes the same task with less e?ort.recode province2(2/4=0).The“/”tells Stata to recode values2through4to be equal to0.7“Recode”is also important if you want to change values within a variable simultaneously.For example suppose we want to reverse the coding on the variable“province”,we could do so with recode province(1=4)(2=3)(3=2)(4=1).This would not be possible using the“replace”command.If you do this,don’t forget to change the value labels.Now if we tab province2we see only 7An even quicker command would be gen province2=(province==1).

13

two values as required.To be sure that we’re not confused when we come back to the data later,or someone else is using it,we should label this new variable properly.We de?ne a new label,lab def prov2 0’’Leinster Munster or Connacht’’1’’Ulster’’.Then attach this to the variable https://www.wendangku.net/doc/d64995066.html,bel values province2prov2.Now tab province2is self explanatory.

Several issues arise with trying to manipulate string variables.For example,if we try replace prov2=15 if county==Armagh,we get an error message.This is because in general Stata requires strings to be surrounded with double quotations marks.replace province2=15if county==‘‘Armagh’’works.But we should undo that change.replace province2=1if county==‘‘Armagh’’.As we will see later on, single quotation marks have other uses.In general,dealing with strings can be tricky,for example,we cannot replace a string value with a numerical value.We also want to avoid having to type out a string every time we want to manipulate the data.Expressions involving>and

2.9Missing Values

Missing values are practically unavoidable,particularly in micro surveys.Individuals may not know how to respond to a question,or may simply refuse.In well established surveys such as Share or Living in Ireland, these will be coded as some value such as“99”,and be appropriately labelled.However this is not always the case.Often there will just be a blank entry.This appears in Stata as“.”.Stata actually equates this with in?nity,so if you try something like replace var9=100if var9>100then your missing values will be(unintentionally)included in your recode.So let’s reverse this replace var9=.if var9==100.For most commands,observations which have missing values(for any of the variables which are involved in running that command)are excluded.This is particularly important to remember when we come to look at regressions.For example we know there are now32observations in our dataset.If we tab income we ?nd a total of32as expected.If we look at var9in the data browser we notice it contains some missing values.So if we tab var9,we will only see a total of29.We notice that there are some entries labelled as “Refusal”.Depending on the circumstances we may or may not want to exclude these from our analysis.To do this we need to?nd the value for“Refusal”with codebook var9.We?nd it is“99”.To recode this as missing we could do as above replace var9=.if var9==99or recode var9(99=.).An alternative is the“mvdecode”command.mvdeocde var9,mv(99).An advantage if using this command is that we can simultaneously recode all missing values for several variables at the same time,e.g.mvdecode time income weekend,mv(99).But of course in this case the variables time and income do not contain any observation with the value“99”.It is also possible to reverse this process with the command“mvencode”.You may also want to recode existing non-missing values as missing,for example to deal with outliers.One way to do this is replace income=.if income>100.Again,an alternative is“mvencode”.But here there are no observations with income greather than100.Note that dealing with missing values is a very important topic in applied econometrics,and can have a major impact on your results.Earlier we mentioned the wildcard “*”.Suppose we wanted to drop all variables with beginning with“var”,we could type drop var*.Or if we wanted to recode missing values for all variables and the value“99”,we could type mvdecode*,mv(99).

2.10Macros,Looping and Programming

An important thing to remember about Stata is that there is nearly always an easier and quicker way to do things,especially if you?nd yourself having to repeat the same task over and over.This could be recoding, generating variables etc.Stata has features designed to automate this processes,and as you become more familiar with the programme you will literally be able to save yourself hours(by using loops,for example).

14

We will start with macros.These are simply shortcuts which stand in for something else,and can be used to store everything from strings(words)to values,variable lists and results from programmes.We de?ne a macro with the“local”command,and access it using single quotation marks‘’.89First we de?ne a macro x which takes the numerical value10.local x10.Now everytime we call the macro‘x’we have the value 10.To check this we type di‘x’.We can now use this macro in expressions,for example di100-‘x’. We can also use it to manipulate variables.gen income2=income*‘x’.We can also store words in macros. Suppose you wanted to add the word“icecream”to each variable name.You could type rename price icecreamprice and rename cons icecreamscons etc.To save time you could store the word“icecream”in a macro.local y icecream.Then rename price‘y’price would give the same result.You may wonder as to how useful this is,and in these cases it is probably not particularly helpful.A better example is when we want to store a list of variables.Rather than typing out the whole list every time,we can save the variables in a macro.local z price income temp.Then suppose we wanted to recode all missing values in for all of these,instead of typing mvdecode price income temp,mv(100)we can type mvdecode‘z’, mv(100).This is a small dataset so it’s not a particularly big deal here,but it’s a di?erent matter when you have100s of variables.

Macros are also important for accessing results stored by programmes.10The macros saved by a pro-gramme are listed in the help?le.For example,if we look at help summarize,we see the list for this command.11Suppose we are interested in constructing new versions of our variables which are in the form

of a z score(standardised deviation from a variable’s mean:z i=(μ?x i)

sd(x).We can see from the help?le that

the command“summarize”stores the two results we need,the mean and standard deviation in the macros “r(mean)”and“r(sd)”respectively.12We can use these to form our new variables.To access the stored results we need an“=”that we didn’t when we were de?ning our own macros.First we run the command sum time.Then we de?ne our macros local a=r(mean)and local b=r(sd).To check we have the correct results di‘a’and di‘b’.Now we can generate our new variable.gen ztime=(‘a’-time)/‘b’.If we now sum ztime,we see that the mean is e?ectively zero(it’s actually a very small number due to rounding),as it should be seeing as the average variation around a mean is zero by de?nition.The standard deviation is also as expected.We will later write our own programme which will allow us to transform all our variables in this way in a single line of code.Think how long it would take to do this in excel.

Loops are another time saving device which employ the use of macros.For example in our icecream dataset,we notice from the data browser that the variable hour has been badly inputted.These should all be in the format of the24hour clock,however you can see that the?nal two zeros are missing from some of the entries.This makes analysis di?cult,for example sum hour will give misleading results.In order to correct this we make use of a loop.This involves the forvalues command.Essentially we want to add two zeros to every entry less than100.

forvalues i=1(1)24{

replace hour=hour*100if hour==‘i’

}

Notice the syntax,we are creating a macro“i”which will start at the value1,execute the command for that value,move on to the next value(“2”),execute the command for that value etc until the loop ends. The?rst number refers to the starting value,the number in brackets is the increment,and the?nal number is the end value.We need a curly brace at the end of this line,the command on a separate line,and another curly brace again on a separate line.Executing this command will give us a well behaved variable with every entry in the same format.sum hour.

8Note that‘is the inward pointing single quotation mark,and is usually the button to the left of the number1on your keyboard.You may need to press it twice.

9There are also“global”macros which are rarely used.

10This is what we are doing at the beginning of our do?le.Stata is told to access the date,and open a log?le under that name.

11Type mac list to see which macros are currently in use.

12For more on accessing macros see help extended fcn.

15

The other type of loop(the“foreach”command)is generally used when you want to perform the same task on a number of di?erent variables.We will write our own programme to transform every variable into a z score.We will call it“zscore”.We then use the foreach loop,with“i”being the macro that corresponds to every individual variable we wish to transform.The macro“0”refers to the list of variables we are interested in,essentially everything we type after our new command.13Like forvalues,we need a curly brace at the end of this line,and each of our commands also on a separate line.For every variable in“0”(our variable list)we are running the sum command,obtaining the macros for the mean and standard deviation,and then generating a new variable which will have the pre?x“zscore”.We also summarise this new variable.After the?nal curly brace we need“end”to tell Stata the programme is?nished.

capture program drop zscore

program define zscore

foreach i in‘0’{

sum‘i’

local m=r(mean)

local n=r(sd)

gen zscore‘i’=(‘m’-‘i’)/‘n’

sum zscore‘i’

}

end

We can now run the programme on whichever variables we are interested in.For example,zscore time cons price.The summary statistics con?rm we have what we wanted.Note that if you try to call a new programme by the same name as an existing programme you get an error message.program define zscore. So if you want to modify an existing programme you will?rst need to drop it.program drop zscore.But if there is no program called zscore this will produce an error message.Hence the use of“capture”just like when we were opening our log?le.Also,you will need to de?ne your programme again each time you start

a new Stata session,unless you save it as an ado?le.14

2.11Counting,sorting and ordering

It is possible to access individual observations with your data using subscripts.This take the form of square brackets containing the observation number after the variable of interest.For example,di time[2]displays the value of the time variable for the second observation.This can be very important if you want to access information in the responses of other observations.For example you may have data on households and may want to use information provided by parents to analysis the outcomes of children.We will illustrate this with our province variable.Two special cases of subscripts are n and N.The latter is used to count the total number in each case.For example,gen totalno=N gives us a new variable which is the total number of observations in the dataset.It is obviously the same for each observation.tab totalno.On the other hand,gen totalno2=n tells us the ranking of each observation in the dataset,and runs from1to32.tab totalno2.These are most useful when used with the by command.

We will generate two variables which gives us the total number in each province,and also the rank of each observation in each province.First we need to sort our data sort province.Then by province:gen provinceno=N15and bysort province:gen provinceno2=n.

We can use the data browser to con?rm that this generated the variables we expected.In this case we have some province level data in var10,however it is only present for one observation in each province,and we need it in all observations.

This is a case where you may be tempted to use excel,but apart from the replication issue,this will simply not be possible if you have a datatset with1000s of observations and hundreds of variables.Instead we can use subscripts and a loop to make the change:

13Within the macro“0”,the macro“1”refers to the?rst variable,“2”the second etc.

14It so happens that Stata already has a way of creating standardised variables with egen newvar=std(var).

15These two steps could be combined with“bysort”:bysort province:gen provinceno=N

16

forvalues i=1(1)10{

by province:replace var10=var10[‘i’]if var10==.&var10[‘i’]!=.}

Within each province we are simply replace var10with the value of the?rst observation in each province, provided that variable is missing.We are then looping over ten values as there are10at most10observations. The variable is only replaced if there is a missing value for that observation.Another useful command in this context is gsort,which orders the variables according to the values of some other variable.

2.12Reshaping Datasets

Sometimes data will be in the wrong“shape”.This is di?cult to explain without an example.If we open the ?le cyear use cyear2,and look in our data browser we will see that we have several variables that refer to repeated measures over a number of years.In fact there are three outcomes:u5m(under5mortality),gdppc (GDP Per Capita)and hivp(Proportion of Population infected with HIV/AIDS),and9years.Depending on the analysis you wish to conduct,this may be awkward with the data in this“shape”.For example,tracking an indicator across time is di?cult with this format.We can use the reshape command to transform the data into something more useable.The syntax is a little tricky,but the most important part is to identify our outcome variables,our time variable,and our country(or?rm or individual)identi?er.The command then takes the form“reshape”“outcome variables”,i(“identi?er”)j(“time variable”).So in this case we have reshape long u5m gdppc hivp,i(country)j(year).Now we can track an indicator across time, e.g.tabstat gdppc,by(year).Of course we now essentially have a panel dataset,which is a whole other topic.See section7for more details.

2.13Graphs

Like the tables we discussed above,graphs are a powerful tool for exploring,summarising and presenting your data.The basic graph commands for Stata are straightforward,however getting to grips with all the available options is tricky.It would be impossible to discuss all the di?erent types of graph,however we will discuss the most common types.Like with tables we will divide graphs into two types,those that deal with continuous variables and those that deal with categorical data.We will load our icecream data https://www.wendangku.net/doc/d64995066.html,e icecream2,clear.

For continuous data,the easiest way to visualise the relationship between two variables is to produce a scatterplot of them,e.g.scatter cons temp.If,instead,we want a graph of the line of best?t between the two variables,the relevant command is graph twoway lfit cons temp.We can also combine multiple plots in one graph using the twoway command.For example,try using twoway(scatter cons temp)(lfit cons temp).We can add some complexities very easily.For example you may wish to add con?dence interval “bands”around your line of best?t.To achieve this,use lfitci instead of lfit.16Stata also can produce several graphs in the one chart.For example we can create a3x3matrix of scatterplots with by inputting graph matrix cons temp price,scheme(s1mono).To investigate the distribution of a single variable,we can create a histogram of it using the histogram command.17For example,histogram temp.

We can produce a bar chart showing the mean of our variables using graph bar time cons price income temp.We can display this breakdown for values of a particular variable in the same graph graph bar time cons price income temp,by(province)or in di?erent graphs graph bar time cons price income temp,over(province).We may also want to graph categorical variables like province,with the aim of showing the percent in each category.The best way to do this is with the user written command “catplot”.As before we use the search command search catplot,all or ssc install catplot,and click on the blue link to install.Now we can use this to graph our province variable by itself catplot province, or by another variable catplot province weekend.

16Note that the easiest way to change the overall look of a graph is with schemes.See help schemes.We will use “scheme(s1mono)”to generate the graphs in this document as we want them in black and white.

17A similar chart is produced with graph7.

17

We will discuss this further in the time series tutorial,but if you have time series data then line graphs can be important.First we need to sort our data by the time variable,in this case time.sort time.Then we use the“line”command to graph the variables.The last variable needs to be our time variable.So in this case we could have line cons income time.

We can save any graph we produce in Stata using the graph export command.After drawing our graph, Stata will open it in a new window.It is possible to save it using the menus FILE,SAVE AS.We can also type graph export filename.png,replace.Stata can save graphs in various di?erent formats,but.png is the most straightforward.18.We can then use the graphs in other documents and presentations.19 It would take too long to go through all of the available options,but some of the most important ones refer to the title and axis labels.For example:

line income temp time,title(Time Series Graph of Income and Tempreature Over Time)/// xtitle(Time Period)ytitle(Euro(Income)and Degrees(Temp))caption(Source:icecream.dta)

18.wmf is best for word documents,.eps is best for LaTeX.

19There is a useful graph editor available in Stata version10onwards.

18

The“///”tells Stata to read the next line as part of the same command.Two examples of the kind

of graphs that are possible are provided below.The?rst graph was made using the user written command “spmap”.The second graph was generated using the cyear dataset with the following code:

twoway(scatter gdppc u5m if gdppc<1000&year==2005,mlabel(country2)mlabsize(tiny))///

(lfit gdppc u5m if gdppc<1000&year==2005),///

caption(’’Source:WHO and Penn World Tables’’,size(tiny)span)legend(order(2’’Linear Fit’’))/// title(’’National Income Per Head and Under5Mortality’’,span)///

ytitle(’’GDP Per Capita in US Dollars’’,size(small))note(’’Note:Correlation=-0.1737’’)/// plotregion(style(none))legend(pos(4)col(1))///

xtitle(’’Under5Mortality,Deaths Per1,000Births’’)///

graphr(lwidth(thin)ilwidth(vvthin)ilcolor(black)ilpattern(solid))///

subtitle(’’Developing Countries in2005’’,span)

For more examples see:

https://www.wendangku.net/doc/d64995066.html,.au/Usergraphs.html

And:

https://www.wendangku.net/doc/d64995066.html,/stat/stata/library/GraphExamples/default.htm

19

20

STATA面板数据模型操作命令要点

STATA 面板数据模型估计命令一览表 一、静态面板数据的STATA 处理命令 εαβit ++=x y it i it 固定效应模型 μβit +=x y it it ε αμit +=it it 随机效应模型 (一)数据处理 输入数据 ●tsset code year 该命令是将数据定义为“面板”形式 ●xtdes 该命令是了解面板数据结构 ●summarize sq cpi unem g se5 ln 各变量的描述性统计(统计分析) ●gen lag_y=L.y /////// 产生一个滞后一期的新变量

gen F_y=F.y /////// 产生一个超前项的新变量 gen D_y=D.y /////// 产生一个一阶差分的新变量 gen D2_y=D2.y /////// 产生一个二阶差分的新变量 (二)模型的筛选和检验 ●1、检验个体效应(混合效应还是固定效应)(原假设:使用OLS混合模型)●xtreg sq cpi unem g se5 ln,fe 对于固定效应模型而言,回归结果中最后一行汇报的F统计量便在于检验所有的个体效应整体上显著。在我们这个例子中发现F统计量的概率为0.0000,检验结果表明固定效应模型优于混合OLS模型。 ●2、检验时间效应(混合效应还是随机效应)(检验方法:LM统计量) (原假设:使用OLS混合模型) ●qui xtreg sq cpi unem g se5 ln,re (加上“qui”之后第一幅图将不会呈现) xttest0

可以看出,LM检验得到的P值为0.0000,表明随机效应非常显著。可见,随机效应模型也优于混合OLS模型。 ●3、检验固定效应模型or随机效应模型(检验方法:Hausman检验) 原假设:使用随机效应模型(个体效应与解释变量无关) 通过上面分析,可以发现当模型加入了个体效应的时候,将显著优于截距项为常数假设条件下的混合OLS模型。但是无法明确区分FE or RE的优劣,这需要进行接下来的检验,如下: Step1:估计固定效应模型,存储估计结果 Step2:估计随机效应模型,存储估计结果 Step3:进行Hausman检验 ●qui xtreg sq cpi unem g se5 ln,fe est store fe qui xtreg sq cpi unem g se5 ln,re est store re hausman fe (或者更优的是hausman fe,sigmamore/ sigmaless) 可以看出,hausman检验的P值为0.0000,拒绝了原假设,认为随机效应模型的基本假设得不到满足。此时,需要采用工具变量法和是使用固定效应模型。

最新Stata软件基本操作和数据分析入门

Stata软件基本操作和数据分析入门 第一讲Stata操作入门 张文彤赵耐青 第一节概况 Stata最初由美国计算机资源中心(Computer Resource Center)研制,现在为Stata公司的产品,其最新版本为7.0版。它操作灵活、简单、易学易用,是一个非常有特色的统计分析软件,现在已越来越受到人们的重视和欢迎,并且和SAS、SPSS一起,被称为新的三大权威统计软件。 Stata最为突出的特点是短小精悍、功能强大,其最新的7.0版整个系统只有10M左右,但已经包含了全部的统计分析、数据管理和绘图等功能,尤其是他的统计分析功能极为全面,比起1G以上大小的SAS系统也毫不逊色。另外,由于Stata在分析时是将数据全部读入内存,在计算全部完成后才和磁盘交换数据,因此运算速度极快。 由于Stata的用户群始终定位于专业统计分析人员,因此他的操作方式也别具一格,在Windows席卷天下的时代,他一直坚持使用命令行/程序操作方式,拒不推出菜单操作系统。但是,Stata的命令语句极为简洁明快,而且在统计分析命令的设置上又非常有条理,它将相同类型的统计模型均归在同一个命令族下,而不同命令族又可以使用相同功能的选项,这使得用户学习时极易上手。更为令人叹服的是,Stata语句在简洁的同时又拥有着极高的灵活性,用户可以充分发挥自己的聪明才智,熟练应用各种技巧,真正做到随心所欲。

除了操作方式简洁外,Stata的用户接口在其他方面也做得非常简洁,数据格式简单,分析结果输出简洁明快,易于阅读,这一切都使得Stata成为非常适合于进行统计教学的统计软件。 Stata的另一个特点是他的许多高级统计模块均是编程人员用其宏语言写成的程序文件(ADO文件),这些文件可以自行修改、添加和下载。用户可随时到Stata网站寻找并下载最新的升级文件。事实上,Stata的这一特点使得他始终处于统计分析方法发展的最前沿,用户几乎总是能很快找到最新统计算法的Stata程序版本,而这也使得Stata自身成了几大统计软件中升级最多、最频繁的一个。 由于以上特点,Stata已经在科研、教育领域得到了广泛应用,WHO的研究人员现在也把Stata作为主要的统计分析工作软件。 第二节Stata操作入门 一、Stata的界面 图1即为Stata 7.0启动后的界面,除了Windows版本的软件都有的菜单栏、工具栏,状态栏等外,Stata的界面主要是由四个窗口构成,分述如下: 1.结果窗口:位于界面右上部,软件运行中的所有信息,如所执行的命令、执行结果和出错信息等均在这里列出。窗口中会使用不同的颜色区分不同的文本,如白色表示命令,红色表示错误信息。 2.命令窗口:位于结果窗口下方,相当于DOS软件中的命令行,此处用于键入需要执行的命令,回车后即开始执行,相应的结果则会在结果窗口中显示出来。

[推荐] stata基本操作汇总常用命令

[推荐] Stata基本操作汇总——常用命令 help和search都是查找帮助文件的命令,它们之间的 区别在于help用于查找精确的命令名,而search是模糊查找。 如果你知道某个命令的名字,并且想知道它的具体使用方法,只须在stata的命令行窗口中输入help空格加上这个名字。回车后结果屏幕上就会显示出这个命令的帮助文件的全部 内容。如果你想知道在stata下做某个估计或某种计算,而 不知道具体该如何实现,就需要用search命令了。使用的 方法和help类似,只须把准确的命令名改成某个关键词。回车后结果窗口会给出所有和这个关键词相关的帮助文件名 和链接列表。在列表中寻找最相关的内容,点击后在弹出的查看窗口中会给出相关的帮助文件。耐心寻找,反复实验,通常可以较快地找到你需要的内容.下面该正式处理数据了。我的处理数据经验是最好能用stata的do文件编辑器记下你做过的工作。因为很少有一项实证研究能够一次完成,所以,当你下次继续工作时。能够重复前面的工作是非常重要的。有时因为一些细小的不同,你会发现无法复制原先的结果了。这时如果有记录下以往工作的do文件将把你从地狱带到天堂。因为你不必一遍又一遍地试图重现做过的工作。在stata 窗口上部的工具栏中有个孤立的小按钮,把鼠标放上去会出

现“bring do-file editor to front”,点击它就会出现do文件编 辑器。 为了使do文件能够顺利工作,一般需要编辑do文件的“头”和“尾”。这里给出我使用的“头”和“尾”。capture clear (清空内存中的数据)capture log close (关闭所有 打开的日志文件)set more off (关闭more选项。如果打开该选项,那么结果分屏输出,即一次只输出一屏结果。你按空格键后再输出下一屏,直到全部输完。如果关闭则中间不停,一次全部输出。)set matsize 4000 (设置矩阵的最大阶数。我用的是不是太大了?)cd D: (进入数据所在的盘符和文件夹。和dos的命令行很相似。)log using (文件名).log,replace (打开日志文件,并更新。日志文件将记录下所有文件运行后给出的结果,如果你修改了文件内容,replace选项可以将其更新为最近运行的结果。)use (文件名),clear (打开数据文件。)(文件内容)log close (关闭日志文件。)exit,clear (退出并清空内存中的数据。) 实证工作中往往接触的是原始数据。这些数据没有经过整理,有一些错漏和不统一的地方。比如,对某个变量的缺失观察值,有时会用点,有时会用-9,-99等来表示。回归时如果 使用这些观察,往往得出非常错误的结果。还有,在不同的数据文件中,相同变量有时使用的变量名不同,会给合并数

5分钟速学stata面板数据回归(初学者超实用!)

5分钟速学stata面板数据回归(超实用!) 第一步:编辑数据。 面板数据的回归,比如该回归模型为:Y it=β0+β1X1it+β2X2it+β3X3it+εt,在stata中进行回归,需要先将各个变量的数据逐个编辑好,该模型中共有Y X1 X2 X3三个变量,那么先从Y的数据开始编辑,将变量Y的面板数据编辑到stata软件中,较方便的做法是,将excel的数据直接复制到stata软件的数据编辑框中,而excel中的数据需要如下图编辑: 从数据的第二行开始选中20个样本数据,如图:

直接复制粘贴至stata中的data editor中,如图: 第二步:格式调整。 首先,请将代表样本的var1Y变量数据是选20个省份5年的数据为样本,那么口令为rename var1 province 。例如:本例中的Y变量数据编辑接下来需要输入口令为reshape long var,i(province) 其中,var代表的是所有的年份(var2,var3,var4,var5,var6),转化后格式如图: 转化成功后,继续重命名,其中_j这里代表原始表中的年份,var代表该变量的名称

例如,我们编辑的是Y变量的数据,所以口令3和口令4的输入如下: 口令3:rename _j year 口令4:rename var taxi (注:taxi就是Y变量,我们用taxi表示Y) 命名完,数据编辑框如下图所示。 第三步:排序。 例如,本例中的Y变量(taxi),是20个省份和5年的面板数据, 那么口令4为sort province year (虽意思是将province按升序排列,然后再根据排好的province数列排year这一列升序排列。然很多时候在执行sort之前,数据已经符合排序要求了,但为以防万一,请务必执行此操作) 第三步:保存。

Stata操作入门(中文)

第一讲Stata操作入门 第一节概况 Stata最初由美国计算机资源中心(Computer Resource Center)研制,现在为Stata公司的产品,其最新版本为7.0版。它操作灵活、简单、易学易用,是一个非常有特色的统计分析软件,现在已越来 越受到人们的重视和欢迎,并且和SAS、SPSS一起,被称为新的三大权威统计软件。 Stata最为突出的特点是短小精悍、功能强大,其最新的7.0版整个系统只有10M左右,但已经包含了全部的统计分析、数据管理和绘图等功能,尤其是他的统计分析功能极为全面,比起1G以上大小的SAS系统也毫不逊色。另外,由于Stata在分析时是将数据全部读入内存,在计算全部完成后才 和磁盘交换数据,因此运算速度极快。 由于Stata的用户群始终定位于专业统计分析人员,因此他的操作方式也别具一格,在Windows席卷天下的时代,他一直坚持使用命令行/程序操作方式,拒不推出菜单操作系统。但是,Stata的命令语句极为简洁明快,而且在统计分析命令的设置上又非常有条理,它将相同类型的统计模型均归在同 一个命令族下,而不同命令族又可以使用相同功能的选项,这使得用户学习时极易上手。更为令人叹 服的是,Stata语句在简洁的同时又拥有着极高的灵活性,用户可以充分发挥自己的聪明才智,熟练应用各种技巧,真正做到随心所欲。 除了操作方式简洁外,Stata的用户接口在其他方面也做得非常简洁,数据格式简单,分析结果输出简洁明快,易于阅读,这一切都使得Stata成为非常适合于进行统计教学的统计软件。 Stata的另一个特点是他的许多高级统计模块均是编程人员用其宏语言写成的程序文件(ADO文件),这些文件可以自行修改、添加和下载。用户可随时到Stata网站寻找并下载最新的升级文件。 事实上,Stata的这一特点使得他始终处于统计分析方法发展的最前沿,用户几乎总是能很快找到最新统计算法的Stata程序版本,而这也使得Stata自身成了几大统计软件中升级最多、最频繁的一个。 由于以上特点,Stata已经在科研、教育领域得到了广泛应用,WHO的研究人员现在也把Stata作为主要的统计分析工作软件。 第二节Stata操作入门 一、Stata的界面 图1即为Stata 7.0启动后的界面,除了Windows版本的软件都有的菜单栏、工具栏,状态栏等外,Stata的界面主要是由四个窗口构成,分述如下: 1.结果窗口 位于界面右上部,软件运行中的所有信息,如所执行的命令、执行结果和出错信息等均在这里列出。窗口中会使用不同的颜色区分不同的文本,如白色表示命令,红色表示错误信息。

stata入门教程

Stata 快速入门 1、Stata的窗口 ?在最上方有一排菜单,即“File Edit Data Graphics Statistics User Window Help”。?左上“Review”(历史窗口):此窗口记录着自启动Stata以来执行过的命令。?右上“Variables”(变量窗口):此窗口记录着目前Stata内存中的所有变量。?正上方“Results”(结果窗口):此窗口显示执行Stata命令后的输出结果。 ?正下方“Command”(命令窗口):在此窗口输入想要执行的Stata命令。 2、将数据导入Stata ?打开Stata软件后,点击Data Editor(Edit)图标(也可以点击菜单“Window”→“Data Editor”),即可打开一个类似Excel的空白表格。 ?用Excel打开文件“nerlove.xls”,复制文件中的所有数据,并粘贴到Data Editor 中。 ?导入数据的另一方法是,点击菜单“File”→“Import”,然后导入各种格式的数据。但这种方法有时不如直接从Excel表中粘贴数据来得方便直观。 3、变量窗口 ?关闭Data Editor后,即会看到右上方的“Variables”窗口出现了5个变量:?分别为tc(total cost,总成本),q(total output, 总产量),pl(price of labor,小时工资率),pf(price of fuel,燃料价格),与pk(user cost of capital,资本的租赁价格。 4、存为dta数据文件 ?此时,可以点击Save图标(也可以点击菜单“File”→“Save”),将数据存为Stata格式的文件(扩展名为dta),比如nerlove.dta。 ?以后就可以用Stata直接打开这个数据集了(不需要再从Excel表中粘贴过来)。 5、打开dta数据文件 打开的方式有三种: 1.点击Open图标(也可以点击菜单“File”→“Open”),然后寻找要打开的dta 文件的位置。 2.直接双击想要打开的dta文件 3.在命令窗口输入以下命令(假设文件在E盘的根目录)并回车(按Enter键)

1 STATA入门

1 STATA入门 Stata统计软件包是目前世界上最著名的统计软件之一,与SAS、SPSS一起被并称为三大权威软件。它广泛的应用于经济、教育、人口、政治学、社会学、医学、药学、工矿、农林等学科领域,同时具有数据管理软件、统计分析软件、绘图软件、矩阵计算软件和程序语言的特点,几乎可以完成全部复杂的统计分析工作。其功能非常强大且操作简单、使用灵活、运行速度极快,在许多方面别具一格,而且操作灵活简单,易学易用。 Stata的命令语句极为简洁明快,而且在统计分析命令的设置上又非常有条理,它将相同类型的统计模型均归在同一个命令族下,而不同命令族又可以使用相同功能的选项,这使得用户学习时极易上手。Stata语句在简洁的同时又拥有着极高的灵活性,用户可以充分发挥自己的聪明才智,熟练应用各种技巧,真正做到随心所欲。尽管它也提供了窗口菜单式的操作方式,但强烈建议大家坚持使用命令行/程序操作方式,很快你就会体会到使用程序和命令方式所带来的那种随心所欲自由地处理和分析数据的快感。 Stata的另一个特点是他的许多高级统计模块均是编程人员用其宏语言写成的程序文件(ADO文件),这些文件可以自行修改、添加和下载。用户可随时到Stata网站寻找并下载最新的升级文件。这一特点使得STATA始终处于统计分析方法发展的最前沿,用户几乎总是能很快找到最新统计算法的Stata程序版本,而这也使得Stata自身成了几大统计软件中升级最多、最频繁的一个。 STATA由美国计算机资源中心(Computer Resource Center)研制,现为STATA 公司的产品。从1985至2007的二十多年时间里,已连续推出1.1,1.2,…,7.0,8.0,9.0,10.0等多个版本。我们将要学习的是9.0版本。 1安装 (1) https://www.wendangku.net/doc/d64995066.html,/bbs/dispbbs.asp?boardID=67&ID=97705&page=2 上有stata9.rar下载,但是做正式的论文或工作还是应该尽量用正版软件。 (2)将其解压到D:/stata9。 (3)点击setup安装>>改变安装路径到D:/stata8>>选择Stata/SE版本。 1.2启用和退出 (1) 程序→Stata,即可进入Stata,启动后出现文件对话框,要求输入注册单位和密码等。

Stata软件学习者应该收藏的学习资源

此软文主要面向讲师和做科研的人员,建议发布在此类人员关注的互动性强的网站 Stata软件学习者应该收藏的学习资源 ---- Stata牛人的学习笔记分享(转帖) 前言: 小弟小本,非统计专业科班出身,参加工作才知道原来学的统计知识不够用,在头儿的刺激下开始学习统计软件,计量知识薄弱,为了理清一堆模型,在各经济论坛潜水多时,水平没见涨,倒是收集了不少学习资料。这里转一篇Stata牛人前辈的笔记分享,供广大奋战在软件学习道路上的同学参考+瞻仰 正文如下: 我经常会被问到“Stata好学吗”、“我多长时间能学会Stata”,诸如此类的问题。诚然,相比于SPSS和Eviews等软件,Stata的门槛的确要高一些。然而,问题的关键并不在于Stata本身有多么难学,而在于你在统计和计量方面花费了多少时间,这与学习Stata所需的时间显著负相关。因此,我的回答往往会是:“哦,这个不好说,如果……,其实很简单……”。 相比于十年前,现在学习Stata的资料已经非常丰富了。虽说殊途同归,但不同的学习路径却存在着巨大的效率差异。对于初学者而言,我的建议是,首要的问题是知道“Stata能做什么”,继而才是“Stata如何做什么”。 第一个问题之所以重要,是因为从本质上讲,Stata只是我们完成统计分析的工具而已,因此,其基本平台是否宽广、是否有扩展潜力,以及它提供的分析工具是否能满足你的专业需求,都是你在选择Stata之前需要深入了解的。Stata User’s Guide(400页,中文)对这些问题做出了很好的解答,是一幅绝佳的导航图,能帮助你在短时间内了解Stata的基本架构、语法特征和核心功能。对于第二个问题,则有众多的资料可供参考: (1)网络资源 我精选了一些链接。值得一提的有如下几个: ●Stata官方网站。Stata公司提供的Web resources,涵盖了大量相关网络资源; 其FAQ则提供了各种常见问题的解答;Statalist则是一个类似于人大经济论坛 的免费的讨论区。加入Statalist的方法很简单,你只需要发送邮件至 majordomo@https://www.wendangku.net/doc/d64995066.html,,邮件内容无需任何称谓,只需写上“subscribe Statalist”的字样即可。接到确认信息后,你便成为一名Statalist的成员了。当 然,即使不加入,你仍然可以浏览,但不能提问。 ●UCLA(加州大学洛杉矶分校)提供的网络教程。该网站提供的Data Management、

计量经济学stata操作指南

计量经济学stata操作(实验课) 第一章stata基本知识 1、stata窗口介绍 2、基本操作 (1)窗口锁定:Edit-preferences-general preferences-windowing-lock splitter (2)数据导入 (3)打开文件:use E:\example.dta,clear (4)日期数据导入: gen newvar=date(varname, “ymd”) format newvar %td 年度数据 gen newvar=monthly(varname, “ym”) format newvar %tm 月度数据 gen newvar=quarterly(varname, “yq”) format newvar %tq 季度数据 (5)变量标签 Label variable tc ` “total output” ’ (6)审视数据 describe list x1 x2 list x1 x2 in 1/5 list x1 x2 if q>=1000 drop if q>=1000 keep if q>=1000 (6)考察变量的统计特征 summarize x1 su x1 if q>=10000 su q,detail su tabulate x1 correlate x1 x2 x3 x4 x5 x6 (7)画图 histogram x1, width(1000) frequency kdensity x1 scatter x1 x2 twoway (scatter x1 x2) (lfit x1 x2) twoway (scatter x1 x2) (qfit x1 x2) (8)生成新变量 gen lnx1=log(x1) gen q2=q^2 gen lnx1lnx2=lnx1*lnx2 gen larg=(x1>=10000) rename larg large

STATA高级视频教程简介(连玉君)

STATA高级视频教程简介 培训目的: STATA高级视频教程的目的是使学员熟练使用STATA进行实证分析工作,主要包括: (1) 掌握多种常用的估计方法(如普通最小二乘法、广义最小二乘法、非线性最小二乘法、最大似然估计、IV估计和GMM); (2) 学会估计和分析时间序列和面板数据常用模型(如单位根检验、协整分析、VAR、固定效应模型、随机效应模型、动态面板模型、面板单位根检验和面板协整分析等等); (3) 学会编写一个完整的STATA程序; (4) 学会应用STATA进行抽样和模拟分析,包括Bootstrap和Monte Carlo 模拟分析。 课程简介:(详见课程目录) STATA高级视频教程共9讲,共48个视频文件,总计50余个学时。 第1-5讲介绍计量经济学中最为常用的五种估计方法,包括:普通最小二乘法(OLS)、广义最小二乘法(GLS)、非线性最小二乘法(NLS)、最大似然法(MLE)和广义矩估计法(GMM)。 第6讲介绍时间序列模型,包括:ARIMA模型、VAR模型、单位根检验、协整分析、误差修正模型、GARCH模型。这些模型基本上涵盖了宏观时间序列、金融时间序列分析中的常用工具。 第7讲介绍面板数据模型,包括:固定效应模型、随机效应模型、异方差和序列相关、动态面板模型、面板随机系数模型、面板随机前沿模型、面板单位根检验、面板协整分析等。这些模型由浅入深,基本上涵盖了目前文献中使用的多数面板分析方法。 第8讲介绍STATA编程技巧,包括:输入项、输出项的设定,子程序、可分组执行、可重复执行等程序高级功能,以及帮助文件的编写方法。通过本讲的学习,学员将能够独立编写复杂的STATA程序,这些程序和STATA官方提供的程序完全一致。 第9讲介绍自抽样和模拟分析,包括:Bootstrap(自抽样)、组合检验(Permutation tests)、刀切法(Jackknife)和蒙特卡洛模拟。不同于传统的假设检验和统计推断方法,这些方法都是以计算机模拟和抽样为基础的,在最近十年

STATA 学习入门必须知道的基础

STATA 学习入门必须知道的基础 作者:量化研究方法 关于Stata软件,我经常会被人问道:“你知道它到底能做什么吗?”那我们今天就带大家来了解一下这门当下热门的软件究竟的用处是什么? 1 关于Stata,这些基础的东西你必须要知道 很长一段时间里,我一直把“Stata”读为“Stay-ta”。有一次和一个从日本回来的朋友聊天,她把Stata读为“Star-ta”,让我甚感不适。经查阅,方才发现,原来“Stata”并非数个单词的缩写(因此其正确拼写为Stata 而非STATA),而是由“statistics”和“data”合成的一个新词。从这个小小的趣闻中,可以看出Stata 在问世之初(1985年)的主要功能在于统计分析和数据处理。经历了三十余年的发展,Stata 已经升级到第15版,在不断强化上述功能的同时,Stata在矩阵运算、绘图、编程等方面的功能也在不断加强。Stata 擅长数据处理、面板数据分析、时间序列分析、生存分析,以及调查数据分析,但其它方面的功能也并不逊色。(表1) 2为何选择Stata? 这是个不太容易回答的问题。Stata网站列举了数条可能的原因。Edwards(2005)曾经非常细致地对比了Stata,SPSS和SAS的优劣。Princeton大学的Torres-Reyna博士则将四种常用软件的特征总结为表2。整体而言,Stata具有较强的优势。 3 弱水三千,我为何钟情于Stata? 就我个人的经历而言,如下几个原因使我自2003年以来一直钟情于Stata。 Stata的数据处理功能很强大。由于将数据导入内存后进行运算,其速度非常快。在多个数据文件的合并和追加,以及文字资料、时序资料,以及调查资料的处理方面,Stata 总能以极为简洁的命令完成分析。

STATA初级视频教程说明书(连玉君)

STATA初级视频教程(2010版) 使用说明 连玉君 (中山大学 岭南学院 金融系) arlionn@https://www.wendangku.net/doc/d64995066.html, 目 录 1 课程简介 (1) 2 课程特色 (2) 3 课程配套资料 (2) 4 配套资料的使用方法 (2) 5 讨论和建议 (4) 6 讲师介绍 (4) 7 报名咨询 (4) 8 培训优惠 (4) 附录A:STATA初级视频目录(时间节点) (5) 第一讲STATA简介 (5) 第二讲数据处理 (9) 第三讲Stata绘图 (13) 第四讲矩阵操作 (16) 第五讲STATA 编程初步 (18) 附录B:STATA高级视频教程简介 (20)

工欲善其事,必先利其器。无论是经济学、管理学还是社会科学的其他学科,定量分析都变得越来越重要了。作为一个较为年轻的计量软件,STATA自1985年问世以来,以其在数据处理、绘图、回归分析等方面的出色表现,赢得了越来越多的青睐。然而,相比于SPSS、Eviews等以菜单操作为主的软件,以命令操作见长的STATA软件门槛相对较高。由于进入国内的时间较短,相关的参考资料甚为有限,而STATA公司提供的近10000页的全英文使用手册更是令多数初学者望而生畏。这也成为阻碍多数国内同仁学习这款功能强大的计量软件的主要障碍。 鉴于上述情况,我们分别于2007年11月和2008年10月推出了“STATA初级视频教程”和“STATA高级视频教程”,内容涉及STATA的基本操作、数据处理、绘图、编程、常用计量模型的估计,以及Bootstrap和Monte Carlo模拟等内容。视频教学的直观性,加之课程的实用性导向,使这两套教程获得了广泛的好评。承蒙广大STATA视频教程学员的积极参与和反馈,我在过去两年多的时间里收集到了100余条修改建议,历经半年多的制作,最终得以为大家呈现这套新版STATA初级视频——“STATA初级视频教程(2010版)”。 STATA初级视频教程(2010版)共5讲,包含36个视频文件,总计40余个学时。内容涉及:STATA入门、数据处理、绘图、矩阵以及编程。内容简介如下(详细目录见附录A):第1讲从整体上介绍了STATA的基本架构,以便使学员在最短的时间内掌握STATA的精髓,包括:数据的导入导出、执行命令、修改和查验资料、log文件和do文件的使用、STATA与Word(Excel、LaTeX)等软件的完美结合,以及STATA常用设定等内容。 第2讲通过大量的实例,介绍了各种数据处理技巧,是本课程最为核心、最有特色的内容。主要包括:复杂变量的创建;分位数;重复样本值、缺漏值和离群值的处理;资料的合并、追加和重新组合;文字变量、类别变量的处理;时间序列和面板资料的处理;以及数据的查验和对比等内容。这些内容的学习将大幅提高学员的数据处理能力。 第3讲介绍STATA绘图。为了达到举一反三的效果,我先从整体上介绍STATA绘图的基本知识,进而将绘图命令拆解成8类选项和5类元素,并最终通过40余个实例全面介绍了15类常用图形的绘制方法。 第4讲介绍STATA矩阵操作,包括矩阵的定义和管理、矩阵运算、矩阵解析等四个主题,为后续学习STATA编程知识奠定了扎实的基础。 第5讲介绍STATA编程的基本知识,包括:程序的定义和调用、单值、暂时性物件(暂元、暂时性变量、暂时性文件、暂时性矩阵等)、循环语句、条件语句,以及STATA返回值的引用等内容。通过本讲的学习,学员将能够通过独立编写STATA程序来提高数据处理和模型估计的效率,为后续学习STATA高级程序奠定基础。 以上各讲都以专题的形式进行讲解,其中不但包含了STATA官方提供的常用命令,还包含了大量外部命令(520多个),为学员提供了强有力的实证分析工具。

5分钟搞定Stata面板数据分析

【原创】5分钟搞定Stata面板数据分析简易教程ver2.0作者:张达 5分钟搞定Stata面板数据分析 简易教程 步骤一:导入数据 原始表如下, 数据请以时间(1998,1999,2000,2001??)为横轴,样本名(北京,天津,河北??)为纵轴 将中文地名替换为数字。

注意:表中不能有中文字符,否则会出现错误。面板数据中不能有空值。 去除年份的一行,将其余部分复制到stata的data editor中,或保存为csv格式。

打开stata,调用数据。 方法一:直接复制到data editor中。 方法二:使用口令:insheet using 文件路径 调用例如:insheet using C:\STUDY\paper\taxi.csv 其中csv格式可用excel的“另存为”导出 如图:

步骤二:调整格式 首先请将代表样本的var1重命名 口令:rename var1 样本名 例如:rename var1 province 也可直接在var1处双击,在弹出的窗口中修改:

接下来将数据转化为面板数据的格式 口令:reshape long var, i(样本名) 例如:reshape long var, i(province) 其中var代表的是所有的年份(var2,var3,var4??) 转化后的格式如图: 转化成功后继续重命名,其中_j 这里代表原始表中的年份,var代表该变量的名称口令例如: rename _j year rename var taxi 也可直接在需要修改的名称处双击,在弹出的窗口中修改 如图:

stata 使用入门

Stata入门介绍 Stata入门介绍 转载,原作者不详。 (1) Stata要在使用中熟练的,大家应该多加练习。 (2) Stata的很多细节,这里不会涉及,只是选取相对重要的部分加以解释,大家在使用Stata 过程中留心积累。作为入门性质的介绍,本文只选取和中级计量经济学作业相关的内容和一些处理数据所使用的基本命令。对于更高深的内容,请大家参看STATA manual.” 界面 当我们把stata装好以后,首先需要了解的是它的界面。打开Stata后我们便可以看到它常用的四个窗口:Stata Results; Review; Variables; Stata Command。我们所有的运行结果都会在Stata Results界面中显示;而命令的输入则在Stata Command窗口;Review窗口记录我们使用过的命令;最后Variables窗口显示存在于当前数据库中的所有变量的名称。可以直接点击 Review窗口来重新输入已使用过的命令,我们所需变量可以通过点击Varaibles窗口来得到,这些都可以简便我们的操作。 Stata 命令 Stata软件功能强大,体现在它提供了丰富的命令,可以实现许多功能。每一个stata命令都相应的命令格式。我们在这里介绍常用的一些命令的功能和相应的格式,大家在使用stata的过程中会不断积累命令的相关知识。 需要对命令的帮助时可以用help命令查询。例如了解命令:“reg” ,就可以在Stata Command 窗口输入“help reg” ,也可以在Help选项下content中查找我们需要的相关命令。用help 查询,则窗口会显示关于该命令的详尽说明。更直接的办法是看Examples中的范例是如何使用该命令,阅读一些相关的说明并加以模仿。 重要习惯 我们使用stata进行回归分析时,需要养成一些好的习惯。在进行一些数据量很大,过程复杂的分析时尤其重要。 (1)使用日志(log)。它可以帮助我们记录stata的运行结果。 格式:log using c:\stata8\logfiles\10.21.5_30.log (注意:我们需要先建好文件夹c:\stata8\logfiles) 关闭log的命令为“log close”。 格式: log close 那么“10.21.5_30.log”文件就记录了从“log using”命令到“log close”命令之间stata

STATA面板数据模型操作命令

S T A T A 面板数据模型估计命令一览表 一、静态面板数据的STATA 处理命令 εαβit ++=x y it i it 固定效应模型 εαμit +=it it 随机效应模型 (一)数据处理 输入数据 ●tsset code year 该命令是将数据定义为“面板”形式 ●xtdes 该命令是了解面板数据结构 ●summarize sq cpi unem g se5 ln 各变量的描述性统计(统计分析) ●gen lag_y=L.y /////// 产生一个滞后一期的新变量 gen F_y=F.y /////// 产生一个超前项的新变量 gen D_y=D.y /////// 产生一个一阶差分的新变量 gen D2_y=D2.y /////// 产生一个二阶差分的新变量 (二)模型的筛选和检验 ●1、检验个体效应(混合效应还是固定效应)(原假设:使用OLS 混合模型) ●xtreg sq cpi unem g se5 ln,fe 对于固定效应模型而言,回归结果中最后一行汇报的F 统计量便在于检验所有的个体效应整体上显着。在我们这个例子中发现F 统计量的概率为0.0000,检验结果表明固定效应模型优于混合OLS 模型。 ●2、检验时间效应(混合效应还是随机效应)(检验方法:LM 统计量) (原假设:使用OLS 混合模型) ●qui xtreg sq cpi unem g se5 ln,re (加上“qui ”之后第一幅图将不会呈现) xttest0 可以看出,LM 检验得到的P 值为0.0000,表明随机效应非常显着。可见,随机效应

模型也优于混合OLS模型。 ●3、检验固定效应模型or随机效应模型(检验方法:Hausman检验) 原假设:使用随机效应模型(个体效应与解释变量无关) 通过上面分析,可以发现当模型加入了个体效应的时候,将显着优于截距项为常数假设条件下的混合OLS模型。但是无法明确区分FE or RE的优劣,这需要进行接下来的检验,如下: Step1:估计固定效应模型,存储估计结果 Step2:估计随机效应模型,存储估计结果 Step3:进行Hausman检验 ●qui xtreg sq cpi unem g se5 ln,fe est store fe qui xtreg sq cpi unem g se5 ln,re est store re hausman fe (或者更优的是hausman fe,sigmamore/ sigmaless) 可以看出,hausman检验的P值为0.0000,拒绝了原假设,认为随机效应模型的基本假设得不到满足。此时,需要采用工具变量法和是使用固定效应模型。 (三)静态面板数据模型估计 ●1、固定效应模型估计 ●xtreg sq cpi unem g se5 ln,fe (如下图所示) 其中选项fe表明我们采用的是固定效应模型,表头部分的前两行呈现了模型的估计方法、界面变量的名称(id)、以及估计中使用的样本数目和个体的数目。第3行到第5行列示了模型的拟合优度、分为组内、组间和样本总体三个层面,通常情况下,关注的是组内(within),第6行和第7行分别列示了针对模型中所有非常数变量执行联合检验得到的F统计量和相应的P值,可以看出,参数整体上相当显着。 需要注意的是,表中最后一行列示了检验固定效应是否显着的F统计量和相应的P值。显然,本例中固定效应非常显着。 ●2、随机效应模型估计

stata命令大全(全)

*********面板数据计量分析与软件实现********* 说明:以下do文件相当一部分内容来自于中山大学连玉君STATA教程,感谢他的贡献。本人做了一定的修改与筛选。 *----------面板数据模型 * 1.静态面板模型:FE 和RE * 2.模型选择:FE vs POLS, RE vs POLS, FE vs RE (pols混合最小二乘估计) * 3.异方差、序列相关和截面相关检验 * 4.动态面板模型(DID-GMM,SYS-GMM) * 5.面板随机前沿模型 * 6.面板协整分析(FMOLS,DOLS) *** 说明:1-5均用STATA软件实现, 6用GAUSS软件实现。 * 生产效率分析(尤其指TFP):数据包络分析(DEA)与随机前沿分析(SFA) *** 说明:DEA由DEAP2.1软件实现,SFA由Frontier4.1实现,尤其后者,侧重于比较C-D与Translog 生产函数,一步法与两步法的区别。常应用于地区经济差异、FDI溢出效应(Spillovers Effect)、工业行业效率状况等。 * 空间计量分析:SLM模型与SEM模型 *说明:STATA与Matlab结合使用。常应用于空间溢出效应(R&D)、财政分权、地方政府公共行为等。 * --------------------------------- * --------一、常用的数据处理与作图----------- * --------------------------------- * 指定面板格式 xtset id year (id为截面名称,year为时间名称) xtdes /*数据特征*/ xtsum logy h /*数据统计特征*/ sum logy h /*数据统计特征*/ *添加标签或更改变量名 label var h "人力资本" rename h hum *排序 sort id year /*是以STATA面板数据格式出现*/ sort year id /*是以DEA格式出现*/ *删除个别年份或省份 drop if year<1992 drop if id==2 /*注意用==*/

相关文档