文档库 最新最全的文档下载
当前位置:文档库 › secondpaper

secondpaper

IV. Panel Data Approach

4.1. Theoretical Framework

4.1.1. Research Aim and Approaches

Equality of opportunity for health is defined and advocated as the right conceptualization of equity in the allocation of health-care resource. The most accepted egalitarian view in health care is that health-care delivery be horizontally equitable: two individuals who suffer from the same disease characteristics should receive the same treatment (e.g. Wagstaff and Van Doorslaer, 2001). For social medical care system such as Medicaid and Medicare, this horizontal equality refers to providing the same treatment for those medical needy people. The concept of vertical equity (the equal opportunity for the acquisition of good health) is equally important.

This vertical equity refer to the prevention part of social medical care system: through the Medicaid or Medicare system, individuals should get the equal opportunity to prevent from getting ill.

In a society, there are some circumstances for which society should not hold individual responsible. The circumstances include for instance: gender, race, parental socioeconomic status, and the level of formal education by parents. Each individual is identified by society with a profile of circumstances. In a society implementing an equal-opportunity policy, the objective is of the society is to equalize the earning power of individuals. In such a society, children who can not afford medical care for the only reason that he is born in a disadvantage family (a reason for which the children should not be held responsible) should be compensated. The aggregation of

this compensation can be seen as a social loss associated with the inequality of earning powers of each individual. This social loss could be alleviated by introducing an efficient social medical care system.

This paper is motivated as an evaluation of the current Medicaid program: whether Medicaid program has encouraged socially disadvantaged children to use the preventive and treatment medical services more often so that the these children will have a better health condition comparing with those who don’t enrolled Medicaid, whether the Medicaid enrolled children with an improved health status because of Medicaid will have less school missing days than their counterparts. In addition to Medicaid, I also provide the results for children who are enrolled in private insurance. The major aim of my research is to identify the correlation between Medicaid eligibility and students’ school missing days.

In my paper, I apply the path Model approach: in my first step, I analyze the correlation between Medicaid/private insurance and doctor visits. The question I am asking in this step is whether Medicaid or private insurance increase the probability of children’ going to a doctor or increase the number of doctor visits; in my second step, I analyze the correlation between doctor visits and the health outcome of the children. The question I am asking in this step is whether doctor visit improves the health outcome of children or not. In my third step, I analyze the correlation between health outcome of children and their school missing days. The question I am asking in this step is whether the better health condition of the children will lead to less school missing days. The structure of my analysis can be represented by the following graph:

As shown on the graph, attendance is specified to be affected by socio-economic variables (current income, adults’ education), students’ characteristics (age, gender) and the health outcome of the children. Doctor visits are affected by socio-economic variables (current income, adults’ education), students’ characteristics (age, gender) and the distance from the students’family to the doctors’ office. Children’s health status is associated by socio-economic variables (current income, adults’ education), students’ characte ristics (age, gender) and the body mass index (BMI) of the children. Body mass index (BMI) is a measure of body fat based on height and weight. The MEPS data gives the BMI for both children and adults.

4.1.2. The Hypothesis and Possible Results of the Paper

Hypothesis 1 can be illustrated mathematically by the following equation:

0/ it it MD DV ?? (1) 0/ it it MD DV ?? (2)

it MD ---Medicaid eligibility

PV--- Private Insurance Coverage

it DV --- Continuous variable measuring the number of doctor’s visits

If a child is eligible for Medicaid, the out-of-pocket cost of medical service is lower. Therefore, the child has more access to health care services and will tend to use medical services more often. This idea can be intuitively illustrated through the idea of income and substitution effect: we assume that the Medical services are normal goods. Medicaid eligibility lowers the price of Medical services; the income effect and substitution will work in the same direction to increase the purchase of Medical services. Children will be more likely to go to a doctor when they get ill. As a result, the sign of equation (1) are supposed to be positive. The estimation equation (7) is applied to test the sign of equation (1).

As is illustrated in the literature review, most re cent studies of children’s utilization of Medicaid use data taken from one or more of the following categories: children’s hospitalization, treatment intensity and mortality rate. In this part of my dissertation, I use the panel dataset---the Medical Expenditure Panel Survey (MEPS) from year 2000 to year 2005. I choose CHCNOYR (total number of doctor office visits, past 12 month) to measure the utilization of medical services.

I choose this variable because of two reasons: firstly, by using the variable, I get the largest sample children possible. By examining the child data file of MEPS (Medical Expenditure Panel Survey), I find that there are several variables which

could be used to identify children’s utilization of health service. These varia bles include: CHCHNOY2 (total number of home visits in past 12 months), CHCHMOYR (number of homecare in the past 12 months) and several other specific variables measuring children visits to foot doctor, eye doctor, dentist, etc. By examining the data, I found that for each year, only a few children have used the health care services referred to by these variables. In comparison, more than half of the sample children have one or more doctor office visits. By choosing CHCNOYR, I obtain the largest number of sample children which is around 20,000 children. Second, Doctor Visits are the most common form of medical services which are covered by Medicaid. Since Medicaid covers the expanses of doctor’s visits, it reduces the out -of-pocket expenses for doctor visits. Therefore, my first hypothesis is that there will be a positive correlation between Medicaid eligibility and doctor visits.

Hypothesis 2 can be illustrated mathematically by the following equation:

0/ it it MD DV ?? (1) 0/ it it MD DV ?? (2)

it MD ---Medicaid eligibility

it DV --- Continuous variable measur ing the number of doctor’s visits

Health---The Health Condition of the Children

As illustrated by hypothesis 1, if a child with Medicaid uses the medical services more often, it is assumed that he or she will have more doctor visits. The second hypothesis assumes that there is a positive correlation between doctor visits and children ’s health condition. More doctor visits will improve the health outcome of children. However, there are several things to be noted. First, the health measurement in MEPS is a categorical measurement. The health condition of children is classified into five categories: excellent, very good, good, fair and poor. As illustrated in my

empirical part, I combined excellent and very good to create a new variable vexcel and I combined fair and good to a new variable fgood. I test the correlation between doctor visits and health condition of the children by use these two categorical variables and through different kind of estimation strategies which will be found in detail in my estimation part.

is eligible for Medicaid, the out-of-pocket cost of medical service is lower. Therefore, the child has more access to health care services and will tend to use medical services more often. This idea can be intuitively illustrated through the idea of income and substitution effect: we assume that the Medical services are normal goods. Medicaid eligibility lowers the price of Medical services; the income effect and substitution will work in the same direction to increase the purchase of Medical services. Children will be more likely to go to a doctor when they get ill. As a result, the sign of equation (1) are supposed to be positive. The estimation equation (7) is applied to test the sign of equation (1).

As is illustrated in the literature review, most re cent studies of children’s utilization of Medicaid use data taken from one or more of the following categories: children’s hospitalization, treatment intensity and mortality rate. In this part of my dissertation, I use the panel dataset---the Medical Expenditure Panel Survey (MEPS) from year 2000 to year 2005. I choose CHCNOYR (total number of doctor office visits, past 12 month) to measure the utilization of medical services.

I choose this variable because of two reasons: firstly, by using the variable, I get the largest sample children possible. By examining the child data file of MEPS (Medical Expenditure Panel Survey), I find that there are several variables which could be used to identify children’s utilization of health service. These variable s include: CHCHNOY2 (total number of home visits in past 12 months), CHCHMOYR (number of homecare in the past 12 months) and several other specific variables measuring children visits to foot doctor, eye doctor, dentist, etc. By examining the data, I found that for each year, only a few children have used the health care services referred to by these variables. In comparison, more than half of the sample children have one or more doctor office visits. By choosing CHCNOYR, I obtain the largest

number of sample children which is around 20,000 children. Second, Doctor Visits are the most common form of medical services which are covered by Medicaid. Since Medicaid covers the expanses of doctor’s visits, it reduces the out-of-pocket expenses for doctor visits. Therefore, my first hypothesis is that there will be a positive correlation between Medicaid eligibility and doctor visits.

4.2. Empirical Approach

4.2.1. Estimation Procedure

4.2.2. Fixed Effect Approach

For a typical panel data set, each observational unit, or entity, is observed at two or more time periods. In my dataset, there are (?) individuals and each individual is observed at least two times in two continuous years. (some (?) how many of the individuals are observed three times). For a balanced panel dataset, the variables are observed for each individual and for each time period. However, in my dataset, after I deleting all the missing variables, my data set becomes an unbalanced panel which means that some of the individuals have missing information for one time period.

4.2.3.1. Entity Fixed Effect and Before and After Comparison

If I use the OLS regression method from the cross section data analysis and if I include the children’s age, gender, the family income, father and mothers’ education (?) as control variables (regression 1) then I got a positive correlation between Medicaid and school missing days. However, this positive correlation may not be established since this regression could have substantial omitted variable bias: it is possible that there are some omitted variables affect the days of school missed which I have not included in my estimating equation. Put another way, causal inference fails because changing the dependent variable makes a systematic change in the error term. One source of such dependence is omitted variables-problem. Many factors affect the school missing days, such as whether the school districts are good, whether the regulation for missing schools are strict in the area or whether it is more acceptable to miss school in a given family. Any of these factors may be correlated with school missing days; and if they are, they will lead to omitted variable bias. The ideal approach is to add all these omitted variables to the model; however, it is impossible to do so in practice since firstly we are not sure about what exactly these omitted variables are, secondly, due to the limitation of our dataset, we will not be able to do so. However, if these omitted factors remain constant over time, then we could apply panel data approach which is a method for controlling certain types of omitted variables without actually observing them. These factors could be held constant, even though we can not measure them.

Entity fixed effects regression is the main tool for regression analysis of panel data. It extends the multiple regressions that exploit panel data to control for omitted variables that differ across entities but are constant overtime. By studying the changes in the dependent variable over time, it is possible to eliminate the effect of omitted variables that differ across entities but are constant over time. For example, it is unlikely that the quality of school districts and the regulation of each school will change over the two years’ period, however, the quality of school district and the regulations will vary much for different individuals. As a result, including individual fixed effects in the regression allows us to avoid omitted variables bias arising from omitted variables. I firstly apply the entity fixed effects regression in my paper to control for this kind of omitted variables.

The entity fixed effects regression can be expressed by the following two equations:

it i it it W X Y μβββ+++=210 (1)

it i n i i it it Dn D D X Y μγγγββ++++++=...323210 (2)

From equation (1) we can see that the slope coefficient of the population regression line 1β is the same for all individuals, but the intercept of the population regression line varies from one individual to the next. The source of the variation in the intercept is the variable i W , which varies from one individual to the next but is constant over time. We can also write the fixed effect in the form of equation (2) in

which we add dummies for each individual entity. In both of the formula, the slope coefficient on it X is the same from one individual to the next.

In my dataset, I have two observations for most of the individuals (2=t ). In this case, the entity fixed effec t regression can be estimated by a “before and after” regression of the change in Y from the first period to the second on the change in X . Based on equation (1), we can use the following two equations to demonstrate the regression for each of the two periods:

121101i y e a r i i y e a r i y e a r W X I N G S C H O O L M I S S μβββ+++= (3)

222102i y e a r i i y e a r i y e a r W X I N G S C H O O L M I S S μβββ+++= (4)

Subtracting equation (3) from equation (4) eliminates the effect of i W :

2

121121)(iyear iyear iyear iyear iyear iyear X X ING SCHOOLMISS ING SCHOOLMISS μμβ-+-=- (5) By focusing on changes in the dependent variable, this “before and after” comparison eliminates the unobserved factors that differ from one individual to another individual but do not change over the two year time period within individuals. For a two period dataset, the binary variable specification in equation (2) and equation

(5) are equivalent, they all produce identical OLS estimates. By eliminating this source of omitted variable bias of i W and focusing on analyzing changes in Y and X has the effect of controlling for variables that are constant over time. The insight

for this entity fixed effect analysis is that if the unobserved variable does not change over time, then any changes in the dependent variable must be due to influences other than these fixed characteristics.

4.2.3.2. Time Fixed Effects and Regression Equations for Time Effects

By examining changes in school missing days over time, the entity fixed regression controls for fixed factors such as the quality of the school districts. However, there are many other factors that influence school missing days, and if they change over time and are correlated with the Medicaid eligibility, then their omission will produce omitted variable bias. In order to control for the omitted variable bias, the second step is to use the time fixed effects which control for variables that are constant across entities but evolve over time. For example, the national Medicaid laws evolve over time but apply to every state and every individual; we can eliminate the influence of such variables by including time fixed effects.

it t i it it Z W X Y μββββ++++=3210 (6)

Equation (6) represents adding the time fixed effect to the model. Because 3β represents variables that determine it Y , if t Z is correlated with it Y , then omitting t Z from the regression leads to omitted variable bias. Our objective is to estimate 3β, controlling for t Z . . This empirical model also accounts for potentially important time varying influences not captured by the student fixed effects. For instance, by adding

time effects to the model, I can control for variables that are constant across entities but change over the two-year time period such as the national Medicaid laws. In my dataset, I have five years data and I control for time fixed effect by year.

Include both Entity Fixed Effect and Time Fixed Effect in the Model

If some omitted variables are constant over time but vary across states (such as the quality of school districts), while others are constant across states but vary over time (such as national Medicaid laws), then it is appropriate to include both state and time effects. This estimation is actually done by including 1-n individual binary variables and 1-t time binary variables in the regression, along with the intercept. The combined time and entity fixed effects regression model can be expressed by the following equation:

it t t t i n i i it it BT B Dn D D X Y μδδγγγββ+++++++++=...2 (3223210)

(7) By adding both the individual fixed effects and time fixed effects, the estimated relationship between school missing days and Medicaid enrollment is immune to omitted variable bias from variables that are constant either over time or across states.

The strength of this analysis is that including state and time fixed effects mitigates the threat of omitted variable bias arising from unobserved variables that either do not change over time (like people’s attitudes towards m issing school ) or do not vary across states (like national Medicaid law). Since my data is only for two

years, it is expected that there will be limited time effects. However, despite these virtues, entity and time fixed effects regression can’t control f or the omitted variables that vary both across entities and over time.

In conclusion, the panel data set permitted us to include school fixed effects, which, in addition to the instrumental variables strategy, attacks the problem of omitted variable bias at the school level. The TSLS estimates suggested that the effect of doctor visits on children’s health status is small; most of the estimates are statistically insignificantly different from zero. (is it OK that my results from the fixed effect model are not significant?)

4.2.2. Instrumental Variable Approach

I firstly run a simple OLS regression to analyze the relationship between Medicaid and the number of doctor visits .We know that if the explanatory variable X and error term μare correlated, then the OLS estimator is inconsistent, that is, the OLS estimator will not be close to the true value of the regression coefficient even if we have a large sample. There are several sources for the correlation between X andμwhich include omitted variables, errors in the variables and simultaneous causality. In my analysis, causality runs both from the number of doctor visits (X) to childre n’s health status (Y) and children’s health status (Y) to the number of doctor visits (X). On one hand, doctor visits are supposed to improve the

health outcome of children, on the other hand, better health status in turn will lead to less doctor visits. Because of this simultaneous causality an OLS regression of the number of doctor visits on the health status of the students will estimate some complicated combination of these two effects. In another word, in this case, regressor X is correlated with the error termμ.This problem can’t be solved by finding better control variables. This simultaneous causality bias, however, can be eliminated by finding a suitable instrumental variable and using TSLS. The TSLS estimator has two stages: first, the included endogenous variable (the number of doctor visits) are regressed against the included exogenous variables and the instrument; second, the dependent variable is regressed against the included exogenous variables and the predicted values of the included endogenous variables from the first-stage regression.

Our first goal is to decide on how many instrumental variables to choose. The coefficients are overidentified if there are more instruments than endogenous

regressors ()k

m ; they are underidentified if ()k

m ; and they are exactly

identified if ()k

m=. Estimation of the IV regression model requires exact

identification or overidentification. In this case, doctor visits are determined within the model and it is correlated with the population error termμ, which indicates that we need to choose at least one instrumental variable. However, do we need to choose more than one instrumental variable? Theory proves that we are not better off by using more than one instrument. Actually, we will be better off to discard the weakest instruments and use the most relevant subset.

Where to find a good instrument? There are two requirements: first, when deciding on the instruments, we need to make sure that the instruments we choose are strong enough. Instruments that explain little of the variation in X are called weak instruments. If the instruments are weak, then the normal distribution provides a poor approximation to the sampling distribution of the TSLS estimation, even when the sample size is large. In this case, there is no theoretical justification for the usual methods for performing statistical inference. In short, if the instruments are weak, TSLS is no longer reliable and the TSLS estimator could be badly biased. However, how could we measure the strength of the instrument variable? As suggested by econometric theories, we do an F-statistic. The first stage F-statistic provides a measure of the information content contained in the instruments: the more information content, the larger is the expected value of the F-statistic. If the F-statistic exceeds 10, then we don’t need to worry about weak instruments. Second, we need to make sure that the instruments are exogenous. If the instrument is not exogenous, then the TSLS estimator is inconsistent. The over-identifying restriction test (the J-statistic) can be used to test for the exogeneity of the instruments.

In my case, I use the similar instrument as McCllen, McNeil, and Newhouse (1994). They use patients’ relative distance to hospital as an instrument in their research about the effect of aggressive treatments for victims of heart attacks (technically, acute myocardial infarctions, or AMI). If the relative distance has a strong affect on the probability of receiving this treatment, then it is a strong instrument. If it is distributed randomly across all the sample AMI victims, then it is

exogenous. In their paper, they do not report first-stage F-statistics. However, their statistical analysis suggests that the probability of patients’ receiv ing treatment is highly correlated with the distance of the patients’ home to hospital. As for exogeneity, they make two arguments. First, they draw on their medical expertise and knowledge of the health care system to argue that distance to a hospital is plausibly uncorrelated with any of the unobservable variable that determine AMI outcomes. Second, they use data on some of the additional variables that affect AMI outcomes, such as the weight of the patient to prove that distance is uncorrelated with these observable determinants of the patients’ survival rate; this, makes it more credible that distance is uncorrelated with the unobservable determinants in the error terms so that it is exogenous.

In my paper, I use “how long it takes for patients to go to visit the doctor” as instrument in my regression. To prove the validity of my instrument, I need to prove that this instrument is strongly correlated with doctor visits, but it is uncorrelated with the error term in the school missing days (it must be exogenous).In other word, the instrumental variable should affect doctor visits but has no direct effects on the school missing days. In the first stage estimation, I test for the strength of the instrument by using the F-statistic. I regress the number of doctor visits on how long it takes for the patients to get to the hospital and I got an F-statistic of (?) which is much larger than ten—the bench mark. Therefore, the distance to the doctors’ office is highly correlated with the number of doctors’ visit proving that this instrument is strong. One estimator that is less sensitive to weak instruments than TSLS is limited information maximum likelihood (LIML). I also tried the LIML in my paper and compared it with the TSLS

results.

On the next stage, since the instrument is a categorical variable, I thus have several instruments; I am able to test the overidentifying restrictions and I fail to reject them using the J-test, which bolsters the case that my instrument is valid.

This instrumental variable method improves upon the OLS method. The OLS analysis uses the actual number of doctor visits as the regressor, but because actual number of doctor visits is itself the outcome of a decision by patient, so actual number of doctor visits is correlated with the error term. Instead, TSLS uses predicted days of doctor visits where the variation in predicted doctor visits arises because of variation in the instrumental variable: patients closer to a doctors’ office are more likely to receive the treatment.(can type in the results of the regression and point out that after using the instrument, the sign of the equation changes).

The change of the results of the regression has two implications. First, the IV regression actually estimates the effect of doctor visits on health not on a “typical”randomly selected patient, but rather on patients for who distance to the doctors’ office is an important consideration in the treatment decision. The effect on those patients who take into consideration of the distance to doctors’ office might differ from the effect on a typical patient. Second, it suggests a general strategy for finding instruments in this type of setting: finding an instrument that affects the probability of receiving doctor visits but does so for reasons that are unrelated to the health outcome except through their effect on the likelihood of going to the doctor visits.

4.2.4. Data Description

We draw data from MEPS (Medical Expenditure Panel Survey), a large national sample. MEPS contain data on social and demographic characteristics of families. MEPS are a panel dataset; it is conducted five rounds over a two and half year period. The data from the year 2000-2005 is used for this paper. MEPS asks the following question, "During the past 12 months, about how many days did [the child] miss school because of illness or injury?"

After restricting the sample to children between the ages of 5 to 19, the number of observations comes to 7688. By deleting the observations with missing information, there are 6417 observations left. Among these observations, 28% of the children have single mothers and 7% of the children have single fathers. Among the 6417 children, 1026 are enrolled in Medicaid while 763 have no insurance at all. The Medicaid coverage in this sample is around 16%. 4042 are covered by private insurance so the private insurance coverage is about 63%.

Table 1 shows the summary statistics for children in the sample who report days of school missed due to illness or injury. The average days of school missed due to illness or injury for children is 3.37. The standard error for days of school missed is high (5.50). If we draw the hectographs, we can see that most of the children have around 3 days of school missed. In order to perform further analysis, a dummy variable is created for the days of school missed due to illness or injury. If school missing days are equal or greater than 3, then “SCHDAY3”equals 1, otherwise, “SCHDAY3” equals 0.

52% of the sample is male, 36% of them live in the southern region, 25% live in the west, and 22% live in the Midwest, another 17% live in the northeast. 78% of the sample children are white 17% of the sample children are black. About 26% of the sample is of Hispanic origin. 22% of the children’s mothers have college or above college education and 29% of the mothers have some technical or vocational training. 20% of the children’s fathers have college or above college education and 19% of the fathers have some technical or vocational training. Only 4% of the families have an

相关文档