Stata Toiler: September 2010

Friday, September 10, 2010

Are you the/a “marginal” ________ ? (PART 1)

In microeconomic theory, the concept of the marginal is unavoidable. In fact, it is the cornerstone of “modern” microeconomic theory (well, it depends on how you cast your individuals and players…).

In the simplest term, for me anyway, the “marginal” is the one that tilts the status quo. It is a dividing line, it makes you pursue a specific task, it tells you if something is a “go” or not, it tells you to STOP!

Are you the “marginal” player? The “marginal” consultant….. the one that makes the difference. The relevant one. The cat in Mrs. Lovett's pie?

In running regressions, we are oftentimes interested in the marginal effects of specific explanatory variables. The explanatory variable can be anything, from a continuous variable (say income) to a switch variable.

What is the effect of a 100 unit of increase in income in Y?

What is the effect of introducing a “pill” or a policy in utilization?

In the standard linear regression model, the computed betas are usually the “marginal” effects (ill show in another blog entry examples which show otherwise). Suppose you are running a logit or probit model, you are oftentimes not only interested in the direction, but in the degree and magnitude as well (effect of introducing a policy in the probability of pursuing a certain action).

In stata 10, some of the commands are

mfx, compute -- >for logit models

mfx, predict (pu0) --> for fixed or random effects logit models (xtlogit)

(there are variations of mfx depending on the model you are running, say an ologit or mlogit).

These two commands would give you the effect on the probability by the explanatory variables. HOWEVER, note that stata would compute the probabilities at the MEAN values of your explanatory variables. Say x1 is a dummy variable for “male” and 20% of your regression sample are males, the mfx will be computed at x1=0.2 (you have to make basic manual computations to get the predicted probability).

If you are lazy and do not want to bother with computations to get the predicted probability, you can use…

mfx, predict (p) at(male=0) à say you want to find out the change in the probability if the respondent is male.

A complication exists if one of the dependent variables is an interaction of two other variables. Or you are dealing with squared explanatory variables. Obviously, the “mfx, compute” command WILL NOT give you the marginal effects (say of age if there is age^2)….for the simple reason that stata WILL NOT BE ABLE to recognize a variable called age_square as a transformation of another variable. Unless explicitly specified, stata will simply treat age_square as an additional variable. (MORE ON THIS,,,,,LATER)

PS: for those using STATA 11, there is now a faster command, margins. Click here to learn more.

Thursday, September 2, 2010

OUT OF (re)SHAPE: long to wide

I am NOT a fan of the reshape command, if there are other ways, i really avoid it like a plague.

Say you have a data set with members as unit of observation. Each observation has a tag, identifying its household and its “count” in the household.

Say, the problem is transforming the data set into one where the unit of observation is the household.

If I am interested only with a few household characteristics, I WOULD RATHER NOT USE the reshape command.

I would rather use the egen command and then using the following technique:

sort hhid:

by hhid: egen aveincome=mean(income)

gen count=sum(1)

drop if count~=1

keep hhid aveincome

TAPOS! WALA NG KUSKUS BALUNGOS!

Problem with reshape (long to wide pa lang ito ha), you would have a lot of income variables depending on the family with the most number of members. And you will be forced to include in the reshape command variables you are actually not interested with (since these vars may not be constant within the ids).

But sige na nga, here is a sample of the darn RESHAPE command.

Wednesday, September 1, 2010

Let Me Count the Ways, 1 2 3

Somebody asked me this morning if it is possible to generate a variable in an EXISTING stata data set containing counts, from 1 to n.

Yes, its possible, the command is:

gen varname=sum(1)

you can also have a series of variables containing, 2, 4, 6, n+2…The command is

gen varname=sum(2)

suppose, you do not want to start with 1, then

gen varname=sum(1) + 1

Suppose you have a roster data set with family members as observation and with specific family ids tagging the members of a family, and say you want to order the members, with count=1 for the youngest…Try running the commands

sort family_id age

by family_id: gen varname=sum(1)

Try running the gen ____=sum(1) command in a blank data set and see what you will get. the answer isnt surprising :)

YOUR FRIENDLY AVON GIRL: A RDS TECHNIQUE

A few months ago, I handled data sets on intravenous drug users and men-having-sex with men (MSMs).

The sampling design used for data collection was RDS, or respondent-driven sampling. Basically, for each sites, some “seeds” or respondents were recruited to participate in the survey. Then, these primary seeds were asked to recruit additional respondents. And so on…(think of the method used by AVON to sell cosmetics)…

The IDs of the respondents reflect their “position” in the entire recruitment process.

For example, given the following IDs

Id=1 > person is a first seed

Id=2 > person is a first seed

Id=12 > person is a seed recruited by 1

id=21 > person is a seed recruited by 2

Id=123 > person is a seed recruited by person 12 (who was recruited by id 1)

I needed to perform an xtreg with grouping based on the seeds. I then, have to cut the IDs, say on the third , second, or fourth level. And then do labeling based on the new ids generated. To perform this, I used the substr command.

Refer to statadaily.wordpress.com for the command. Click here. Thanks Mitch!!!

Stata Toiler