Tuesday, August 31, 2010

DROP DOWN menus in Excel

A really swell thing that I have been doing is making my own encoding file in excel.

To avoid Major Major data cleaning after encoding (say, recoding
NCR and Metro Manila, as NCR), I simply use a drop-down menu via excel.

Suppose, I have two variables, Region and Province, making a drop-down menu for Region is easy.

However, for province, can the menu “change” depending on the answer on
Region.  Say, if the Region is A, provinces in menu will be C, D, E. 
and if Region is B, provinces will be X, Y, Z?  Yes!



Click here for a video demo on drop down menus.

Monday, August 30, 2010

Respondent-Driven Sampling (RDS)

How do you do analysis or quantitative studies on hidden populations, or those populations where sampling frame cannot be constructed?  Say Intravenous Drug Users (IDUs), or Men Having Sex with Men (Yes Martha, in most countries, MSMs are still hidden!).  


You can do RESPONDENT-DRIVEN SAMPLING (RDS).


and then....with the data from an RDS study, how do you then analyze the data?  What are the complications of using data from RDS studies?  


Click here to learn more!

COLLAPSE to avoid fatigue

A few months ago, an officemate ran the following regressions:

Y1 = a + bx1 + cx2 + dx3 + …….. + nxn
Y2 = a + bx1 + cx2 + dx3 + …….. + nxn
.....
Y5 = a + bx1 + cx2 + dx3 + …….. + nxn

There were around 5 dependent variables and 7 explanatory variables (which include age and income).  The data set also covered observations coming from more than 10 cities.  Within a city, there are two types of observation, say registered and free-lance (o dead give away na kung ano ang topic J ).

Then I was asked to do the following (and to be submitted within 15 hours!)

1.        Mean values of Y1, Y2, and Y3 by explanatory variables, meaning…..
-Say one of the explanatory variables is sex.  Means of Y1, Y2, Y3 for male and for female.   I remember the regression models had categorical variables as well, so Means of Y1, Y2, Y3 for ALL the categories!
2.    Oh, and note that since there are continuous explanatory variables, age and income….
The means has to be done for each age and income quintile groups, meaning…..
Means of Y1, Y2,Y3 for quintile1 of age, quintile2, quintile3….and the same style for income and ALL THE OTHER EXPLANATORY VARIABLES 
3.      Oh, and the quintiles of the continous explanatory variables SHOULD BE REFLECTIVE of location distributions and not the entire data set.  Meaning, the quintiles should be generated PER location. 

4.    And one last note, the table should be PER LOCATION and PER TYPE OF OBSERVATION (registered versus free lance).

Ang DAMING gusto!  Inday, tapos na ang trilogy ng Lord of the Rings, and isama mo pa ang special features,malamang hindi pa ako tapos.  And kinabukasan siya kailangan!  

I can do the basic table commands, tabstatat, tab, sum. Pero naman, sa conditionalities pa lang (setting the “if”), prone na to mistakes……So...

I performed a COLLAPSE command!

Click here for a sample.  Enjoy.






 

Sunday, August 29, 2010

Play that Funky Music!

One of my friends, Aiken mentioned, “Jay, ikaw ang nakita kong UNANG nagkaroon ng ipod sa UPSE…”

Really, I still contest this claim... Pero, I will not deny that I have one of the most extensive (as in MAJOR) music collection at UPSE, if not the entire university (hehehehehe, I buy my music from itunes, thank you!).  Dati, I have time to listen to music, depende sa mood.  Now, I listen to music, depende sa work :)

 Some examples.

1. Reviewing a questionnaire, drawing (Yes, I draw first) the structure of the data set. 
     Recommendation:  Philip Glass, Songs  and Poems for Solo Cello
 2. Generating, renaming
     Recommendation:  Newton Faulkner, Hands Built By Robot 
3.  More generating, more renaming J
     Recommendation:  Incubus, Light Grenade
4.  Egen, reshaping 



     Recommendation:  ANY Kronos Quartet CD + Recommendation 1
 5.  Regressions, logits, mfx compute

   RecommendationS:  (ang tagal nito) Joni Mitchell, Bob Dylan, Alexi Murdoch, and 

 6.  A certain data set called Female Sex Workers and MSM (isearch ninyo na lang what is this)
    Recommendations: Barang!

Starting Here, Starting Now: Some basic stuff + Gen(e)gen

My first encounter with stata ten years ago (guys , I am carbon dating myself here), was a very colorful one, mostly in RED. So irritating.


I really do not know how to manipulate data set, the right syntax, that stata is case-sensitive, and most importantly, that I should ALWAYS log (though I still usually do not keep log files). My first impression is that stata is so tiring to use. 


All that was going for me is my knowledge of excel.  I picture in my mind the variables in my mind using excel.  I visualize the structure of the data set in “excel terms”.  At one point, I constructed a data set via excel and then ran the regression commands using stata (a No! No!).

Then, seven years ago, in my first random experiment project, I instinctively picked up a few things.  For instance, in the question…

  “What if I am adding two variables, and say some of the observation has no values in one of the mentioned variables, should I….
 1.  Replace the missing values with zero, and then simple 
gen variablename=variable1+variable2
….. or I do the following:
2.  gen variablename=variable1+variable2
 replace variablename=variable1 if variable2==.
replace variablename=variable2 if variable1==.     “ 

My guiding rule then is NEVER, ever change the base variable.  Always generate a new one. So I opted for option 2 above……

Now I know better (I think), I use egen! 


Don’t want to explain, would rather show.  (Hopefully, I would always be inspired to do sample files and do-files for you guys!  Now I am inspired since I just saw a nice play).

So here is the link.  Read the do-file first. Change the location in the do-file (research on your own why this has to be done J ).  And of course,  you know the drill.

PS:  Hopefully, fewer REDs this time....





Come to my window...

So how do we start...Well first, an introduction...

Probably, just like some of you, I have experienced toiling till the wee hours of the morning on manipulating data files while drinking endless cups of coffee.  Doing it while listening to some favorite music? Instead of coffee, drinking alcohol while working (this can be a very bad idea).  Or your style might be working, while downloading "free" music and movies (I do not officially admit this).

Anyhow, at one point, WE TOILED!  Getting to know the data, watching out for skips, complaining about too many errors..We toiled, using Excel, Stata, SAS.  It gets to be too frustrating sometimes, but admit it or not....

YOU GET A KICK in watcing your stata do-files run smoothly, like the "perfect" stream of characters in the MATRIX movie. 

YOU CURSE whenever a "bump" is experienced along a do-file.  But once the bump is fixed, you are A-OK again....ONLY TO REALIZE AFTER THAT YOU DID NOT SAVE THE DO-FILE OR MADE A LOG-FILE.

If you have experienced or are currently experiencing either of the two things above, good for you!  But i think this blog is NOT for you since you should be busy looking at your stata or excel files right now :) .  If not, you are probably not yet working hard enough! 

This blog, STATA TOILER, is for people like me...those who like to:

1.  Look at data sets.
2.  Arrange data sets and complain.
3. Create do-files.
4.  Discover new commands which hopefully will make things faster.
5.  and has to hone their analytical and data manipulation skills to be continuously EMPLOYED!

But seriously, this blog is really FOR ME!.

1.  To gather thoughts from other stata toilers on how best to approach STATA problems and Excel programs;
2.  To release tension when I need a break from all these do-files;
3.  basically, to keep track of my files :)


This BLOG will include things related with data manipulation, either directly or indirectly.  I would probably post things DIRECTLY related with stata, stuff like do-files, excel tips, commands, etc.  AND things not really directed with data manipulations like...

1.  How productivity in working increases after BUYING something impractical :)
2. What music works best when toiling away till the morning, or the next night.
3.  The simple joys of having coffee while working
4.  The importance of a PRIVATE SPACE

At times, this blog can be motherly :), encouraging, but there will be BITCHY turns ahead.  

So.  THERE!