Friday, February 10, 2012

Creating Variables: Distance From Campus

Hi folks. This is the first entry in a new series I'll call "Creating Variables". This series will explain the creation and use of helpful predictive variables that might not be present in your existing datasets.

Today we’ll talk about how to create a “distance from” variable.  A "distance from" variable can also be applied to things like retail sales, fundraising or donor models, or even hospital admissions. This variable is particularly useful for predicting enrollment at admission, which is the example we'll use. Because we don’t have a lot of information about each candidate at admission, we have to use each piece of information we’re given to the best of our ability. In this case, we use the zip code of each applicant and the zip code of our institution to determine each applicant’s distance from campus. Distance from campus is often very predictive of an applicant’s likelihood to enroll at a particular institution – usually, the closer an applicant lives to the institution, the more likely they are to enroll there.  Let’s get started.

 To begin, we’ll need to hook the applicant data into a transform node:

Opening the transform node, we’ll need to select “Distance Between” from the formula drop-down menu:

In the “Enter a Formula” window, you’ll want to enter:

Where “A” is the variable in your dataset that represents each applicant’s zip code, and ‘03818’ is replaced by your institution’s zip code. Be sure to set the result type to “Integer” and name your new variable “Distance from Campus” before saving. If you preview your data, you'll see that each student now has a value in the "Distance from Campus" column, which will be located all the way on the right as you scroll through your admission variables. 

Tada! At this point, you’re ready to output your dataset, augmented with a shiny new variable, and one step closer to predicting enrollment! 

-Caitlin Garrett, Statistical Analyst at Rapid Insight

No comments:

Post a Comment