Hi folks. This is the first
entry in a new series I'll call "Creating Variables". This series
will explain the creation and use of helpful predictive variables that might
not be present in your existing datasets.
Today we’ll talk about how to
create a “distance from” variable. A "distance from" variable can also be applied to things like retail sales, fundraising or donor models, or even hospital admissions. This variable is particularly useful for predicting enrollment at admission, which is the example we'll use. Because we don’t have a lot of
information about each candidate at admission, we have to use each piece of
information we’re given to the best of our ability. In this case, we use the
zip code of each applicant and the zip code of our institution to determine
each applicant’s distance from campus. Distance from campus is often very
predictive of an applicant’s likelihood to enroll at a particular institution –
usually, the closer an applicant lives to the institution, the more likely they
are to enroll there. Let’s get started.
To begin, we’ll need to hook the applicant data into a transform node:
Tada! At this point, you’re
ready to output your dataset, augmented with a shiny new variable, and one step
closer to predicting enrollment!
-Caitlin Garrett, Statistical Analyst at Rapid Insight
-Caitlin Garrett, Statistical Analyst at Rapid Insight
No comments:
Post a Comment