Tuesday, February 21, 2012

On Target: Predicting Pregnancy



Call me biased, but I think creative uses for predictive analytics are pretty cool.  Target’s “pregnancy-prediction model”, explained Thursday in a The New York Times Magazine article, is a great example.  It should inspire all of us to take a fresh look at our data and consider what more we can accomplish with a powerful predictive analysis tool (like RI Analytics) and a little bit of creative thinking.



Target’s journey to predicting pregnancy started with an idea conceived by its marketing department. The department had previously conducted surveys which indicated that once a consumer’s shopping habits are ingrained, it can be hard to change them – except during certain brief periods of a person’s life, like after a marriage or the birth of a child, where shopping patterns and brand loyalties often change.  The birth of a child represents a new grocery and household goods list for new parents, as well as the opportunity for Target to sell things like cribs, rugs, furniture, car seats, and other items that a person or couple would not usually buy. Because birth records are public information it was already common practice for companies to send promotional items to new parents; so, to stay one step ahead of competitors, marketers at Target wanted to see if there was a way to predict pregnancy during the second trimester.

Target reviewed the shopping habits of women who had a baby-shower registry as they approached their due dates. Eventually they were able to identify about 25 different products that were indicators of pregnancy, including items like unscented lotion, vitamin supplements, hand sanitizers and washcloths. By treating the purchase of each item as a variable, they were able to create a model that assigned each shopper a pregnancy prediction score based on their purchases. This score was then used to send out relevant coupons and advertisements tailored to each woman at a specific point in her pregnancy – before other retailers even knew she was pregnant. Needless to say, sales in Target’s Mom and Baby department skyrocketed.

This is one way that a creative use of data, combined with some predictive analytics, yields some pretty cool results. Target had the data they needed all along –they just needed the right person to ask the right question. 

-Caitlin Garrett, Statistical Analyst at Rapid Insight

Friday, February 10, 2012

Creating Variables: Distance From Campus


Hi folks. This is the first entry in a new series I'll call "Creating Variables". This series will explain the creation and use of helpful predictive variables that might not be present in your existing datasets.

Today we’ll talk about how to create a “distance from” variable.  A "distance from" variable can also be applied to things like retail sales, fundraising or donor models, or even hospital admissions. This variable is particularly useful for predicting enrollment at admission, which is the example we'll use. Because we don’t have a lot of information about each candidate at admission, we have to use each piece of information we’re given to the best of our ability. In this case, we use the zip code of each applicant and the zip code of our institution to determine each applicant’s distance from campus. Distance from campus is often very predictive of an applicant’s likelihood to enroll at a particular institution – usually, the closer an applicant lives to the institution, the more likely they are to enroll there.  Let’s get started.


 To begin, we’ll need to hook the applicant data into a transform node:




Opening the transform node, we’ll need to select “Distance Between” from the formula drop-down menu:








In the “Enter a Formula” window, you’ll want to enter:


Where “A” is the variable in your dataset that represents each applicant’s zip code, and ‘03818’ is replaced by your institution’s zip code. Be sure to set the result type to “Integer” and name your new variable “Distance from Campus” before saving. If you preview your data, you'll see that each student now has a value in the "Distance from Campus" column, which will be located all the way on the right as you scroll through your admission variables. 

Tada! At this point, you’re ready to output your dataset, augmented with a shiny new variable, and one step closer to predicting enrollment! 

-Caitlin Garrett, Statistical Analyst at Rapid Insight