Rapid Insight: Data Analytics: March 2013

Wednesday, March 20, 2013

Customer Webinar: Predictive Modeling for SEM

Our next customer webinar, "Strategic Enrollment Management: St. Michael's College and Predictive Analytics" will be given by Bill Anderson, CIO of Saint Michael's College today at 2pm EDT and will be re-broadcasted on Tuesday, March 26th, and Thursday, May 2nd.

I got the chance to ask him a couple of questions about his session, which will describe the ways in which Veera and Analytics are utilized on campus to produce predictions and other analyses for the scoring team.

What types of models have you been building?
Almost entirely enrollment management - mostly apply to enroll. We've been building them on and off for about five years now. I have someone on campus that I collaborate with and when we first started, she was using SPSS for the statistical analysis, but we've since abandoned that.

How has model building changed your Enrollment and/or Financial Aid practices?
There have been a number of ways that we've used the models - one as a sort of verification of what our consultant has been doing, two to be able to do some sensitivity and what-if analysis (and suggest different practices or emphases on where the aid awards should go), and three to help confirm in-semester and in-process prediction on where the class is going to end up.

In some occasions, this has impacted size of waiting list or the way we thought about awarding wait list spots, including the total number of admits. This last year, our model suggested that we could be more selective than we had been in the past.

What do you hope attendees will learn from your presentation?
One thing is that you can do it on your own - it's not that hard. You have to have a background that supports responsible interpretation of the results, but you can sit down and do it. That's one element: just do it. I think there's another element that says once you start thinking this way, it can become infectious. In our enrollment management meetings, we have the opportunity to appeal to the data or look at a Veera job that identifies the applicants we could avoid accepting. This changes the internal conversation - from a culture of anecdote, you can change the conversation with data. The use of the products has been fabulous in terms of making the data accessible to people.

Tuesday, March 12, 2013

Rapid Insight's 5th Annual User Conference

Let the countdown to the 5th annual Rapid Insight User Conference begin! Here’s what you need to know about this fun and informative event:

We are making one big change this year: we’ve outgrown our space here in NH and are hosting the conference on the campus of Yale University in New Haven, Connecticut. It will kick off at 9am on Thursday, June 27^th and wrap up by 4pm on Friday, June 28^th. The cost of the conference is $150 per attendee. In addition to the presentations and hands-on labs, we’ll be providing a continental breakfasts and an evening reception to all registrants.

For User Conference lodging, we recommend the Omni New Haven Hotel at Yale. We have arranged a special rate of $169/night + tax available through 5/26. You’ll find the dedicated Conference link to guarantee this rate, along with additional travel information, on the official User Conference webpage.

Be sure to check the Conference webpage frequently for updates on specific sessions and activities as the date draws near. We look forward to seeing you there!

Tuesday, March 5, 2013

Six Predictive Modeling Mistakes

As we mentioned in our post on Data Preparation Mistakes, we've built many predictive models in the Rapid Insight office. During the predictive modeling process, there are many places where it's easy to make mistakes. Luckily, we've compiled a few here so you can learn from our mistakes and avoid them in your own analyses:

Failing to consider enough variables

When deciding which variables to audition for a model, you want to include anything you have on-hand that you think could possibly be predictive. Weeding out the extra variables is something that your modeling program will do, so don’t be afraid to throw the kitchen sink at it for your first pass.

Not hand-crafting some additional variables

Any guide-list of variables should be used as just that – a guide – enriched by other variables that may be unique to your institution. If there are few unique variables to be had, consider creating some to augment your dataset. Try adding new fields like “distance from institution” or creating riffs and derivations of variables you already have.

Selecting the wrong Y-variable

When building your dataset for a logistic regression model, you’ll want to select the response with the smaller number of data points as your y-variable. A great example of this from the higher ed world would come from building a retention model. In most cases, you’ll actually want to model attrition, identifying those students who are likely to leave (hopefully the smaller group!) rather than those who are likely to stay.

Not enough Y-variable responses

Along with making sure that your model population is large enough (1,000 records minimum) and spans enough time (3 years is good), you’ll want to make sure that there are enough Y-variable responses to model. Generally, you’ll want to shoot for at least 100 instances of the response you’d like to model.

Building a model on the wrong population

To borrow an example from the world of fundraising, a model built to predict future giving will look a lot different for someone with a giving history than someone who has never given before. Consider which population you’d eventually like to use the model to score and build the model tailored to that population, or consider building two models, one for each sub-group.

Judging the quality of a model using one measure

It’s difficult to capture the quality of a model in a single number, which is why modeling outputs provide so many model fit measures. Beyond the numbers, graphic outputs like decile analysis and lift analysis can provide visual insight into how well the model is fitting your data and what the gains from using a model are likely to be.

If you’re not sure which model measures to focus on, ask around. If you know someone building models similar to yours, see which ones they rely on and what ranges they shoot for. The take-home point is that with all of the information available on a model output, you’ll want to consider multiple gauges before deciding whether your model is worth moving forward with.

-Caitlin Garrett, Statistical Analyst at Rapid Insight
Photo Credit: http://www.flickr.com/photos/mattimattila/

Have you made any of the above mistakes? Tell us about it (and how you found it!) in the comments.