Where does predictive
modeling fit into the analytic ecosystem in higher education?
Within the analytic ecosystem in higher ed, there is a range
of ways in which data is analyzed and looked at. On one side, you have
historical reporting, which our clients do a lot of and is vital to every
institution. Somewhere in the middle is data exploration and analysis,
where you’re slicing and dicing data to understand it better or make more
informed decisions based on what happened in the past. On the other side
of the spectrum is predictive modeling. Modeling requires taking a look
at all of the variables in a given set of information to make informed
predictions about what will happen in the future. What is each applicant’s
probability of enrolling or what is each student’s attrition likelihood?
What will the incoming class look like based on the current admit pool?
These are the types of questions that are being answered in higher ed with
predictive analytics. The resulting probabilities can also be used in the
aggregate. For example, enrollment models allow you to predict overall
enrollment, enrollment by gender, by program, or by any other factor. The models are also used to project financial
outlay based on the financial aid promised to admitted applicants and their
individual enrollment probabilities.
Higher education has come a long way in the last five to ten
years in its use of predictive analytics. The entire student life cycle is now
being modeled starting with prospect and inquiry modeling all the way through
to alumni donor modeling. It used to be that any institutions that were
doing this kind of modeling were relying on outside consulting companies.
Today most are doing their modeling in-house. Colleges and universities
view their data as a strategic asset and they are extracting value from their
data with the same tools and methodologies as the Fortune 500 companies.
What kinds of
resources are needed and what is the first step for an institution who wants to
become more data-driven in their decision making?
It’s important to have somebody who knows the data. As long
as a user has an understanding of their data, our software makes it very easy
to analyze data and build predictive models very quickly. And our support team is
available to answer any analytic questions.
Gaining access to their data is the first step. We see a lot
of institutions that have some reporting tools which don’t allow them to ask
new questions of the data. So, they might have a set of 50 reports that they’re
able to run over and over but anytime someone has a new question, without
access to the raw data there’s no way to answer the question.
It really helps if the institution is committed to a culture
of data driven decision making. Then all
the various stakeholders are more focused on ensuring data access for those
doing the predictive modeling.
What do you say to
those who are on “the quest for perfect data”? Is it okay to implement predictive analytics
before you have that data warehouse or those perfectly cleansed datasets?
No institution is ever going to have perfect data, so you work
with what you have. We suggest seeing what you have, finding any obvious
problems in the data, and then fixing those problems the best you can. We’ve
designed our solutions such that a data warehouse is not required but, even
with a clean data warehouse, the data is never going to be perfect. As long as you as you have an understanding
of the data, you can move forward.
In your experience,
which models in higher education produce the highest ROI?
We have a customer, Paul Smith’s College that has quantified
their retention modeling efforts. Using their model results, they put programs
into place to help those students that were predicted to be high-risk of
attrition. They credit the modeling with helping them identify which students
to focus on, saving them $3m in net tuition revenue so far.
We have other clients that are using predictive modeling on
the prospect side and they’re realizing significant savings on their recruiting
efforts. So instead of mailing to 200,000 high school seniors, they’re mailing
to 50,000, and realizing significant savings by not mailing and not calling
those students who have pretty much zero probability of applying or enrolling.
Although not as easily quantifiable, enrollment modeling has
a pretty big ROI. Not only on
determining which applicants are likely to enroll, but in predicting class
size. If an institution overshoots and
enrolls too many applicants, they’ll have dorm, classroom, and other resource
issues. If enroll too little, they’ll
have revenue issues. So predicting class
size and determining who and how many applicants to admit is extremely
important.
What are some common
mistakes you see when approaching predictive modeling for your higher ed
customers?
One mistake that I often see is when information is thrown
out as not useful to the models. Zip code is a good example. Zip
code looks like a five digit numeric variable, but you wouldn’t want to use it
as a numeric variable in a model. In some cases it can be used
categorically to help identify applicants’ origins, but its most useful purpose
is to for calculating a distance from campus variable. This is a variable
that we see showing up as a predictor in many prospect/ inquiry models,
enrollment models, alumni models, and even retention models. Another
example of a variable that is often overlooked is application date.
Application date often contains a ton of useful information if looked at
correctly. It can be used to calculate the number of days between when the
application was sent and the application deadline. This piece of
information can tell you a lot about an applicant’s intentions. A student
who gets their application in the day before the deadline probably has very
different intentions than a student who applies nine months before the
deadline. This variable ends up participating in many models.
To get our customers up to speed on best practices in
predictive modeling we’ve created resources like lists of recommended variables
for specific models and guides on how to create useful new variables from existing
data.
No comments:
Post a Comment