This next post comes from Jeff Fleischer, our Director of Client Operations, support wiz, and analyst extraordinaire:
Working out the logic of a new variable you want to create with a TRANSFORM node can be challenging. But when missing data ("nulls") get into the mix, it can be especially confusing and frustrating. For example, if you'd written the conditional formula...
IF ([A]='Freshman', 'UG', 'Grad')
...and some of the fields under column [A] were null, you would get nulls as an output for those rows rather than the desired 'Grad'. This is because trying to equate something with "nothing" confuses Veera as to what you would really want as a result. So here are some suggestions on how best to deal with those gaps and still get to the outcome you need...
1.
The most obvious way to deal with gaps in data is to replace them with something. This may not always be desirable, but when it is, using a CLEANSE ahead of your TRANSFORM is your best bet. Select the "Is Missing" operator and use Alt-Left Mouse to select all the columns that need their data fields filled in with that new value, like 'unknown'.
Of course, you could instead place a CLEANSE after your TRANSFORM, using it to fill in any missing values appearing in the new column.
2.
If filling in those data holes using a cleanse is not preferable, maybe just a temporary patch will do. Look for the "Treat Missings in Formula as Zeros" checkbox just above the "New Variable Name" field in the TRANSFORM. Just as the name suggests, this will temporarily replace any missing data with a zero, allowing most operations to function. Be careful, though, if the column you're evaluating already contains zeros - the output may not be what you intended!
3.
If even temporarily replacing nulls with something else isn't an option, then change your TRANSFORM formula to deal with them ahead of everything else. To do this, you'll likely need to use one of two built-in Veera functions - IS NULL or IS NOT NULL. We might change our example to include another condition, such as...
IF ([A] IS NULL, 'Withdrawn',
IF ([A]='Freshman', 'UG', 'Grad'))
The idea here is to catch any nulls before they affect the rest of the logic by putting that condition first.
4.
Finally, another (if more specialized) option might be to use the "Missings:" TRANSFORM feature. Unlike the "Treat Missings in Formula as Zeros" checkbox, this control changes nulls that appear as the final result of a formula. The replacement options offered by this feature are limited (0 or 1), but it may be an easy way to fix a problem with absent data appearing in a new numeric field.
-Jeff Fleischer
Tuesday, January 22, 2013
Wednesday, January 16, 2013
How to Interpret a Decile Analysis
After building a predictive model, there are several ways to
determine how well the model is describing your data. One visual way to get an
idea of how well a model is fitting your data is by taking a look at the decile
analysis. Here we’ll take a look at what the decile analysis represents, how it’s
created, and how to spot a good model.
What a Decile
Analysis Represents
After building a statistical model, a decile analysis is
created to test the model’s ability to predict the intended outcome. Each
column in the decile analysis chart represents a collection of records that
have been scored using the model. The height of each column represents the
average of those records’ actual behavior.
How the Decile
Analysis is Calculated
1. The hold-out or validation sample is scored according to the
model being tested.
2. The records are sorted by their predicted scores in descending
order and divided into ten equal-sized bins or deciles. The top decile contains
the 10% of the population most likely to respond and the bottom decile contains
the 10% of the population least likely to respond, based on the model scores.
3. The deciles and their actual response rates are graphed on
the x and y axes, respectively.
After the decile analysis is built, you’ll want to take a
look at the height of the bars in relation to one another. Deciding whether a
model is worth moving forward with depends on the pattern you see when viewing
the decile analysis.
Ideal Situation: The
Staircase Effect
When you’re looking at a decile analysis, you want to see a
staircase effect; that is, you’ll want the bars to descend in order from left
to right, as shown below.
This is telling you that the model is “binning” your
constituents correctly from most likely to respond to least likely to respond. A
model exhibiting a good staircase decile analysis is one you can consider
moving forward with.
Not-So-Ideal
Situations
In contrast, if the
bars seem to be out of order (as shown below), the decile analysis is telling
you that the model is not doing a very good job of predicting actual responses.
If the bars seem to
be the same height, or the decile analysis looks “flat”, the decile analysis is
telling you that the model isn’t performing any better than randomly binning
people into deciles would. In both cases, your model should be improved before
moving forward with it.
-Caitlin Garrett, Statistical Analyst at Rapid Insight
-Caitlin Garrett, Statistical Analyst at Rapid Insight
Thursday, January 10, 2013
Valuing Analytics & Predictive Modeling in Higher Ed

Where does predictive
modeling fit into the analytic ecosystem in higher education?
Within the analytic ecosystem in higher ed, there is a range
of ways in which data is analyzed and looked at. On one side, you have
historical reporting, which our clients do a lot of and is vital to every
institution. Somewhere in the middle is data exploration and analysis,
where you’re slicing and dicing data to understand it better or make more
informed decisions based on what happened in the past. On the other side
of the spectrum is predictive modeling. Modeling requires taking a look
at all of the variables in a given set of information to make informed
predictions about what will happen in the future. What is each applicant’s
probability of enrolling or what is each student’s attrition likelihood?
What will the incoming class look like based on the current admit pool?
These are the types of questions that are being answered in higher ed with
predictive analytics. The resulting probabilities can also be used in the
aggregate. For example, enrollment models allow you to predict overall
enrollment, enrollment by gender, by program, or by any other factor. The models are also used to project financial
outlay based on the financial aid promised to admitted applicants and their
individual enrollment probabilities.
Higher education has come a long way in the last five to ten
years in its use of predictive analytics. The entire student life cycle is now
being modeled starting with prospect and inquiry modeling all the way through
to alumni donor modeling. It used to be that any institutions that were
doing this kind of modeling were relying on outside consulting companies.
Today most are doing their modeling in-house. Colleges and universities
view their data as a strategic asset and they are extracting value from their
data with the same tools and methodologies as the Fortune 500 companies.
What kinds of
resources are needed and what is the first step for an institution who wants to
become more data-driven in their decision making?
It’s important to have somebody who knows the data. As long
as a user has an understanding of their data, our software makes it very easy
to analyze data and build predictive models very quickly. And our support team is
available to answer any analytic questions.
Gaining access to their data is the first step. We see a lot
of institutions that have some reporting tools which don’t allow them to ask
new questions of the data. So, they might have a set of 50 reports that they’re
able to run over and over but anytime someone has a new question, without
access to the raw data there’s no way to answer the question.
It really helps if the institution is committed to a culture
of data driven decision making. Then all
the various stakeholders are more focused on ensuring data access for those
doing the predictive modeling.
What do you say to
those who are on “the quest for perfect data”? Is it okay to implement predictive analytics
before you have that data warehouse or those perfectly cleansed datasets?
No institution is ever going to have perfect data, so you work
with what you have. We suggest seeing what you have, finding any obvious
problems in the data, and then fixing those problems the best you can. We’ve
designed our solutions such that a data warehouse is not required but, even
with a clean data warehouse, the data is never going to be perfect. As long as you as you have an understanding
of the data, you can move forward.
In your experience,
which models in higher education produce the highest ROI?
We have a customer, Paul Smith’s College that has quantified
their retention modeling efforts. Using their model results, they put programs
into place to help those students that were predicted to be high-risk of
attrition. They credit the modeling with helping them identify which students
to focus on, saving them $3m in net tuition revenue so far.
We have other clients that are using predictive modeling on
the prospect side and they’re realizing significant savings on their recruiting
efforts. So instead of mailing to 200,000 high school seniors, they’re mailing
to 50,000, and realizing significant savings by not mailing and not calling
those students who have pretty much zero probability of applying or enrolling.
Although not as easily quantifiable, enrollment modeling has
a pretty big ROI. Not only on
determining which applicants are likely to enroll, but in predicting class
size. If an institution overshoots and
enrolls too many applicants, they’ll have dorm, classroom, and other resource
issues. If enroll too little, they’ll
have revenue issues. So predicting class
size and determining who and how many applicants to admit is extremely
important.
What are some common
mistakes you see when approaching predictive modeling for your higher ed
customers?
One mistake that I often see is when information is thrown
out as not useful to the models. Zip code is a good example. Zip
code looks like a five digit numeric variable, but you wouldn’t want to use it
as a numeric variable in a model. In some cases it can be used
categorically to help identify applicants’ origins, but its most useful purpose
is to for calculating a distance from campus variable. This is a variable
that we see showing up as a predictor in many prospect/ inquiry models,
enrollment models, alumni models, and even retention models. Another
example of a variable that is often overlooked is application date.
Application date often contains a ton of useful information if looked at
correctly. It can be used to calculate the number of days between when the
application was sent and the application deadline. This piece of
information can tell you a lot about an applicant’s intentions. A student
who gets their application in the day before the deadline probably has very
different intentions than a student who applies nine months before the
deadline. This variable ends up participating in many models.
To get our customers up to speed on best practices in
predictive modeling we’ve created resources like lists of recommended variables
for specific models and guides on how to create useful new variables from existing
data.
Labels:
analysis,
interview,
Mike Laracy,
predictive analytics,
predictive modeling,
ROI
Tuesday, January 8, 2013
Defining Rapid Insight
I recently had the opportunity to sit down with Mike Laracy,
President and CEO of Rapid Insight to ask him a few questions about analytics
in higher education, predictive modeling, and Rapid Insight. I’ll be posting
the interview as a two part series here on the blog (with part two located here). The first part is the
story of Rapid Insight – how it started, what we do, and where we’re going –
enjoy!
Rapid Insight has
been around since 2002. Can you tell us a bit of the story on how the company
came to be?

I had been living in Boulder, Colorado when I developed the
concept of Rapid Insight. I spent a lot
of time thinking through the predictive modeling process and figuring out how
it could be automated and streamlined. I
sat on the concept for a couple of years before actually starting the company.
In 2002 I had moved here to North Conway and decided to rent
some office space to start developing the concept of Rapid Insight into an
actual software product. For the first
six months it was just me. I spent that
time writing the algorithms and developing a working prototype. I wasn’t a programmer and I knew that to turn
the software into a commercial application, I’d need more help. I hired a software developer who is still
with the company today as our lead engineer.
A year later we hired another developer.
In 2006 we hired our first salesperson, launched Rapid Insight
Analytics, and we’ve been growing ever since.
Do your products
focus exclusively on predictive analytics?
Our products also focus on ad hoc analysis and
reporting. In 2008, we launched our
second product called Veera. Whereas
Rapid Insight Analytics automates and streamlines the process of predictive
modeling and analysis, Veera focuses on the data. Data is typically scattered between
databases, text files and spreadsheets, with no easy way to organize it and
piece it together for modeling and analysis.
Veera solves that problem. It’s a
data agnostic technology that allows access to any database and any file format
and makes it easy for people to integrate, cleanse, and organize their data for
modeling, reporting, or simply ad hoc analysis.
We initially developed this technology as a tool to organize
data for predictive modeling. We’re now
seeing enormous demand for the tool as a standalone technology as well. Colleges and universities use it for
reporting and ad hoc analysis. Companies
like Choice Hotels and Amgen use it for processing analytic datasets with data
coming from disparate sources. Healthcare
organizations are using it for reporting and performing ad hoc analyses on
their databases. Defense contractors are
using it for cyber security.
What makes your
company different from others working in the higher ed space?
In higher ed there are consulting companies that provide predictive
modeling services. You send them your
data, and they build a model and send you back the model and a report. But the institution still has to do the prep
work to create the analytic file, which is 90% of the effort. This process is both expensive and time-consuming,
and the knowledge gained from the analysis isn’t always transferred back. By
bringing predictive modeling in-house, changes can be made on the fly without
having to send data anywhere and models can be changed and updated very
quickly, which is important because modeling is such an iterative process.
We provide schools with a means of doing this analysis and
building their own models. One
advantage is that the knowledge is always captured internally. But the biggest advantage is the ability for
institutions to be able to ask questions of their data and answer them on the
fly.
As far as other software products that are being used in
higher ed, we’re very different from tools like SAS or SPSS in that the users don’t
need to be programmers or statisticians to build models using our tools. I think if you ask the question of our
customers you’d find that one of our biggest differentiators from these types
of products is our customer support. Our
analysts are available to help our clients with any questions as they build
models, analyze data, or create reports.
Whether the questions pertain to using our technology or about
interpreting the results, we are always available to help. We want to ensure that our customers grow
their own analytic sustainability.
...click here for Part Two, where Mike shares more about predictive modeling in higher education.
Monday, January 7, 2013
Thoughts from a Registrar
Dan Wilson, Registrar at Muskingum University, recently talked with us about some of the reports he's been working on, how he's using Veera, and his upcoming webinar.
CG - What types of reports are usually on your plate?
DW - Some of the reports I'll be talking about in the webinar include:
CG - How has Veera helped with your reporting?
DW - It's helped me develop complex reports that would have taken me 4-6 hours each to get all the data, build, and run. Now I pull up a report and run it in about a minute. It's especially useful for complicated and repetitious reports.
All of the reports I've mentioned have been automated in Veera. Everything that I can, I automate. I anticipate that people might be asking for DWF rate for the first year students, or by division, or by course, or by phase of the moon. Veera is good at pulling that data together and allowing me to tweak it and make adjustments as needed. I usually choose to work with Veera when I think I'll see a lot of revisions, need to do some digging around, or can see similar questions being framed differently.
The year-over-year registrations by date report was one of the first I created using Veera. It's proven to be one of the most valuable to our administration in helping improve our retention rates. It literally takes two minutes to run - it actually takes longer to download the data file than it does to run the report in Veera.
CG - What do you hope attendees will take away from your webinar?
DW - I hope each person will find something that is a spark moment for them. Some people need exposure to Veera and to see what it can do. For users, I'm hoping to spark a brainstorm on how and why to use Veera - as the way to achieve what they need faster and easier.
CG- Anything else you'd like to add?
DW - Anyone who knows what a small college registrar does will understand that I wear many, many hats. Though I'm not an institutional researcher, some of the work that I do is borderline IR. Typically I'm asked a question and need to get someone the answer quickly. That's what I use Veera for the most.
For more information or to register for Dan's upcoming webinar, Digging Deep into Data: How a Small University's Registrar Develops Complex and Repeatable Mission-Critical reports, click here.
CG - What types of reports are usually on your plate?
DW - Some of the reports I'll be talking about in the webinar include:
- historical registrations by date,
- historical majors by semester,
- number of students still needing to take specific general education courses,
- graduation persistence by major,
- retention rates by various factors, and
- DWF (drop/withdraw/fail) rates by course.
I have one report for each of these and I make minor adjustments to it each time a new question is asked.
CG - How has Veera helped with your reporting?
DW - It's helped me develop complex reports that would have taken me 4-6 hours each to get all the data, build, and run. Now I pull up a report and run it in about a minute. It's especially useful for complicated and repetitious reports.
All of the reports I've mentioned have been automated in Veera. Everything that I can, I automate. I anticipate that people might be asking for DWF rate for the first year students, or by division, or by course, or by phase of the moon. Veera is good at pulling that data together and allowing me to tweak it and make adjustments as needed. I usually choose to work with Veera when I think I'll see a lot of revisions, need to do some digging around, or can see similar questions being framed differently.
The year-over-year registrations by date report was one of the first I created using Veera. It's proven to be one of the most valuable to our administration in helping improve our retention rates. It literally takes two minutes to run - it actually takes longer to download the data file than it does to run the report in Veera.
CG - What do you hope attendees will take away from your webinar?
DW - I hope each person will find something that is a spark moment for them. Some people need exposure to Veera and to see what it can do. For users, I'm hoping to spark a brainstorm on how and why to use Veera - as the way to achieve what they need faster and easier.
CG- Anything else you'd like to add?
DW - Anyone who knows what a small college registrar does will understand that I wear many, many hats. Though I'm not an institutional researcher, some of the work that I do is borderline IR. Typically I'm asked a question and need to get someone the answer quickly. That's what I use Veera for the most.
For more information or to register for Dan's upcoming webinar, Digging Deep into Data: How a Small University's Registrar Develops Complex and Repeatable Mission-Critical reports, click here.
Labels:
customer webinar,
customers,
Dan Wilson,
interview,
Muskingum University
Subscribe to:
Posts (Atom)