Rapid Insight: Data Analytics: July 2013

Tuesday, July 30, 2013

Why Nonprofits Should Be Building Predictive Models

Last fall, the Whitney Museum of American Art decided to take a different approach when deciding which of their prospective donors to mail. They built their first in-house predictive model from the ground up, and felt ready to use it. They shifted their focus away from some of their prospects who "made sense" but had never given, and used the model to inform a large part of their mailing list. Within the first six months of modeling, they received a $10k donation from a donor they would not have mailed using their previous methodology.

...And they aren't the only ones. More and more nonprofits are turning to predictive modeling to drive their fundraising. For a more in-depth look at the 'hows' and 'whys', I sat down with a man who founded his own company to provide software so that nonprofits and for-profits alike could start building their own predictive models in-house. He also happens to be my boss and one of the smartest people I know - Mike Laracy:

Why would a nonprofit use predictive modeling? How can it drive fundraising?

The quest for any organization, whether a for-profit or non-profit, is to figure out how to achieve its goals and to do so in the most efficient and cost-effective manner possible. Predictive modeling allows an organization to make better decisions and become more efficient with its use of what are often limited resources. By using analytics, an organization can better determine who to contact, how often to contact, how much to ask for and how best to achieve their desired fundraising results.

Although driven by very different motivations, the relationship between a nonprofit and its donors is very similar to the relationship between a for-profit company and its customers. Customers choose whether to buy a product or not buy a product. They can become loyal customers or non-loyal customers. They can buy a lot or they can buy very little. It is much the same story for nonprofits and their donors. Donors can be loyal or not loyal. A prospect can choose to be a donor or not be a donor. They can give large gifts or small gifts. With accurate data and a modeling process that is easy to implement, a non-profit can begin to model a donor’s behavior using the exact same methodologies that are used to model a customer’s behavior.

What kinds of resources are needed to start building predictive models in-house?

Without quality data, predictive modeling isn’t possible. So let’s start with that. There needs to be a system in place that is capturing an organization’s historical data. Almost every organization is already capturing their data, so that’s usually not a problem. The data doesn’t necessarily need to be organized in a data warehouse. In fact, the data needs to be available in its raw form, so sometimes having data pre-aggregated in a warehouse can be a disadvantage. What’s important is that the data is accessible.

From a staffing perspective, you will need a person or people to collect information on the data, build the models, communicate the results and make sure the models are being used. There needs to be someone who is making sure the right information is being collected and the right information is being communicated. This can be a single person, but that person needs to make sure that others in the organization are on board with an understanding of why the models are being built and how they will be used.

What are good first steps for an institution looking to get into predictive modeling?

Like any new initiative, it’s vital to the success of your predictive modeling efforts that there is universal buy-in across the organization. If there isn’t buy-in, the models won’t be utilized. To get buy-in, start small. Go for the early win by building and implementing a single model. Make sure others in the organization have an understanding of what the model will do, how it will be utilized, and most importantly, how the model will benefit the organization. Once you get that first win, the interest and buy-in will usually spread quickly across the organization. As you share the results of those first few successes, begin to identify who the champions for this initiative will be. Work with them to help them communicate the success of the project organization-wide.

In your experience, how should an institution decide who should build the predictive models?

Ideally, you want someone who has an understanding of the data. If you don’t already have someone with that knowledge, you want a person who is willing to learn the data. Some understanding of statistics is a plus, but with current analytic software technology, there is no longer a need to rely on someone with programming skills or a PhD in statistics to be your data expert. The people you want to dedicate as resources for predictive modeling should be creative problem solvers who are willing to learn.

What modeling challenges might be unique to different types of nonprofits?

There are definitely different needs and different challenges depending on what type of fundraising entity you are. A college advancement office, for example, has an advantage in that they have information on the students who graduated with them. For example, age comes up as a predictor in many giving models. Whereas an organization like a museum might not have good info on the age of all of its members and donors, a college or university will at the very least have each student’s year of graduation, which is a great proxy for age. A college will also have great information like the major each student graduated with and whether or not the current year is a major reunion year. While a non higher-ed entity won’t have this type of information, they will have information that a college advancement office won’t have. A museum will have info on its members, how many times someone has visited the museum, and a lot of other great information for modeling that a college won’t have.

Another challenge that people may encounter is how spread out their data is. Some organizations have more sophisticated computer systems with everything centralized and others may have the information spread across multiple spreadsheets, databases and even outside sources. As you determine what your data needs to look like, keep in mind that you will need to pull it together and do cleanup before you can begin to model with it. This was actually one of the reasons we originally created our Veera product. People were looking for an easier way to clean up and merge their data before they created their models.

Are there any common mistakes to avoid when gearing up to build a model?

I think the biggest mistake to avoid is building a model without buy-in from the rest of the organization. Another mistake is building a model without an implementation/utilization plan. Building and scoring a model is great, but by itself the model doesn’t do anything for you. Before building the model you should have a plan for how you are going to use the model. For example, if you are a nonprofit and you build a model to predict each donor’s probability of giving to the annual fund, you need to utilize the model in your annual fund outreach. You will need a plan to mail/call the top X% of your donors with the highest probability of giving, or you should have a plan to not mail donors that are below some probability threshold. Or perhaps you only want to mail to donors who are likely to give at least a $500 gift. There are many ways that these models can be used, but the key is that they have to be used.

Once you begin to use them, you can also begin the process of refining and measuring the effectiveness of your models. Then you can refine them to make them even better.

What kinds of resources/learning opportunities are out there for those looking to get started with predictive modeling?

In the fundraising world, APRA and the Data Analytic Symposium have a lot of extremely useful sessions. I’d also recommend Prospect DMM, which is a listserv where a lot of really smart people discuss modeling topics. We (Rapid Insight) put on a predictive modeling class not too long ago with Brown University and Chuck McClenon from the University of Texas – Austin. Classes like those are a great place to get started and we’re thinking about doing one again soon.

What strategies can you recommend so that a customer gets the most mileage possible out of their predictive modeling efforts?

To borrow a phrase, I’d say reduce, reuse, recycle.

Once you’ve set up a process for organizing, cleansing and analyzing your data for one model, you can use that same process for all of your models. In fact, you can even use that same process for scoring and testing all of your models. There’s no reason to reinvent the wheel each time.

Another important strategy is to make sure you set up a system for knowledge capture. Modeling is an iterative process; you don’t just build one and you’re done. You can learn a tremendous amount as you’re building models. A lot of that knowledge is actually knowledge about your data. That knowledge will accumulate very quickly over time and will make you smarter and smarter as an organization. This is one of the biggest advantages to bringing predictive modeling in-house: if you are not doing predictive modeling yourself, you run the risk of that knowledge escaping from your organization. Once it escapes, you miss out on an opportunity to grow your organization’s analytic intelligence.

Remember the old proverb about giving a man a fish and feeding him for a day versus a lifetime? The same thing is true with predictive modeling. If you give an organization a model; you’ve made them smart for a day. When you give them the tools to build their own models they become smarter and more competitive for a lifetime.

**
Besides being the Founder and CEO of Rapid Insight, Mike Laracy is a devoted Birkenstock fan, recently ran up Mount Washington, has an eclectic taste in music, loves talking about predictive modeling, is a sap for his two kids, and has pretty much always been a nerd. For those of you attending APRA, he'll be giving a presentation - "Preparing Your Data for Modeling" - on Wednesday, August 7th at 1:30 pm.

Friday, July 26, 2013

Predicting Retention for Online Students: Where to Start

With the rise of enrollment in online programs and MOOCs, we’re seeing more and more students forego traditional classroom experiences in favor of more flexible online programs. With this shift comes a whole new set of guidelines for enrollment management, financial aid, and retention programs. Retention, in particular, has seen a significant downward trend as learning moves from in-person to online classrooms.

My interest lies in figuring out what variables might be worth including in an analysis attempting to predict online student retention. I did a bit of research and was hoping to find a list of variables online that had worked in the past but couldn’t find any comprehensive resource, so I’ve started to build my own. In the sections below, I’ve listed the type of information that I think would be worth analyzing broken out into four separate categories. Some of these are variables in and of themselves, and some can be broken down different ways; for example, “age” can be used by itself, but creating a “non-traditional age” flag is useful as well. Realistically, not all schools will have all of this information, so this list is meant to be a good starting point of what to shoot for when collecting data.

Also, if you have any variables to add (and I’m sure there are some I’ve missed), I’d love to hear about them in the comments.

Student Demographic Information

Socioeconomic status / financial aid information

FAFSA info, Pell eligibility, any scholarship or award info

Ethnicity

Minority Status

Gender
Home state
Distance from physical campus (if applicable)
Age; traditional or non-traditional?
Military background?
Have children?
Currently employed full-time?
First generation college student?
Legacy student? (Did a parent/grandparent/sibling attend?)

Student Online Learning History

Registered for classes online or in person?
How many days did they register before the start of the term?
Ever attended a class on-campus?
Do they plan to attend both online and on-campus classes?
Did they attend any type of orientation?
Number of previous online courses taken

First-time online learner?

Student Academic History

GPA
SAT/ACT scores
Degree hours completed
Degree hours attempted
Taking developmental courses?
Transfer student?
Degree program / major
Program level (Associate, Bachelors, Masters, etc.)
Number of program or major changes (if applicable)
Any previous degrees?

Course- and Program- Related

Amount of text vs. interactive content
Lessons with immediate feedback?
Any peer-to-peer forum for interaction?
Lessons in real time or recorded?
Amount of teacher interaction with students

Chat, email exchange, turn-around time on assignments

Closing notes:

Getting course-related data might be difficult, but the variables I listed above are derived from studies about how to improve online courses as being areas to focus on; my thinking is that the more engaged a student is, both with peers and instructors, the better their chances of online success are. If you have the data available, it would be worth trying to incorporate it into your model dataset to see whether or not it is predictive.

Rather than using retention as a y-variable when building these models, we typically create an attrition variable (exactly the opposite of retention) and use that as our y instead. This way, we're getting more directly at the characteristics of a student who is likely to leave rather than stay.

Typically when building attrition models, I create separate models for freshmen and upperclassmen. I’d suggest doing that here as well, since previous online coursework will probably be a good indicator of future online coursework. In that case, you’d want to take out many of the variables listed above when modeling freshmen retention.

Finally, it’s important to keep in mind that student success has different meanings for different institutions. You could be basing success on # of credits completed, transitions from semester to semester, or a particular GPA cutoff, among other indicators. When building these different types of student success models, you will probably need to tailor some of these variables to fit the model you're building.

-Caitlin Garrett is a Statistical Analyst at Rapid Insight

Tuesday, July 16, 2013

Playlists for Analysis

At our recent User Conference, I had a really interesting conversation with some of our customers about listening to music at work which got me thinking about the types of music that people listen to in the office. I know that different music works for different people, but I also know from personal experience that different music works for different situations. I listen to different music when I'm doing things like answering emails (or writing blog entries) than I do when I'm in the midst of an analysis.

Depending on your office, protocol for listening to music may be different, but in our office, it’s safe to say that the analysts are generally working with one ear to their music and one to the general office sounds. So my question became: "When you're working hard on an analysis, what’s coming from the headphones?" I asked each of the analysts in our office to come up with a playlist that reflects the type of music they generally listen to when they want to get down to business. Here’s what our office is listening to:

Mike Laracy, Founder, CEO, and Data Geek:

"Within the calmness of these songs, there's a rhythmic intensity that I find helpful for thinking and analyzing (and occasionally for napping). But the songs in my selection also have bits and pieces that are extremely 'rock-out-to-able'. Case in point, Beethoven's 9th (4th movement). Don't be afraid to blast it!!"

Jeff Fleischer, Director of Client Operations:

"I’m a soundtrack guy. I find vocals distract me if I’m trying to concentrate, so I stick with instrumentals. Here are some of the things I listen to." [Note: Some of Jeff's tracks weren't on Spotify, like the soundtracks to the Flower and Journey video games.]

Caitlin Garrett, Statistical Analyst:

"This playlist is a pretty balanced representation of the music I listen to when I'm knee-deep in analysis mode. Most of these songs are pretty upbeat, but there are a few mellow ones thrown in (mostly Poolside tracks). The single thing I need in a playlist is a steady beat, which you'll find throughout this list. Bands like Ratatat and Javelin get a lot of airtime on here because I like their genre of instrumental. I only took a handful of songs from each of them but their full albums make good standalone playlists as well."

Jon MacMillan, Data Analyst:

"This playlist is all over the place, but that's typically how I am when I really get down to work. The only prerequisiste for a song to make my playlist is that it maintains an upbeat tempo and catchy beat. This includes most notably Ratatat, Explosions in the Sky, and a little Daft Punk sprinkled in. As the title ['Forget the Words'] suggests, forget the words and just listen to the music. The first track, All My Friends by LCD Soundsystem, is one of my favorite songs. I can't tell you how many times I've listened to this song and yet still don't know the lyrics, yet I can't help but get excited when I hear that piano riff."

If listening to music at work isn’t your thing, there’s been some research which shows that ambient sounds can increase creativity. If working at a coffee shop isn’t an option, Coffitivity has you covered. Their website provides the same ambient noises that you’d hear at your local coffee shop without the distractions.

We'd love to know: what's on your at-work playlist?

-Caitlin Garrett is a Statistical Analyst at Rapid Insight

Thursday, July 11, 2013

#RIUC13

For those of you who weren’t able to attend the 2013 Rapid Insight User Conference, we set a new record for most attendees and largest number of customer presentations. With two full days of dual track programming, the presenters covered a lot of ground. While we wait for some of the video recordings of customer presentations to be formatted, I thought it would be good to do a quick recap here.

Mike Laracy, Data Geek (at right)

The conference opened with a keynote from our Founder and CEO, Mike Laracy, who talked a bit about the future of predictive analytics. With a mass public education on the value of analytics (from people like Nate Silver and Billy Bean, with a little help from Brad Pitt), as well as significant advances in data storage and processing power, a stronger need for predictive analytics is emerging. The market is shifting towards the view that more data access is better than restricted access, and that given the right tools along with access, smart people – data scientists – can turn raw data into actionable information. Given these changes, the data scientist – that’s you – will be in increasingly higher demand over the next decade and beyond, as will predictive analytics.

The user presentations covered lots of different topics, and we’ve made all of their slide decks available here; I’d highly recommend checking them out. In addition to what’s there, I’d also recommend checking out some of the interviews we’ve done with customers on building campaign pyramids and using predictive modeling to drive fundraising efforts. The RI staff team also gave a few presentations, including topics like Tips and Tricks in Veera, Techniques for Improving Your Predictive Models, and An Introduction to Reporting and Dashboarding with Veera.

Another thing worth mentioning is that we announced our partnership with Tableau to provide a complete solution for both predictive modeling and visualization. Now users can use Veera to clean up their data, Analytics to build their predictive models, and Tableau’s visualizations to turbocharge their presentations. For more information, check out our partner page.

I am now a scientist according to @RapidInsightInc. I feel super smart! #RIUC13
— Dustin Mayfield (@dlm0078) June 27, 2013

My favorite part of the User Conference has always been talking to customers about the cool data projects that they’ve been tackling, and this year was no different. Kudos to our users for being so creative and smart with the ways they use our software. We also owe a big thanks to the folks at Yale for hosting us, and to all who were able to attend. Here’s to the best User Conference so far and to making next year’s even better!