Tuesday, November 13, 2012

Predictive Modeling Mantras

Whether you're new to predictive modeling, or you dream in decile analyses, here are some things to keep in mind as you're embarking on your next modeling project: 


Data preparation makes ALL the difference.
Simply put, if you use junk data to create a model, chances are that your model’s output will be junk too. Thus, it’s very important to clean up your data before building your predictive model. During the data clean-up process, you’ll want to think about things like how to handle any missing values, possibilities for new variables to be added to your dataset, how to handle outliers, and make sure that your data is as error-free as possible.

A complex model isn’t the same as a good model.
More often than not, the best model is a simple one. Although you can almost always find a new variable to add, or new way to slice your data, you want to avoid the trap of overfitting. You want your model to be specific, but not so specific that you sacrifice reliability when scoring a new dataset.

A good model validates what you know while revealing what you don’t .
Don’t be surprised if some of your “common sense” variables outperform the more exotic ones. Although it’s always nice to pick up on some new variables and insights, building a predictive model can also boost your confidence in the rest of your data.  

If a model looks perfect, it’s lying.
As exciting as getting a great model fit statistic can be, there is the possibility of too good to be true when it comes to model building. If you build a particularly great model, you’ll want to double and triple check each of the variables in the model to be sure they make sense. One of the most common reasons for a great model is an anachronistic variable – a variable you would have available only after your y-outcome was decided.

Persistence is a virtue (because building models is an iterative process).
After you’ve taken a first pass at a model, maybe you’ll think of a related variable that would be predictive. Maybe you take a second look at some of the relationships between variables and decide to bin or re-map some of your continuous or categorical variables. Maybe the outputted variables are the opposite of what you expected, so you decide to tweak the way your dataset is set up. The point here is that your first model will likely not be your final model. Be ready.

Trust and verify.
The modeling process doesn’t end after you finish building your model. After implementing your predictive model, you want to be sure that it’s correctly predicting your y-variable over time. To do this, you’ll need to compare your model scores with actual results once they are available. If your model is correctly predicting the desired outcome, you can continue to use it (but still must validate as time goes by); otherwise, you’ll need to take a few steps back to see where you can make improvements.


-Caitlin Garrett, Statistical Analyst at Rapid Insight
[photo credit]

No comments:

Post a Comment