Data preparation
makes ALL the difference.
Simply put, if you use junk data to create a model, chances are that your model’s output will be junk too. Thus, it’s very important to clean up your data before building your predictive model. During the data clean-up process, you’ll want to think about things like how to handle any missing values, possibilities for new variables to be added to your dataset, how to handle outliers, and make sure that your data is as error-free as possible.
Simply put, if you use junk data to create a model, chances are that your model’s output will be junk too. Thus, it’s very important to clean up your data before building your predictive model. During the data clean-up process, you’ll want to think about things like how to handle any missing values, possibilities for new variables to be added to your dataset, how to handle outliers, and make sure that your data is as error-free as possible.
A complex model isn’t
the same as a good model.
More often than not, the best model is a simple one. Although you can almost always find a new variable to add, or new way to slice your data, you want to avoid the trap of overfitting. You want your model to be specific, but not so specific that you sacrifice reliability when scoring a new dataset.
More often than not, the best model is a simple one. Although you can almost always find a new variable to add, or new way to slice your data, you want to avoid the trap of overfitting. You want your model to be specific, but not so specific that you sacrifice reliability when scoring a new dataset.
A good model validates what you know while
revealing what you don’t .
Don’t be surprised if some of your “common sense” variables
outperform the more exotic ones. Although it’s always nice to pick up on some
new variables and insights, building a predictive model can also boost your
confidence in the rest of your data.
If a model looks perfect,
it’s lying.
As exciting as getting a great model fit statistic can be, there is the
possibility of too good to be true when it comes to model building. If you
build a particularly great model, you’ll want to double and triple check each
of the variables in the model to be sure they make sense. One of the most
common reasons for a great model is an anachronistic variable – a variable you
would have available only after your y-outcome was decided.
Persistence is a virtue (because building
models is an iterative process).
After you’ve taken a first pass at a model, maybe you’ll
think of a related variable that would be predictive. Maybe you take a second
look at some of the relationships between variables and decide to bin or re-map
some of your continuous or categorical variables. Maybe the outputted variables
are the opposite of what you expected, so you decide to tweak the way your
dataset is set up. The point here is that your first model will likely not be
your final model. Be ready.
Trust and verify.
The modeling process doesn’t end after you finish building your model. After implementing your predictive model, you want to be sure that it’s correctly predicting your y-variable over time. To do this, you’ll need to compare your model scores with actual results once they are available. If your model is correctly predicting the desired outcome, you can continue to use it (but still must validate as time goes by); otherwise, you’ll need to take a few steps back to see where you can make improvements.
-Caitlin Garrett, Statistical Analyst at Rapid Insight
[photo credit]
The modeling process doesn’t end after you finish building your model. After implementing your predictive model, you want to be sure that it’s correctly predicting your y-variable over time. To do this, you’ll need to compare your model scores with actual results once they are available. If your model is correctly predicting the desired outcome, you can continue to use it (but still must validate as time goes by); otherwise, you’ll need to take a few steps back to see where you can make improvements.
-Caitlin Garrett, Statistical Analyst at Rapid Insight
[photo credit]
No comments:
Post a Comment