Friday, August 30, 2013

Here's to the Skeptics: Addressing Predictive Modeling Misconceptions

Photo credit: Jonny Goldstein
As a full-time analytics professional, I have a hard time conceiving of people who have not fully embraced the power of predictive analytics, but I know they’re out there and I think it’s important to address their concerns. In doing so, I’m not here to argue that predictive analytics is a perfect fit for every organization. Predictive analytics requires investment: in your data, in infrastructure and technology, and of your time. It’s also an investment in your company, your internal knowledge base, and your future. I’m here to argue that the investment is worth it. 

To do so, I’ve presented a few clarifications to address predictive modeling concerns that I’ve heard from skeptics. If you have anything to add, or if there are any big concerns I’ve missed, let me know in the comments.

You don’t need to be a PhD statistician to build predictive models
A working knowledge of statistics will help you to better interpret the results of predictive models, but you don’t need ten years’ experience or a doctorate degree to glean insight or utilize the output from a model. There are software packages out there with diagnostics that can help you understand which variables are important, which are not, and why. Knowing your data is equally important as statistical knowledge, and both will serve you well in the long run. 

A predictive model shouldn’t be a black box
There are plenty of companies and consultants whose predictive models could fall into the “black box” category.  The model building process, in this case, involves sending your data to an outside party who analyzes it and returns you a series of scores. On the surface, this may not seem like a bad thing, but once you’ve built your first model, you’ll understand why this is not nearly as valuable as doing it yourself. While the output scores are important, you also want to know about the variables used, how the model handled any missing or outlying variables, and glean insight beyond a single set of scores so that you can change or monitor specific behaviors going forward.

Even if you know your data, modeling can help
A finished predictive model will do one of two things: confirm what you’ve always believed, or bring new insights to light. In our office, we refer to this idea as “turn or confirm” – a model will either turn or confirm the things you’ve thought to be true. Most of the time, models will do both. This allows you to both validate any anecdotal evidence you might have (or realize that correlations might not be as strong as you thought) and take a look at new variables or connections that you may not have picked up on before. 

Predictive models can be implemented quickly
I've heard some horror stories about a model taking months, or even years, to implement. If this is the case at your institution, you're doing it wrong. At this point, predictive modeling software has become incredibly efficient - usually able to turn out models within seconds or minutes. The bulk of time spent working on a model is typically spent on the data clean-up, which will vary from company to company. In any case, this is time well spent. Clean data is just as good for reporting, dashboarding, and visualizing as it is for predictive modeling.

Predictive models enhance human judgment, not replace it
If models were meant to replace human judgment, I too would be uncomfortable and suspicious of the idea. However, 99% of the time, the aim of predictive modeling is to enhance and expand human expertise to allow us (the end users) to be better-informed and more data-driven in our decision making.

-Caitlin Garrett, Statistical Analyst at Rapid Insight


  1. Amen to most but taking minutes. Unless by implementation you are referring to just the generating the decisions from the model. Today I think that model governance kills most of the speed. After the market debacles of 2005-2008 (all induced by stupid modeling in my opinion) everyone is afraid to just follow the logic of modeling.

  2. I agree that model governance can take some time - and how much probably varies widely from company to company. I was thinking of that as being a little different than implementation in that it has more to do with company structure than predictive modeling as a whole, but your point is well taken.

    Also, loving the phrase 'stupid modeling'. I've seen some of this and it's good to call it what it is. Thanks for your comments.

  3. "You don’t need to be a PhD statistician to build predictive models...A working knowledge of statistics will help...".

    Granted, you don't need a PhD, but you'll need more than a "working knowledge of statistics." Yes, the software has come a very long way - and that is a blessing and a curse. Many who "do statistics" do it very badly because they misapply good software because they "know just enough to be dangerous" - and often get it wrong.

    Examples of this abound in the various pertinent LinkedIn Groups - especially those who claim Six Sigma Black Belts...

    1. Frederick Lord, Ph.D., said that it is not a matter of computation, but of interpretation. No particular degree is necessary, but awareness of what the results of a predictive model means is essential. Do not rely on the software because it will spew out numbers, but if you do not know what they mean, you will reach erroneous conclusions. The more poignant issue is the research design, not the particular results of a statistical test.

  4. You make a good point that sometimes you can know just enough to be dangerous. While I wouldn't encourage someone to completely fly by the seat of their pants while building a model, I think it's important that people realize that they don't need to be intimidated by the ideas that come with modeling. A working knowledge of statistics is okay if you have people supporting you who know how to interpret results - which is the way our company works. I guess it just depends on what your support system looks like if you don't actually have a good working knowledge.

    1. Caitlin
      I agree. I believe it's an iterative and learning process for ALL. As long as you know the limits of the model(s) you build (and your own limits as well) and you are keen to learn why/how you are predicting what you are predicting. You will do well. Having all stakeholders on board is very important. No one should go with the expectations that this model is going to change the course of the company. It's just a step closer to making right decisions and several of those steps are needed before you can start reaping the benefits from your investments. The most important take away from my experience in last 5 years has been the thought process and discussions it generates about.
      Good Write Up!!

  5. Outside the professional realm, I like to tell people that the predictive modeling projects I do are sort-of like what Charlie does on the TV show Numbers. When I do this, people seem to perk up and get a clue as to what I do. Of course, I'm not a Harvard faculty member who calculates the whole thing by hand. Thanks to Analytics I don't have to be ;)

    But professionally, predictive modeling is still a new thing here at IWU, and I'm currently in the midst of marketing the value to department heads. Thanks Caitlin, for this will be useful for me.

  6. Great post, as one of the "non PhDs" out there attempting to model I find myself in a constant state of "adjustment". I do spend a majority of my time in the data cleaning mode, but I feel that having an open mind and open communication to others smarter than you is another point to make. I do whole-heartily agree that the insights gleaned from simply building out some of these processes and models has been, at least to me and my group of immense value.

    Thank you for the article.

  7. Nice job Caitlin! I particularly like your warnings about sending data out to a third party who analyzes and returns a series of scores. As important (perhaps more important) is knowledge and understanding of your data. Modeling without context of data quality, what is clean vs. dirty, derived vs. raw, captured at different points in time, extracted from various systems, manipulated by other processes, etc. will lead to a less predictive result. The modeling team must include data experts who can provide the data quality perspective, including what data to exclude from the model. This is what differentiates good models from great models.

  8. @Tony, I'm very glad to hear that this article will be useful for you. Also, love the Numbers comparison.

    @Jonathon, thank you for reading! I agree that there's a lot to be learned just from the work that goes into a dataset before predictive modeling starts - all of which is very valuable.

    @Gregory, thank you! Your comments summarize my feelings on third party vendors almost exactly. Thanks for sharing.

  9. @ Catlin

    Yes, I can vouch for you skeptics are abound -- educated, expreienced pros, executives and management.

    It is up to us sell them. Some time back we could not get some Discriminant Analysis - Scoring for Good / Bad Risks based on experience and factors before and after certain undesirable events and the magnitude given such scenarios.

    The results were very good / and followed through with a pilot validation study. However, the Mgmt /Field Staff, beyond certain factors the models prediction was marginal , and those are the factors they already know -- preponderance of evidence. So, it could not be field implemented, even at major discussion about - quantifcation of even if it is a gut feel, how much , how often those were still could be used .. but did not win it, so it was a paper stduy -- results were still in the computers as models..

    Another perspective

  10. Nice article. Very well written and related.

    But I believe, if you would have added couple of examples related to predictive modeling, then it would have made the picture more clear. Examples/Instances actually help to learn more better and generally they withstand and support the concepts. Strictly my opinion !!!

    On the other hand, I believe sometimes predictive modeling could be black box; example could be neural network.
    we cannot predict what's happening within neural network but what we are interested in, is the final output here.

  11. Well, if the models are created for getting a better insight - this is great and frequently useful. As for "predictive" part of it, I doubt that many organizations are systematically validating, year after year, the quality of past predictions, and I suspect that those who tried to do this, typically find themselves disappointed. This is why there are skeptics.

    And I am not talking about "stupid modeling", and not blaming software or the modelers. The culprit is that too many systems / processes of interest don't exist in isolation from their environment (e.g. all parts of the world that are not included in the model), and/or may be so complex that commonly used modeling software cannot adequately capture their dynamics. In these cases, you'd have to build custom models "by hand", and for this, indeed, you need more than just "PhD statisticians", you need the whole team of subject matter experts, software developers, etc. This time-consuming, costly, and does not guarantee a success.

    I may sound like a skeptic, but I was in business of building and using computer models most part of my life. I just think that the examples of successful predictive modeling outside the realm of physics and engineering are too limited to convince skeptics. If you know such examples, I'd like to hear from you.

  12. @ Rational_Observer, always a bummer to see a model not get implemented if it could be improving an existing system. I think this speaks to the "turn or confirm" quality that models can have - sometimes they confirm your existing assumptions rather than provide new ones. C'est la vie.

    @ Ankit, I agree that examples are very helpful for illustrating points. There are a lot of case studies located here about how predictive modeling has helped various institutions:

    @ Mike, I couldn't agree more about the importance of validating your models. It's completely possible for a model to go stale and actually really easy to do a validation using something like a decile analysis. There's really no reason a model shouldn't get validated.

    As for examples outside of the realm of physics and engineering, we have plenty. Colleges, hospitals, non-profits, direct marketing companies, and businesses are all using these techniques to gain a competitive advantage. If you have any more specific questions about how businesses are using predictive modeling, feel free to email and I'm happy to discuss. There are a lot of success stories out there.

  13. Cant help agreeing with you Caitlin as a user of Simulation Software and promoting it

  14. You do need a PhD to know when a predictive model (e.g. statistical one) will work and when it won't. Anyone who claims any particular package/approach will work is fooling themselves.

    1. Sorry that last sentance should read "...will always work is fooling themselves"

  15. Interesting article and educational. With more tools we have to build PM, we still are struggling to achieve clean and good data. The quality of the PM heavily depends on the quality of the data and I see that good 50+% of time is spent on data cleansing and data preparation.

  16. Caitlin, it is not only business that utilizes (and struggles with!) predictive models; I teach a class in the construction and usage of attrition models (usually referred to as 'combat models'; e.g., the Lanchester Equations) to support military analysis, training, and wargaming. For them, the issue is often one of context: if I use a wargame to examine an issue, or even a deterministic model, I'm still not certain where in the range of all possible outcomes (generated by either a Stochastic/Monte Carlo model or Design of Experiments (DoE) run) that single outcome might fall. Part of the problem for the military, of course, is the ability to quantify the effectiveness of a system, or procedure, or organization where there is no existing data as whatever is being looked at either does not exist, or has not been used in the past. But we still get asked the questions: "How many casualties should we expect?" "Will this new ____ help us, or have no impact, or could it possibly hurt us?" And then, of course, there are the non-physics based side effects: what will be the impact politcally? Socially? etc.
    Anyway, interesting topic.

  17. Data Modelling Online Training, ONLINE TRAINING – IT SUPPORT – CORPORATE TRAINING The 21st Century Software Solutions of India offers one of the Largest conglomerations of Software Training, IT Support, Corporate Training institute in India - +919000444287 - +917386622889 - Visakhapatnam,Hyderabad Data Modelling Online Training, Data Modelling Training, Data Modelling, Data Modelling Online Training| Data Modelling Training| Data Modelling| "Courses at 21st Century Software Solutions
    Talend Online Training -Hyperion Online Training - IBM Unica Online Training - Siteminder Online Training - SharePoint Online Training - Informatica Online Training - SalesForce Online Training - Many more… | Call Us +917386622889 - +919000444287 -

  18. To delve deeply into imprumut rapid is an exciting adventure. In depth analysis of imprumut rapid can be an enriching experience. Given that its influence pervades our society, imprumut rapid is not given the credit if deserves for inspiring many of the worlds famous painters. Since it was first compared to antidisestablishmentarianism much has been said concerning imprumut rapid by global commercial enterprises, who are yet to grow accustomed to its disombobulating nature.

  19. This blog is the general information for the feature. You got a good work for these blog.We have a developing our creative content of this mind.Thank you for this blog. This for very interesting and useful.
    Analytics Training in Chennai

  20. Hi, thanks for sharing such an informative blog. I have read your blog and I gathered some needful information from your blog. Keep update your blog. Awaiting for your next update.
    aws scenario based interview questions