When I went to my first APRA Data Analytics Symposium in
2010, the use of analytics in support of philanthropic fundraising was a
novelty. “Analysis”, for most
organizations, consisted of descriptive statistics in Excel. A few pioneers had built regression models,
and the Symposium faculty pretty much consisted of those who could explain the
differences between Ordinary Linear and Logistic Regression.
What a difference three years has made! At this year’s Symposium in Baltimore we
considered keyword analysis, hierarchical linear modeling, visualization, and
the use of financial industry formulae for portfolio optimization. We have progressed beyond regression and now
have the critical mass of practitioners throwing ideas at each other. And at many of our institutions we are also
accumulating the critical mass of data to support serious mining, and try these
new approaches.
Alan Schwartz, formerly with ESPN and more recently the New
York Times, gave the keynote address.
Alan had written a series of article for the Times, over several years,
examining the incidence of concussions among NFL players, and their long-term
effects, including early-onset dementia.
One retired player with dementia at age 50 does not tell a story and the
pushback was that there wasn’t enough data, but most of the data is buried in
medical records and team records. The
demand for more data was a case of the “better” being the enemy of the
“good”. This one didn’t really require
Big Data, it just needed Enough Data.
Early onset dementia is normally extremely rare. When you have five cases, in a population of
only 2000+ retired NFL players, it’s hardly chance. Schwartz’ exposition is leading to real changes
in how head injuries are being regarded in football, down to college, high
school, and youth leagues. Tenacity with
data, that’s what analytics is about.
Divah Yap of the University of Minnesota offered an
intriguing presentation on scoring the free text in contact reports for words
or phrases which may tend to indicate attitude toward the organization. We have a lot of usable data around us, if we
know how to decompose it and connect dots.
When we have enough data, well-organized, we can understand it in ways
we never could before.
Visualization may be coming of age as part of analysis. One
of our fundraising projects here at UT which we have mostly failed at so far is
to find donors for the Texas Advancement Computing Center (TACC) and its
visualization lab. But if we can’t help
them, maybe they can help us. In a few
weeks we’re going to get together with them, and hand them the keys to our data
warehouse, and see if they can paint it in colors we never imagined, and help
us to see it in ways that the numbers alone don’t tell us.
In my college class on linear methods, we were warned
strictly against correlation fishing. In
your typical experiment in human psychology, p < .05 is the standard, and if
you run your experiments on twenty or fifty or even a hundred subjects, getting
past p < .05 can be a challenge. And
of course the measurement “P = .05” means that there is a one in twenty chance
that the conclusion is wrong. Run ten
such studies, and there’s a 40% likelihood that at least one of your
conclusions, if not more, will be wrong.
Taken a different way, and this is where the dictum against
correlation fishing comes in, if you have a file with ten independent
variables, and you threw it into a correlation matrix, there would be 45 pairs
of variables to correlate, and if you set your standard going in as p < .05,
then from those 45 pairings you could expect to draw two false
conclusions. Try it on a file of twenty
variables or more, with hundreds of combinations to test, and there is a real
risk that the apparent correlations are simply the random noise in the sample,
and are as much a reflection of tides and astrology as they are of anything
causative within the population. And
with more variables thrown into the mix, there is also the increasing risk of
multi-collinearity if your variables are in fact numerically related in their
derivations.
But when we study donor behavior in large organizations, we
move beyond the realm of the psychology lab and limited sample sizes. The University of Texas at Austin has a
constituent database of over 500,000 alumni and friends. I have decades of gift history, and I have
acquired consumer behavior information, derived from point-of-sale and other
sources. People with cats give more to
the arts, people with dogs give more to athletics, but in the end their total
giving is similar. I can say this “with
confidence”, when p < .0001. Big Data tells us stories, and illustrates
them in color. This doesn’t mean that I
can operationalize any strategy dependent on dogs and cats -- especially never depend on cats – but it does
give us new insights.
Coming back to the conference, if there are a half-dozen
presenters offering totally novel approaches to analysis, then the probability
is fairly high that any one of them may be a total waste of time, but there’s a
pretty good chance that at least one or two of them contain real nuggets. That’s the nature of data mining, and it’s
also why we go to conferences, to look for new insights, which may or may not
be usable. Coming away from this year’s
Symposium, many of us are feeling almost overwhelmed by new ideas, and just
wishing we had the time needed to explore all of them.
Big Data? How Big is
big enough, and how much is too big?
That’s becoming a difficult question, and the boundaries of privacy will
be a philosophical argument for years to come.
I’ve reached the unscientific conclusion that market segmentations such
as Claritas or PersonicX clusters are dead on the money 85% of the time, a
little bit off 10% of the time, and absolutely wrong 5% of the time. When there’s so much data around, and They
seem to have such a complete picture of the individual, is it comforting to
know that some of it is probably wrong, and so the picture that They have of us
isn’t as accurate as we’re afraid? When
I talk about cat owners and dog owners, should you be shocked that I know so
much about my constituents, or shocked that I draw conclusions from such
imperfect data? Perhaps both, but Big
Data is becoming reality, and so we will learn to use it for what it is, to use
it wisely and respectfully.
Organize, transform, restructure, build a systematic
repository. Mine for connections. And if a you don’t have a supercomputer for
your visualization, Tableau may take you a long way.
*
About Chuck: Chuck McClenon arrived at the University of Texas at Austin in 1975, earned a PhD in linguistics, dabbling in the nascent technology of pattern recognition. After a year teaching in English in China, he returned to UT to work in administrative information management, searching for patterns and meaning in data ranging from student course registrations to library book titles to the bit-paths of room keys. He joined the advancement operation as an IT manager in 1996 at the start of UT’s first comprehensive capital campaign. After a brief tour of duty managing the gift processing and donor records operation, he retired to a cave and immersed himself in phonathon results and gift officer contact reports. Now he spends his days acquiring, constructing, managing and analyzing data representing the full spectrum of advancement activity. Since 2006, he has held the official title of Fundraising Scientist.
No comments:
Post a Comment