Tuesday, June 25, 2013

Data Scientists: The Next Generation

As I’m sure you all have noticed, the data business is booming right now. (Are you tired of the term “big data” yet?) The fact that 90% of the data in world today has been created in the last two years is a great example of the growth trajectory of data. All of this data provides new opportunities for discovery for those who are willing to analyze it. Enter the data scientist.

 “Data Scientist” isn’t even listed as a career by the US Government’s Bureau of Labor Statistics yet, but it’s already been named the sexiest job of the 21st century by Harvard Business Review. With a growth pattern similar to that of data itself, it’s safe to say that data scientists are going to be in high demand. Among other skills, being a practitioner of data science requires analytical thinking, mathematical/statistical ability, a knack for communicating results to non-data people, and creativity. This combination of business acumen and technical skill isn’t easy to come by, and new graduate programs with an emphasis on data science seem to be cropping up daily to fill the gaps. One article from the New York Times recently asserted that the United States will need to increase the number of graduates with data science skills by as much as 60% to keep up with demand.  So, when you’re looking for new data scientists, where do you turn? To a generation who’s grown up with data science all around them – through Netflix recommendations, Google search results, and even at the movie theater à la Moneyball.

I was recently asked to participate in a “Job Hop Day” for a local elementary school. The idea was to expose 4-6 graders to different jobs that are available in the Mount Washington Valley in NH. It was a good opportunity to spend a fund day with elementary school students while exposing them to world of data science (and the idea that people actually get paid for doing it!). In preparing for our session, I realized that as thrilling as an hour-long lecture on data science might be for some, 10-year-olds probably wouldn’t be so interested. After ruling out a product demo and a slideshow, my coworkers and I thought about other ways to engage them. We decided the best approach for them to learn about being a data scientist was to do it themselves (in the guise of a game). 

When creating the game, we thought about some of the skills we wanted to reinforce, which were things like data mining, basic math, and the ability to make predictions. From there, we got creative – we wanted to pick a subject that kids would be interested in, and since vampires are on the brink of cliché, we settled on werewolves. The game we came up with was a variation of a Family Feud board that involved an initial data-mining phase to glean the characteristics of a werewolf.

To start, I gave the kids ten descriptions of people on color-coded index cards, five of which were designated as “werewolves” and five of which were “non-werewolves”. (Coming up with the descriptions was a good exercise for us as well, we tried to make  sure the clues weren’t too obvious, and had to plan them so that some characteristics were more popular than others. An example: three of the werewolves were vacationing in London this summer, but all five of them played some kind of sport). Each data scientist had a whiteboard to write down their descriptions as they went, and we stopped the “data mining” portion of the game once they all felt like they had come up with as many characteristics as they could. The Family Feud board I mentioned earlier had the ten characteristics listed in order of the number of times they came up, and the kids took turns guessing what was on the board.

Over the course of the day, three groups of students played the game, and all three groups seemed to really enjoy it. After we finished the game, we talked about the different uses of data and predictive modeling, covering examples spanning test scores to baseball. They were knee-deep in baseball season and pretty excited when I told them about a baseball scout’s presentation I saw at DRIVE, and how they used statistics to predict what might happen in each game. It was evident from our conversations that the kids had some knowledge of the amount of data around them and were interested in examining the world from a data-driven viewpoint. (I should probably mention here that the kids who chose to attend our session knew it would be math-related, so our sample was a bit biased.) Most of them had never heard of a data scientist or a statistical analyst before, but they were interested in the type of thinking we’d done. A few days later, a student’s mom told me that her son “loved the game” and “was so excited that it was an actual job that he could shoot for”.

Overall, our ad hoc approach to the data scientist experience seemed to go over well, but there’s always room for improvement. I’m interested in any ideas or experiences you guys have might regarding young data scientists, and would love to hear about them in the comments below. In the meantime, if you’ve had a sneaking suspicion about a certain neighbor around a full moon, or just want to have a little fun, I’d recommend trying out your own version of the game. 

-Caitlin Garrett is a Statistical Analyst at Rapid Insight


  1. Cool, very accessible. I wish I learned about data this way. It reminds me of a (slightly controversial) project I has some students do called "Are they siblings?". We got pictures of siblings and non-siblings, made facial measurements, trained a model, and tested it along with our intuition. It got giggles and "hmm"s.

  2. Thanks, Sean. I can see why the siblings project might be controversial, but I like that you attempted to to teach modeling. That was something we ultimately steered away from, thinking it would be a bit tough to fit everything in, but might be worth including for round two.

  3. This is a great way to teach this. It would be great if you put more details so other teachers could use what you created and put it in their classroom.

  4. Good advice, DJ. I do have more information available (including what information I included on the flash cards) if you'd like to email me!

  5. I have to admit that I was caught quite unaware of the fact that All Werewolves like Boy Bands. Sure, I can see how some Werewolves like Boy Bands, but ALL? Seems like a sampling error to me. FYI, ALL Mummies do like Funk Music.

  6. Mark, a lot of research went into these habits and I can tell you that boy bands tested much better than dog whistles, which totally proved my hypothesis wrong.

    Good to know about the mummies. I'll have to factor that in when building a Halloween playlist.

  7. Its refreshing to know that Science prevails.