Couple events have occurred over the past few weeks that had me thinking about how my work is interpreted and used in various hockey-related discussions. I figure using this blog as a platform would be the best way to summarize my thoughts on this matter, and clarify any issues that arise in the future.
This always makes me cringe. For the simple fact that “analytics” is used so interchangeably, especially in hockey, that I often have to re-familiarize myself with the concept. To me, analytics is the process of collecting raw data, refining it, applying different models, finding correlations and ultimately, looking for some sort of pattern to make a decision on. I do some of this, but leave the hard work to people who know what they’re doing.
My approach is to start with a question and then find the data that’s already been scraped from NHL.com and aggregated in an easy to use format. Thanks to websites like War on Ice, Hockey Analysis, Behind the Net and Natural Stat Trick, all I have to do is find the metric that’s been derived from the analytics (i.e., Corsi, Fenwick, etc) and apply it to whatever question or topic I have. I do look for patterns. I do look for correlations. But the bulk of the work is done by real analytics-type people with backgrounds in computational science and statistics. Once I have the data, and run my analysis, I try to explain in 750 words why my topic matters, what I found and what I think the next steps are.
Now this one is relatively new.
Last week when Mark Fayne was put on waivers to be sent to the AHL, I openly questioned how Eric Gryba was any better than Fayne. Without a doubt Fayne has struggled mightily this season, even getting benched at times and healthy scratched. But I still consider him ahead of guys like Gryba and others for the simple fact that he’s a proven player and has more experience playing against top competition. Gryba has not looked good to me at all, and does not appear to have the ability to move up and down the lineup like Fayne would. For what it’s worth, my own analysis found that Fayne wasn’t shooting at a frequency that McLellan expects from his defencemen, and this might be why he’s been waived.
Now I do look to shots and shot attempt data mainly because it’s a good indicator of possession and has been reviewed and analyzed by some very bright people (Arctic Ice Hockey, Pension Plan Puppets, SB Nation to name a few). It’s not perfect and can’t answer every question, but I have my reasons for using it.
First off, shots and shot attempts tend to be the best metric for the question I have or the topic I’m exploring. My thoughts aren’t that overly complicated, so I can typically track down the exact data set I need rather quickly, without having to using any modelling to test correlations. If I can’t find the dataset, I ask around. That’s how I found things like Ryan Stimson’s passing project or Corey Sznajder’s Zone Entry project.
Quick note: What I stress to anyone who’s looking into any sort of analytics, whether it be hockey or business, is to approach the data with specific questions. And be ready for continuous analysis and discussion. Analytics does not provide any sort of final answer. In my opinion, the best analytics articles are the ones that leave you with more questions.
I also like the shot data because it’s readily available to anyone and everyone. Using a data source that’s used by many other people gives my work some credibility and also makes my work verifiable.
Having said all of that, I’ve always remained open to new metrics that have some thought and explanation to it. Hockey analytics is only in its infancy, so I expect people to collect and aggregate data which can only push the discussion along. Examples include dCorsi, Dangerous Fenwick, xGoals and the results from manual data collection projects.
So should using metrics such as Corsi or sharing the work of others who use Corsi make me a “Corsi Guy”? Hardly.
This one sounds all warm and fuzzy, but it’s been used as a way to put down a whole group of people when really the target might be one or two.
Another problem with this phrase is the generalization of the intended participants. There are some in this community that are the actual statisiticians who parse through and test the data. There are some that do the aggregation (i.e., War on Ice). There’s the visualization people. And then there are those that have an understanding of the data and just like reading articles about it. So when someone says “Analytics Community”, I really have no idea who this is referring to and tend to ignore the rest of their issue.
And finally, there is a lot (a LOT) of disagreement among fans when it comes to the application of analytics to hockey. Player A might look great to one person using this metric, while Player B might look better using another metric. But when someone says “Analytics Community”, it sounds like everyone is on the same page have come to the same results.
There are a few local media types, ones that work full time for one of the major outlets, that tend to stir the pot to draw extra attention to their work. We know this is part of the game when it comes to covering sports in Edmonton. A lot of it is what I refer to as scripted ignorance. For instance, taking a shot at the “analytics community” is a good way to get under the skin of a lot of people and draw attention to themselves. It’s usually the same three or four local reporters that tend to do this. This doesn’t bother me because statistical analysis has been done for a long, long time. It’s a way for fans to get into the game and it helps to add to the discussion. Plus, the beauty of modern communication technology is that individual fans create their own little ecosystem and control what information they receive, create and share.
What does bother me is how the rest of the folks who cover the game get lumped with the few ignorant ones. Outside of our Oilers bubble, “Edmonton Media” does not have a good reputation, which isn’t fair to the individuals who actually do make a conscience effort of expanding their scope to include analytics. And the reputation of Edmonton being a tough place to play is warranted, but has been driven by a lot of the garbage content produced by the few.
As always, feedback is appreciated.