I wanted to expand on this post by Justis about declining participation during the combine (it’s great, go and read it) because I don’t really think there is a way to fix any of this, and I wanted to discuss why that is.

The amount of public access we have to sports data is a bit of a weird accident in the first place. The reason that newspapers printed baseball (and eventually football) box scores in the first place was to save ink and page space, and to present, as efficiently as possible, a record of what happened in baseball games featuring the non-local teams.

Database software allowed later baseball nerds to create the first batch of advanced statistics like WAR, and eventually its football cousins, EPA and DVOA, because box scores were available to leverage, but things didn’t stop there. We now have advanced player and ball tracking systems, expected outcome statistics, and tons of other fun metrics created by amateur and professional statisticians alike.

The reason we have detailed pitch tracking analysis in baseball is also a happy accident. The technology that originally tracked pitching at a detailed level was a reconstituted version of the NHL’s FoxTrax, which was intended to make it easier to follow the puck in televised hockey games. People hated the glowing puck (and especially hated how it left a streak behind on hard shots), and so to make ends meet, FoxTrax turned into PitchTrax/PitchFX, simply because they were looking for some use for their now-discarded hardware/software platform. A side effect of that repurposing was that the data was made public because MLB did not control it.

We’ve been living in the golden age of public sports data ever since, however there are alarming signs that this is about to change. The public loves having all of this data, but only a small minority of fans are willing to pay for it, and those incentives are starting to become a problem. Moreover, individual teams are actually willing to pay quite handsomely for any data that provides a competitive advantage. If you think about how the recently acquired Pro Football Focus works, they chart and track a ton of data at great cost and sell some of it to average fans and journalists for hundreds of dollars, while selling far, far more to teams for hundreds of thousands of dollars. The investment in performing this service is subsidized by the teams, but that also gives teams far more power and leverage over that data and its availability.

The NFL Scouting Combine is sort of like the old box score in baseball. Back before information could be easily tracked, recorded, and sent over the internet, it was efficient to get everyone together in one space and test for athletic ability. Instead of going to 28 or 30 individual workouts, everyone was able to see and interpret everyone’s workout over a few days. It also made more sense in the days of the powerless student-athlete, when the NFL could essentially make prospects do whatever they wanted.

Everything about this has changed. NFL teams can garner almost all of the data they would get from the Combine from tape study, and in many cases, it’s not just the same, but better. We get inklings of this with the NFL’s NextGenStats tracking and with the college bowl games reporting on the top speed achieved by players or the RPMs on passes, but those are just the tip of the iceberg. And because players now have more money and power via NIL deals, and I would argue, better advice from agents, many skip the Combine because they correctly see that it can do much more harm than good to their draft prospects. This is especially true given that conditions and timing on drills are not optimized for the players’ performance, but for TV, and due to the fact that scores tend to vary greatly year to year, depending on what time of day drills are run. And of course, there are the bizarre measuring errors that Justis mentioned.

The NFL Draft is also a huge business unto itself, and even a diminished Combine will be popular, but there’s simply no incentive to dramatically improve participation rates. It’s a shame, as the public data analytics, aided by data visualization options like Kent Lee Platte’s RAS, have become a fantastic resource for fans, but when only 37% of prospects are even bothering, selection bias and sample size start to creep in, and even the data we do collect becomes suspect, especially when we’re comparing the numbers we do collect to an average of those numbers.

We’ve arrived at a situation where individual teams don’t care about the Combine because they have their own, better data already. Players don’t care because for many, the Combine is more likely to hurt than help. Legally, no one can be forced to participate (which is a good thing). And above all, because teams are doing more of the measuring internally, the existence of public-facing data is now under enormous threat.