If Borussia Dortmund had trusted the data, Jürgen Klopp might still be BVB coach today. Or at least he would have stayed a little longer than until the summer of 2015. That’s a bold thesis, but the story behind it tells a little about the advancing digitalization of professional football.
When BVB unexpectedly crashed into the bottom of the Bundesliga table in the first half of the 14/15 season, blogger Colin Trainor created an analysis for the homepage “statsbomb.com”. The auditor comes to the conclusion that Klopp’s team should have scored 25 goals instead of 17. They should have only conceded 17 goals instead of 26. The result: BVB should have been in fourth place after 17 match days – and not second to last.
An expected goals (xG) model told him all of this. To put it simply, this model summarizes all the shots that a team makes over the course of a game and the probability that they would have gone into the goal. The basis for this is the numbers from a database of countless previous games. This results in an xG total value.
Trainor’s conclusion is: BVB had unusually bad luck over a long period of time and should have had more goals and therefore more points. In the second half of the season the bad luck disappeared and Borussia even made it into the European Cup. The separation from Klopp has already been announced.
Not only good and bad luck can be calculated
It’s just one story among many, but it’s about the impact of data on football. Nowadays, xG values have long been shown to viewers in television broadcasts of the Bundesliga. Coaches prepare their teams for their next opponents using metrics such as spatial control or pressing index. Not only can luck and bad luck be calculated, but the computer can also be used to find the ideal new addition for your own team. All this thanks to big data, football has long since become a data game. Or?
“I experienced this flood of data for the first time when I was a coach at Fulham FC in England in 2014,” says Felix Magath, who is critical of the issue. “The fallacy is that data can say anything. The stupid thing is that football is still played by people. And we are different every day.” The 70-year-old says he doesn’t know how much data is currently available: “But in any case too much.” You can present data however you want. “But there is one piece of information, the table: It doesn’t lie,” says Magath.
So was Colin Trainor wrong? There are quite a few people who see things completely differently than Magath. Basically, these are all big clubs. There have long been a number of tracking cameras hanging in every Bundesliga stadium, recording 3.6 million position data points per game. This means: Not a single movement of players or ball remains unobserved. The German Football League (DFL) then makes some of this raw data available live to the Bundesliga clubs. Next to each head coach there are analysts on the bench who can provide him with information via tablet. But what part of this jumble of numbers really helps you win a game?
“Passes received, kilometers run by a team, possession of the ball, all the simple metrics have no meaning at all for knowing which team wins or loses a football game,” says sports scientist Daniel Memmert from the Institute for Training Science and Sports Informatics at the Cologne Sports University. An example: In the German national team’s legendary 2014 World Cup semi-final against Brazil, the Seleção was better in many of these simple metrics. Nevertheless, in the end the DFB team won 7:1.
Some Bundesliga clubs work with statisticians
However, it has been scientifically proven by Memmert and his team that the chance of victory increases for the team that has better spatial control in the opponent’s 30-meter area. The team that presses better also has a better chance of winning. “More complex metrics create this meaningfulness,” says Memmert.
All such things can be calculated these days. Not only a top club like Champions League winners Manchester City has long been employing astrophysicists or other scientists to filter out exactly what coach Pep Guardiola will ultimately benefit from the available data. Some Bundesliga teams also work with mathematicians or statisticians. In addition to the DFL data, most clubs also obtain data from external providers such as Stats Perform or Impect. There are other computer programs for scouting possible new arrivals. All of this costs the Bundesliga clubs six-figure sums per season.
And this is just a small part of a huge business area. In the English Premier League, some clubs even bought some of these external providers years ago in order to have their data available exclusively. “Of course the whole thing also has to do with money. Someone who has a lot of money can buy tons of data and of course the best players,” criticizes Magath. “If you win in the end, you can of course say it was the data.” On the other hand, his colleague Jürgen Klopp might have remained BVB coach longer if the data had been trusted.