Mathematical trick could halve artificial intelligence development time

You can train artificial intelligence to recognize faces, for example – but before that succeeds, the AI ​​must first take millions of practice steps. A new way to train AI saves time, energy and computing power.

Artificial intelligence (AI) is getting better at performing complex tasks, but training AI requires an enormous amount of computing power. A more efficient technique could halve the time, energy and required computing power.

Artificial intelligence often uses a method that deep learning is called. The AI ​​consists of a network of artificial ‘nerve cells’. These are linked together by computer code, which takes input, assigns a certain weight to it, and then relays a modified output. In this way, the network resembles a rough version of our brain, in which nerve cells are connected to each other.

ALSO READ
Ancient Magnetism

By always assigning different weights to the input, the network can slowly but surely learn a certain task, such as recognizing faces or digitizing handwritten texts. Before that succeeds, the network has to change the assigned weights thousands to millions of times.

Back and forth

To train a model, researchers send data into the network, after which they assess the quality of the output. They then calculate a “gradient”: a value that dictates how the weights must change to improve the output. During this process, data constantly moves back and forth through the network, slowly establishing the gradient.

Atılım Guneş Baydin, an AI researcher at the University of Oxford, has now simplified that process with colleagues. Instead of the process having to be done in two steps – with the data moving back and forth – it can now be done in a step† The gradient is established so accurately in the first step that a second calculation is no longer necessary.

In theory, this could halve the development time of AI. The team compared their new AI model with an ‘old-fashioned’ model, and saw that their AI performed its tasks just as well.

math trick

Calculating the gradient in one step is a ‘simple math trick’, says Andrew Corbett† He is a computer scientist at the University of Exeter. That simple trick could potentially solve one of the biggest problems for AI researchers: the ever-increasing computing power required.

According to Corbett, this could provide a cheaper way to train AI. ‘It is very important to solve the problem of [gebrek aan rekenkracht] to solve, because that is the bottleneck of AI algorithms.’

super computers

Modern AI systems use billions of data. It takes weeks to months to train the necessary giant supercomputers.

One of the largest neural networks in use is the Megatron-Turing Natural Language Generationmodel, which contains 530 billion data. It is trained with Nvidia’s Selene supercomputer, which consists of 560 powerful servers and 4480 graphics cards. Each of those cards costs thousands of euros. Despite the machine’s massive computing power, it takes more than a month to train this model.

halving

In the best-case scenario, computing power has now been halved, but that is far from certain, says Güneş Baydin. How efficient the method is will only become clear when other researchers use it for all kinds of different AI tasks. For some applications it may work better than others.

‘You can perform a single optimization calculation faster with this algorithm, but that doesn’t necessarily mean that it works twice as fast on a large scale. There are more factors involved in this,’ he says. ‘In some cases it may work worse than the old method, and it just needs more calculations to achieve the same training quality. And if that happens, the new method may not be able to compete with the old one.’

ttn-15