AI and Machine Learning being Trained at Double Speed

Seems to be no coincidence that the emergence of artificial intelligence and machine learning has occurred at exactly the time that humankind needs it most, as the entire world struggles to keep pace with a global population that continues to explode and that explosion bringing a whole whack of major challenges along with it. Especially when it comes to continuing to provide everyone with what they’ve come to expect in regards to essential services and the like. The need is pressing, but fortunately there are very talented engineers and equally dedicated directors above them who are applying themselves to the best of their ability.

While in the simplest sense this won’t have anything directly to do with providing web hosting, our industry is one that is already in the process of being touched by this trend too as there are definitely applications for better and more efficient data management and data streamlining that can be made possible by machine learning. One of those secondary affects from it could be in better proactivity for managing data overloads that can be caused by unexpected external influences. The major heatwave in the UK 2 weeks ago caused some data centers to have to shut down.

Machine learning may provide systems with the means of getting ahead of the curve with dealing with that so a complete shutdown isn’t needed. What would occur isn’t exactly what i known as load shedding, but the process would be similar; being able to foresee what’s coming and knowing where to best make cuts temporarily so that the cumulative affect of it all isn’t so catastrophic. As a Canadian web hosting provider, those of us here at 4GoodHosting can see all sorts of promise in this.

2x Speed

There are now a set of benchmarks – MLPerf – for machine-learning systems that are determine that they can be trained nearly 2x as quickly as they could last year. The bulk of these training speed gains are thanks to software and systems innovations, but new processors from Graphcore and Intel subsidiary Habana Labs plus others are contributing nicely too.

Previously there was no getting around the fact it took neural networks a REALLY long time to do their task. This is what drove companies like Google to develop machine-learning accelerator chips in house. But the new MLPerf data shows that training time for standard neural networks has gotten a lot less taxing in very little time. Neural networks can now be trained exponentially faster than what you would expect, and this really is a beautiful thing when you understand the big picture relevance of it.

It is prompting machine-learning experts to dream big, especially as the new neural networks continues to outpace computing power. MLPerf is based on 8 benchmark tests:

  • image recognition
  • medical-imaging segmentation
  • two versions of object detection
  • speech recognition
  • natural-language processing
  • recommendation
  • form of gameplay called reinforcement learning

As of now, systems built using Nvidia A100 GPUs have been dominating the results. Much of that can be attributed to Nvidia’s new GPU architecture, Hopper, that was designed with architectural features aimed at speeding training.

as for Google’s offering, TPU v4 features impressive improvements in computations per watt over its predecessor, now being able to compute 1.1 billion billion operations per second. At that scale, the system only needed just over 10 seconds for the image-recognition and natural-language-processing trainings.

Another standard trend is to have an IPU where both chips in the 3D stack do computing. The belief here is that we could see machine-learning supercomputers capable of handling neural networks 1,000 times or more as large as today’s biggest language models.

Better Sequencing Lengths

This advance goes beyond networks themselves, as there is also need for the length of the sequence of data the network to promote reliable accuracy. In simpler terms this relates to how many words a natural-language processor is able to be aware of at any one time, or how large an image a machine vision system is able to view. As of now those don’t really scale up well, but the aim is double its size and then quadruple the scale of the attention layer of the network. The focus remains of building an algorithm that gives the training process an awareness of this time penalty and a way to reduce it.