None of us were alive when Greek Mythology was in its heyday, but if you were you’d certainly now that Cerberus was the 3-headed dog that guarded the gates to Hades. Sure, there will be a few history buffs alive today that know of this mythological creature, but it really is a shame that generally speaking humans don’t read books like they used to. But enough about that, this blog is about anything and everything related to web hosting and / or computer development so fair to ask where we’re going with this.
No one at Cerebras Systems has more than one head and will never have been anywhere near any entrance to the Underworld. But Cerebras does have the distinction of being the maker of the world’s largest processor. And if you’re an engineer who works with development in AI then you may well think they’ve outdone themselves with their newest offering to the development world. Do a little research into AI and you’ll learn that the capacity it has is entirely based on being able to add more parameters to it.
20 billion is a big number, and so the reason this news around the CS-2 system and WSE-2 chip is so big is because that the newest offering from Cerebras is able to train AI models with up to 20 billion parameters. This is the ultimate in optimization at the software level, and that’s why it is something that a top Canadian web hosting provider like those of us here at 4GoodHosting are going to take an interest in. Like most, we have a nice front row seat for everything that major advances in computing technology have promise for doing for us.
So let’s use this week’s entry to have a much deeper look into these CS-2 devices. Long before AI technology becomes commonplace in darn near everything of course, but it’s good to be early rather than late.
No More Mass Partitioning
What this all promises to do is resolve one of the most frustrating problems for AI engineers. That being the need to partition large-scale models across thousands of GPUs in order to facilitate full cross-compatibility. What the CS-2 promises to do is drastically reduced the time it takes to develop and train new models.
Natural Language Processing has undeniable benefits, but the degree of functionality for them is entirely dependent on the number of these parameters that can be accommodated. To date the way it has worked is that the performance of the model correlates in a linear fashion with the number of parameters. Larger models provide better results, but these days the development of large-scale AI products traditionally requires a large number of GPUs or accelerators, with the models spread across them.
The wheels fall off when there are too many parameters to be housed within memory or compute performance is incapable of handing training workloads. Compounding the problem then is the way the process is unique to each network compute cluster pair, so each cluster has to be catered to individually and that makes the whole thing so much more of a drawn-out process. That’s time many of these engineers will be displeased to be losing.
It is true that the most complex models consist of many more than 20 billion parameters, but it may be possible now that the ability to train relatively large-scale AI models on a single CS-2 device will do away with these bottlenecks and majorly accelerate development for existing players, plus increasing access for those previously unable to participate in the space. More cooks in the kitchen is a plus when it comes to high tech development much of the time.
The consensus seems to be that the CS-2 may we have the ability to bring large language models to the masses in a cost-efficient way, and this greater ease of access may well be the stimulus needed for a new era in AI where big new steps are made and in the long run society benefits from it in many different ways.
There’s potentially more to this too, as according to Cerebras the CS-2 system may be able to accommodate even bigger models down the line, perhaps even capable of trillions of parameters. If that’s true, chaining together multiple CS-2 systems could start us down the path to AI networks that are more expansive than the human brain.