Published 16:37 IST, November 11th 2024
OpenAI and others seek new path to smarter AI as current methods hit limitations
The so-called ‘training runs’ for large models can cost tens of millions of dollars by simultaneously running hundreds of chips.
Advertisement
Artificial intelligence companies like OpenAI are seeking to overcome unexpected delays and challenges in pursuit of ever-bigger large langu models by developing training techniques that use more human-like ways for algorithms to "think". A dozen AI scientists, researchers and investors told Reuters y believe that se techniques, which are behind OpenAI's recently released o1 model, could reshape AI arms race, and have implications for s of resources that AI companies have an insatiable demand for, from energy to s of chips.
OpenAI declined to comment for this story.
Advertisement
After release of viral ChatGPT chatbot two years ago, techlogy companies, whose valuations have benefited greatly from AI boom, have publicly maintained that “scaling up” current models through ding more data and computing power will consistently le to improved AI models. But w, some of most prominent AI scientists are speaking out on limitations of this “bigger is better” philosophy.
Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, told Reuters recently that results from scaling up pre-training - phase of training an AI model that uses a vast amount of unlabeled data to understand langu patterns and structures - have plateaued.
Advertisement
Sutskever is widely credited as an early vocate of achieving massive leaps in generative AI vancement through t he use of more data and computing power in pre-training, which eventually created ChatGPT. Sutskever left OpenAI earlier this year to found SSI.
“ 2010s were of scaling, w we're back in of wonder and discovery once again. Everyone is looking for next thing,” Sutskever said. “Scaling right thing matters more w than ever.”
Sutskever declined to share more details on how his team is dressing issue, or than saying SSI is working on an alternative approach to scaling up pre-training.
Advertisement
Behind scenes, researchers at major AI labs have been running into delays and disappointing outcomes in race to release a large langu model that outperforms OpenAI’s GPT-4 model, which is nearly two years old, according to three sources familiar with private matters.
so-called ‘training runs’ for large models can cost tens of millions of dollars by simultaneously running hundreds of chips. y are more likely to have hardware-induced failure given how complicated system is; researchers may t kw eventual performance of models until end of run, which can take months.
Advertisement
Ar problem is large langu models gobble up huge amounts of data, and AI models have exhausted all easily accessible data in world. Power shorts have also hindered training runs, as process requires vast amounts of energy.
To overcome se challenges, researchers are exploring “test-time compute,” a technique that enhances existing AI models during so-called “inference” phase, or when model is being used. For example, inste of immediately choosing a single answer, a model could generate and evaluate multiple possibilities in real-time, ultimately choosing best path forward.
Advertisement
This method allows models to dedicate more processing power to challenging tasks like math or coding problems or complex operations that demand human-like reasoning and decision-making.
“It turned out that having a bot think for just 20 seconds in a hand of poker got same boosting performance as scaling up model by 100,000x and training it for 100,000 times longer,” said am Brown, a researcher at OpenAI who worked on o1, at TED AI conference in San Francisco last month.
OpenAI has embraced this technique in ir newly released model kwn as "o1,” formerly kwn as Q* and Strawberry, which Reuters first reported in July. O1 model can "think" through problems in a multi-step manner, similar to human reasoning. It also involves using data and feedback curated from PhDs and industry experts. secret sauce of o1 series is ar set of training carried out on top of ‘base’ models like GPT-4, and company says it plans to apply this technique with more and bigger base models.
At same time, researchers at or top AI labs, from Anthropic, xAI, and Google DeepMind, have also been working to develop ir own versions of technique, according to five people familiar with efforts.
“We see a lot of low-hanging fruit that we can go pluck to make se models better very quickly,” said Kevin Weil, chief product officer at OpenAI at a tech conference in October. “By time people do catch up, we're going to try and be three more steps ahe.”
Google and xAI did t respond to requests for comment and Anthropic h immediate comment.
implications could alter competitive landscape for AI hardware, thus far dominated by insatiable demand for Nvidia’s AI chips. Prominent venture capital investors, from Sequoia to Andreessen Horowitz, who have poured billions to fund expensive development of AI models at multiple AI labs including OpenAI and xAI, are taking tice of transition and weighing impact on ir expensive bets.
“This shift will move us from a world of massive pre-training clusters toward inference clouds, which are distributed, cloud-based servers for inference,” Sonya Huang, a partner at Sequoia Capital, told Reuters.
Demand for Nvidia’s AI chips, which are most cutting edge, has fueled its rise to becoming world’s most valuable company, surpassing Apple in October. Unlike training chips, where Nvidia dominates, chip giant could face more competition in inference market.
Asked about possible impact on demand for its products, Nvidia pointed to recent company presentations on importance of technique behind o1 model. Its CEO Jensen Huang has talked about increasing demand for using its chips for inference.
"We've w discovered a second scaling law, and this is scaling law at a time of inference...All of se factors have led to demand for Blackwell being incredibly high," Huang said last month at a conference in India, referring to company's latest AI chip.
16:37 IST, November 11th 2024