Technology
Meta’s AI chief says world models are the key to ‘human-level artificial intelligence’, but it may not take another 10 years
Do today’s artificial intelligence models really remember, think, plan and reason like a human brain would? Some AI labs would tell us that is the case, but according to Meta’s Chief AI Scientist Yann LeCun, the answer isn’t any. But he thinks we could achieve it in a few decade using a brand new method called the “world model.”
Earlier this yr, OpenAI released a brand new feature it calls “memory” that permits ChatGPT to “remember” your conversations. The startup’s latest generation of models, o1, displays the word “thinking” when generating results, and OpenAI claims the same models are able to “complex reasoning.”
Everything indicates that we are close to AGI. However, during recent discussion at the Hudson ForumLeCun undermines AI optimists like xAI founder Elon Musk and Google DeepMind co-founder Shane Legg, who suggest that human-level artificial intelligence is just around the corner.
“We need machines that understand the world; (machines) that can remember things, that have intuition, that have common sense, things that can reason and plan at the same level as humans,” LeCun said during the call. “Despite what you have heard from the most enthusiastic people, current AI systems are not capable of this.”
LeCun argues that today’s large language models, reminiscent of those powered by ChatGPT and Meta AI, are a far cry from “human-level artificial intelligence.” He later said that humanity could possibly be “years or even decades” away from achieving such a goal. (But that does not stop his boss, Mark Zuckerberg, from asking him when AGI will occur.)
The reason is easy: these LLMs work by predicting the next token (often just a few letters or a brief word), and today’s image/video models predict the next pixel. In other words, language models are one-dimensional predictors and AI image/video models are two-dimensional predictors. These models have develop into quite good at predicting of their respective dimensions, but they do not really understand the three-dimensional world.
For this reason, modern artificial intelligence systems are unable to perform easy tasks that almost all humans can. LeCun notes that folks learn to clear the table at age 10 and drive a automotive at 17 – they usually learn each in a matter of hours. However, even the most advanced artificial intelligence systems in the world today, built on 1000’s or tens of millions of hours of information, cannot operate reliably in the physical world.
To achieve more complex tasks, LeCun suggests that we’d like to construct three-dimensional models that may perceive the world around us and cluster around a brand new variety of artificial intelligence architecture: world models.
“A world model is a mental model of how the world behaves,” he explained. “You can imagine a sequence of actions you might take, and your world model will allow you to predict what effect that sequence of actions will have on the world.”
Consider the “world model” in your personal head. For example, imagine you have a look at a unclean bedroom and wish to clean it. You can imagine how collecting all of your clothes and putting them away would do the trick. You haven’t got to try multiple methods or learn the way to clean a room first. Your brain observes three-dimensional space and creates an motion plan that may enable you to achieve your goal the first time. This roadmap is the secret that the models of the AI world promise.
Part of the profit is that world models can take in way more data than LLM models. This also makes them computationally intensive, which is why cloud service providers are racing to partner with artificial intelligence firms.
World models are a crucial concept that several artificial intelligence labs are currently working on, and the term is quickly becoming another buzzword attracting enterprise capital funds. A gaggle of esteemed artificial intelligence researchers, including Fei-Fei Li and Justin Johnson, just raised $230 million for his or her startup World Labs. The “Godmother of AI” and her team are also confident that world models will unlock much smarter AI systems. OpenAI also describes its unreleased Sora video generator as a world model, but doesn’t go into details.
LeCun outlined the idea of using world models to create human-level artificial intelligence in: Article from 2022 on “goal-driven artificial intelligence,” although notes that the concept is over 60 years old. In short, the basic representation of the world (for instance, a video of a unclean room) and memory are fed into the world model. The world model then predicts what the world will seem like based on this information. You then provide the goals for the world model, including the modified state of the world you wish to achieve (e.g. a clean room), in addition to guardrails to ensure the model doesn’t harm people in achieving the goal (don’t kill me in the middle of cleansing the room, please). The world model then finds a sequence of actions to achieve these goals.
According to LeCun, Meta’s long-term research lab, FAIR, or Fundamental AI Research, is actively working on constructing goal-driven models of artificial intelligence and the world. FAIR used to concentrate on artificial intelligence for Meta’s upcoming products, but LeCun says that in recent years the lab has begun to focus exclusively on long-term artificial intelligence research. LeCun says FAIR doesn’t even use LLM courses presently.
World models are an intriguing idea, but LeCun says we’ve not made much progress in making these systems a reality. There are numerous very difficult problems facing us where we are now, and he says it’s actually more complicated than we predict.
“It will be years, if not a decade, before we can get everything up and running here,” Lecun said. “Mark Zuckerberg keeps asking me how long it will take.”