Connect with us

Technology

The Movie Gen Meta model provides realistic video with sound, so we can finally have infinite Moo Deng

Published

on

No one really knows yet what generative video models are good for, but that does not stop firms like Runway, OpenAI, and Meta from investing thousands and thousands of their development. The latest version of Meta is titled Movie Genand true to its name, it turns text prompts into relatively realistic video with sound… but luckily no voice yet. And it’s sensible that they do not post it publicly.

Movie Gen is definitely a set (or “cast” as they call it) of basic models, the most important of which is the text-to-video bit. Meta claims it outperforms the likes of Runway Gen3, the newest LumaLabs release, and Kling 1.5, although as all the time, this type of thing shows more that they are playing the identical game than Movie Gen winning. The specs can be present in Meta’s release article describing all components.

Sound is generated to match the content of the video, adding, for instance, engine sounds corresponding to the movements of the automotive, the sound of a waterfall within the background, or thunder mid-video when required. He’ll even add music if he thinks it is vital.

It was trained on “a combination of licensed and publicly available datasets” that they called “proprietary/commercially sensitive” and provided no further details about it. We can only guess, which means there are various videos on Instagram and Facebook, in addition to some partner materials and rather more, that usually are not properly shielded from scrapers – i.e. “publicly available”.

However, Meta is clearly aiming here not only to say the “state-of-the-art” crown for a month or two, but for a practical, soup-to-nuts approach during which a quite simple material can be changed into a solid end product, a natural language prompt. Things like “imagine me as a baker baking a shiny hippopotamus-shaped cake during a storm.”

For example, one in every of the sticking points with these video generators is how difficult they have an inclination to be to edit. If you request a video of an individual crossing the road and also you realize that you just want the person to walk from right to left, not left to right. There’s a very good probability the entire shot will look different when you repeat the prompt with additional instruction. The meta adds an easy, text-based editing method where you can just say “change the background to a busy intersection” or “change her clothes to a red dress” and she is going to attempt to make that change, nevertheless it’s a change.

Image credits:Meta

Camera movements are also generally understood, and things like “tracking shot” and “panning left” are taken into consideration when generating video. It’s still quite clunky in comparison with actual camera controls, nevertheless it’s significantly better than nothing.

The model limitations are a bit strange. It generates video with a width of 768 pixels, a dimension familiar to most from the famous but outdated 1024×768 resolution, but which can be thrice the width of 256, so it plays well with other HD formats. The Movie Gen system upscales this resolution to 1080p, which is the source of the claim that it produces this resolution. Not entirely true, but we’ll leave them alone because upscaling is surprisingly effective.

Oddly enough, it generates as much as 16 seconds of video… at 16 frames per second, a frame rate that nobody in history has ever wanted or asked for. However, you can also record 10 seconds of video at 24 frames per second. Lead with it!

As for why it doesn’t play the voice… well, there are probably two reasons. First of all, it is extremely difficult. Generating speech is now easy, but matching it to lip movements and lip movements to faces is a rather more complicated proposition. I do not blame them for leaving it until later because it could have been a one-minute fail. Someone might say, “generate a clown delivering the Gettysburg address by riding around on a little bicycle” – nightmare fuel primed for popularity.

The second reason might be political: releasing a deepfake generator a month before the major elections is… not one of the best for optics. A practical preventive step is to barely limit its capabilities so that if malicious actors try to make use of it, it requires real work on their part. You can actually mix this generative model with a speech generator and an open mouth sync generator, but you can’t just generate a candidate making crazy claims.

“Movie Gen is currently purely an AI research concept, and even at this early stage, security is a top priority, as it is with all of our generative AI technologies,” a Meta representative said in response to TechCrunch’s questions.

Unlike, say, Llama’s large language models, Movie Gen is not going to be publicly available. You can replicate these techniques to some extent by following the research paper, however the code is not going to be published apart from the “baseline evaluation prompt dataset,” a record of the prompts used to generate the test videos.

This article was originally published on : techcrunch.com
Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology

Even the “Godmother of AI” has no idea what AGI is

Published

on

By

Are you confused about artificial general intelligence or AGI? This is the thing that OpenAI is obsessive about, ultimately creating in a way that “benefits all of humanity.” They may be price taking seriously because they only raised $6.6 billion to catch up with to that goal.

But should you’re still wondering what the heck AGI is, you are not alone.

During a wide-ranging discussion at the Credo AI Leadership Summit on Thursday, Fei-Fei Li, a world-renowned researcher often called the “godmother of artificial intelligence,” said she too doesn’t know what AGI is. In other moments, Li discussed her role in the birth of modern artificial intelligence, how society should protect itself from advanced artificial intelligence models, and why she thinks her latest unicorn startup World Labs will change every part.

But when asked what she considered the “AI singularity,” Li was as confused as the rest of us.

“I come from an artificial intelligence academic background and was educated in more rigorous and evidence-based methods, so I don’t really know what all these words mean,” Li told a packed room in San Francisco, next to a big window with overlooking the river. Golden Gate Bridge. “Honestly, I do not even know what AGI stands for. Like people say you recognize it whenever you see it, I do not think I’ve seen it. The truth is that I do not spend quite a bit of time fascinated by these words because I believe there are far more essential things to do…

If anyone were to know what AGI is, it’s probably Fei-Fei Li. In 2006, she created ImageNet, the world’s first large AI training and benchmarking dataset, which has been instrumental in catalyzing the current AI boom. In 2017–2018, she served as Chief Scientist for Artificial Intelligence/ML at Google Cloud. Today, Li directs Stanford’s Human-Centered AI Institute (HAI), and her startup World Labs builds “large models of the world.” (This term is almost as confusing as AGI, should you ask me.)

OpenAI CEO Sam Altman attempted to define AGI in a profile “New Yorker”. last 12 months. Altman described AGI as “the equivalent of the average person you might hire as an associate.”

Meanwhile, OpenAI statute defines AGI as “highly autonomous systems that outperform humans in the most economically valuable work.”

Apparently these definitions weren’t adequate to work for a $157 billion company. This is how OpenAI was created five levels uses internally to measure its progress towards AGI. The first tier is chatbots (like ChatGPT), then thinkers (apparently OpenAI o1 was that tier), agents (which supposedly shall be next), innovators (AI that might help invent things), and the last organizational tier (AI that may do work of the entire organization).

Still confused? Me too, Li too. Plus, it looks as if quite a bit greater than the average co-worker could do.

Earlier in the conversation, Li said she was fascinated by the idea of ​​intelligence from a young age. This led her to review artificial intelligence long before it became profitable. Li says she and several other others were quietly laying the foundations for the field in the early 2000s.

“In 2012, my ImageNet merged with AlexNet and GPUs – many individuals call it the birth of modern AI. They were based on three key ingredients: big data, neural networks and modern computations using graphics processors. I believe that when this moment got here, life in the entire field of artificial intelligence, and in our world, was never the same again.

When asked about California’s controversial artificial intelligence bill SB 1047, Li spoke fastidiously in order to not repeat the controversy that Governor Newsom had just put to rest by vetoing the bill last week. (We recently spoke with the writer of SB 1047 and he was more serious about reopening his argument with Li.)

“Some of you may know that I have been very vocal about my concerns about this bill (SB 1047) being vetoed, but at this moment I am thinking deeply and with great excitement about looking to the future,” Li said. “I was very honored, indeed honored, that Governor Newsom invited me to participate in the next steps after SB 1047.”

California’s governor recently tapped Li and other AI experts to form a task force to assist the state develop guardrails for AI deployment. Li said she takes an evidence-based approach on this role and can make every effort to support academic research and funding. But he also desires to be sure that California won’t punish technologists.

“We need to actually have a look at the potential impact on people and our communities, fairly than putting the blame on the technology itself… It would not make sense for us to penalize an automotive engineer – say Ford or GM – if a automotive is misused intentionally or unintentionally and harms a human being. Just punishing a automotive engineer won’t make cars safer. We must proceed to innovate for safer measures, but additionally improve the regulatory framework – whether it’s seat belts or speed limits – the same goes for artificial intelligence.”

This is one of the higher arguments I’ve heard against SB 1047, which might penalize tech firms for unsafe artificial intelligence models.

While Li advises California on artificial intelligence regulation, he also runs his startup World Labs in San Francisco. This is Li’s first time founding a startup, and she or he is one of the few women running a state-of-the-art artificial intelligence lab.

“We are far from a very diverse artificial intelligence ecosystem,” Li said. “I believe that diverse human intelligence will lead to diverse artificial intelligence and simply give us better technology.”

She is excited to bring “spatial intelligence” closer to reality in the next few years. Li argues that the development of human language, on which today’s large language models are based, probably took 1,000,000 years, while vision and perception probably took 540 million years. This implies that creating large models of the world is a far more complex task.

“It’s not just about enabling computers to see, but actually understanding the entire 3D world, which I call spatial intelligence,” Li said. “We don’t just see naming things… What we actually see is how to do things, how to move through the world, how to interact with each other, and closing the gap between seeing and doing requires spatial knowledge. As a technologist, I’m very excited about this.”

This article was originally published on : techcrunch.com
Continue Reading

Technology

Investors are scrambling to get into ElevenLabs, which could soon be valued at $3 billion

Published

on

By

Illustration of a person speaking and a computer process changing the words.

As TechCrunch has learned, ElevenLabs, a startup that creates AI tools for audio applications, is being approached by existing and latest investors a few latest round that could value the corporate at up to $3 billion.

The two-year-old company makes a speciality of creating AI tools to generate synthetic voices for audiobook narration and to dub videos in real time into other languages.

One source at an interested VC firm told TechCrunch that investors are trying to get into the fast-growing company and that their company is willing to offer a valuation of up to $3 billion, pondering which may be enough to advance to the subsequent round. The person said a deal was likely in the approaching weeks.

Investors from two other firms have confirmed that ElevenLabs is raising, but are withdrawing from the deal. One of those sources heard secondhand that the corporate’s annual recurring revenue (ARR) has increased from $25 million at the tip of last 12 months to about $80 million in recent months, making it certainly one of the fastest-growing startups developing real-world AI applications . (These investors requested anonymity for competitive reasons.)

If this data is accurate, it means investors could value ElevenLabs at around 38 times its latest ARR value. This multiple is barely lower than some enterprise-focused firms comparable to Hebbia and Glean.

The lower multiple may be due to the incontrovertible fact that a good portion of revenue comes from consumer use for narration and private video dubbing. Consumer revenues are often considered more volatile than revenues generated by corporate customers.

The round, if it leads to a $3 billion valuation, will triple ElevenLabs’ valuation from its January Series B co-led by Andreessen Horowitz, Nat Friedman and Daniel Gross.

This would be Eleven Labs’ third round in only over a 12 months, but TechCrunch could not learn the scale of the potential investment because talks with investors are still ongoing. Eleven Labs has already raised $100 million.

While Google’s Gemini and OpenAI have introduced their very own human voice models, neither offering allows you to clone other people’s speech like Eleven Labs. Other firms targeting the synthetic voice generation market include Murf, Tavus, Resemble AI, Respeecher and Lovo.

ElevenLabs didn’t respond to a request for comment.

This article was originally published on : techcrunch.com
Continue Reading

Technology

Google is introducing ads to AI reviews, expanding AI’s role in search

Published

on

By

The Google Inc. logo

Google will start showing ads in AI reviews, that are the AI-generated summaries it provides for certain Google Search queries, and will even add links to relevant web sites for a few of those summaries. AI-organized search results pages will even be available in the US this week.

The growing importance of artificial intelligence in Google’s core search engine is aimed toward keeping users from switching to alternatives comparable to ChatGPT or OpenAI’s Perplexity, which use artificial intelligence to answer lots of the questions traditionally asked to Google. Embarrassment he said in May that its worldwide user base had grown to over 85 million web visits, a drop in the bucket compared to Google, but impressive considering Perplexity launched just two years ago.

Since its launch this spring, AI Reviews has been the topic of much controversy, with its dubious claims and dubious advice (like adding glue to pizza) gaining huge popularity online. Recent report from the search engine marketing platform SE Ranking found that AI Reviews cites sites which might be “not completely trustworthy or evidence-based,” including outdated research and paid product listings.

The major problem is that AI Reviews sometimes has difficulty distinguishing whether a source of knowledge is fact or fiction, satire or serious matter. Over the past few months, Google has made changes to how AI Reviews work, including limiting responses related to current events and health topics. But the corporate doesn’t claim it’s perfect.

“We will invest in AI reviews to make them even more useful,” Rhiannon Bell, vice chairman of user experience at Google Search, said at a press conference. “We do everything we can to provide our users with relevant content.”

Separately, Google says AI Reviews has led to a rise in Google Search engagement, especially amongst 18- to 24-year-olds – a key demographic for the corporate.

Now Google is taking steps to monetize this feature by adding ads.

Image credits:Google

US mobile device users will soon see ads in AI Reviews with “relevant queries” comparable to how to remove grass stains from jeans. Ads labeled “Sponsored” will appear alongside other unsponsored content in AI summaries and can be pulled from advertisers’ existing campaigns on Shopping and the Google Search network.

AI Reviews ads have been available to select users for a while, and according to internal Google data, they’ve been well received.

“People have found AI advertising useful because it allows them to quickly connect with the right companies, products and services to take the next step exactly when they need it,” Shashi Thakur, vice chairman of Google Ads, wrote in a blog post shared with TechCrunch .

But ads also litter AI summaries. One of the formats, the carousel of sponsored product results, is embedded directly in AI summaries and placed in such a way that unsponsored content is pushed to the screen.

Search results organized by artificial intelligence
Image credits:Google

The recent look of AI Reviews that appears alongside ads adds highlighted links to web sites that could be relevant. For example, when you search “Do air filters protect your lungs?” AI Reviews may link to a study on air filters conducted by the American Lung Association.

The redesign was tested for several months and is currently being rolled out in regions where AI Overviews were already available, including India, Brazil, Japan, Mexico, the US and the UK

Finally, this week in the US, a separate product will debut on mobile devices – search results pages organized by artificial intelligence. Searches for recipes and meal inspiration – like “What are some good vegetarian snacks or dinner ideas that make an impression?” – can display an AI-aggregated page of content from across the web, including forums, articles and YouTube videos.

However, they are going to not include AI Reviews ad formats.

“The customized Gemini (model) generates an entire page of relevant and structured results,” Bell explained, referring to Google’s Gemini family of artificial intelligence models. “With AI-organized results pages, we are serving more diverse content formats from a more diverse set of content.”

Google says it plans to expand these pages to other search categories in the approaching months.

Publishers may suffer collateral damage.

One study found that AI reviews can negatively impacting roughly 25% of publisher traffic due to the reduced emphasis on website links. On the revenue side, an authority quoted by The New York Post estimated that AI-generated reviews could result in publisher losses of greater than $2 billion due to the resulting decline in ad impressions.

AI-generated search results from Google and competitors don’t yet appear to block traffic from large publishers. In their latest earnings, Ziff Davis and Dotdash Meredith – IAC parents characterised effects as negligible.

But which will change because Google which commands over 81% of the worldwide search market, expands AI overviews and AI organized pages for more users and queries. According to one estimateAI overviews only showed up for about 7% of searches in July, as Google re-targeted the feature to make changes.

Google says it continues to take publishers’ concerns under consideration during its AI-powered search workshops.

This article was originally published on : techcrunch.com
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending