Technology

The Movie Gen Meta model provides realistic video with sound, so we can finally have infinite Moo Deng

Published

8 months ago

October 4, 2024

IAM

No one really knows yet what generative video models are good for, but that does not stop firms like Runway, OpenAI, and Meta from investing thousands and thousands of their development. The latest version of Meta is titled Movie Genand true to its name, it turns text prompts into relatively realistic video with sound… but luckily no voice yet. And it’s sensible that they do not post it publicly.

Movie Gen is definitely a set (or “cast” as they call it) of basic models, the most important of which is the text-to-video bit. Meta claims it outperforms the likes of Runway Gen3, the newest LumaLabs release, and Kling 1.5, although as all the time, this type of thing shows more that they are playing the identical game than Movie Gen winning. The specs can be present in Meta’s release article describing all components.

Sound is generated to match the content of the video, adding, for instance, engine sounds corresponding to the movements of the automotive, the sound of a waterfall within the background, or thunder mid-video when required. He’ll even add music if he thinks it is vital.

It was trained on “a combination of licensed and publicly available datasets” that they called “proprietary/commercially sensitive” and provided no further details about it. We can only guess, which means there are various videos on Instagram and Facebook, in addition to some partner materials and rather more, that usually are not properly shielded from scrapers – i.e. “publicly available”.

However, Meta is clearly aiming here not only to say the “state-of-the-art” crown for a month or two, but for a practical, soup-to-nuts approach during which a quite simple material can be changed into a solid end product, a natural language prompt. Things like “imagine me as a baker baking a shiny hippopotamus-shaped cake during a storm.”

For example, one in every of the sticking points with these video generators is how difficult they have an inclination to be to edit. If you request a video of an individual crossing the road and also you realize that you just want the person to walk from right to left, not left to right. There’s a very good probability the entire shot will look different when you repeat the prompt with additional instruction. The meta adds an easy, text-based editing method where you can just say “change the background to a busy intersection” or “change her clothes to a red dress” and she is going to attempt to make that change, nevertheless it’s a change.

Image credits:Meta

Camera movements are also generally understood, and things like “tracking shot” and “panning left” are taken into consideration when generating video. It’s still quite clunky in comparison with actual camera controls, nevertheless it’s significantly better than nothing.

The model limitations are a bit strange. It generates video with a width of 768 pixels, a dimension familiar to most from the famous but outdated 1024×768 resolution, but which can be thrice the width of 256, so it plays well with other HD formats. The Movie Gen system upscales this resolution to 1080p, which is the source of the claim that it produces this resolution. Not entirely true, but we’ll leave them alone because upscaling is surprisingly effective.

Oddly enough, it generates as much as 16 seconds of video… at 16 frames per second, a frame rate that nobody in history has ever wanted or asked for. However, you can also record 10 seconds of video at 24 frames per second. Lead with it!

As for why it doesn’t play the voice… well, there are probably two reasons. First of all, it is extremely difficult. Generating speech is now easy, but matching it to lip movements and lip movements to faces is a rather more complicated proposition. I do not blame them for leaving it until later because it could have been a one-minute fail. Someone might say, “generate a clown delivering the Gettysburg address by riding around on a little bicycle” – nightmare fuel primed for popularity.

The second reason might be political: releasing a deepfake generator a month before the major elections is… not one of the best for optics. A practical preventive step is to barely limit its capabilities so that if malicious actors try to make use of it, it requires real work on their part. You can actually mix this generative model with a speech generator and an open mouth sync generator, but you can’t just generate a candidate making crazy claims.

“Movie Gen is currently purely an AI research concept, and even at this early stage, security is a top priority, as it is with all of our generative AI technologies,” a Meta representative said in response to TechCrunch’s questions.

Unlike, say, Llama’s large language models, Movie Gen is not going to be publicly available. You can replicate these techniques to some extent by following the research paper, however the code is not going to be published apart from the “baseline evaluation prompt dataset,” a record of the prompts used to generate the test videos.

This article was originally published on : techcrunch.com

Related Topics:film gen Generative AI meta

Up Next

OpenAI secured another billions, but there was capital left for other startups

Don't Miss

Even the “Godmother of AI” has no idea what AGI is

Click to comment

Technology

The latest model AI Google Gemma can work on phones

Published

4 hours ago

May 20, 2025

IAM

It grows “open” AI Google, Gemma, grows.

While Google I/O 2025 On Tuesday, Google removed Gemma 3N compresses, a model designed for “liquid” on phones, laptops and tablets. According to Google, available in a preview starting on Tuesday, Gemma 3N can support sound, text, paintings and flicks.

Models efficient enough to operate in offline mode and without the necessity to calculate within the cloud have gained popularity within the AI community lately. They will not be only cheaper to make use of than large models, but they keep privacy, eliminating the necessity to send data to a distant data center.

During the speech to I/O product manager, Gemma Gus Martins said that GEMMA 3N can work on devices with lower than 2 GB of RAM. “Gemma 3N shares the same architecture as Gemini Nano, and is also designed for incredible performance,” he added.

In addition to Gemma 3N, Google releases Medgemma through the AI developer foundation program. According to Medgemma, it’s essentially the most talented model to research text and health -related images.

“Medgemma (IS) OUR (…) A collection of open models to understand the text and multimodal image (health),” said Martins. “Medgemma works great in various imaging and text applications, thanks to which developers (…) could adapt the models to their own health applications.”

Also on the horizon there may be SignGEMMA, an open model for signaling sign language right into a spoken language. Google claims that Signgemma will allow programmers to create recent applications and integration for users of deaf and hard.

“SIGNGEMMA is a new family of models trained to translate sign language into a spoken text, but preferably in the American sign and English,” said Martins. “This is the most talented model of understanding sign language in history and we are looking forward to you-programmers, deaf and hard communities-to take this base and build with it.”

It is value noting that Gemma has been criticized for non -standard, non -standard license conditions, which in accordance with some developers adopted models with a dangerous proposal. However, this didn’t discourage programmers from downloading Gemma models tens of tens of millions of times.

(Tagstransate) gemma

This article was originally published on : techcrunch.com

Technology

Trump to sign a criminalizing account of porn revenge and clear deep cabinets

Published

1 day ago

May 19, 2025

IAM

President Donald Trump is predicted to sign the act on Take It Down, a bilateral law that introduces more severe punishments for distributing clear images, including deep wardrobes and pornography of revenge.

The Act criminalizes the publication of such photos, regardless of whether or not they are authentic or generated AI. Whoever publishes photos or videos can face penalty, including a advantageous, deprivation of liberty and restitution.

According to the brand new law, media firms and web platforms must remove such materials inside 48 hours of termination of the victim. Platforms must also take steps to remove the duplicate content.

Many states have already banned clear sexual desems and pornography of revenge, but for the primary time federal regulatory authorities will enter to impose restrictions on web firms.

The first lady Melania Trump lobbyed for the law, which was sponsored by the senators Ted Cruz (R-TEXAS) and Amy Klobuchar (d-minn.). Cruz said he inspired him to act after hearing that Snapchat for nearly a 12 months refused to remove a deep displacement of a 14-year-old girl.

Proponents of freedom of speech and a group of digital rights aroused concerns, saying that the law is Too wide And it will probably lead to censorship of legal photos, similar to legal pornography, in addition to government critics.

(Tagstransate) AI

This article was originally published on : techcrunch.com

Technology

Microsoft Nadella sata chooses chatbots on the podcasts

Published

3 days ago

May 18, 2025

IAM

While the general director of Microsoft, Satya Nadella, says that he likes podcasts, perhaps he didn’t take heed to them anymore.

That the treat is approaching at the end longer profile Bloomberg NadellaFocusing on the strategy of artificial intelligence Microsoft and its complicated relations with Opeli. To illustrate how much she uses Copilot’s AI assistant in her day by day life, Nadella said that as a substitute of listening to podcasts, she now sends transcription to Copilot, after which talks to Copilot with the content when driving to the office.

In addition, Nadella – who jokingly described her work as a “E -Mail driver” – said that it consists of a minimum of 10 custom agents developed in Copilot Studio to sum up E -Mailes and news, preparing for meetings and performing other tasks in the office.

It seems that AI is already transforming Microsoft in a more significant way, and programmers supposedly the most difficult hit in the company’s last dismissals, shortly after Nadella stated that the 30% of the company’s code was written by AI.

(Tagstotransate) microsoft

This article was originally published on : techcrunch.com