Charlotte Times 46

collapse
Home / Daily News Analysis / Google’s new anything-to-anything AI model is wild

Google’s new anything-to-anything AI model is wild

May 26, 2026  Twila Rosenbaum  3 views
Google’s new anything-to-anything AI model is wild

Google has released a new generative AI model called Omni, which it claims can eventually turn any kind of input — photo, video, text, audio — into any other kind of output. For now, the first version, Omni Flash, focuses on video generation and is available in Google's Flow platform. The promise is ambitious: a single model that understands and transforms across modalities, potentially simplifying content creation. But early tests reveal a mix of impressive advances and persistent flaws.

What Is Omni?

Omni represents Google's latest step in generative AI following models like Gemini and Veo. While Gemini handles multimodal understanding (text, images, audio, video), Omni aims to go further by generating output in any format from any input. The company describes it as an "anything-to-anything" model. The first deployment, Omni Flash, specializes in video creation and editing. Users can upload a video and combine it with text prompts to generate new scenes or modify existing ones. Google claims Omni incorporates more real-world knowledge and maintains better character consistency than previous models like Veo 3.

This approach differs from earlier models that required separate systems for image, video, or text generation. By unifying capabilities, Google hopes to streamline workflows for content creators. However, the technology remains in its early stages, and practical limitations are evident.

Testing Omni: The Buddy Experiment

To evaluate Omni's claims, a reviewer recreated a previous experiment using a plush deer named Buddy. In earlier tests with Veo 3, generating videos of Buddy on vacation produced inconsistent results. With Omni, some clips showed marked improvement: Buddy maintained his appearance and actions more faithfully. For example, a video of Buddy skydiving was more coherent than before, though it still contained jarring moments where his orientation suddenly flipped.

Another test asked Omni to create a montage of Buddy packing for a cruise, with a humorous twist. The model generated a sequence where Buddy packs a jar of honey, then later squirts it on his hoof as if it were sunscreen. While the concept worked, the honey bottle changed appearance multiple times throughout the video — from jar to clear bottle to squeeze bottle. The final frame seemed to blend elements nonsensically, revealing the model's struggle with long-term consistency.

These results highlight a core challenge: generating videos that maintain logical coherence over time. While Omni handles short, simple prompts better than its predecessor, complex narratives or extended sequences still break down. The model's "real-world knowledge" appears in flashes — like understanding that honey can be used as sunscreen — but lacks the depth to keep details stable.

Deepfake Capabilities: Convincing but Unsettling

Perhaps the most striking test involved deepfaking a human subject. Starting from a selfie video with a neutral expression, Omni generated clips of the person eating spaghetti, sitting on an airplane, and posing in front of the Eiffel Tower with a baguette. The results were surprisingly realistic. The pasta-eating clip fooled a spouse into thinking it was real, with only the unfamiliar bowl as a clue. The Eiffel Tower clip was convincing enough that only repeated viewing revealed subtle AI tells — like an extra ponytail appearing or a background character doubling.

These deepfakes underscore how far generative video has come. The author described feeling "exhausted" by the ease of creating convincing fakes, noting that previous encounters with synthetic media had already dulled the shock. The technology has reached a point where it can deceive people who know the subject intimately. This raises serious questions about misinformation, consent, and the erosion of trust in visual media.

Google has implemented guardrails such as watermarks and usage policies, but the underlying capability is now accessible to anyone with a $20 monthly subscription and a few minutes of effort. The line between harmless fun and harmful deception is increasingly thin.

Cost and Accessibility

Using Omni is not free. Video generation consumes credits, with costs ranging from 15 to 40 credits per clip depending on length and complexity. Editing an existing video costs 40 credits. The $20-per-month AI Pro plan includes 1,000 credits — enough for roughly 20 clips with minimal edits. Heavy users will quickly run out or face extra costs. This pricing model limits experimentation and means that achieving a polished result may require multiple iterations, each consuming credits.

The economic barrier may prevent casual users from exploring the technology, but it does little to deter malicious actors. With sufficient resources, generating deceptive content becomes trivial. The democratization of deepfakes continues apace.

Context: Google's AI Evolution

Google's pursuit of generative AI has accelerated since the launch of Gemini in 2023. The company has integrated AI across its products, from search to Photos to Workspace. Omni builds on lessons learned from earlier video models like Veo and Imagen. Each iteration improves realism and reduces artifacts, yet fundamental challenges persist: maintaining consistency, avoiding "hallucinations" of objects, and ensuring safe use.

The race among tech giants — including OpenAI, Meta, and Microsoft — has pushed the boundaries of what AI can create. Google's advantage lies in its vast data resources and infrastructure, but it also faces scrutiny over responsible deployment. The Omni release includes safety filters and content credentials, but critics argue that speed of release outpaces safeguards.

Industry observers note that "anything-to-anything" generation could revolutionize fields like advertising, education, and entertainment. Instead of hiring studios to produce videos, a single prompt could generate a commercial. But the same tool could produce non-consensual deepfakes or political propaganda. The dual-use dilemma is acute.

Technical Challenges and User Experience

Despite improvements, Omni remains far from reliable. Text-based editing works better than Veo 3, but results are often unpredictable. For instance, asking Omni to emphasize facial expressions in a video made Buddy look strange or gave him antlers he doesn't have. Removing antlers from one scene caused them to appear in all others. Such erratic behavior frustrates users trying to achieve a specific vision.

The model also struggles with physics and causality. Objects change shape, backgrounds warp, and actions lack the smooth logic of real-world physics. These glitches are hallmarks of current generative video and serve as telltale signs for careful viewers. For casual scrolling on social media, however, many clips may pass as authentic.

Google continues to refine Omni, and future versions may overcome these limitations. The company has not announced when Omni Flash will expand beyond video or whether the credits system will change. For now, early adopters can explore its capabilities but should expect imperfect results.

The Bigger Picture: Ethical Implications

The ability to generate realistic video from a selfie has profound implications. Identity theft, blackmail, and disinformation become easier. Legal frameworks — such as America's NO FAKES Act or the EU AI Act — are still catching up. Meanwhile, platforms like Google are developing detection tools and provenance systems, but these are reactive measures.

Educational efforts must also ramp up. Public awareness of deepfakes remains low, and many people trust video evidence implicitly. The onus is on both creators and consumers to question what they see. Omni's release is a reminder that synthetic media is no longer science fiction; it is a practical tool available today.

In the hands of artists and storytellers, generative AI can unlock new creative possibilities. But the same technology, wielded irresponsibly, can undermine reality. The wildness of Google's new model lies not just in its technical achievement but in the societal challenges it amplifies.


Source: The Verge News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy