Google’s Vertex AI Just Learned to Make Music and Videos, Is Gemini Next?

Leonard Sengere

9 April 2025

Follow Techzim WhatsApp Channel

wa.me/channel/0029VaS23mO84OmAI88Cwd2q

Google Cloud’s Vertex AI, a machine learning (ML) platform, is receiving significant updates. Vertex AI is a tool for enterprise AI development.

Unlike consumer-facing AI like Gemini or ChatGPT, which focus on conversational interaction, Vertex AI is designed for businesses to build, deploy, and manage their own custom ML models and generative AI applications. Meaning it’s one of the tools you could use to build a Gemini competitor.

Gemini, ChatGPT and competition are primarily designed for general-purpose conversational AI, providing information and engaging in dialogue. While they can be powerful tools, Vertex AI offers the control and customisation businesses need for more business-type applications.

Google just announced some interesting major updates to its Vertex AI platform at Cloud Next.

From tools that generate music and videos to advanced speech and image capabilities, the new features show the kinds of stuff we can expect to see in the future of AI development.

But beyond serving businesses, these updates may offer a glimpse into the future of Gemini, Google’s conversational AI platform.

As Vertex AI gains more creative and multimodal features, we could see the same tools shaping the next generation of Gemini’s capabilities.

So What’s New?

Google talked about:

Text-to-video generation using the Veo model, capable of creating 1080p videos from simple prompts.
Music creation via the new Lyria model, developed in partnership with YouTube, allowing users to generate soundtracks or songs based on text.
Advanced speech synthesis, image generation (with Imagen 2), and editing tools that push closer to human-quality output.
Multimodal agent-building tools, enabling developers to create AI apps that can “see,” “hear,” and “respond” using multiple types of input.
Chirp 3 now includes Instant Custom Voice, a new way to create custom voices with just 10 seconds of audio input.
Imagen 3 has improved image generation and inpainting capabilities for reconstructing missing or damaged portions of an image and is making object removal edits even higher quality.

These tools let developers bring advanced AI to chatbots, internal tools, and customer-facing applications.

Could be Gemini’s Future?

While these updates are enterprise tools, they likely show what we’ll soon see in Gemini.

Gemini is supposed to be a multimodal model, able to understand and generate not just text, but also images, audio, and more.

Until now, it has focused on text and light code generation. But with Vertex AI now offering music and video generation, it’s reasonable to assume that Gemini — which runs on the same foundation models — will soon expand in that direction.

I’m already excited about a Gemini that can compose background music for YouTube videos, edit images, or generate short animations — all from a simple conversation.

These are no longer pipe dreams; they’re being rolled out to developers today via Vertex AI.

That said, I still use ChatGPT for over 90% of my generative AI needs but who knows, maybe this could move the needle. Not likely because ChatGPT’s new photo editing features are crazy good.