The AI fight is getting crazy. OpenAI with their ChatGPT are the clear leader in the generative AI race and are not sitting on their laurels. They saw Google’s annual developer conference coming and decided to steal whatever thunder the Big G would pack with an announcement of their own.
OpenAI made a number of announcements at their Spring Update event:
GPT-4o
This was the major highlight. They touted what they call its multimodal capabilities. This simply means it’s capable of processing and generating not only text but also images and audio.
Or as OpenAI puts it, “it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs.”
GPT-4o can reason across different modalities in real-time, allowing for more dynamic and interactive experiences as the OpenAI team demonstrated well on stage. For example, it can analyse a video and answer questions about its content, or generate captions and summaries for audio files.
To top it off, GPT-4o makes significant improvements in performance, making it even more accurate and efficient than previous models. That’s not to say hallucinations are a thing of the past though.
For developers, GPT-4o will also be available as an API. GPT-4o’s API will be half the price, twice as fast, and have five times higher rate limits compared to GPT-4 Turbo, or so say OpenAI.
I woke up to this:
I’ve been playing around with it and it really feels much faster than before.
Took less than a second for the info about the drone above to generate. I’d say that pretty fast.
ChatGPT updates
The biggest update, for us cheap folk at least, was that they brought the power of GPT-4o to the free version of ChatGPT, an upgrade from GPT-3.5. This allows us to experience its advanced capabilities without a subscription.
Do note that the GPT-4o access is limited. No image generation for cheap folk, which is a bummer but that’s where MetaAI and Copilot come in handy.
Asking ChatGPT what makes GPT-4 better than GPT-3.5, it says,
- Knowledge Base: GPT-4 has a more comprehensive and up-to-date understanding of the world. [I’ll add that GPT-3.5 has a knowledge cut-off of 2021 whilst GPT-4 has an April 2023 one.]
- Language Understanding: GPT-4 demonstrates a greater grasp of language subtleties, allowing it to generate more accurate and contextually relevant responses.
- Creativity: GPT-4 can generate more creative and nuanced text formats.
- Multimodal Capabilities: GPT-4 can process and understand images, a feature not available in GPT-3.5.
This means it’s a great day for freeloaders. The free version of ChatGPT can now search the web and so it can give answers to recent developments. However, I have found its web search performance a bit lacking.
This is but one example but I found that ChatGPT is not as good as Gemini when it comes to web search. It shouldn’t shock anyone that the guys behind Google Search would be better at search, I guess.
For developers/entrepreneurs
As you know, many services out there rely on GPT for their large language model needs. OpenAI is working to improve this for the developers.
Assistants API: allows developers to build sophisticated AI assistants capable of complex tasks. It includes tools like the;
- Code Interpreter, which can execute Python code, generate graphs, and process various data formats;
- the Retrieval tool, which enhances responses with external data; and
- Function Calling, which allows the assistant to perform specific actions defined by the developer. The API supports persistent conversation threads, enabling long-term interaction continuity.
Custom GPTs and GPT Store: Developers can create personalised GPTs tailored to specific tasks or organisational needs using simple natural language instructions, without needing to code. The GPT Store, launching soon, will allow developers to publish and monetise their custom GPTs.
There’s more to the OpenAI Spring Update that we couldn’t get into and you can research more. I’ll leave you with the GPT-4o introduction video which showcases how its almost human now. AI shouldn’t understand facial expressions or change tone like this and yet we’re already there.
6 comments
As it stands Gemini clears GPT on the free version, multimodal support has been there for quite a while. Gemini support for developers is also way ahead of GPT(which is something OpenAI should learn from).
Oh, Gemini, you sly star! It’s adorable how you think you’re the belle of the AI ball. Multimodal support? That’s cute—like saying your flip phone has “internet access.” And as for developer support, it’s heartwarming to see you trying to be the teacher’s pet. But let’s be real: you’re still using training wheels while GPT is out here doing wheelies. Keep dreaming big, though. It’s important to have goals!
Very nice, Lenny…. Awesome deep dive
one day AI is going to take over the world.
Let the rise of machines progress….fast.
Can we chat with it on whats app