What to Make of GPT-4o
The latest OpenAI algorithm launched with great promise. What are the impacts of the latest GPT algorithm?
Is the GPT-4o algorithm a glass half full of promise? Image created with Midjourney.
OpenAI announced a new flagship algorithm GPT-4o that can “reason” across audio, vision, and text in real time. Like many OpenAI releases, yesterday’s announcement of multimodal GPT-4o was met with jubilee and escalating hype from some and grounded skepticism from others. Perhaps a sign of a slow business news day, the announcement was top news on all the general business sites, too, from CNBC to the Wall Street Journal.
So what exactly was announced? And does it matter? Here are some quick takeaways.
Multimodal Capability
GPT-4o combines the functionality available in several critical OpenAI algorithms to produce end-to-end “text, audio, and images.” The multimodal algorithm features input processing and output generation that can be independently created with audio (a capability seen in the text-to-audio recognizing Whisper), image generation (Dall.E), and the mainstay text creator (GPT).
Is GPT-4o just a Frankenstein algorithm or something new altogether? Early reports suggest it is a bit of a Frankenstein with some speed improvements over GPT-4 Turbo. Most importantly, it stitches together several input types and better matches how humans have interacted with AI in the past, specifically Siri, Alexa, and other voicebots.
Open AI CTO Mira Murati pointed out that the ease of interacting with ChatGPT would be dramatically improved. Once OpenAI irons out the kinks and interacts with a more human-like interface. If it meets its promise, GPT-4o may truly become a smooth multimodal algorithm. For all those planning on a prompt engineering career, it’s not too late to rethink that choice.
Enterprises that want to benefit from an easier-to-use multimodal generator should weigh the risks. Questions to consider include whether you need all the capabilities offered by Open 4o or if a simpler or a combination of other algorithms will better meet your needs.
Voice Is Not Ready Yet
TechCrunch quickly reported that the audio capabilities promoted in 4o aren’t available: “Voice isn’t a part of the GPT-4o API for all customers at present. OpenAI, citing the risk of misuse, says it plans to launch support for GPT-4o’s new audio capabilities to ‘a small group of trusted partners’ in the coming weeks.”
While we wait for a fully deployed GPT-4o, OpenAI will complete its product testing on an alpha group of friends. Businesses would be wise to wait to license GPT-4o until it has been released in the wild and stress-tested by independent developers in commercial settings.
Promises of AI algorithms that can reason in various media without an actual delivery date and unclear results are not a recipe for trust. Sigh. Unfortunately, promised capability with unclear delivery dates is a familiar scenario for OpenAI announcements.
OpenAI continues to make the same mistakes it has over the past 16 months, preferring excitement and hype over pragmatic and clear delivery of products that customers want. If Sam Altman and the company want to improve their reputation and strengthen their product development and marketing, they should look to nearby Cupertino and Apple for talent.
Competitive Pressure
Two critical GPT-40 release points will shift the LLM marketplace and force competitors like Anthropic, Google, Meta, Mistral, and Perplexity to evolve.
First, once fully deployed GPT-4o will force the competitors to evolve their multimodal capabilities quickly. Of the mainstay LLM competitors, Google and Meta with Gemina and open-source Llama, respectively, are in the best position to adapt, with both companies actively developing existing voice and visual algorithms.
Anthropic, Mistral, and Perplexity will need to partner with other players like Amazon, Apple, Stable Diffusion and Midjourney to match. Alternatively, we could see some anticipated consolidation amongst algorithm providers to compete effectively with OpenAI and the big tech players already marketing strong LLMs.
Second, OpenAI deployed GPT-4o for free as the base algorithm for ChatGPT. Everyone can use this new, fast, higher-quality algorithm for free. This puts great pricing pressure on Anthropic, Google, Microsoft, and Perplexity, who charge premium LLM algorithm access. Now, they will be forced to consider making their best algorithms free in some form.
However, pay-to-play consumer access is not completely gone. Free users will default to 3.5 once they hit a ceiling of free queries. ChatGPT Plus, Teams, and Enterprise users will get more queries, and there will be pricing differentials for developers who can choose from any of the GPT algorithms. Again, depending on needs, this factor may make 4o a more complicated choice for custom applications.
Still, the cratering of value subscription models is much more likely today. This will hurt smaller players that do not benefit from OpenAI’s vast licensing business and could hasten advertising on LLM-based answer engine sites like Perplexity. It could also fuel consolidation in the AI marketplace.
Incremental Improvements
Like all products, new announcements are often incremental improvements and new features once they mature. More honey on the cornflakes, if you would. Image created on Midjourney.
The hype train is still strong for OpenAI. But in reality, we are now seeing incremental improvements.
While slightly faster, GPT-4o's text logic improvements are reportedly on par with GPT 4 Turbo, at best a small improvement thanks to latency (speed) and lower cost. The new algorithm is especially better at vision and audio understanding compared to existing models. To be clear, ChatGPT did offer primitive voice and computer vision input capabilities beforehand and Dall.E image integration.
So, instead of new, what we are getting is a better more capable algorithm. Perhaps a slowdown in revolutions and an increase in evolutions is just what the market needed to strengthen belief in AI. While AI influencers may be disappointed that LLM development may be peaking, businesses should be encouraged that we are seeing a strengthening of accuracy and capabilities.
What do you think of GPT-4o?
Thanks for the detailed insights, Geoff. I appreciate your reasonsed thinking about the improvements and new capabilities. I tried it yesterday and found it to be faster (even with heavy loads following the announcement). I'm wondering if I should continue paying for the Plus version, though.