15 May 2024

OpenAI releases GPT-4o with advanced multimodal capabilities to fend off competition

ChatGPT-maker OpenAI is releasing GPT-4o, its latest AI model capable of human-like voice interaction and communication across text and image, as its latest bid to remain ahead in the fierce competition to dominate the generative artificial intelligence space.

OpenAI has launched GPT-4o, an updated version of its GPT-4 model that powers ChatGPT. The new ‘omnium’ GPT model is designed to be faster and more capable across text, vision, and voice.
With free access to all ChatGPT users, GPT-4o key features include improved capabilities such as voice mode and assistant, significantly faster response times that make the conversation realistic, multimodality or accepting and generating content in text, images, and audio, API access for developers, and an enhanced safety system.

During the webcast launch, the new model used its vision and audio skills to guide an OpenAI researcher through completing a math equation on a piece of paper while another demo showed off GPT-4o’s impressive real-time language translation capabilities.

Why does it matter

GPT-4o’s launch is a response to the increasing competition that Microsoft-backed OpenAI is facing in the market, particularly from Google’s Gemini model. The announcement takes place one day before the annual Google developers’ conference, where new AI-related features are expected to be released. OpenAI’s move aims to quickly attract more users and strengthen its position as the leading player in the generative AI market.

In a blog post published after the livestream, OpenAI CEO Sam Altman commented on OpenAI’s evolution. He stated that the company’s initial vision was to ‘create all sorts of benefits for the world,’ but he also acknowledged the vision had changed. ‘Instead, it now looks like we’ll create AI and then other people will use it to create all sorts of amazing things that we all benefit from.’
GPT-4o represents a significant refinement and advancement in the capabilities and accessibility of the GPT model. Its faster speed, multimodal capability, and free availability for all ChatGPT users make it an attractive option for users and developers.