OpenAI Introduces GPT-4 Omni: Advanced Multimodal AI Model

OpenAI has introduced its latest generative AI model, GPT-4o, where the "o" stands for "omni," highlighting its ability to process text, speech, and video. The rollout of GPT-4o will happen gradually across OpenAI's developer and consumer-facing products over the next few weeks. According to OpenAI CTO Mira Murati, GPT-4o offers intelligence on par with GPT-4 but enhances capabilities across multiple media and modalities.

The new model significantly enhances the experience of OpenAI’s AI-powered chatbot, ChatGPT. While ChatGPT has previously offered a voice mode that transcribes responses using a text-to-speech model, GPT-4o takes this further. It allows users to interact with ChatGPT more naturally, similar to an assistant. Users can ask questions and interrupt while the chatbot is responding, with the model delivering "real-time" responsiveness. It can even detect nuances in a user's voice and generate responses in various emotive styles.

GPT-4o also upgrades ChatGPT’s visual capabilities. Given a photo or a desktop screen, ChatGPT can now answer related questions more swiftly. Future updates may enable ChatGPT to "watch" videos and provide commentary and explanations. Additionally, GPT-4o is more multilingual, with enhanced performance in around 50 languages. Within OpenAI’s API and Microsoft’s Azure OpenAI Service, GPT-4o is claimed to be twice as fast, half the price, and offer higher rate limits compared to GPT-4 Turbo.

GPT-4o is available in the free tier of ChatGPT and to subscribers of OpenAI’s premium ChatGPT Plus and Team plans, offering "5x higher" message limits. OpenAI notes that ChatGPT will automatically switch to GPT-3.5, an older and less capable model, when users hit the limit.

OpenAI Unveils GPT-4 Omni