A few weeks ago, OpenAI released its latest large-scale language model for generative artificial intelligence services, GPT-4Turbo, at its first developer event. Subsequently, Microsoft announced that it would add the GPT-4Turbo model to its Azure OpenAI service. Today, Microsoft announced enhancements to its Azure OpenAI service, with the GPT-4 Turbo with Vision model now available to customers as a public preview.

Microsoft said in a blog post:

This advanced multi-modal AI model retains all the powerful features of GPT-4Turbo while introducing the ability to process and analyze image input. This opens the opportunity to leverage GPT-4 for a wider range of tasks, including accessibility improvements, visual data interpretation and analysis, and visual question answering (VQA).

In addition, Microsoft has added more features for Azure OpenAI customers through the preview version of GPT-4 Turbo with Vision. One of these is Optical Character Recognition (OCR), which examines an image and extracts any text in the image so it can be integrated into user prompts.

Another feature of GPT-4 TurbowithVision is object grounding, which allows AI to inspect an image and display key objects in the image based on text prompts from the user. Likewise, AI can also analyze specific frames of a video.

Microsoft added:

By combining GPT-4 Turbo with Vision, Azure AI Search, and Azure AI Vision, it is now possible to add images and text data together to develop solutions that connect to user data using vector search to improve the chatbot experience.

The service is priced at $0.01 per 1,000 words of input and $0.03 per 1,000 words of output, with pricing for enhanced features varying.

Currently, GPT-4Turbo with Vision is available in AzureOpenAI's Australia East, Sweden Central, Switzerland North, and US West regions. Customers accessing the public preview of GPT-4Turbo with vision capabilities will be automatically updated to a "stable, production-ready version in the coming weeks."