Microsoft launches AI Dev Gallery to make it easier to run AI models natively on Windows 11

Copilot+PC is the first computer to run small language models (SLM) on the device. The advantage of this approach is that it produces results for tasks such as image or text generation much faster than the cloud-based Copilot application. Now, Microsoft has launched AIDevGallery, an easy way to integrate on-device artificial intelligence capabilities into any app.

The AIDevGallery app is for developers who want to try out multiple models for integrating artificial intelligence capabilities into their apps. The app offers over 25 samples that users can download and run on their devices. Additionally, projects or source code can be exported directly into the application and run immediately. It works on Windows 10 and 11 and supports both x64 and ARM64 architectures.

Currently, the only way to access it is to build the project in Visual Studio and then run it. In addition, at least 20GB of space and a multi-core CPU are required. It also recommends a GPU with 8GB VRAM.

The application has two modes: sample and mockup, and the program divides them into text, images, code, audio and video, and smart controls.

test model

The models for image generation and video generation are quite large, approaching 5GB. We start with a small model related to upscaling, which is less than 100MB. We took a screenshot and tried scaling it using the CPU, switching between CPU and GPU to handle requests as we worked.

In this low-configuration virtual machine, the scaling process took less than 30 seconds, and the memory consumption instantly climbed to 1GB. The application displays an upscaled version of the image with a resolution of 9272*4900. Graphical elements, especially text, are severely affected and difficult to read.

There is no option to preview the generated image in a larger window or full screen. Not even a download option to save it to disk

We tried another model called DetectHumanPose. It can identify the location of someone in an image. While it was able to accurately identify a basic walking person, it even started showing location markers for screenshots of our desktop with several apps open.

We don't know how these models are integrated into the application, but some of these features can be run locally. Of course, these models of PC require more storage space and a powerful CPU with 16GB of memory or more.

Is it worth downloading a 5GB model to convert text prompts into images, or waiting 30 seconds to implement this functionality on a web app? Clearly, most of these features have very niche use cases and implementation environments, rather than appealing to the entire Windows 11 user base.