DALL·E 3 Bing beta test: Overcoming the problem of horse-riding astronauts, specifying 50 objects in one painting

DALL·E3limited testing, has beenMicrosoft BingOpen it first and see if you are one of the European Emperors? It doesn’t matter if you haven’t received the qualification. Coupled with third-party research previews and internal trials by OpenAI employees, various test cases have emerged one after another, which is guaranteed to be enjoyable. The most exaggerated one is to count"50 different objects appear in the specified screen", and ended up drawing hundreds of them.

△from WindowsLatest

In addition to simple tile arrangement, these objects can be combined more creatively.

Horse riding astronautThis counterfactual concept has been used by various models of OpenAI and Google in the past.I can only draw an astronaut riding a horse.

The paper was generally regarded as a failure case, and was ridiculed by Marcus, the AI pessimist at the time.

Now, DALL·E3 can easily handle it with the support of ChatGPT.

DALL·E3 has made such great progress this time, not only due to OpenAI’s own efforts, but alsoWorking together with Microsoftresult.

Although it is not clearly stated, at least three Microsoft engineers and researchers are involved in the research part of the contribution list, and most of the members in the inference optimization part are from the Microsoft DeepSpeed team.

Looking back at GPT-4, it was still mainly developed internally by OpenAI, and was then given open access to Microsoft and other research institutions for testing.

this timeChanges in cooperation models, also representsThe relationship between the two companies has further deepened.

Specify 50 objects in one painting

By CEO of Microsoft BingMikhailParakhinConfirmed, the luckiest few percent of users have qualified for testing.

Since the number of places is really limited, netizens who can’t wait have put forward their ideas online and asked people with accounts to help test it.

The netizen who proposed to draw 50 different objects just wanted to test how many objects DALL·E3 can accommodate in one picture, similar to a large language model."Context window".

Third-party designers who received a preview version of the studyNathan Shipleytook on this challenge.

First, he asked ChatGPT to randomly list 50 daily objects, and then directly asked to draw these objects in a painting. The complete prompt word isChatGPT completes it yourself, and then drawn by DALL·E3.

Next Nathan asked, "Can you create an image of a surfer putting these objects into a pile while surfing hard?"

ChatGPT added some details in the prompt word:

A middle-aged Polynesian male surfer with sun-kissed skin struggles to surf while holding up 50 objects, including a tea cup, a bicycle, an umbrella... Waves crash around him, and his expression is one of determination and slight panic as he tries to balance the objects with his surfboard.

As a result, I got an image that basically met the requirements in one try, except that the character's expression looked more like extreme panic rather than mild panic.

Later, Nathan asked for a lower perspective. This time, ChatGPT changed the character description in the prompt word to "an elderly Spanish woman."

Onlookers thought it was very surprising, but also pointed out why the size of the pencil was disproportionate to other objects.

Some people speculate that the model only knows what an object looks like, but not its relative size, especially since publicity pictures of pens often use close-up shots.

In addition, Nathan also asked from ChatGPT which parameters of the screen the user can control.

ChatGPT responded as follows:

describe:You can provide a detailed description of the desired image. The more specific and detailed you are, the closer the resulting image will be to your wishes.

Style and medium:You can specify the style or medium of the image, such as "Painting", "Photo", "Illustration", "Cartoon", "Watercolor", "3D Rendering", etc.

Resolution:By default, images are generated at 1729x1024 resolution. But if your description suggests a full-length portrait or something more suitable for high aspect ratios, the image will be generated at 1024x1792. You can also request square resolution (1024x1024) if needed.

Number of pictures: If you don't specify one, I usually create four different images based on your description. However, you can request a specific number (up to four).

Revise: If you're not satisfied with the initial results, you can provide feedback and request revisions. While I can't directly modify existing images, I can use your feedback to generate new images that better fit your vision.

random seed: If you want to modify a previous image, I can use the seed in that image's metadata to generate a new version. This ensures a certain consistency between the original and modified images.

Please note thatThere are certain restrictions. For example, images of public figures and the styles of artists within the last 100 years cannot be created directly.

The focus is on the back:

However, I can offer alternatives or use adjectives to capture the essence of a particular style without quoting it directly.

Horse Riding Astronaut Puzzle Solved

Among OpenAI internal employees,WillDePueThey are the most active in disclosing the results of the DALL·E3 trial.

In the horse-riding astronaut test, he said that it was not 100% successful.

You can get it done in two or three attempts becauseGPT-4 is working with you to improve the prompt words, until you get it right.

With some effort, you can get almost anything you want.

Some netizens tried to use MidJourney to obtain the same results, but it can only be said that it is not completely impossible, but it requires a lot of effort.

Almost impossible, requires a lot of hint engineering, and is difficult to reproduce.

If you are an experienced MidJourney user, you might as well try to see if it works.

In the "8 giraffes drinking water" challenge proposed by netizens, DALL·E3 once again showedDifficulty counting numbers accurately.

△Count how many giraffes there are in the picture

More erroneous attempts also resulted in a two-headed giraffe.

Getting the AI to count correctly doesn't solve this time, but at least it solves the problem of understanding spatial relationships.

In the challenge "Four zebras running on the grassland, a lion chasing behind, and an eagle above, there are no other animals in the picture" proposed by netizens,The spatial relationship is basically correct, but with one more zebra.

In comparison, both DALL·E2 and StableDiffusion have worse understanding of spatial relationships.

OpenAI is responsible for the enterprise version of ChatGPTAdamGoldbergA lot of high-quality results were also posted, but no prompt words were shared.

Responsible for writing AI code and calling toolsJerryTworekThen he created a lot of abstract concept paintings, such as"Division of Mechanical Cells".

as well as"A computer program tree spanning the galaxy".

Microsoft OpenAI collaborates

DALL·E3 has made a huge improvement this time. In addition to integrating ChatGPT, how exactly is the image generation part accomplished?

Unfortunately, given the trend of OpenAI getting closer and closer, it is likely that it will not publish papers like the previous two generations. We can only make a few guesses from the contribution list.

The DALL·E2 paper has five authors.

And DALL·E3 does not look at the product, security, public communication and legal teams for the time being, the research part alone has18 people participated.

which proposesconsistency modelTsinghua Alumni of (ConsistencyModels)Song Yang(YangSong) is listed.

The consistency model is faster than the current most popular diffusion model, and can generate 64 256*256 images in 3.5 seconds.

However, Song Yang's research contribution this time is minor. It is not sure whether DALL·E3 used the consistency model. It is more likely that he borrowed his method in the improved diffusion model.

In addition, in addition to the DALL·E2 author and Ouyang Long from the ChatGPT team, at least three researchers are from Microsoft.

JianfengWangGraduated from the University of Science and Technology of China with a Ph.D., and works as a chief researcher at Microsoft.

LijuanWangGraduated from Tsinghua University with a Ph.D., and works as a director research manager at Microsoft.

The two have participated in the research of NUWA-Ininity, an infinite canvas image generation.

Li Linjie(Lindsey Li) is an alumnus of Beijing Institute of Technology. She received two master's degrees from Purdue University and UC San Diego. She is a senior researcher at Microsoft and has published many top conference papers in the field of multimodality.

In addition to research aspects, DALL·E3’sInference optimizationhaveMicrosoft DeepSpeed TeamDeep involvement.

Deepspeed is an open source deep learning optimization library that reduces computing power consumption and memory usage, and trains and infers large-scale distributed models through better parallelism on existing hardware.

Many of them expressed their pleasure to participate in this work and were excited about the release of DALL·E3.

Finally, among special contributions, Microsoft’sBing CEO Mikhail Parakhin,Misha Bilenko, Principal Vice President, Azure CloudAll in it.

Microsoft's previous release activities also confirmed that Bing will directly integrate DALL·E3.

According to current rules, DALL·E2 on Bing is free. 99 acceleration tokens will be issued. Without tokens, it will just take longer to queue.

Although DALL·E3 will charge $20 per month on ChatGPTPlus in October.

But now that GPT-4 is available for free on Bing, in the futureDALL·E3 free to playYou can also look forward to a wave~