This riding astronaut is a milestone in AI's ability to understand the world

This riding astronaut is a milestone in AI’s ability to understand the world

Consider becoming a subscriber to support MIT Technology Review journalism

Diffusion models are trained on images that are completely distorted with random pixels. They learn to convert these images back into their original form. In DALL-E 2 there are no existing images. So the diffusion model takes the random pixels and, guided by CLIP, converts them into a brand new image, created from scratch, that corresponds to the text prompt.

The diffusion model allows DALL-E 2 to produce higher resolution images faster than DALL-E. “That makes it much more practical and enjoyable to use,” says OpenAI’s Aditya Ramesh.

In the demo, Ramesh and his colleagues showed me pictures of a hedgehog using a calculator, a corgi and panda playing chess, and a cat dressed as Napoleon holding a piece of cheese. I notice at the weird cast of subjects. “It’s easy to spend an entire workday thinking about prompts,” he says.

“A sea otter in the style of Girl with a Pearl Earring by Johannes Vermeer” / “An ibis in the wild, painted in the style of John Audubon”

DALL-E 2 is still slipping. For example, it may struggle with a prompt asking it to combine two or more objects with two or more attributes, such as “a red cube on top of a blue cube.” OpenAI thinks this is because CLIP does not always correctly associate attributes with objects.

In addition to deriving text prompts, DALL-E 2 can also create variations of existing images. Ramesh plugs in a photo he took of some street art outside his apartment. The AI ​​immediately starts generating alternate versions of the scene with different wall art. Each of these new images can be used to trigger its own set of variations. “This feedback loop can be very useful for designers and artists,” says Ramesh.

User beware

DALL-E 2 looks much more like a polished product than the previous version. That was not the intention, says Ramesh. But OpenAI plans to release DALL-E 2 to the public after an initial rollout to a small group of trusted users, much like GPT-3.

GPT-3 can produce toxic text. But OpenAI says it has used GPT-3’s user feedback to train a more secure version called InstructGPT. The company hopes to follow a similar path with DALL-E 2, which will also be shaped by user feedback. OpenAI will encourage the first users to break through the AI ​​and trick it into generating offensive or harmful images. As it fixes these issues, OpenAI will begin to make DALL-E 2 available to a wider range of people.

OpenAI also publishes a user policy for DALL-E, which prohibits asking the AI ​​to generate offensive images – no violence or pornography – and no political images. To avoid deep fakes, users are not allowed to ask DALL-E to generate images of real people.

In addition to user policies, OpenAI has removed certain types of images from DALL-E 2’s training data, including images that exhibit graphic violence. OpenAI also says it will eventually pay human moderators to review every image generated on its platform.

“Our main goal here is to just get a lot of feedback for the system before we start sharing it more widely,” said OpenAI’s Prafulla Dhariwal. “I hope it will eventually be available so developers can build apps on top of it.”

creative intelligence

Multifunctional AIs that can view the world and work with concepts in multiple modalities, such as language and vision, are a step towards more general intelligence. DALL-E 2 is one of the best examples yet.

But while Etzioni is impressed with the visuals DALL-E 2 produces, he’s cautious about what this means for AI’s overall progress. “This kind of improvement doesn’t get us any closer to AGI,” he says. “We already know that AI is remarkably capable of solving narrow tasks using deep learning. But it is still people who formulate these tasks and give the marching orders for deep learning.”

For Mark Riedl, an AI researcher at Georgia Tech in Atlanta, creativity is a good way to measure intelligence. Unlike the Turing test, which requires a machine to fool a human through conversation, Riedl’s Lovelace 2.0 test rates a machine’s intelligence based on how well it responds to requests to create something, such as ‘A photo of a penguin in a spacesuit on Mars. †

DALL-E scores well on this test. But intelligence is a sliding scale. As we build better machines, our intelligence tests need to adapt. Many chatbots are now very good at mimicking human conversations and narrowly pass the Turing test. However, they are still mindless.

Leave a Comment

Your email address will not be published.