Last year, OpenAI published a piece on Image GPT. As of late, it looks like some incredible progress has been made on this front — their latest publication is titled,
DALL·E: Creating Images from Text. This is a really exciting development in the field of AI. Kudos to the OpenAI team for their work on this.
According to the paper, DALL·E is a neural network that generates images from text captions. A basic example would be a prompt, such as: "A banana sitting on a table". The network would then generate an image that matches the description.
In addition, "Like GPT-3, DALL·E is a transformer language model. It receives both the text and the image as a single stream of data containing up to 1280 tokens, and is trained using maximum likelihood to generate all of the tokens, one after another."
Token Definition: A
token is any symbol from a discrete vocabulary; for humans, each English
letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has tokens
for both text and image concepts. [...]
Source: publication footnotes
AI is evolving rapidly. There's no doubt that this technology will have a profound impact on our lives in the near future. It really emphasizes the importance of how we, collectively, should think about the future of AI.
While the OpenAI team is separate from Dr. Timnit Gebru, I think it's important to acknowledge her incredible work and the impact she's had (and continues to have) on the field. Dr. Gebru has been leading the way in this space for some time now, notably in the areas of fairness, ethics, social good and accountability, which are all critical to the future of AI. Huge respect and props to Dr. Gebru for her work and contributions to the field.
- Check out OpenAI's original publication: openai.com/blog/dall-e/
- Related PyTorch code available on GitHub: github.com/openai/dall-e
- Dr. Timnit Gebru's Wikipedia