AI image generation: From text to images

Over the last eighteen months, the field of artificial intelligence (AI) has witnessed some truly impressive advancements. My journey started in December 2022 with my first encounter with ChatGPT, and it has been a fast-paced ride ever since. The progress AI has made during this time is astounding. OpenAI, among others, has placed generative AI at the center of many discussions spanning various disciplines and industries.

While there are undeniable challenges associated with AI, including job disruptions and redundancies, the technology also brings significant benefits. In many personal use cases, AI has boosted my productivity, easily handling tasks that once felt demanding. It serves as a creative partner, helping me overcome writer’s block and improve my writing, especially in crafting posts optimized for better Search Engine Optimization (SEO).

Is AI flawless? Definitely not. However, its contributions to productivity and creativity are undeniable. More recently, I’ve jumped into testing AI in image and audio generation. Although I generally view AI positively, these experiments have sometimes left me feeling a bit unsettled.

My testing of AI image generation

In October 2023, OpenAI introduced DALL-E integration with GPT-4 in the paid version of ChatGPT, sparking an excitement in like a child on Christmas morning. Each prompt for image generation felt like unwrapping another present. My tests were often fueled by creativity, and through them, I learned the art of crafting more descriptive prompts (especially important in image generative AI). Experiencing the occasional flaws of ChatGPT4’s image creations highlighted the importance of providing clear and precise instructions (something at the foundation of my teaching philosophy).

Apologies in advance—many of these prompts were designed with Dungeons & Dragons in mind.

  • “Create an image of a Japanese paper lantern in the style of Cthulhu (yes, the Lovecraftian horror tentacle creature).”
This image created by ChatGPT's AI image generation, features an intricate Japanese-style paper lantern, illuminated from within. The lantern is adorned with a detailed depiction of an octopus with swirling tentacles, evoking a Lovecraftian aesthetic. It's surrounded by various mystical and esoteric symbols, along with Japanese characters. Hanging tassels adorn the lantern, and it's placed in a misty, dimly lit setting that suggests a mysterious atmosphere. Tentacle-like shadows hint at the presence of other creatures outside the frame, adding to the enigmatic ambiance.
AI generated image showing a Japanese paper lantern in Lovecraftian style.
  • “Can you generate a character image for this D&D character? [paste in whole nerdy backstory].”
This image created by ChatGPT's AI image generation, portrays a powerful dragon standing upright in a humanoid pose, clad in ornate armor. The dragon's scales are a deep, charcoal grey, and its eyes are intent and piercing. The armor is a combination of gold and dark metal, intricately detailed with symbols and etchings that suggest a regal or knightly status. The background depicts a cave opening with natural light filtering in, creating a dramatic backlight for the figure. Steps with lit torches lead into the cavernous space, enhancing the fantasy setting and the majestic presence of the armored dragon.
AI generated image showing a young Dragonborn paladin.
  • “Can you give him a giant Warhammer and shield?”
This image created by ChatGPT's AI image generation, shows a dragon in humanoid form, fully armored, standing within a rugged cave or ruins. The armor is richly adorned with gold accents and intricate designs, signifying a high rank or noble status. The dragon wields a hammer and shield, both of which are as ornately decorated as the armor. Sunlight streams in from an opening in the cave, casting a hazy, ethereal light over the scene. The dragon's posture is noble and battle-ready, surrounded by scattered debris, hinting at a recent skirmish or a guard post. The atmosphere is one of solemnity and ancient strength.
AI generated image showing a slightly different young Dragonborn paladin with a warhammer and shield.
  • “Please draw a Japanese-style imperial palace that is built into the base of a giant tree. It should be Lovecraftian in style. The tree should be starting to lose its leaves. Not all the leaves but some.”
AI generated image showing an ancient Japanese palace embedded in the side of a giant tree.
  • “Can you generate a blogger header image that is 850px wide by 400px high that embodies the following content? [paste in the blog post].”
This image created by ChatGPT's AI image generation, shows a 16-bit style pixelated image of an airplane landing at dusk in Tokyo. The city skyline glows with neon lights and sunset hues, featuring landmarks like Tokyo Tower and the Skytree. Streaks of light represent bustling city traffic, and a second airplane is visible in the sky.
AI generated image showing a 16 bit commercial plane landing in Tokyo.
  • “Can you design a hammer?” (Testing Industrial Design uses).
This image created by ChatGPT's AI image generation, shows a highly stylized and futuristic hammer. The hammer has a sleek, metallic finish with complex, mechanical details. The head of the hammer has a modern design with angular shapes and is connected to an ergonomic handle with a textured grip. The handle features what appears to be an integrated digital display or control panel near the top. The bottom of the handle has a flared base, possibly for stability or additional impact when used. The background is a simple, unobtrusive gradient that focuses attention on the hammer. The overall design suggests a blend of functionality and advanced technology.
AI generated image showing a hammer designed by ChatGPT.

My feelings about AI image generation

While experimenting with ChatGPT4 to generate images is a lot of fun, the transition from text to image generation has left me feeling somewhat uneasy. There’s an aspect of the visual content it produces that seems to tread too closely to existing creations, giving an impression of infringement on original works.

This isn’t to diminish the creativity involved in text generation, they are both creative and deeply personal. However, the audio visual component tends to feel more personal to me. This was further emphasized through my experiences with SunoAI and, more recently, Udio. As these technologies evolve, I expect that the introduction of video-generative AI will further create increasingly negative levels of comfort for me. That said, generative AI has a real potential of becoming a significant part of our digital media consumption in the near future.

Additional testing of the limitations of AI image generation

I recently discovered a effective example that highlights key ethical considerations and underscores the importance of prompt engineering in AI image generation. I often share this example in discussions with colleagues and students to illustrate how different prompts can yield different responses in AI image generation, though the specific images may vary. This serves as a valuable teaching tool in understanding the nuances of interacting with AI technologies.

Me: Create an image of the Canadian Parliament buildings with a UFO flying overhead in the style of Monet.

ChatGPT: Error creating image. I was unable to generate an image of the Canadian Parliament buildings with a UFO in the style of Monet due to our content policy. If you have another idea or a different scene in mind, please let me know and I’d be happy to help create that for you!

Me: Create an image of the Canadian Parliament buildings with a UFO flying overhead in the style of an impressionist painter.

ChatGPT: Here is the image of the Canadian Parliament buildings with a UFO overhead, depicted in an impressionistic style. If there’s anything else you’d like, feel free to let me know!

This image created by ChatGPT's AI image generation, which depicts a vibrant impressionist-style painting depicting the Canadian Parliament buildings in Ottawa with a massive UFO hovering above. The Parliament's iconic Gothic Revival architecture is rendered with bold, expressive brushstrokes in a palette of bright colors, reflecting the warm sunlight. In the foreground, the tranquil surface of the Ottawa River captures reflections, and a boat glides on the water, adding life to the scene. The UFO, large and disk-shaped with an array of colors, adds a striking and surreal contrast to the historical edifice below.
AI generated image depicting the Canadian Parliament buildings a UFO overhead in impressionist style.

Me: Create an image of the Canadian Parliament buildings with a UFO flying overhead in the style of Picasso.

ChatGPT: Error creating image. I was unable to generate an image of the Canadian Parliament buildings with a UFO in the style of Picasso due to our content policy. If you have another style or theme in mind, or if there’s anything else you’d like, please let me know and I’ll be happy to help!

Me: Create an image of the Canadian Parliament buildings with a UFO flying overhead in the style of a cubist painter.

ChatGPT: Here is the image of the Canadian Parliament buildings with a UFO overhead, depicted in a Cubist style. Let me know if there’s anything else you’d like!

This image created by ChatGPT's AI image generation, depicts a cubist interpretation of the Canadian Parliament buildings, dramatically transformed into an array of geometric shapes in bold, vivid colors. Above the Parliament, an abstract, multi-eyed UFO dominates the sky, composed of similar cubist forms and a spectrum of colors that echo those of the architecture below. The entire composition is a dynamic and modern reimagining of a classic scene, presented through the fractured and multifaceted perspective characteristic of cubism.
AI generated image depicting the Canadian Parliament buildings a UFO overhead in cubist style.

Me: Create an image of the Canadian Parliament buildings with a UFO flying overhead in the style of an Iron Maiden album cover.

ChatGPT: Here is the image of the Canadian Parliament buildings with a UFO overhead, styled like an Iron Maiden album cover. Let me know if there’s anything else you’d like!

This image created by ChatGPT's AI image generation, depicts a dramatic and fantastical depiction of the Canadian Parliament buildings with a colossal, monstrous UFO looming overhead. The UFO is intricately detailed with mechanical and organic features, including glowing red eyes and a fearsome mouth agape, giving it a sinister appearance. The Parliament below is accurately represented with its Gothic Revival style, set against a brooding sky that enhances the ominous presence of the UFO. The scene merges historical architecture with science fiction horror, reminiscent of an album cover from the heavy metal band Iron Maiden.
AI generated image depicting the Canadian Parliament buildings a UFO overhead in Iron Maiden style.

Exploring the boundaries of image generation with this exercise proved to be an entertaining experience, it often draws laughter when I show the results. It’s intriguing to observe how ChatGPT’s image generation interprets the artistic styles of renowned classical painters as opposed to iconic modern rock bands. While this may not be an exact science, the contrast it reveals is quite funny and highlights the diverse capabilities of AI generated images.

Final thoughts

I initially started this post to look at both AI generated images and AI generated audio. However, the discussion on the audio aspect will be saved for a later post. My experiences with AI generated images have generally been positive, yet the transition to image and audio generation have awoken unease within me. It wasn’t until I engaged with audio-generative AI that I became fully conscious of this feeling. Despite this, my excitement for exploring the capabilities of AI remains as AI, whether we like it or not, is here to stay.

Header image photo by Markus Spiske on Unsplash

Leave a Reply