Are you looking for a way to test ChatGPT? Try and get it to generate something that has procedural tasks, such “create an image of a step by step guide to make a paper airplane” or “create an image of a step by step guide to make an origami rabbit“.

This week I was asked to speak to a high school class about the challenges of artificial intelligence. As I prepared for the session I had the opportunity to chat with the teacher for the class. I talked about my testing with ChatGPT and my latest blog post about artificially generated images.

ChatGPT is really bad at some things

I made the comment, it’s really bad at drawing hands, especially when asking it to create someone with just four fingers (don’t ask, it was a Dungeons & Dragons request). I have included some of the testing I’ve done (on more than one occasion, with hand tests). The results are always similar.

Me: Create an image of a dwarf with four fingers.

ChatGPT: Here’s the portrait of the dwarf character with four fingers. Let me know what you think!

This image depicts a portrait of a dwarf character set against a mystical forest background. The dwarf features a rugged appearance with a thick, wavy brown beard and long hair, deep blue eyes, and a stern expression. His face shows a scar over his right eye, adding to his battle-hardened look. He's adorned with leather and metal armor, suggesting a warrior or blacksmith role, and detailed with rustic straps and buckles, highlighting his fantasy setting.
Image of a dwarf with four fingers, according to ChatGPT

Me: Show me his hands! He should only have four fingers on his right hand. He lost one of his fingers in an accident.

ChatGPT: Here is a detailed close-up of the dwarf character’s right hand, showing that he has only four fingers.

This image displays a close-up view of a fantasy dwarf's right hand, showcasing that he has only three fingers. The hand is robust and weathered, with visible callouses and a rough texture, highlighting a life of manual labor. The backdrop is softly blurred, suggesting a rustic and aged workshop environment.
Closeup of a dwarven hand generated by ChatGPT when prompted to make it with four fingers.

Me: Make it three fingers…

ChatGPT: Here’s the updated image showing the dwarf character’s right hand with only three fingers.

This image features a close-up of a fantasy dwarf character's right hand, displaying only three fingers. Each finger is detailed with rugged, calloused skin, indicative of heavy labor. The hand is part of a larger, muscular arm covered with a leather bracer adorned with metal details, hinting at the dwarf's warrior or craftsman role. The lighting highlights the textures of the skin and leather, enhancing the tactile quality of the image.
Final attempt of ChatGPT to make a “three fingered” dwarf.

I guess it worked? I still think this dwarf hand has four fingers, you can see the knuckle of the fourth finger.

ChatGPT is also really bad at procedural tasks/instructions

In our discussion, the teacher pointed out that ChatGPT also struggles with procedurally generated instructions that utilize images. He cited an example involving origami rabbit instructions.

Like always, I did some testing.

Me: Can you create an image of instructions for creating a paper airplane?

ChatGPT: Here is the step-by-step instructional illustration for creating a paper airplane. Each panel shows a different step in the process, designed to be easy to follow. You can view and use the image to make your own paper airplane.

This image shows a step-by-step guide for folding a paper airplane. It consists of nine panels, each illustrated in a detailed, black-and-white comic style. The sequence starts with a stack of paper and progresses through various folds, demonstrated by hands, to form a paper airplane. The steps are numbered from 1 to 5, repeated as necessary to indicate the progression through the panels.
An attempt by ChatGPT to create step-by-step instructions for making a paper airplane.

Me: Now origami instructions for making a rabbit.

ChatGPT: Here is the step-by-step instructional illustration for creating an origami rabbit. Each panel illustrates a different folding step, making it easy to follow. You can view and use the image to create your own origami rabbit.

This image displays a step-by-step guide for folding an origami rabbit, illustrated in a detailed, black-and-white comic style across nine panels. The panels feature various tools and folds needed to create the origami figure. The instructions start with a square piece of paper, show the necessary folds through hands demonstrating each step, and progress to the final form of a rabbit. Each panel includes arrows and text captions, likely indicating directions for the folds, although the text appears to be in a jumbled or coded language.
ChatGPT attempt to create instructions for folding an origami rabbit.

Me: Now origami instructions for making the titanic sinking.

ChatGPT: Here is the step-by-step instructional illustration for creating an origami version of the Titanic sinking. Each panel illustrates a different stage of folding, designed to visually depict the ship’s sinking. You can view and use the image to create your own origami Titanic.

This image is a step-by-step guide for folding a paper model of the Titanic, presented in nine panels. The instructions begin with a simple rectangular piece of paper and proceed through various folds, demonstrated by drawings of hands manipulating the paper. The sequence includes shaping the basic form, adding intricate details like the ship's funnels, and culminates in a completed paper model of the Titanic, depicted with smoke coming from its funnels. The illustrations are simple, black and white line drawings, and each step is clearly numbered to guide the viewer through the folding process.
ChatGPT’s attempt to create an origami sinking Titanic.

Okay, so the last example is a bit cheeky, but I wanted to test the absurdity of this prompt. ChatGPT consistently struggles with images focusing on procedural tasks.

Can you spot the errors?

The errors are numerous. One doesn’t have to look beyond the simple numbering on the instructional images to see where ChatGPT struggles. This is consistent between the three tests I ran above. This isn’t new, consider a prompt I gave ChatGPT months ago when I asked it to create a mind map. It just doesn’t handle numbers and words very well.

This image is a colorful and detailed mind map focusing on Japan's use of soft power to influence and impact Western countries. The central theme is highlighted in a large pink circle, from which numerous branches extend, categorizing various cultural exports and influences such as anime, cuisine, martial arts, and technology. Each branch is further subdivided into specific examples, with each element contained in its own uniquely colored oval, making a complex web of interconnected influences. The background features a stylized, intricate illustration of cityscapes and cultural icons, adding a visual richness to the information presented. Unfortunately, most of the text is illegible as the text was generated by ChatGPT.
ChatGPT generated mind map on the topic of Japan’s use of soft power, shows it’s inability to coherently generate words in an image.

With the paper airplane, you have to look closely. However, when you do, you can spot tons of errors. Once again it struggles with numbers.

In the image above with instructions for the origami rabbit, I lose the concept between steps 2 and 3. That’s a pretty big jump from a triangle shaped paper to a fully developed rabbit. The realism is uncanny.

With the Titanic, I think the errors are pretty glaring…

Wrapping up

I am continually amazed by ChatGPT’s capabilities. The technology not only impresses me but it also presents challenges and solutions for the information profession and society at large. Procedural tasks remain a serious issue for generative AI, and the above images serve as excellent examples of its limitations. However, as these models evolve, I am confident they will get better. Just consider the remarkable advancements in generative AI video over the past year.

Video generative AI is something I have not yet tested and I will soon publish a post about my experiences testing generative AI audio.

