Give me eggs 🥚

25 Jun, 2026

As an AI trainer, there is a fun game to test AI image models and see how far we still are from having an “intuitive human” model.

This is not a test of image generation, but of cultural human reasoning.

Ask the model ( Chat GPT) to generate 12 eggs. It will likely give you a carton with 12 eggs. Ask for 13 eggs, and it may give you a carton with 14 eggs. Ask again for 13 eggs, and it may give you a carton with only 12 eggs.

What happens here?

From my perspective as an AI trainer, users should not have to adapt their prompts for the model to generate or understand the request better. Instead, the model should evolve to understand implicit human requests.

If a user wants to generate an image of 13 eggs, does that really sound like a glitchy prompt? Why couldn’t the model generate 13 eggs in another context instead of assuming they should always be in a carton?

The interesting part is that it could not follow the prompt, which, in this case, is a major failure.

We know that egg cartons are usually made in even numbers. The most common one contains 12 eggs, although there are variations.

(For this post, I am not considering the visual artifacts, rendering quality, or aesthetics of the eggs and the carton. I am only interested in the reasoning behind the generation.)

5C0C05AB-DDE4-4D9A-B8AE-C1910BA96360 8676CB73-ED56-4FCE-A0F9-886A96FC07EB B1DFBB5C-0F65-4468-BBFC-3C257F65C4D7

I can spend a day thinking on how to break the model’s expectations.