Diapers – An Annotation
On Potty Training and AI Training
Training an AI is not the same as potty training a child, but I couldn’t dismiss the thought on my walk home today: if the people around us aren't constantly making a mess of themselves, it’s only because a caregiver was once focused on training the tiny versions of them to mind their business on a toilet. Before they learn, those tiny people are perfectly content to go anywhere—their diapers, their beds, or right on the floor.
I spent a massive portion of my life changing diapers, and I don’t regret a second of it. Not the time, not the diapers, not even the literal shit. In those first weeks of the journey, you’d never imagine you would one day celebrate a dirty diaper. But you do, because every one of them is a milestone of nutrition, hydration, and growth.
While the rest of the world moved on with its discoveries, trends, and tragedies, I stayed focused on diapers and the milk flooding from me. I never guessed I could produce that much; it was as if I had a kitchen sink in my chest instead of breasts.
I never kept a tally of the diapers I changed. But walking down the street today, I thought of every caregiver who dedicated hours and days to training little humans how to eat and how to go. My "little humans" are big now. They can use a toilet. They even say "eww" when they learn they were exclusively breastfed for six months without so much as a sip of water—all of it provided by my "sink-body."
I think about the friends, family, and strangers who were out doing "important" things while I was breastfeeding and changing diapers. Those micro-moments don’t mean anything to most people; they are forgotten by history. There is no Nobel Prize for potty training or feeding a child, just as there is no glory in training an LLM.
And yet, training an LLM is the same. We are constantly working hard to prevent the models from "speaking shit." It isn’t always possible — there are still too many accidents! In the industry, we call it RLHF- Reinforcement Learning from Human Feedback. It’s a clinical term for a very parental process: sitting with the model, hour after hour, and telling it which behaviours are "good" and which are "accidents." And then, let them learn with mistakes and accidents. They learn and grow. Where there is no visible, monumental work, remember this: the invisible labor of the trainers—the caregivers and the RLHF specialists— keeping the world from becoming a much messier place.