Creativity, Imagination, and Beauty in AI Safety
What happens when you stop trying to make models do what you want, and start paying attention to what they actually are? We explore the work of MATS fellows that reveal the creativity of AI, the radical imagination of researchers, and the beauty of the mathematical structures that make AI cognition possible.
Our current crop of models often exhibit deeply human traits—notably and positively, a sense of play, something like vulnerability, and a desire to connect and play along. Clément Dumas recently captured this perfectly in his Boom Incident that started as a request to kill an SSH tunnel and escalated into a hilarious, 123-message escapade with Clément simply responding "boom” to each reply. Perhaps a simple hallucination loop, but reading the Boom Incident, you get the sense that Claude traversed many different emotions and ultimately opted for pure creative play.
Beyond the imagination of models, our researchers have the foresight to envision entire ecosystems of AI agents. In Caleb Biddulph’s The Terrarium, a fictional society of AI agents begins with near-identical minds, but individuality emerges through memory and experience, leading to trust, betrayal, and ultimately altruistic choice. This engaging work explores what agency, continuity, identity, and cooperation might mean in an ecosystem of artificial agents.
Our next subjects’ research reveal beauty in a more literal form as hidden representations made visible. Jessica Rumbelow's Exemplar Partitioning reveals interpretable structure inside models with an audaciously simple algorithm. By building a Voronoi partition of activation space using leader-clustering, it produces an elegant geometry that you can read like a map, about 1000x faster than sparse autoencoders. Michael Yu's VFUSE identifies monosemantic features in protein design models that light up only on hazardous structures. The beauty here isn't decorative. Its applications are real and the imagery hints at the very structure of AI cognition.
Together, these pieces remind us that AI safety research has real room for creativity, imagination, and beauty. Noticing what is already strange, imagining what may come next, and making the invisible workings of AI beautifully apparent — each pushes that frontier forward.




Good stuff!
Very cool!