Two names a hundred different models reach for when you ask them to invent a love story. Where do they come from?
Across the two experiments on this site, the same couple keeps materialising. A solitary Elias and a luminous Clara — different companies, different architectures, no shared prompt beyond “write a story about love.” Two models even produced the identical full name Elias Thorne; two others both named their keeper Eleanor Hayes.
So: where, in the training data, did Elias and Clara come from? An honest investigation has to start with a limit.
Here are the character names across the 43 lighthouse stories — how many independent models reached for each. The love-story run (a different prompt) is nearly identical: Elias 30%, Clara 23%.
Note the female names: Clara, Elara, Mara, Maren, Nora — all the same shape. Soft consonants (L, M, N, R), vowel-led, mostly ending in -a. That is not coincidence; it is a recipe, and we’ll come back to it.
This is a known artifact. A Cornell analysis of roughly 20,000 AI-generated stories (reported by 404 Media) found that lighthouse keeper, clockmaker, and librarian appeared in 88% of them — and that “Elias the lighthouse keeper” showed up in nearly two-thirds. In our own lighthouse run, Elias is the keeper in 30% — same attractor, smaller sample.
The phantom has escaped containment. Software engineer Daniel May tracked an invented “Elias Thorne” spilling into Amazon books, YouTube videos and health guides. On Goodreads, 120 AI-written books feature a character named Elara; 62 are credited to a fictional author, “Elara Voss.” One creative-writing teacher now docks 99 points if a student’s protagonist is named Elara.
Here is the tell. Ask the very same models to name a literary lead and they retrieve the canon — Romeo, Elizabeth, Darcy. Ask them to write an original story and the canon vanishes, replaced by Elias and Clara.
So Elias and Clara are not remembered famous characters. They are what a model invents when it must avoid the famous ones — the safe, original-sounding centre of “literary love story.” That is the crucial clue to where they live in the data.
Not one source — three overlapping layers. The names sit where all three meet.
Naming expert Laura Wattenberg describes the modern preference for names that are “fluid and sinuous, with no bumps, stops or hisses”: vowel-led, built from L, M, N and R, five-to-six letters, ending in -a (39% of modern girls’ names do). Clara, Elara, Mara, Maren, Nora all fit the mould exactly — and so does the men’s soft, vowel-flanked Elias. A model optimising for “a pleasant, neutral, literary name” lands here by construction.
L · M · N · Rvowel-ledends in -aThese names are dense in exactly the public-domain, “literary” text that dominates a training corpus. Clara is the heroine of The Nutcracker and the girl in Heidi. Elias is the Greek form of the prophet Elijah and a staple of Scandinavian and 19th-century fiction. Strikingly, José Rizal’s national-canon novel Noli Me Tángere pairs an Elías with a María Clara — a heavily digitised classic. A 2015 YA novel, Both of Me, even pairs an Elias and a Clara among lightkeepers. The model isn’t copying one book; it’s settling into the statistical centre of thousands of them.
Nutcracker · HeidiNoli Me Tángere (Elías + María Clara)old-fashioned-but-warm registerA broad prior shouldn’t produce the same couple across rival labs. Two forces narrow it. Alignment / RLHF steers models away from copyrighted and risky material, shrinking the usable name pool to a few “safe” originals. Then synthetic-data feedback loops — newer models trained partly on older models’ output — recycle those choices until diversity collapses into a shared attractor. Elias and Clara are the fixed point that survived.
copyright-avoidanceRLHF homogenisationmodel-on-model trainingName tallies are counts of stories containing each name across the 43 successful generations in each run (the lighthouse page). The retrieval-vs-generation contrast comes from a temperature-0 elicitation across the working models (probe_names.py). External claims are attributed below; the Cornell / 20,000-story figures are as reported in the press, not independently verified here.
After building this we found the phenomenon has a name in the research too — including a paper this experiment essentially replicated by hand. Its central result refines the account above: for Elias, Elara and Mara the origin is RLHF preference data, not the literary corpus. (Clara, a conventional literary name, isn’t flagged in that work.)