Here’s a game for you. In 1948, Claude Shannon, the father of information theory, devised. He was attempting to model English as a random process. Pick out a random book from your shelf, open it, point to a random area on the page, and write down the first two letters you see. Let’s pretend they’re I and N. On your page, jot down those two letters.
Take another book off the shelf and look through it until you discover the letters I and N in that order. The following letter of your book is whichever character comes after “IN”—say, space. Now you grab another book from the shelf and search for an N followed by a space, noting which character will appear next. Rep until you’ve written a paragraph.
“IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID
PONDENOME OF DEMONSTURES OF THE REPTAGIN IS
REGOACTIONA OF CRE”
That isn’t English, but it has the appearance of being English.
Shannon was fascinated by the English language’s “entropy,” a measure of how much information a string of English text holds under his new framework. The Shannon game is a Markov chain, which means it’s a random process in which your next move is determined only by the present state of the process. The “IN NO IST” doesn’t matter once you’ve arrived in LA; the probability that the following letter is, say, a B is the probability that a randomly picked occurrence of “LA” in your library following by a B.
And, as the name implies, he didn’t invent the approach; it was about a half-century old and derived from, of all things, a bitter mathematical/theological feud in late-czarist Russian math.
In my opinion, the verbal conflict between sincere religious believers and movement atheists is almost inherently intellectually vapid. And yet, at least this time, it resulted in a huge mathematical breakthrough whose echoes have reverberated ever since. Pavel Alekseevich Nekrasov, who began his career as an Orthodox theologian before switching to mathematics, was a key figure in Moscow. In St. Petersburg, he faced off against his contemporary Andrei Andreyevich Markov, an atheist and a vehement opponent of the church. He was known as Neistovyj Andrei, or “Andrei the Furious,” since he wrote many angry letters to the press about societal issues.
The specifics are too numerous to go into here, but the gist is as follows: Nekrasov believed he had discovered mathematical evidence of free will, confirming the church’s doctrines. That was mystical gibberish to Markov. Worse, it was mysterious gibberish dressed up as mathematics. He created the Markov chain as an example of random behavior that could manufacture solely mechanically while still exhibiting the same characteristics Nekrasov believed guaranteed free will.
A spider traveling on a triangle with the corners labeled 1, 2, and 3 is a simple illustration of a Markov chain. The spider goes from its current perch to one of the other two corners it’s attached to at random with each tick of the clock. As a result, the spider’s journey would consist of a series of numbers.
1, 2, 1, 3, 2, 1, 2, 3, 2, 3, 2, 1 …
Markov began with abstract examples like these but later (possibly influenced by Shannon?) extended the concept to text strings, including Alexander Pushkin’s poem Eugene Onegin. For the sake of math, Markov thought of the poem as a series of consonants and vowels that he painstakingly cataloged by hand. Following consonants, 66.3 percent of letters are vowels, and 33.7 percent are consonants, but following vowels, just 12.8 percent of letters are vowels, and 87.2 percent are consonants.
So, just as Shannon created fake English, you can make “fake Pushkin”: if the current letter is a vowel, the following letter will be a vowel with a probability of 12.8 percent, and if the current letter is a consonant, the following letter will be a vowel with a possibility of 66.3 percent. The outcomes are unlikely to be poetic, but they may be recognized from other Russian writers’ Markovized work, as Markov discovered. The chain has captured some of its styles.
Nowadays, the Markov chain is a key tool for delving into the spaces of intellectual things far broader than poems. It’s how election reformers determine which legislative districts blatantly gerrymandered, and it’s how Google decides which Web sites are the most significant (the key is a Markov chain in which you start at a specific Web site and then follow a random connection from that site). A massive Markov chain instructs a neural net like GPT-3 on picking the next word after a sequence of 500, rather than the following letter after a series of two, which allows it to make uncanny imitations of human-written text. All you need is a rule informing you what probability applies to the next step in the chain based on the previous step.
You can train your Markov chain on anything, including your library, Eugene Onegin, and the massive literary corpus to which GPT-3 has access; you may teach it on anything, and the chain will replicate it! You can use baby names from 1971 to teach it, and you’ll get:
Kendi, Jeane, Abby, Fleureemaira, Jean, Starlo, Caming, Bettilia.
Alternatively, consider the following 2017 baby names:
Anaki, Emalee, Chan, Jalee, Elif, Branshi, Naaviel, Corby, Luxton, Naftalene, Rayerson, Alahna.
Alternatively, from 1917:
Vensie, Adelle, Allwood, Walter, Wandeliottlie, Kathryn, Fran, Earnet, Carlus, Hazellia, Oberta.
The Markov chain, despite its simplicity, somehow captures the style of naming conventions of many eras. It’s almost as though it’s a creative experience. Some of these aren’t too bad! Imagine a child named “Jalee” or, for a throwback flavor, “Vensie” in elementary school.
But maybe not “Naftalene.” Even Markov gives the nod.