Markov Chains

A Markov chain is an algorithm first described by Andrej Markov in a 1913 paper where Markov used the method to construct — one character at a time — a line of text based on the probability sequences of letters from Eugen Onegin. Later (1948), in "A Mathematical Theory of Communication," Claude Shannon used the method to generate the "approximation to English":

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.

The central concept of the algorithm is that any given sequence of discrete symbols has embedded within it the capability to make an educated guess about the next symbol in a similar sequence based on probability.

Markov Chain Text Generation with Markovify

Neural Networks

textgenrnn

A recurrent neural network, is a kind of machine learning where the computer models a series of predictions in layers, and an LSTM (long-short-term-memory) uses those layers to pay attention to the success of predicting the next item in a sequence based on the vector contexts. Andrej Karpathy (another Andrej!) described this in an influential blog post, "On the Unreasonable Effectiveness of Recurrent Neural Networks." The textgenrnn Python module my Max Woolf simplifies many of the processes involved in training an LSTM recurrent neural network

Textgenrnn demo

GPT

The well-known GPT-2 and GPT-3 are "Generative Pre-trained Transformer" neural networks. They are "Generative Transformers" because like an LSTM RNN, the basic thing it's doing is predicting the next word or token in a sequence. It does this be seeking to understand the context (a vector) for all symbols in the text in order to give weight to their usefulness in predicting other, less significant symbols. I think of it like an image with different colored regions. I imagine the transformer understanding that red-hued pixels in one region likely mean that other pixels in the same region will also be red. An image is a 2-dimensional matrix of information — so imagine a text sample as an n-dimensional space where n is the number of unique tokens in the sample.

GPT-2 and GPT-3 only work with a pre-trained model that is drawn largely from Common Crawl.

GPT Demo