N-gram Visualization

Here, we visualize inference for character-level n-gram language models using a trigram language model (predicting the next character based on the previous two characters) as an example. A trigram model was chosen so that the heatmap would be easier to visualize and allow rolling over with your mouse to see the probabilities, but this would work fine for a bigram or higher-order model as well. The model comes from the counts for each trigram during training (in this case on 30,000 names). Smoothing is applied to avoid zero probabilities.

These visualizations were inspired by my learning from Andrej Karpathy's ngram github repo, his excellent makemore video, and the very nice book chapter that he points to from the repo page. The model itself, along with the heatmap, are directly based on what Andrej has on the above repo. The random number generator used for sampling is my direct port to javascript of the one he uses in the repo so that sampling should be consistent given the same seed.

The first page (Explore) lets you familiarize yourself with a 2D heatmap representation of the conditional probability distributions for the next character given the previous two characters. With blanks selected for the initial characters, you can see and rollover the whole heatmap. You can choose characters to condition the heatmap and see the probabilities of the next character given those two characters. Conditional probabilities are also shown in three different graphical representations: a horizontal cumulative probability line, which will come into play when we start sampling on the following page, a probability mass function bar chart, and a heatmap row that only shows the relevant portion of the overall heatmap for the selected characters.

The second page (Sample) lets you sample from the conditional probability distributions and see how the probability distributions change as you continue to sample. You start by default with two return characters (\n\n), which gives an unbiased initial sample, but you can set these to whatever you want. Try the first couple of characters of your own name, for example.

The third page (Generate Names) lets you generate baby names by sampling from the n-gram model. Setting the birthday month and day seeds the random number generator, and pressing 'Generate names' will create 10 baby names.