Nondeterministic Otomads: Markov Chains

Last updated 2026- 2-23, 2: 0:18 A.M. UTC+00:00

Day 17, Otomad Advent Calendar 2025

About this article

This article has some overlap with this video:

Being familiar basic probability and linear algebra will help in a later section.

I came across this last year:

Since then I've always felt that there's probably a little bit more that can be done with this idea. Something about this just feels Otomad-ish to me. (I'll let you be the judge whether this counts as an Otomad proper!)

What is a Markov chain?

To be honest, I think we can just look at Markov chains as a more intentional way to introduce randomness and sequencing into something, but let's introduce them formally. That's right, I tricked you into reading about math.

Probability measures

Recall that the discrete measure \(\Pr\) must satisfy \(\sum_{s \in S}\Pr (s) = 1 \). Sometimes we don't specify the normalized probability explicitly, if we have weights \(T: S \to \mathbb [0, \infty)\) instead, the standard normalization is: \[\Pr(s) := \frac{T(s)}{\sum_{s' \in S} T(s')}\] which is just computing the \(T\)-percentage of \(s\in S\), and makes sense so long as \(T(s) \neq 0\) for at least one \(s\).

Suppose we have a set of \(n\) states \(S = \{s_1, ..., s_n\}\). A discrete-time finite-state Markov chain is the corresponding set of \(n\) probability measures \(\{\Pr_{s_i}: S \to [0,1]\}_{s_i \in S}\). Each \(\Pr_{s_i}\) specifies the probability of transitioning to each possible next state, given that we currently at \(s_i\).

In the video earlier, \(S = \{し, か, の, こ, た, ん, \text{Space}\}\), and the \(\Pr\)'s are specified on the graph. For example, \(\Pr_し (か) = \Pr_し (た) = 1/2\), which means that し transitions to か and た with equal probability (50%) -- the value of \(\Pr_し\) on all other states is zero, so we don't draw the arrows. Similarly, の transitions to こ with \(\Pr_の (こ) = 100%\) of the time, and こ transitions to の 50% of the time, to itself 25% of the time, and to し 25% of the time. We reproduce the graph for the Markov Chain here:

#import "@preview/finite:0.5.0"

#figure(finite.automaton((
  か: (の:1),
  の: (こ:1),
  し: (か:1/2, た: 1/2),
  た: (ん:1),
  こ: (こ:1/4, し:1/4, の: 1/2),
  ん: (た:1/2, Space:1/2),
  Space: (Space:1/2,し:1/2)
), layout:finite.layout.custom.with(positions: (
  か: (9,4),
  の: (12,2),
  し: (6,2),
  た: (3,0),
  こ: (9,0),
  ん: (0,2),
  Space: (3,4)
)), initial: (), final: ()))

The weights given here aren't arbitrary either. In "しかのこのこのここしたんたん", こ transitions to の twice, to itself once, and to し once. And this corresponds to 50%, 25%, 25% as observed before. This empirical method of assigning weights is a form of Maximum Likelihood Estimation (MLE).

We can connect this model to Otomads by treating each state as a sample. We play the sample, and by the time we are done, we randomly transition to the next state based on the Markov chain, and then play that next sample. This is what the video we saw earlier does.

Live Demo

Click on the "Start" button to start.

This simulation will terminate itself when it reproduces "しかのこのこのここしたんたん_".

Other fun things

Nondeterministic loops - サビが始まらない炉心融解.

Why don't you render all this out ahead-of-time ("bake the simulation")?

For the same reason why you might not want to render out a DJ set ahead of time.

Derandomization

A Markov chain with randomized transition distributions replaced by deterministic transitions are called state machines (or finite automata).

If you think about it, this is one way to introduce nonlinear storytelling (in the sense of branching paths) into videos (so long as you don't worry about how you would post them to video sites). I know this was done a long time ago on YouTube with video annotations and interactive videos on Bilibili, which present choices like in adventure (ADV) games.

Visual novel choice screen

Perhaps this would be useful for more story-driven works like those made by PakaShiro. And if you are interested you can look at how this has been explored in visual novels.

Ergodicity

Ergodicity roughly describes qualitative behavioural ("global") properties over long periods of time in a dynamical system defined by "local" interactions inside the system. We can extract such properties from Markov chains.

A Markov chain is called irreducible if we can go from any state to any other and back with positive probability.

The period of a state \(s_i\) is the number \(d_i = \gcd \{ n \geq 1 : \Pr (s_i \to s_i \text{ in } n \text{ steps}) > 0 \}\) -- that is, the GCD of path lengths for a state to return to itself. An entire irreducible Markov chain is called aperiodic if \(d_i = 1\) for any given state.

Warning about infinite chains

Infinite-state irreducible aperiodic Markov chains need to also be positively recurrent to be called ergodic, that is, every state is expected to return to itself in finite time. This is not satisfied, say, for 1D random walk.

Notice that the finite-state Markov chain we studied at the beginning is irreducible and aperiodic! In such cases, the Markov chain is called ergodic. Irreducibility is rather clear, we see aperiodicity from the simple observation that the state \(の\) is able to return to itself within any \(k\) number of steps where \(k \geq 2\), since we can always have のこの, のここの, のこここの... all with positive probability, and \(\gcd \{2,3,...\} = 1\).

Ergodic Markov chains have a fun property: there exists some threshold \(N > 0\) for which any \(k > N\) has \(\Pr(s_i \to s_j \text{ in } k \text{ steps}) > 0\) for any states \(s_i ,s_j\). In other words, there exists a threshold for which there always exists sequences of states longer than the threshold length that the Markov chain can produce with positive probability which start and end on arbitrary states.

Stationary distribution

We can represent the Markov chain as a stochastic matrix \(P\), where \(P_{ij} := \Pr_{s_i} (s_j)\), that is, the \(i\)-th row of the matrix is the weight vecotr \((\Pr_{s_i} (s_1),...,\Pr_{s_i} (s_n))\). For the example earlier, setting \(し = s_1, か = s_2, の = s_3, こ = s_4, た = s_5, ん = s_6, \text{Space} = s_7\), we have: \[ P = \begin{bmatrix} 0 & 1/2 & 0 & 0 & 1/2 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1/4 & 0 & 1/2 & 1/4 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1/2 & 0 & 1/2 \\ 1/2 & 0 & 0 & 0 & 0 & 0 & 1/2 \end{bmatrix} \]

Something interesting happens if we square \(P\), consider the \((i,j)\)-entry of \(P^2\):

\[\begin{aligned} (P^2)_{ij} &= \sum_{k=1}^n P_{i k} P_{k j}\\ &= \sum_{k=1}^n \Pr_{s_i}(s_k) \Pr_{s_k}(s_j)\\ &= \sum_{k=1}^n \Pr(s_i \to s_k \to s_j)\\ &= \Pr(s_i \to s_j \text{ in 2 steps}). \end{aligned}\]

That's right, I know what you are thinking: in general \(P^\ell\) gives us the probabilities that we transition between states in \(\ell\) steps.

Given an stochastic matrix for an ergodic Markov chain, we have a convergence theorem: \[ \lim_{n \to \infty} P^n = \lim_{n \to \infty} \frac 1 n \sum_{i=0}^n P^i = \begin{bmatrix} p_1 & \dots & p_n \\ \vdots & \ddots & \vdots \\ p_1 & \dots & p_n \end{bmatrix} \] That is, every row of the matrix becomes the same vector \( \pi = (p_1,\dots,p_n) \). Further, this \(\pi\) vector is also magically a left eigenvector of \(P\), with eigenvalue \(1\), that is (notice \(P(\lim_{n \to \infty} P^n) = \lim_{n \to \infty} P^{n+1} = \lim_{n \to \infty} P^n\)): \[ \pi P = \pi. \]

For this reason, \(\pi\) is called the stationary distribution, since it is fixed by \(P\).

We can actually extract a lot of information from this. For example, \(\pi\) gives us a ranking of states by the amount of time we are most likely to spend in them regardless of our starting position. We can think of taking successive powers of \(P\) as gradually diluting the intial conditions, and the limit converges to something uniform that does not depend on the starting position. After a large number of steps, I think it intuitively make sense that where we start is not very important anymore.

For the example above with \(し = s_1, か = s_2, の = s_3, こ = s_4, た = s_5, ん = s_6, \text{Space} = s_7\), we have \(\pi = (1/8, 1/16, 3/16, 1/4, 1/8, 1/8, 1/8)\). So, こ will appear more often than any of the others regardless of where we start, despite the fact that both こ and た have the same graph-theoretic indegree. This statistic can be useful if you plan on using Markov chains to model persistent ambiance.

Conclusion

Well, I hope this was all interesting to you. I'm sure I'm not the first one to consider this kind of ideas, but I wanted to write about it. It would be interesting to see more exploration into the sort of ideas about extensions of Otomads outside an entirely deterministic video format. I personally feel that there is a lot that can be done in this area (so long as you jump out of the constraint of having to put them up on regular video platforms -- which is not really all that crazy).

Now Loading..