Subtitles section Play video
When observing the natural world, many of us
notice a somewhat beautiful dichotomy.
No two things are ever exactly alike,
but they all seem to follow some underlying form.
And Plato believed that the true forms of the universe
were hidden from us.
Through observation of the natural world,
we could merely acquire approximate knowledge of them.
They were hidden blueprints.
The pure forms were only accessible
through abstract reasoning of philosophy and mathematics.
For example, the circle.
He describes as that which has the distance
from its circumference to its center everywhere equal.
Yet we will never find a material manifestation
of a perfect circle, or a perfectly straight line.
Though interestingly, Plato speculated
that after an uncountable number of years,
the universe will reach an ideal state,
returning to its perfect form.
This platonic focus on an abstract pure forms
remained popular for centuries.
And it wasn't until the 16th century,
when people tried to embrace the messy variation
in the real world, and apply mathematics to tease out
underlying patterns.
Bernoulli refined the idea of expectation.
He was focused on a method of accurately estimating
the unknown probability of some event,
based on the number of times the event occurs
in independent trials.
And he uses a simple example.
Suppose that without your knowledge
3,000 light pebbles and 2,000 dark pebbles
are hidden in an urn.
And that, to determine the ratio of white versus black
by experiment, you draw one pebble
after another, with replacement, and note how many times
a white pebble is drawn versus black.
He went on to prove that the expected value of white
versus black observations will converge on the actual ratio,
as the number of trials increases.
Known as the Weak Law of Large Numbers.
And he concluded by saying, if observations of all events
be continued for the entire infinity,
it will be noticed that everything in the world
is governed by precise ratios and a constant law of change.
This idea was quickly extended, as it
was noticed that not only did things
converge on an expected average, but the probability
of variation away from averages also
follow a familiar underlying shape, or distribution.
A great example of this is Francis Galton's bean machine.
Imagine each collision as a single independent event,
such as a coin flip.
And after 10 collisions, or events,
the bean falls into a bucket, representing
the ratio of left versus right deflection.
Or heads versus tails.
And this overall curvature, known
as a binomial distribution, appears
to be an ideal form, as it kept appearing everywhere,
any time you looked at the variation
of a large number of random trials.
It seems the average fate of these events
is somehow predetermined, known today
as the Central Limit Theorem.
But this was a dangerous philosophical idea to some.
And Pavel Nekrasov, originally a theologian by training,
later took up mathematics, and was a strong proponent
of the religious doctrine of free will.
He didn't like the idea of us having this predetermined
statistical fate.
And he made a famous claim, that independence
is a necessary condition for the law of large numbers.
Since independence just describes these toy examples
using beans or dice, where the outcome of previous events
doesn't change the probability of the current or future
events.
However, as we all can relate, most things
in the physical world are clearly
dependent on prior outcomes.
Such as the chance of fire, or sun, or even
our life expectancy.
And when the probability of some event
depends, or is conditional on previous events,
we say they are dependent events, or dependent variables.
So this claim angered another Russian mathematician,
Andre Markov, who maintained a very public animosity
towards Nekrasov.
He goes on to say in a letter that, this circumstance prompts
me to explain, in a series of articles,
that the law of large numbers can
apply to dependent variables.
Using a construction, which he brags,
Nekrasov could not even dream about.
So Markov extends Bernoulli's results to dependent variables,
using an ingenious construction.
Imagine a coin flip, which isn't independent, but dependent
on the previous outcome.
So it has a short term memory of one event.
And this can be visualized using a hypothetical machine, which
contains two cups, which we call states.
In one state, we have a 50/50 mix of light versus dark beads.
While in the other state, we have more dark versus light.
Now one cup we can call state zero.
It represents a dark having previously occurred.
And the other state, we can call one,
it represents a light bead having previously occurred.
Now to run our machine, we simply start in a random state,
and make a selection.
And then we move to either state zero or one,
depending on that event.
And based on the outcome of that selection,
we output either a zero, if it's dark, or a one, if it's light.
And with this two-state machine, we
can identify four possible transitions.
If we are in state zero, and a black occurs,
we loop back to the same state and select again.
If a light bead is selected, we jump over
to state one, which can also loop back on itself,
or jump back to state zero, if a dark is chosen.
The probability of a light versus dark selection
is clearly not independent here, since it
depends on the previous outcome.
But Markov proved that as long as every state in the machine
is reachable, when you run these machines in a sequence,
they reach equilibrium.
That is, no matter where you start,
once you begin the sequence, the number
of times you visit each state converges
to some specific ratio or probability.
Now this simple example disproved Nekrasov's claim
that only independent events could
converge on predictable distributions.
But the concept of modeling sequences
of random events using states, and transitions between states,
became known as a Markov chain.
And one of the first, and most famous applications
of Markov chains, was published by Claude Shannon.