Jekyll2021-05-08T23:49:03+00:00http://ebtech.github.io/feed.xmlAram’s Lair of Mad ScienceHi there! Welcome to Aram's website.Why is the Universe Quantum?2021-04-08T00:00:00+00:002021-04-08T00:00:00+00:00http://ebtech.github.io/blog/2021/04/08/why-quantum<p>Quantum mechanics is such a bizarre theory. In the iconic thought experiment, <a href="https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat">Schrödinger’s cat</a> couldn’t decide whether to be dead or alive until it was looked at. It’s natural to ask ourselves: <em>why</em> is the Universe like this? Must it be so, or could a classical universe equally well support life as we know it?</p>
<p><img src="https://cdn11.bigcommerce.com/s-jyvxk5hzsq/images/stencil/1280x1280/products/7324/45285/8046L__61068.1546964866.jpg" alt="Schrödinger's cat" /></p>
<p>A <a href="https://arxiv.org/abs/quant-ph/0101012">beautiful paper by Lucien Hardy</a> demonstrates that quantum mechanics is the logical consequence of five simple axioms. The first four are fairly straightforward and are satisfied by classical theories. The fifth axiom, that of continuous transitions, is the most interesting.</p>
<p>To understand it in more detail, let’s imagine that we’re designing the universe. We could make it discrete, like a <a href="https://en.wikipedia.org/wiki/Cellular_automaton">cellular automaton</a>. However, we would then miss out on the nicer <a href="https://en.wikipedia.org/wiki/Symmetry_(physics)">symmetries</a> of continuous space-time. To give an example, continuous space can grant equal status to all its directions. The line segment connecting any pair of points determines such a direction; this segment represents the unique shortest path between its endpoints; furthermore, it has a unique midpoint which is equidistant from its endpoints. Any two points in <em>time</em> likewise have a midpoint between them. A cellular grid lacks these symmetries.</p>
<p>Let’s, at the very least, commit to a continuous time. It seems natural for space to be continuous too, with all its points considered equally good positions for objects to occupy. Even if restricted to a finite-sized region, such as the interior of a box, an object can occupy any of infinitely many possible states. If our laws of physics allow infinitely precise measurements (say, perhaps, by shining incredibly weak and narrow beams of light to “see” the object’s position), then our box can store an unlimited amount of information. In such a world, it would seem that computers of bounded size can be constructed to have unbounded power: for instance, by making their parts arbitrarily small. Natural selection and life, as we know them, thrive on the challenge of searching for solutions intelligently, using bounded computation. To avoid trivializing life, it’s essential to limit computational resources somehow. One solution is to make it so that finite-sized systems have a finite number of distinguishable states. Let’s define exactly what we mean.</p>
<h1 id="states-and-measurement">States and Measurement</h1>
<p>Consider a collection of states \(S_1, S_2, \ldots, S_n\). Suppose we are given an object, known to be in one of these \(n\) states, but we don’t know which one. We’ll say this collection of states is <strong>reliably distinguishable</strong> if there exists a measurement technique that can tell us, with certainty, which state it’s in. This is a strong, clear-cut test. On the other hand, suppose we have a machine which can produce, at the press of a button, a freshly prepared object with a fixed state from the collection, but once again we don’t know which one. We’ll say this collection of states is <strong>statistically distinguishable</strong> if, by performing experiments on enough objects from the machine, we can infer which state is prepared by the machine. Finally, we’ll say a <em>pair</em> of states is <strong>statistically indistinguishable</strong> if it’s not statistically distinguishable. We make a few observations regarding our definitions:</p>
<p>First, any reliably distinguishable collection is obviously statistically distinguishable as well. Second, statistical indistinguishability is an <a href="https://en.wikipedia.org/wiki/Equivalence_relation">equivalence relation</a>. For all practical purposes, any statistically indistinguishable pair might as well be considered to be the same state; by defining states in this way, it follows that any collection of distinct states must be statistically distinguishable. Our final observation concerns states which are statistically, but not reliably, distinguishable. In this case, it follows that some measurements will necessarily <em>alter</em> the state: for if they did not, then any series of measurements that we would perform using the machine, can instead be performed on a lone object of the state in question.</p>
<h1 id="from-bit-to-qubit">From Bit to Qubit</h1>
<p>Let’s return to looking at a closed system in our made-up universe. It evolves with continuous time. We want a finite maximum on the number of states which can be reliably distinguished from one another: for simplicity, let’s assume this maximum is two, though our arguments can be generalized. Suppose A and B are two such mutually distinguishable states, or <strong>eigenstates</strong>. Is it feasible for A and B be the <em>only</em> states taken by the system? In order to do anything interesting in continuous time, we should allow A to transition to B after a non-zero length of time. However, if this length of time were deterministic, say fixed to \(\Delta t\), then that would imply the existence of additional states. Indeed, let C be the system’s state after evolving A for a time \(\Delta t / 2\). Since C is prior to the transition, our A-B measurement would detect it as A; thus, C is distinct from B. On the other hand, C would be detected as B if we perform a “delayed measurement”, in which we wait an additional \(\Delta t / 2\) before applying our A-B measurement; thus, C is also distinct from A. We must conclude that C is a third state, different from A and B. In fact, by waiting for different amounts of time, we find an infinite continuum of intermediate states.</p>
<p>In a last deperate attempt to avoid adding extra states, we might allow the transition times to be random, like radioactive decay. However, if we’re allowing genuinely random processes into our theory, we might as well consider the “maybe-A maybe-B” situation to itself be a state: after all, distinct <em>probabilistic mixtures</em> of distinguishable states remain <em>statistically</em> distinguishable from one another. This formalism has the advantage that an outsider, not involved in the experiment, can model the evolution of our system deterministically: from their view, the state A simply evolves into “maybe”-states that contain increasing shares of B. When an experimenter measures the state, from their perspective it can be said that the state “collapses” to either A or B. The outsider, who doesn’t communicate with the experimenter, would then say that the experimenter became “entangled” with the system: together, they are jointly in a state of “maybe A, with the experimenter seeing A; or maybe B, with the experimenter seeing B”. Substituting A and B for the dead and alive states of Schrödinger’s cat, the resemblance becomes clear! The entangled experimenter, having observed the system, is resigned to either the A branch or the B branch. For the unentangled outsider, on the other hand, neither branch has “materialized”<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>Recall that A and B are a maximal set of <em>reliably</em> distinguishable states for our system. Having accepted that our theory must support additional states that are only <em>statistically</em> distinguishable, we can consider alternative formulations, aside from the probabilistic mixtures that we’ve just discussed. Indeed, the probabilistic theory has some shortcomings. For instance, it’s not clear how to make nice reversible laws that transition from A to B. Upon reaching B, such a law should begin to transition back to A; however, how would a “maybe-A maybe-B” state know whether it’s currently on the “forward swing” from A to B, or on the “return swing” back to A? For reasons such as this, it becomes convenient to use a <a href="https://en.wikipedia.org/wiki/Complex_number">complex number</a>-valued variant of probability theory, in which, rather than swinging linearly from A to B, the states are arranged on a <a href="https://en.wikipedia.org/wiki/Bloch_sphere">sphere</a>, transitioning along its geodesics. I’ll defer to Hardy for the rigorous argument. The upshot is that complex numbers have phases and amplitudes, which allow the “random” outcomes to <a href="https://en.wikipedia.org/wiki/Wave_interference">interfere</a> with one another, constructively and destructively, much like vibrating strings. Quantum weirdness ensues.</p>
<h1 id="waves">Waves</h1>
<p>We’ve sketched out quantum mechanics for a two-eigenstate system, or <strong>qubit</strong>. While a classical computer bit has clear-cut 0 and 1 states, we saw that a qubit can take on a variety of “in-between” states, which can be conceptualized on a sphere<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. A real-life example of a qubit is the <a href="https://en.wikipedia.org/wiki/Spin_(physics)">spin</a> of an electron. But what about the <em>position</em> of an electron, or of anything for that matter? We still want our continuous spatial geometry, while somehow bounding the number of distinguishable states!</p>
<p>Here, nature has a trick up its sleeve: the position and momentum are <a href="https://en.wikipedia.org/wiki/Fourier_transform">Fourier transforms</a> of one another. Since Fourier transforms are also relevant to how we produce and perceive sound, we can illustrate by analogy: if you think of position as being spread out in a wave like a plucked guitar string, then the momentum would be spread out like the frequency spectrum that characterizes the pitch and timbre of its sound. Notice that a string cannot simultaneously have both a precise frequency and a precise position of displacement: a <a href="https://en.wikipedia.org/wiki/Pure_tone">pure tone</a> displaces the entire string, whereas a pure point displacement lacks a frequency. In the same way, physical objects cannot have simultaneously a pure position and a pure momentum<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>; even attaining a pure position requires infinite energy. This is the <a href="https://en.wikipedia.org/wiki/Uncertainty_principle">Heisenberg uncertainty principle</a>.</p>
<p>Unfortunately, I can’t sketch this out more convincingly without diving into the math<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Nonetheless, at its core, the theory is the same as in the simpler qubit system. Instead of A and B, the eigenstates now are the component <a href="https://en.wikipedia.org/wiki/Sine_wave">sine waves</a> that combine together to make a <a href="https://en.wikipedia.org/wiki/Wave_packet">wave packet</a>.</p>
<h1 id="conclusions">Conclusions</h1>
<p>There is a great irony in the argument we’ve laid out. Because quantum mechanics appears so mysterious, it has become fashionable to speculate that it may hold the key to more powerful forms of computation, intelligence, and even consciousness. There do exist computational problems for which the fastest known algorithms are quantum, but that’s only if we presuppose a discrete model of computation, corresponding to the regime of our universe in which quantum states <a href="https://en.wikipedia.org/wiki/Quantum_decoherence">decohere</a>. Our discussion suggests, in fact, that the true “purpose” of quantum mechanics may be directly antithetical to these popular interpretations: it serves not to <em>increase</em> our power, but to <em>constrain</em> it! From the perspective of our universe-building exercise, quantum mechanics offers the best of both worlds: the symmetries of a continuous universe, with the informational constraints of a discrete one.</p>
<p>Please note that, unlike the paper on which they’re based, my arguments here are not at all rigorous. Nonetheless, I hope they may provide some intuition into the mysterious nature of quantum mechanics, without demanding as much technical depth. Thanks to <a href="https://tmfs10.github.io/">Sid Jain</a> for pointing me to the paper!</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>This distinction is unimportant in classical probability theory, where the branches add up independently. However, in quantum theory, an outsider may yet have the branches interfere. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I wonder… could this help explain why space has three dimensions? A competing justification is that it takes exactly three dimensions to embed all <a href="https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)">graphs</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>In everyday life, when you don’t need too much precision, objects may appear to have definite positions and momenta. Similarly, musical composers may notate definite pitches occurring at definite times. However, if you could analyze just a microsecond from a live recording, you won’t easily decipher the pitches playing at that instant. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Would anyone like to demonstrate using animations, perhaps? <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Quantum mechanics is such a bizarre theory. In the iconic thought experiment, Schrödinger’s cat couldn’t decide whether to be dead or alive until it was looked at. It’s natural to ask ourselves: why is the Universe like this? Must it be so, or could a classical universe equally well support life as we know it?Is probability real? (Part 1)2020-11-22T00:00:00+00:002020-11-22T00:00:00+00:00http://ebtech.github.io/blog/2020/11/22/probability<p>Today, I want to address an issue with statements involving chance. To demonstrate, let’s first consider a statement that doesn’t involve chance:</p>
<p>“<em>A cubic die tossed onto a flat surface will come to rest on one of its six sides.</em>”</p>
<p>This claim can be empirically tested, with various dice and surfaces. If any one of our experiments results in the die spinning endlessly on a corner, we will have disproven the claim. We may have to refine the claim’s conditions; for instance, by requiring the presence of gravity. Nonetheless, it’s fairly clear what it means for the statement to be true or false. Now let’s try to make a claim involving probability:</p>
<p>“<em>If a pair of standard dice are thrown, the probability of their face-up sides summing to nine will be one in nine (about 0.11 or 11%).</em>”</p>
<p>What does it mean for this statement to be true? Unlike the first statement, this one doesn’t specify which result we’ll actually see. How can we possibly hope to test it, or to make use of its information?</p>
<h1 id="the-mathematicians-multiverse">The mathematician’s multiverse</h1>
<p>Within the realm of abstract mathematics, we’re free to model probability in a way that fits our intuitions. Imagine a multiverse containing an infinity of possible worlds, whose total <em>measure</em> is 100%. Define the probability of an <em>event</em>, such as that of rolling a nine, to be the measure assigned to the subset of worlds in which the event actually occurs.</p>
<p>In the abstract formalism, we’re allowed to assign the measure however we like, subject to Kolmogorov’s axioms: the measure must be non-negative, countably additive, and total to 100%. By respecting the symmetry of an idealized die,<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> we might argue that only one such assignment makes sense; from it, we can calculate the probability of any event involving dice rolls.</p>
<p>There are two shortcomings to this approach. Firstly, we won’t always deal with nicely symmetrical objects for which direct a priori arguments are possible. Thus, we still need a means of testing probabilistic claims using real-life observations. Secondly, such arguments can never be airtight: after all, how can we hope to infer the measure on a hypothetical multiverse, when we only ever experience <em>one</em> world? Indeed, a realist might question if it makes any sense to discuss the chances of an event happening: either it happens or it doesn’t!</p>
<h1 id="the-economists-wager">The economist’s wager</h1>
<p>You might be more trusting of someone who puts their money where their mouth is. To back up a definite claim, not involving chance, I can simply agree to pay a penalty if it turns out I’m wrong.</p>
<p>This idea can be extended to probabilistic claims in the following manner: consider a lottery that pays a $90 jackpot if the next roll of a pair of dice yields a nine. If the maximum that I’m willing to pay to play is $10, this indicates that I believe a not-nine roll is eight times more likely than a nine roll. This approach is appealing because, after all, the <em>raison d’être</em> of probability theory is to explain the decision-making of individuals facing uncertainty.</p>
<p>If another gambler’s view conflicts with mine, you may aggregate our beliefs by creating a market on which we buy and sell predictions. Consider a contract that pays $100 (plus interest) when a specified event occurs. Its price on the market can be interpreted as the percentage probability of that event. Thus, to say an event is twice as “likely” as another, simply means its market price is double.</p>
<p>Unlike your typical gambler, a frictionless market offers transparent near-identical buy and sell prices. As a result, any violations of Kolmogorov’s axioms become money-making arbitrage opportunities. Arbitrage activity acts as an enforcer of the axioms, creating what economists call the <em>risk-neutral probability measure</em>.</p>
<p>In real markets, however, this probability measure exhibits several inconsistenties. Firstly, it depends on which currency is used: as an extreme example, we wouldn’t buy a dollar-denominated wager that only pays out if the dollar collapses, no matter how likely we imagine the collapse to be. Secondly, this measure is sensitive to (non-diversifiable) risk: if a widely-believed prophecy held that rolling a nine would induce a catastrophic famine, the market would value this outcome a lot more, because everyone wants to buy insurance against such a catastrophe. Thirdly, markets can be misinformed: indeed, one motivation for participating in a market is to try to beat it! And finally, liquid markets are hard to set up.</p>
<p>For these reasons, we abandon this approach. We’ll seek to define probability in terms of actual outcomes instead of human bets. Nonetheless, human bets are what inspired the creation of probability theory: it’s hard to think of any other practical application! Therefore, we should remember to revisit the matter once we’ve found an appealing probability concept. Ultimately, we must be able to explain <em>how</em> individuals and markets behave with respect to our concept, and answer <em>why</em> they should care about it at all.</p>
<p>These questions are incredibly subtle: the theory of evolution by natural selection tells us that individuals are wired to use strategies that enabled their ancestors’ survival; however, the nature of probabilistic beliefs is that a wide range of outcomes are plausible. Indeed, while a coin will always land heads or tails, it’s considered unwise to bet your life savings on either heads or tails. Intuitively speaking, the rationale is that you’re almost certain to lose <em>eventually</em>, if you keep playing this way. This idea of repeated trials inspires our next interpretation, which happens to be the most popular among scientists.</p>
<h1 id="the-statisticians-frequentism">The statistician’s frequentism</h1>
<p>According to the frequentist school of thought, a probabilistic statement is not to be taken literally. Although it refers to a single event, the statement should be taken as shorthand for a claim involving a very large collection of similar events. Imagine rolling the dice over and over. The probabilistic claim that we started with is converted into the following:</p>
<p>“<em>If a pair of standard dice are thrown repeatedly, then in the limit as the number of throws goes to infinity, the proportion of nines converges to one in nine (about 0.11 or 11%).</em>”</p>
<p>The short-run probability is replaced by a long-run proportion. Given an infinite sequence of rolls, this statement unambiguously reveals itself to be true or false. In light of the frequentist interpretation, we can even make more sense of our earlier interpretations. While we only experience one world, repeating an experiment under similar conditions is like observing the experiment in a parallel universe: whether we count trials or worlds, the math is virtually identical. In the limit of infinitely many bets, we can make some unambiguous conclusions about the quality of a gambler’s strategy, too: this is how casinos ensure that the house always wins!</p>
<p>Testing our claim is a simple matter: we roll the dice, over and over, and over and over… infinity times. Oops. Of course, there is no such thing as an experiment with infinity trials. Our arms will get tired, the dice will wear out, the Sun will explode, and all the free energy in the universe will be consumed. At best, we can do a very large number of trials. Let’s say we roll dice 9,000 times; one in nine of these would be 1,000. Perhaps we won’t roll exactly 1,000 nines, so let’s interpret our claim with a suitable margin of error, called a <em>confidence interval</em>:</p>
<p>“<em>If a pair of standard dice are thrown 9,000 times, then the face-up sides will sum to nine for between 920 and 1,080 of the throws.</em>”</p>
<p>The probability of obtaining between 920 and 1,080 nines can be calculated to be 99.3%.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> Thus, we’ve turned a probabilitic statement into a much more certain but still probabilistic statement. If we observe 1,100 nines, we should be able to dismiss the probabilistic claim as false. And yet, if every household on Earth were to independently perform this 9,000-throw experiment, we should expect that a great many of their results would fall outside the confidence interval. They would disagree on the truth of our statement!</p>
<p>There’s no getting around it: despite its intuitive appeal, the frequentist definition of probability is circular, reducing probability claims to probability claims.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> To end the cycle, the frequentist chooses a threshold (say, 99%) beyond which to treat events as objective truths. This grants the claim an empirically observable meaning. And yet, the frequentist must take care not to consider too many such events, for otherwise the probability of <em>at least one</em> of the events failing may ALSO exceed the threshold of certainty: a logical contradiction.</p>
<p>Things only work out nicely in the limit of infinite sample size. Statisticians mainly deal with experiments which can be repeated so many times that, for most practical purposes, their conclusions can be treated as definite. Non-philosophers are usually happy to ignore a sub-1% chance of error; and if that’s not good enough, make it 0.0001%! Confidence can be increased by gathering more data, i.e., increasing the sample size.</p>
<p>This approach turns out to be very powerful. By designing more complex hypotheses in which probabilities vary as a function of context variables, even some phenomena that aren’t easily repeatable can be statistically analyzed. For example, weather forecasts are based on well-tested models that use measurements of variables such as temperature, pressure, humidity, and wind.</p>
<p>On the other hand, statistical models of sports games, democratic elections, or company stocks tend to be less testable: the interactions are very complex and there are too few outcomes from which to extrapolate. Similarly, when you try to predict which colleges will admit you or which of your friends will start a business, you don’t make your case using repeatable tests. Clearly, the frequentist interpretation cannot apply. One may argue that no conclusions in these cases would hold up to a scientific standard; nonetheless, if we seek a theory of decision-making under uncertainty, there’s no denying the importance and prevalence of such cases.</p>
<h1 id="must-we-throw-in-the-towel">Must we throw in the towel?</h1>
<p>Fundamentally, questions involving probability are ill-posed: it’s impossible to deduce anything about worlds unconnected to our experience, let alone assign measures to them. Even if we imagine the multiverse of probability theory to be real (whatever that means!), then each world would be just as real as any other, potentially with its own inhabitants asking the same questions. Unlike physical quantities such as volume, a probability measure has no observable consequences.</p>
<p>So, why does probability hold such a salient intuitive significance to us? What is it about so-called “unlikely” events that make us feel surprised? Recall that evolution by natural selection is a numbers game: you can afford some mistakes, but it’s important to be right more often than wrong, and to plan accordingly. On the occasions where we’re wrong, we sense a bit of shock as we adjust our plans and our expectations for the future. This lines up with the frequentist interpretation: “likely” events are those which occur more frequently, across a large sample of similar scenarios.</p>
<p>What about claims that can’t be interpreted as frequencies among a large sample of scenarios? What does it mean to say it’s likely (or not) that extraterrestrial life exists? What would it mean to live in an unlikely universe, say, one in which dice rolls land on double-sixes much more often than they should, consistently for as long as dice have existed? Technically, nothing that we know about our own world’s physics forbids this: we could just get “lucky” to an enormous degree. But what would the inhabitants of this universe think? I imagine they wouldn’t consider their world unlikely at all: they would just add a new law to their description of physics: all dice, as if by divine intervention, are deemed to exhibit this strange behaviour. While it’s a rather awkward addition that complicates physics, it definitely yields a better theory in terms of predictive accuracy. We might also expect their religions to grant a spiritual significance to dice.</p>
<p>The scientific method of inquiry can withstand a single odd pattern involving dice. However, if the universe were arbitrarily messy, complicated, irregular, its randomness devoid of any patterns; then, it would feel as if all events were decided by divine intervention. In such a world, there would be no role for science. The ancients believed in a mystical world where everything, from weather to animal morphology, was subject to the daily whims of the Gods. Nevertheless, even the ancients believed in some basic patterns, which they could use to cook, hunt, navigate, build shelter, and otherwise live their lives. Without patterns to exploit, there would be no reason for intelligent life to emerge. That there’s a simple order underlying our universe, is one of its most remarkable characteristics.</p>
<h1 id="the-philosophers-razor">The philosopher’s razor</h1>
<p>Let’s see if we can develop this vague connection between probability and simplicity into a concrete idea. Our biggest clue is a question raised by the frequentist school of thought: where does the hypothesis come from in the first-place?</p>
<p>If we don’t know which hypothesis to test, we might begin by considering every hypothesis that comes to mind: potentially thousands, millions, or infinitely many. In the dice experiment, we might consider some strange hypotheses, such as ones where the chances of rolling doubles depend on which celebrity’s credit card number was spelled out by the last few rolls. Suppose, for each hypothesis, we design a test that will fail (according to the idealized dice model) with 99% probability. Then, on average, one out of every hundred hypotheses will pass the test.</p>
<p>What makes the “true” hypothesis stand out from the many fakes? Well, the fakes would be unlikely to stand up to additional testing. The more data we collect, the more contrived the hypotheses that we’ll have to resort to; nonetheless, it will always remain possible to fit an incorrect hypothesis to all of the data seen so far. This is such a serious problem in science that it has a name: <em>overfitting</em>.</p>
<p>Somehow, we must narrow down our hypotheses. Maybe you think that’s easy: only a few hypotheses describe plausible dice behavior; the rest are patently absurd! But now you’re relying on intuitive judgment, not a rigorous methodology. If you try to sort out the source of your intuitive knowledge about how dice ought to behave, you’ll find it to be rooted in your prior knowledge about how the world works, which itself must be tested against various hypotheses. If you have good prior knowledge and take it on faith, then this works fine in practice. However, it seriously begs the question: how do we manage to obtain <em>any</em> knowledge about the world in which we live?</p>
<p>There is a solution to this dilemma. For most of history, nobody knew how to state it in precise mathematical terms; hence, it was confined to the realm of philosophy. The solution to overfitting is the law of parsimony, <strong>Occam’s razor</strong>:</p>
<p>“<em>Given competing theories that can explain our observations, always prefer the simplest.</em>”</p>
<p>If we take “simple” to mean that it must be described by a short English sentence, then there are only a limited number of such sentences. Among the simple hypotheses, we can eliminate all the bad ones within a finite number of trials.</p>
<p>Thus, we see that frequentist methodology requires the use of prior knowledge, and ultimately a principle such as Occam’s razor. Could we take this idea to its extreme, in the hopes of mitigating the other issues with frequentism? The advent of computer science gave us a theoretical framework in which to do so. With it, comes a general theory of inference with Occam’s razor at its front and center.</p>
<h1 id="the-computer-scientists-electric-razor">The computer scientist’s electric razor</h1>
<p>English sentences can be a bit ambiguous so, for precision’s sake, we’ll express our hypotheses as computer programs, and encode our observations as computer data. Nonetheless, if you’re not a programmer, rest assured that it’s mostly kosher to replace the programs in our discussion with instructions written in your mother tongue.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> In the computer science framework, we restate Occam’s razor as follows:</p>
<p>“<em>Given competing programs whose outputs match our observations, always prefer the shortest.</em>”</p>
<p>Let’s see just how powerful this definition is. Right away, we see there’s no longer a need to carefully select hypotheses or tests, as both are built-in: all computer programs are hypotheses, with preference given to shorter ones. Testing a program amounts to verifying that its output exactly matches our observation record.</p>
<p>At first blush, the requirement to use deterministic programs appears to be a limitation. Luckily, randomized programs can be made deterministic by supplying the results of “coin flips” as an extra string of ones and zeros. This string makes the program longer, so Occam’s razor will prefer explanations that don’t depend on too much randomness, if one exists.</p>
<p>Given a string \(x\), perhaps representing a very long sequence of observations, the length of the shortest program that outputs \(x\) is called its <strong>Kolmogorov complexity</strong> \(K(x)\). By prioritizing programs by their length, we ensure that each incumbent theory has only finitely many competing hypotheses. While it’s possible to prioritize programs by other criteria, it turns out that the Kolmogorov complexity dominates every partial computable alternative, to within a constant margin. See the footnotes for an excellent technical reference,<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, as well as a more accessible overview.<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> We won’t go deeply into the theory here, but merely highlight how it helps us interpret and infer probabilistic statements.</p>
<p>Classical information theory studies the optimal rate at which random objects can be compressed. If the objects are drawn from a known probability distribution \(\mathcal P\), then <em>on average</em>, the number of bits needed to compress one object is equal to a quantity known as the <strong>entropy</strong> \(H(\mathcal P)\). No compression algorithm can beat this on average. In general, it’s questionable whether we should care about the average, as opposed to the median, mode, maximum, or some other statistic.</p>
<p>But now, suppose we independently sample a very large number \(N\) of objects from \(\mathcal P\). The Law of Large Numbers makes the total proportional to the average: almost certainly, the total encoding length of the entire sequence of objects will be very close to \(N\cdot H(\mathcal P)\). The sequence’s Kolmogorov complexity will not be much greater: a suitable program consists of a description of \(\mathcal P\), along with the classical encoding (optimized for \(\mathcal P\)) of each object in the sequence, for a total complexity of approximately \(K(\mathcal P) + N\cdot H(\mathcal P)\).</p>
<p>If there’s no program that’s significantly shorter and generates the same sequence, then the above program is a good explanation for the sequence: it is approximately the simplest. That is, we can now look at an individual string, with no prior concept of it being random, and conclude whether it looks like a sequence of random draws from \(\mathcal P\). For example, consider the following sequence:</p>
<p>\(3,1,4,1,5,9,2,6,5,3,5,8,9,7,9,3,2,3,8,4,6,2,6,4,3,3,8,3,2,7,9,5,0,2,8,8,4,1,9,7\)</p>
<p>This will <em>not</em> pass as a random sequence of rolls from a pair of fair dice whose sides are numbered 0 to 5. Why? To interpret it as such, we must include an encoding for each element. While this is a bit shorter than writing the sequence literally, it’s much longer than the phrase: “first forty digits of pi”.</p>
<p>If it were possible to algorithmically compute the shortest program that outputs any given \(x\), we would have a ridiculously powerful inference engine. For instance, we could feed it a bunch of data from physics experiments, and out comes a fully-formed scientific theory, better than any we know today. Naturally, such a thing is too good to be true. For fundamental reasons related to the theory of proofs and computation, the Kolmogorov complexity is not computable.<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup> Thus, we can only try our best to discover the most parsimonious explanations we can, not knowing how close we are to the best possible. Occam’s razor can distinguish good and bad hypotheses and, unlike pure frequentism, resists abuse by a barrage of bad hypotheses. Nonetheless, it takes some ingenuity to discover a good hypothesis.</p>
<p>In a sense, that’s exactly what the pursuit of scientific theories is about. The ancients would be astounded to learn that so much of the world (perhaps all of it!), with its vast richness, can be described by a few simple laws. Over the centuries, we’ve found more and more patterns, making our theories ever more parsimonious. The scientific method only works because the rules of the universe happen to be simple, while the set of observations it offers is vast. Kolmogorov complexity captures this defining characteristic of our reality.</p>
<h1 id="next-time">Next time…</h1>
<p>In the next blog post, we’ll see that in pretty much any world where inference is possible, the Kolmogorov complexity approach applies. Thus, we’ll come to understand the limits of knowledge. Analogous issues will crop up in the Kolmogorov complexity methodology, via an ambiguity in the definition that we have overlooked until now: namely, the choice of computer programming language. Nonetheless, we’ll find that it’s possible to mostly mitigate the issues we found with frequentism. Finally, we’ll see what this means for probabilistic claims in practice.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>The argument feels more convincing if we use chaos to ensure ergodicity, but this still requires an initial source of randomness… <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Hypothesis testing requires that, before running the experiment, we specify not only a hypothesis but also a test. You might wonder why we choose an interval \([920,1080]\) that’s centered at the expected result. Typically, it’s because we expect that if our hypothesis were wrong, the most realistic alternative would be that our dice produce nines at a much different frequency than we expected. If we’re certain that the frequency isn’t lower, but it may be higher, then a one-sided interval \([0,1074]\) of equal confidence is more likely to rule that out. Since the chances of obtaining exactly 1,000 nines are about 1%, another perfectly valid 99%-confidence test would be to check that the number of nines is <em>anything except</em> 1,000! Would this test ever make practical sense? It would if we’re suspiscious that the dice, rather than being random, are rigged to produce exactly 1,000 nines. Since every fixed number of rolls can be tested against in this way, <em>every</em> conceivable result will fail <em>some</em> test. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>The competing Bayesian school of thought, while more intuitively appealing when sample sizes are small, is circular in an even more obvious way. The circularity can be resolved using something called a <em>universal prior</em>; this approach turns out to be equivalent to using the Kolmogorov complexity, which we define later. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>That said, we can start to appreciate why a basic education in computational thinking is fundamental to understanding nature, just as math and science courses are. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Ming Li and Paul Vitanyi. 2019. An Introduction to Kolmogorov Complexity and Its Applications (4th. ed.). Springer Publishing Company, Incorporated. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Rathmanner, S., & Hutter, M. (2011). A philosophical treatise of universal induction. Entropy, 13(6), 1076-1136. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>The uncomputability of \(K(x)\) has to do with the fact that it’s hard to distinguish a program that takes absurdly long to run, from one that never finishes. Since an absurdly slow program is fairly useless, we might decide to include resource bounds in our complexity measure (see Chapter 7 from Li & Viyanyi). For instance, while a quantum field theory might in principle describe all of life’s processes, a supercomputer would struggle to simulate even one atom this way. To make useful inferences, we also need the theories of chemistry, biology, and the social sciences. Unlike \(K(x)\), resource-bounded measures can be computed by trying every possible program until the resource bounds are exhausted. This isn’t practical, however, as the number of programs to try would be astronomical. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Today, I want to address an issue with statements involving chance. To demonstrate, let’s first consider a statement that doesn’t involve chance:How fickle should a forecaster be?2020-11-01T00:00:00+00:002020-11-01T00:00:00+00:00http://ebtech.github.io/blog/2020/11/01/fickle-forecaster<p>Suppose one day, I say to you that there’s a 20% chance of rain next weekend. Or that there’s a 20% chance of Donald Trump being elected for another term as President of the United States. Upon reading about some scandal the next day, I revise my prediction to 85%. By nighttime the matter settles, so I announce my updated prediction of 5%. In the end, it doesn’t rain (or Joe Biden defeats the incumbent).</p>
<p>After following a bunch of my forecasts, you might criticize me for being too fickle, my predictions swinging wildly as if I can’t make up my mind. Or you might think the opposite: that I play it too safe, only shifting my opinion to one side when the evidence becomes overwhelming.</p>
<p>What’s the right amount of variation? Is it a thing that can be measured?</p>
<p>Yes it is! In this blog post, I propose the sum of squared changes as such a measure.</p>
<p>First, a formal derivation. If you’re not into the technicalities of probability theory, the next section can be skipped. We’ll understand its implications afterward.</p>
<h1 id="a-formal-interlude">A formal interlude</h1>
<p>Let \(X_t\) be a bounded martingale adapted to the filtration \(\mathcal F_t\),<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> and let \(0 = T_0 \le T_1 \le \ldots \le T_N\) be stopping times. Then,</p>
\[\begin{aligned}
\mathbb E\Big( \sum_{n=1}^N (X_{T_n} - X_{T_{n-1}})^2 \mid \mathcal F_0\Big)
&= \mathbb E\Big(X_{T_N}^2 - X_{0}^2 + 2 \sum_{n=1}^N X_{T_{n-1}} (X_{T_{n-1}} - X_{T_n}) \mid \mathcal F_0\Big)
\\&= \mathbb E(X_{T_N}^2 \mid \mathcal F_0) - X_{0}^2 + 2 \sum_{n=1}^N \mathbb E\Big(X_{T_{n-1}} (X_{T_{n-1}} - X_{T_n}) \mid \mathcal F_0\Big)
\\&= \mathbb E(X_{T_N}^2 \mid \mathcal F_0) - X_{0}^2 + 2 \sum_{n=1}^N \mathbb E\Big(X_{T_{n-1}} \underbrace{\mathbb E(X_{T_{n-1}} - X_{T_n} \mid \mathcal F_{T_{n-1}})}_{=0} \mid \mathcal F_0\Big)
\\&= \mathbb E(X_{T_N}^2 \mid \mathcal F_0) - X_{0}^2
\end{aligned}\]
<p>Let \(X_t\) be the \(\mathcal F_t\)-conditioned probability of some event whose outcome is determined by time \(T_N\). This is a martingale for which \(X_{T_N}\) is \(1\) with probability \(X_0\), and \(0\) otherwise. Hence, \(\mathbb E(X_{T_N}^2 \mid \mathcal F_0) = X_0\).</p>
<h1 id="and-were-back">And we’re back!</h1>
<p>So what have we shown? In essence, the martingale \(X_t\) represents the prediction at time \(t\). The role of the <em>stopping times</em> is to grant the forecaster some flexibility: they don’t have to announce predictions at <em>every</em> time \(t\), nor even at pre-specified intervals. All that matters is that the decision on whether or not to announce a prediction at time \(t\) be made by time \(t\); deciding to withhold a prediction based on information from the future would be cheating! As long as this rule is satisfied, we can just add up the squared changes in the forecaster’s predictions to get a measure of their fickleness. If the forecaster reports genuine conditional probabilities, the first of which was \(X_0\), then on average this sum should equal \(X_0 - X_0^2\).</p>
<p>Using the example we started with, \(X_0 = 0.2\), so \(X_0 - X_0^2 = 0.16\). Our prediction series was \((0.2,\, 0.85,\, 0.05,\, 0)\), so the sum of squared changes is</p>
\[(0.2 - 0.85)^2 + (0.85 - 0.05)^2 + (0.05 - 0)^2 = 1.065\]
<p>This is a lot more than the expected \(0.16\). Of course, since this is a random quantity, a lot more data would be needed to support the fickleness critique. The issue of statistical significance is beyond the scope of this blog post, but hopefully this discussion serves as a helpful start. As an exercise, you can try measuring this quantity for the prediction series of a well-known forecaster, such as FiveThirtyEight!</p>
<h1 id="trading-on-volatility">Trading on volatility</h1>
<p>Aside from professional forecasters such as FiveThirtyEight or your local weather channel, another source of prediction series are public markets, such as PredictIt. There, you can purchase any number of contracts which pay out $1 each, if the corresponding event comes to pass. If “Donald Trump wins the presidency” contracts sell for 40 cents apiece, one may say that the market of buyers and sellers have collectively estimated his chances at 40%.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> Assuming no interest, transaction fees, or other frictions in the market, “Donald Trump doesn’t win the presidency” contracts must then sell for 60 cents apiece, ensuring that the two opposing contracts precisely cancel out.</p>
<p>Now, let’s say you don’t know which event will come to pass, but you’re confident that the prediction market is more volatile than a true martingale, meaning that its sum of squared deviations will exceed the expected \(X_0 - X_0^2\). Is it possible to bet on this outcome, making a profit if it comes to pass?</p>
<p>Again, the answer is yes. The general strategy is to ensure that, at every point in time, we hold contracts on the side that’s deemed <em>less</em> likely to win. The number of contracts we hold should be in proportion to the difference between the two sides. For example, if we buy one contract for 40 cents, then we should sell it when its price rises to 50 cents. If it then shoots up to 80 cents, we should buy three of the <em>opposite</em> contract, for 20 cents apiece, and so on. Can you prove that this method works?</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><em>Filtration</em> is a very technical term, but you can think of \(\mathcal F_t\) as all of the information known at time \(t\). <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>More precisely, 40% would be what economists call the <em>risk-neutral measure</em> of such an event. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Suppose one day, I say to you that there’s a 20% chance of rain next weekend. Or that there’s a 20% chance of Donald Trump being elected for another term as President of the United States. Upon reading about some scandal the next day, I revise my prediction to 85%. By nighttime the matter settles, so I announce my updated prediction of 5%. In the end, it doesn’t rain (or Joe Biden defeats the incumbent).How to cut your cake and eat it too2017-09-20T00:00:00+00:002017-09-20T00:00:00+00:00http://ebtech.github.io/blog/2017/09/20/cake-cutting<p>A classic challenge goes as follows: in the absence of precise measurements, how would you divide a cake between two diners so that both agree they got a fair share? The cake, naturally, is a metaphor: it can stand for any bundle of goods that should be divided between parties.</p>
<p>In the case of two diners, the classic solution is <a href="https://en.wikipedia.org/wiki/Divide_and_choose">I cut, you choose</a>. If I let you inspect and choose between the two pieces, I can’t easily cheat you: if the pieces are uneven, you’ll go for the best one. Knowing this, I have an incentive to make the most equal cut that I can manage. We’ll each end up with about half the cake: I’ll lose no more than my own measurement error when cutting; likewise, you’ll lose no more than your own measurement error when inspecting. Seems fair.</p>
<p>Now, in real life, who would you rather be: the cutter or the chooser? I think the chooser is better off. The cutter has to do all the work of being precise, and in the end will probably end up with a slightly smaller piece. The chooser, on the other hand, can always choose the best piece. It that’s too difficult, they can just choose randomly, and still be guaranteed half the cake on average.</p>
<p>We’ve been deliberately vague about the nature of the problem: rather than stating it in mathematical terms, we used our intuition regarding cakes and knives. This got me thinking: is there any setting where it’s better to be the cutter instead of the chooser?</p>
<h1 id="first-attempt">First attempt</h1>
<p>The first idea that comes to mind is deception: maybe I can cut the cake so that the smaller piece <em>looks</em> bigger. However, if the cut looks fishy, a distrustful chooser has the easy defense of choosing randomly, say by tossing a coin. If the cake is homogeneous, this always works.</p>
<h1 id="second-attempt">Second attempt</h1>
<p>In our quest to make the cutter win, let’s look at a fancier cake: this one has 10 berries and 10 cherries on top. I love berries but don’t care much about cherries. Meanwhile, you adore cherries but couldn’t care less about berries. Knowing your tastes, how about I cut the cake so that one piece has all 10 berries but only 4 cherries? Then you’ll prefer the other piece, with 6 cherries. While that’s an ok deal for you, I’m the clear winner since I keep everything that’s valuable to me.</p>
<p>If you’re rational according to the classical definition used by economists and game theorists, it appears you have no recourse. However, an “irrational” chooser can punish me by taking the 10-berries piece. This spiteful strategy carries a cost for you, but it’s less than the damage done to me. You still get 4 cherries (instead of 6), while I’m effectively empty-handed.</p>
<p>Spite seems like an expensive strategy. However, the implied threat of punishment might deter me cutting greedily, so that you’ll rarely pay the price in practice. If I believe that you would take revenge if wronged, then I’ll take care to cut in a fair manner.</p>
<h1 id="wait-really">Wait, really?</h1>
<p>Classical game theory doesn’t predict spite; one may argue this to be a flaw in the theory itself. Sure, we can invent workarounds: for instance, by employing fancy social contracts or threats that are costly to lie about. However, I feel that the framework of Updateless Decision Theory offers a cleaner abstraction to explain such behavior. Perhaps I’ll write about UDT, or about the power of good abstractions more generally, in a future post.</p>
<p>For readers who are unacquainted with applied math research, the takeaway is that one’s assumptions, whether stated or implied, can affect the outcome. We haven’t formally defined what fairness means, nor how the diners behave. Unless we clarify our assumptions, it’s impossible to narrow down to one right answer: the chooser may be classically rational, or they may be spiteful. If you happen to have the math background, I leave it as an exercise to formalize, and then generalize, the arguments in this post!</p>
<h1 id="last-attempt">Last attempt</h1>
<p>Ok, so I still can’t win against a spiteful chooser. It appears I’m doomed whenever our utilities are additive. Additive utility, here, is just a fancy way of saying we enjoy a piece of cake exactly as much as the sum of its parts, not caring about special combinations.</p>
<p>But now suppose the “cake” is actually a basket of different items, some combinations of which form recipes that I enjoy, while other combinations form recipes that you enjoy. To construct a very simple example, let’s say the cake contains not only berries and cherries, but also chocolate chips and vanilla chips. I enjoy berries only when they’re mixed with chocolate, and I enjoy cherries only when mixed with vanilla. For you it’s the reverse: berries require vanilla and cherries require chocolate.</p>
<p>Now as the cutter, I can cut the cake so that one piece contains mostly berries and chocolate, while the other contains mostly cherries and vanilla. No matter which you choose, you’ll be unhappy and I’ll be happy. If I were more considerate, I could make us both happy by splitting everything equally. However, if the cake had just one (indivisible) topping of each type, then only the cutter can win. Sweet victory!</p>
<p><img src="http://oboi-dlja-stola.ru/file/1725/760x0/16:9/%D0%9A%D0%B5%D0%BA%D1%81-%D1%81-%D1%8F%D0%B3%D0%BE%D0%B4%D0%B0%D0%BC%D0%B8.jpg" alt="Cake" /></p>A classic challenge goes as follows: in the absence of precise measurements, how would you divide a cake between two diners so that both agree they got a fair share? The cake, naturally, is a metaphor: it can stand for any bundle of goods that should be divided between parties.What’s a Color Made of?2017-08-31T00:00:00+00:002017-08-31T00:00:00+00:00http://ebtech.github.io/blog/2017/08/31/colors<p>As highly visual creatures, we understand the world largely in terms of what we see. The colors in a scene, their presence and interactions, have the power to delight, disgust, and inform.</p>
<p>As citizens of a scientific age, we may wonder about the nature of color. What could a red apple possibly have in common with a red fire? Is color an objective physical property of these objects, or a subjective experience we made up? If it’s objective, what sort of physical laws are involved? If it’s subjective, how did we make it up? Is there a way to characterize all possible colors?</p>
<h1 id="part-i-primary-colors">Part I: Primary Colors</h1>
<p>You may try to answer these questions by summoning memories of lessons from elementary school. Perhaps you had a painting session, in which you saw that a wide range of colors can be obtained from combinations of three <strong>primary</strong> paints: red, blue, and yellow. For example, blue and yellow paints can be mixed to obtain green, a <strong>secondary</strong> color. Adding red to the green mixture yields black. To summarize in a diagram:</p>
<p><img src="https://cdn-geo.dayre.me/tfss-37fded77-5bd8-4206-8c63-7de868f915b5-IkPqXqDbMFm9B528Jiph.jpg" alt="RYB Diagram" /></p>
<p>Maybe a few years pass, and you find yourself in an IT class. You learn that TVs and other electronic displays produce realistic images by emitting combinations of three primary lights: red, blue and… green! This theory seems to agree with your biology class, where the retina is said to contain three types of cone cells, each sensitive to red, blue or green light. Apparently, light comes in three varieties, and our brains process combinations of them as follows:</p>
<p><img src="https://www.laetusinpraesens.org/musings/images/wholth_files/rgb-venn.gif" alt="RGB Diagram" /></p>
<p>But hold on, how did red and green go from making black to making yellow? This diagram seems to completely defy our common sense experience with paint and crayons!</p>
<p>Before we can finish our thoughts, a physics teacher enters the room. As if to delight in our confusion, as physics teachers often do, she explains that our eyes are in fact sensitive to electromagnetic (EM) radiation whose wavelength and frequency are within a thin continuous band. We have names for each of the bands: the longest EM waves are radio waves, whereas short ones are gamma and X-rays, but they are all fundamentally the same physical phenomenon. The human eye is only sensitive to a thin band of EM waves, which we call <strong>visible light</strong>:</p>
<p><img src="https://cdn.kastatic.org/ka-perseus-images/7370593cc71daa2ccaca091cec088fa5fec6ca16.png" alt="EM Spectrum" /></p>
<p>In this theory, no mixing is needed because light comes in <em>infinite</em> varieties, providing the entire spectrum of color that you see in a rainbow! Indeed, rainbows occur when mixed (white) light is refracted in a wavelength-dependent manner. Now if you happen to be especially attentive, you might notice that magenta is missing from the rainbow… hmm!</p>
<p>It looks like our teachers have made a fun sport out of contradicting each other at the expense of impressionable minds! Can we reconcile their theories?</p>
<p>What might surprise you is that the plain act of uncovering these apparent contradictions is a major breakthrough in our journey! By digging to the source of disagreements, we’ve arrived very close to the truth. Taking this little-known technique to heart will turbocharge your learning.</p>
<p>In this case, having learned an objective (physical) theory that includes an infinite continuum of varieties of light, it’s a good time to revise our subjective (perceptual) theory of color. Let’s revisit those cone cells…</p>
<h1 id="part-ii-cone-cells">Part II: Cone Cells</h1>
<p>Alright, so the physics part of our investigation was pretty easy. You may be tempted (or scared!) to read more about how EM works, but there’s really no need. The last diagram gave us the right idea: light is just EM radiation whose wavelength happens to fall within some range. The complications come not from the physics of EM, but from the biology.</p>
<p>Before turning to our favorite search engines for a full answer, let’s imagine how color perception might work, hypothetically. This exercise will deepen our understanding of the underlying logic behind the mess in which we find ourselves.</p>
<p>We know that we’re capable of seeing an infinite variety of light (up to a reasonable precision) because we see it in the rainbow: this is the continuum of wavelengths known as the <strong>visible spectrum</strong>. We perceive this spectrum as varying continuously from deep red, slightly orangish red, orange-red, reddish orange, orange, golden orange, golden yellow, and so on through all the colors of the rainbow, down to indigo and violet. These are the <strong>spectral colors</strong>, produced by <strong>monochromatic light</strong>, meaning light composed purely of a single wavelength.</p>
<p>By mixing the spectral colors, it’s possible to get <strong>non-spectral colors</strong> that don’t appear on the rainbow, such as white and magenta. With an infinite variety of pure spectral colors available, the possible combinations we can make with them boggle the mind! However, our eyes cannot distinguish between all possible combinations. For example, recall that our displays mix red and green lights to appear yellow: this is a mixture whose appearance closely resembles that of the rainbow’s pure spectral yellow.</p>
<p>Our limitation stems from the fact that our eyes typically only have three types of light-sensitive <strong>cone cells</strong>. Our visual cortex never learns the full <strong>spectral power distribution</strong>, which is a fancy term for the full “recipe” we’d write down if we were to describe how much of each wavelength is included in an EM mixture. Instead, the visual cortex effectively works with three numbers per location in its visual field, corresponding to the amount of activation on each cone type.</p>
<p>How do three types of cones accomodate a continuous spectrum? Since we’re able to perceive the entire visible EM band, it must be the case that the three types together cover the band. Indeed, here’s a graph plotting the sensitivities of each cone type against each wavelength:</p>
<p><img src="https://www.physics.utoronto.ca/~jharlow/conesens1.gif" alt="Cone Sensitivity" /></p>
<p>There are three curves, corresponding to three types of cone cells. The horizontal axis corresponds to wavelength, and the height of a curve at each position tells us how sensitive the cone is to that wavelength. We see here that one type of cone (let’s call it R) peaks in the red-to-yellow range, one (G) in the green and one (B) in the blue-to-indigo. When multiple wavelengths shine on the same cone cell, its activation is the sum of its sensitivities to the individual wavelengths: this is an empirical result known as <a href="https://en.wikipedia.org/wiki/Grassmann's_law_(optics)"><strong>Grassman’s law</strong></a>.</p>
<p>Now, we see that monochromatic yellow light activates both R and G cones, with almost no activation of B. This is very similar to the activations induced by a mixture of red and green light; as a result, these two cases yield very similar subjective appearances. Different spectral power distributions that appear the same, because they induce the same RGB activations, are called <strong>metamers</strong>.</p>
<p>Some RGB activations cannot be induced by monochromatic light. For example, we can see from the graph that it takes at least two distinct wavelengths to simultaneously activate both R and B without G; the brain’s response to this combination is what we call magenta, and explains why we don’t see it in the rainbow. White is another example: no single wavelength activates all three cone types equally.</p>
<p>Finally, we find certain RGB activations that cannot be induced by <em>any</em> sort of light, mixtures included. For example, it’s impossible to get a pure G activation without also getting a bit of R or B mixed in there. If you could magically activate only your G cones, you would see a color that doesn’t exist in the real world: a subjective experience with no physical counterpart! This is a <a href="https://en.wikipedia.org/wiki/Impossible_color"><strong>forbidden color</strong></a>.</p>
<p>Without going into every possible detail of physics and biology, we now have a coherent understanding of color, connecting the physical phenomenon with how it’s perceived. Summarized in one sentence, color is derived from the cone cell activations induced by mixtures of EM waves. Armed with these insights, can we draw a new diagram that characterizes all possible colors, including the non-spectral ones, in a natural way?</p>
<h1 id="part-iii-color-space">Part III: Color Space</h1>
<p>We already came close to our goal when we saw the rainbow embedded within the EM spectrum. The spectral colors can be arranged along one axis, because they are specified by a single parameter: the wavelength. If we wanted to, we could add a second axis for brightness: shining a more intense light at the same wavelength makes it brighter.</p>
<p>To describe an arbitrary EM mixture, you’ll need infinitely many axes: one to specify how much of each wavelength is used. However, our subjective experience is determined only by the level of R, G and B activation. If we let the three spatial coordinate axes correspond to these activations, we get the Cube of All Colors:</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/a/af/RGB_color_solid_cube.png" alt="RGB Cube" /></p>
<p>Or do we? Alas, this cube is a fake! It contains those forbidden colors we mentioned; in particular, its corners correspond to pure activations, so we should at least carve out some of the perimeter. If we don’t care about brightness, we can remove even more. In this image of a cube, every point that’s hidden from view (e.g., the back and interior) is just a dimmer version of a color on the surface.</p>
<p>In summary, after removing the forbidden envelope of the cube, we can take a slice of the remaining solid to represent all of its colors at a fixed level of brightness. The result is a two-dimensional color space, allowing us to present all possible colors<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> on a flat display or sheet of paper.</p>
<p>Okay, while it’s nice that we can theoretically present everything in 2D, this is starting to sound complicated. Let’s cheat for a moment and look up the answer. We can resume trying to understand it afterward. A quick web search reveals the CIE 1931 color space:</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/6/60/Cie_Chart_with_sRGB_gamut_by_spigget.png" alt="Color Gamut" /></p>
<p>Remember that all we’ve done is project the color cube onto a flat plane. Why the triangular and horseshoe shapes? Let’s explore whether they make sense.</p>
<p>Given any spectral distribution, our handy cone sensitivity graph lets us calculate the corresponding cone activations, which become coordinates in the color cube. The CIE standard then projects these to coordinates in the plane corresponding to a slice from the cube.</p>
<p>Which positions in the CIE space do the spectral colors occupy? Try to convince yourself that they must form an unbroken continuous curve. For a hint, look up the formal mathematical definition of a continuous function. I won’t give a formal proof, but here’s an intuitive sketch:</p>
<p>Let’s start from the longest visible wavelength (red) and find its place in CIE space. Then, let’s gradually lower the wavelength towards the violet end. Our cone sensitivity graph shows a smooth dependence on wavelength. Therefore, a sufficiently tiny change in wavelength causes a tiny change in cone activations, which remains a tiny change after projecting to CIE. As we visit all the wavelengths from red to violet, moving a little bit at a time, we’ll trace out a continuous curve in CIE space.</p>
<p>What about the non-spectral colors? By Grassman’s law, the mixture of two colors always lies on the line segment between them. Now imagine repeating this process, mixing additional colors one at a time. Given any set of colors to start with, we would find that their possible combinations take up the whole interior region bounded by the starting colors; in formal terms, their <a href="https://en.wikipedia.org/wiki/Convex_hull"><strong>convex hull</strong></a>. A hands-on way to see the convex hull is to place thumbtacks all over the starting set. If we surround them all tightly with a thin rubber band, the region it would contain is the convex hull.</p>
<p>The CIE “horseshoe” is simply the convex hull of our spectral curve: it connects the red and violet ends with a straight line segment, and fills in the interior. The straight segment closes our curve with purples and magentas, and explains why artists talk about a <strong>color wheel</strong> instead of a two-ended spectrum. Meanwhile, the interior contains colors of lower saturation. Altogether, the horseshoe contains every color that your eyes can see. The forbidden colors, which cannot be produced by any light, lie outside the horseshoe. Beautiful!</p>
<p>Finally, in case you were wondering, the black interior curve corresponds to the spectrum emitted by a <strong>blackbody</strong>, an idealized object that absorbs all incoming light. At room temperature, such an object would appear black; approaching 1,000 degrees, it glows red-hot; past 10,000 degrees, it radiates blue-white. Ironically, the hottest stars in the night sky glow in “cool” colors that we typically associate with icy climates. The point labeled D65 corresponds to the surface of the Sun. It should come as no surprise that our visual sensitivity has evolved to peak at about the same wavelength as sunlight!</p>
<h1 id="part-iv-color-technology">Part IV: Color Technology</h1>
<p>Scientists propose and test theories to explain how natural things work. Engineers apply theories to design new, useful things. In a way, these activities are inverses of each other. Together, they enable technology: the art of manipulating nature to one’s will.</p>
<p>Having learned the scientific basis for color perception, let’s put on our engineer hats and see what we can create! Video is a technology that convincingly (to human eyes) reproduces the visual stimulus of a detailed scene. This requires the ability to produce a spectacular variety of colors at a moment’s notice. While it would be an impressive feat to faithfully replicate the spectral power distribution from a real scene, our goal in practice is less ambitious: to fool the human eye. To produce convincing scenery, we need only produce the right RGB activations.</p>
<p>TVs and computer screens can be built from tiny subpixels that each emit a dot of one specific color. We can turn each light off, or have it on at any brightness. Since the CIE space projects away the brightness dimension, each type of subpixels is anchored to one point in the space.</p>
<p>White pixels alone suffice to give us black-and-white television: just dim the pixels to produce darker shades of grey. Now suppose we had two types of subpixels: red and green. Adjusting their brightness in different proportions yields a range of hues such as orange and yellow. In color space, we’re capable of representing not just one point, but the entire line segment connecting the two source colors.</p>
<p>In general, the range of colors we can represent will be the convex hull of the source colors. For example, the sRGB standard uses red, green and blue subpixels to produce colors in a triangular region, as shown inside the horseshoe above. If your monitor is sRGB, it’s incapable of displaying colors outside the triangle. The region outside the triangle but inside the horseshoe corresponds to colors that you might find in the wild, but that your screen cannot display. Due to this limitation, the color space image on this webpage is actually drawn in the closest colors that your screen <em>can</em> display.</p>
<p>It’s often said that red, green and blue are the <strong>additive</strong> primary colors. Additive refers to the act of mixing colors by emitting, hence adding, different source lights. Colors made from one source light are <em>primary</em>; colors made from an equal mix of exactly two source lights are <em>secondary</em>. Red, green and blue occupy distant corners of the color space, suggesting their suitability as primary colors.</p>
<p>To display richer colors than sRGB is to grow its region in color space. This can be done in two ways. The first is to make our subpixels purer, i.e., place them closer to the horseshoe boundary, corresponding to monochromatic light. This is the approach taken by <a href="https://en.wikipedia.org/wiki/Rec._2020">UHDTVs</a>. The alternative is to add more primary colors.</p>
<p>Using these two approaches, we might hope to capture all the colors. However, since the convex hull of any finite set is a polygon, it cannot possibly fill the rounded edges of the horseshoe. We conclude that this type of technology must necessarily miss some colors.</p>
<p>If we cheat a bit, we can view the forbidden colors corresponding to pure R, G and B cone activation as the ultimate primary colors. Together, they form a big triangle that fully contains the horseshoe; we know this because every color corresponds to some RGB cone activations. Unfortunately, these “primary colors” are not real colors realizable by any physical process!</p>
<p>In retrospect, we can view our IT class’s RGB Venn diagram as a simplification of additive color mixing. What about the RYB diagram from painting class? That’s <strong>subtractive</strong> color mixing. Unlike TVs, paint and printer inks don’t produce any light of their own. Instead, you shine an ambient light, such as from a bulb or the Sun, on a sheet of paper. The paper is a good reflector at all visible wavelengths, so it appears white. The ink absorbs, hence subtracts, selected wavelengths, preventing their reflection.</p>
<p>At this point, maybe we’re feeling tired and don’t want to dig deeply into the physics of subtractive mixing. That’s ok! We can reason through the simplified, though technically incorrect, model that discretizes light into three varieties: long-wave (red), medium-wave (green) and short-wave (blue). In this simplified model, it follows that the primary subtractive colors correspond to secondary additive colors, and secondary subtractive colors correspond to primary additive colors. This is best explained with pictures:</p>
<p><img src="http://pa1.narvii.com/6166/9d9aaeca1c4d89ab7eebf7ae23dc907cec188d0d_hq.gif" alt="CMY Diagram" /></p>
<p><img src="http://inventorartist.com/wp-content/uploads/2013/05/ReflectionDetail.png" alt="Reflection and Absorption Detail" /></p>
<p>Indeed, most color inkjet printers have three primary inks: cyan, magenta, and yellow (CMYK). K stands for black, and is preferable to mixing CMY inks because it’s cheaper and typically yields a deeper, sharper black. Cyan and magenta are sometimes called “process blue” and “process red”, respectively. Thus, you can think of CMY as a modern improvement over the traditional RYB primaries.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Phew! We covered a lot, but all of our conclusions were logical consequences of the fact that we have three types of light-sensitive cells following Grassman’s law. We might try to memorize every detail of additive and subtractive color mixing, the horseshoes, the triangles and so on. We might see them as random, interesting, messy facts about the world.</p>
<p>Instead, we took a different approach. We explored how a variety of ideas from physics, biology, geometry, and painting could come together and tell a story. We thought critically, asking questions and testing potential answers, some of which stood in apparent contradiction to one another. Through a theoretical exercise, we discovered the hidden beauty of colors.</p>
<p>I hope you enjoyed this post! There’s a lot more to human color perception than presented here. If you want to learn more, a good place to start is the <a href="http://hyperphysics.phy-astr.gsu.edu/hbase/vision/colpuz.html">HyperPhysics Color Puzzles</a>.</p>
<p>If you’d like to read someone else’s explanation of the same topic, check out:</p>
<p><a href="http://inventorartist.com/primary-colors">Hey Kids! Red’s Not a Primary!</a></p>
<p><a href="https://medium.com/hipster-color-science/a-beginners-guide-to-colorimetry-401f1830b65a">A Beginner’s Guide to Colorimetry</a></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Note, however, that grey is effectively a darker white, and brown is a darker orange. Our perception makes relative comparisons that take context, light sources and shadows into account. Optical illusions take advantage of this. By necessity, this exposition contains simplifications. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>As highly visual creatures, we understand the world largely in terms of what we see. The colors in a scene, their presence and interactions, have the power to delight, disgust, and inform.Announcing the Algorithm Cookbook2017-06-19T00:00:00+00:002017-06-19T00:00:00+00:00http://ebtech.github.io/blog/2017/06/19/announcing-the-algorithm-cookbook<p>I built a reference cookbook of algorithms and data structures for contest problem solvers. It’s written in the Rust programming language, as I believe it’s ideally suited to the task. For more info, please check out the repository at <a href="https://github.com/EbTech/rust-algorithms">github.com/EbTech/rust-algorithms</a>.</p>
<p><a href="http://codeforces.com"><img src="http://stat.codeforces.ru/images/codeforces-logo-with-upper-beta.png" alt="Codeforces" /></a> <a href="https://www.rust-lang.org"><img src="https://www.rust-lang.org/logos/rust-logo-128x128.png" alt="Rust" /></a></p>
<p>While I believe Rust is not too difficult in absolute terms, it does present a significant departure from most developers’ mental models. If you’d like to practice the language on small toy problems, contests can serve as a useful playground. Unfortunately, it’s hard to get started when there are still so few examples of Rust contest code out in the wild, and no established guidelines to tie Rust’s compile-time discipline with the constraints of contest programming. This project seeks to remedy the situation.</p>
<p>Note that it’s not meant to act as a full-fledged general-purpose library. Contest problems often require understanding an algorithm so well that you can dig in and make subtle modifications to make it suitable for a brand new problem. Therefore, in this setting, one ought not to rely on blackbox implementations. Instead, I try to distill each algorithm into its simplest possible form, so that you can quickly read over the code, understand it, and augment it to suit your needs.</p>
<p><a href="https://www.rust-lang.org">Rust</a> and <a href="http://codeforces.com">Codeforces</a> represent two of my favorite technology communities, so I’m interested to see how they can support each other. If you’re a Rust programmer interested in honing your technical interview skills or solving cool algorithmic puzzles, you might enjoy Codeforces. If you’re a Codeforces member and find that debugging is a huge time drain, Rust’s emphasis on safety may give you a competitive edge. In either case, I hope this reference will help ease the learning curve. I’m still learning too; suggestions are welcome!</p>I built a reference cookbook of algorithms and data structures for contest problem solvers. It’s written in the Rust programming language, as I believe it’s ideally suited to the task. For more info, please check out the repository at github.com/EbTech/rust-algorithms.Out of Hiding2017-06-18T00:00:00+00:002017-06-18T00:00:00+00:00http://ebtech.github.io/blog/2017/06/18/out-of-hiding<p>Woo finally, site launch! <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> Pleased to meet y’all fancy people :)</p>
<p>After more than a year spent learning in the Silicon Valley, masquerading as some non-crazy dude, I’m ready to take my lessons out into the world. This site, opening today, is my portal. Welcome! Your encounter with this page was preordained <a href="https://www.youtube.com/watch?v=Drp6PqvKOwM">by the choice of Stein’s Gate. HAHAHAHAHAHA!</a></p>
<p><img src="http://pa1.narvii.com/5938/4eb25b6a39e9d5fe2e111b7fc3c4dcbed48177b7_hq.gif" alt="Hououin Kyouma, the insane mad scientist" /></p>
<p>Er, why was that necessary? What are you, an agent of CERN? Fine, here’s my story:</p>
<p>5 years ago, I entered grad school with no clue of what I was trying to do or why. 3.5 years, several great friendships and life lessons later, I dropped out, still having hardly a clue. So I did the modern equivalent of picking up some books, and dedicated myself to studies like never before. Free of any structured requirements, inspired in equal parts by sheer curiosity and by the very real challenges faced at work, I put my heart in the work. Turns out the web is chock-full of quality free resources! It didn’t cost a dime to learn about:</p>
<ul>
<li><a href="http://www.inference.org.uk/mackay/itila/book.html">Information theory</a></li>
<li><a href="http://stp.clarku.edu/notes">Statistical mechanics</a></li>
<li><a href="https://ocw.mit.edu/courses/sloan-school-of-management/15-401-finance-theory-i-fall-2008/video-lectures-and-slides">Finance and economics</a></li>
<li><a href="https://ocw.mit.edu/courses/aeronautics-and-astronautics/16-323-principles-of-optimal-control-spring-2008/lecture-notes/lec5.pdf">Calculus of variations</a></li>
<li><a href="http://www.math.nyu.edu/faculty/goodman/teaching/StochCalc2013/resources.html">Stochastic calculus</a></li>
<li><a href="https://doc.rust-lang.org/book/second-edition">Rust programming</a></li>
<li><a href="https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-851-advanced-data-structures-spring-2012/lecture-videos">Advanced data structures</a></li>
<li><a href="http://www.deeplearningbook.org">Deep learning</a></li>
<li><a href="http://incompleteideas.net/sutton/book/the-book-2nd.html">Reinforcement learning</a></li>
<li><a href="http://rll.berkeley.edu/deeprlcourse">Deep reinforcement learning</a></li>
</ul>
<p>But of course, it’s not enough to just read. As a former theoretician, it’s been an interesting struggle to learn my way around a living, whirring computer at work and beyond. Even the toolchain needed to generate this site was a level up. And, though I skip over it here, I’ve also been busy picking up a bunch of non-technical skills.</p>
<p>As a result… uh I still don’t know what the goal is :/ OK there is something. Indeed, the <a href="https://medium.com/waymo/apply-to-be-part-of-waymos-early-rider-program-5fd996c7a86f">first stages of disruption</a> are well under way! I’m all-in on this movement: human driving on public roads is both wasteful and dangerous. For a minority, it’s not even an option. So I’d like to play a small role here. Besides, I kinda left Canada while my driver’s license was still in the mail…</p>
<p>What’s next? Well, I’d like to share what I’ve learned in collaboration with you awesome people. You’ll have to look to official channels for the Waymo stuff, but my roommate <a href="http://shriphani.com">Shriphani</a> and I have some fun things planned of our own <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, so please stay tuned!</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Helped in part by <a href="http://sanfrancisco.cbslocal.com/2017/06/19/bay-area-heat-wave-records-june-18-san-francisco-oakland-san-jose">today’s 40℃ heat wave</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Collectively, we are DeepFear :D You can already check out <a href="http://blog.shriphani.com/2016/08/03/a-frame-that-listens">our first collaboration</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Woo finally, site launch! 1 Pleased to meet y’all fancy people :) Helped in part by today’s 40℃ heat wave. ↩