2Belief as Probability
2.2 Probability theory
2.3 Some rules of probability
2.4 Conditional probability
2.5 Some more rules of probability
2.1Subjective and objective probability
Beliefs vary in strength. I believe that the 37 bus goes to Waverley station, and that there are busses from Waverley to the airport, but the second belief is stronger than the first. With some idealization, we can imagine that for any propositions \(A\) and \(B\), a rational agent is either more confident in \(A\) than in \(B\), more confident in \(B\) than in \(A\), or equally confident in both. The agent’s belief state then effectively sorts the propositions from ‘least confident’ to ‘most confident’, and we can represent a proposition’s place in the ordering by a number between 0 (‘least confident’) and 1 (‘most confident’). This number is the agent’s degree of belief, or credence, in the proposition. My credence that the 37 bus goes to Waverley, for example, might be around 0.8, while my credence that there are busses from Waverley to the airport is around 0.95.
The core assumption that unifies “Bayesian” approaches to epistemology, statistics, decision theory, and other areas is that rational degrees of belief obey the formal rules of the probability calculus. For that reason, degrees of belief are also called subjective probabilities or even just probabilities. But this terminology can give rise to confusion because the word ‘probability’ has other, and more prominent, uses.
Textbooks in science and statistics often define probability as relative frequency. On this usage, the probability of an outcome is the proportion of that type of outcome in some base class of events. For example, on the textbook definition, to say that the probability of getting a six when throwing a regular die is \(\nicefrac {1}{6}\) is to say that the proportion of sixes in a large class of throws is (or converges to) \(\nicefrac {1}{6}\).
Another use of ‘probability’ is related to determinism. Consider a particular die in mid-roll. Could one, in principle, figure out how the die will land, given full information about its present physical state, the surrounding air, the surface on which it rolls, and so on? If yes, there’s a sense in which the outcome is not a matter of probability. Quantum physics seems to suggest that the answer is no: that the laws of nature together with the present state of the world only fix a certain probability for future events. This kind of probability is sometimes called ‘chance’.
Chance and relative frequency are examples of objective probability. Unlike degrees of belief, they are not relative to an agent; they don’t vary between you and me. You and I may have different opinions about chances or relative frequencies; but that would be an ordinary disagreement. At least one of us would be wrong. By contrast, if you are more confident that the die will land six than me, then your subjective probability for that outcome really is greater than mine.
In this course, when I talk about credence or subjective probability, I do not mean belief about objective probability. I simply mean degree of belief. Our Bayesian model here diverges from frequentist or objectivist models that define expected utility in terms of objective probability. The MEU Principle is then restricted to cases in which the agent knows the relevant objective probabilities. (I mentioned this under “Sources and Further Reading” in the previous chapter.) On the Bayesian conception of probability, the MEU Principle does not presuppose knowledge of probabilities; it only presupposes that the agent has a definite degree of belief in the relevant states.
2.2Probability theory
What all forms of probability, objective and subjective, have in common is a certain abstract structure, a structure that is studied by the mathematical discipline of probability theory.
Mathematically, a probability measure is a certain kind of function – in the mathematical sense: a mapping – from some objects to real numbers. The objects that are mapped to numbers are usually called ‘events’, but in philosophy we call them ‘propositions’.
The main assumption probability theory makes about propositions (the objects that are assigned probabilities) is the following.
Booleanism
Whenever some proposition \(A\) has a probability (possibly 0), then so does its negation \(\neg A\) (‘not \(A\)’); whenever two propositions \(A\) and \(B\) both have a probability, then so does their conjunction \(A \land B\) (‘\(A\) and \(B\)’) and their disjunction \(A \lor B\) (‘\(A\) or \(B\)’).
(Here and henceforth, I use upper-case letters \(A,B,C\), etc. as schematic variables for arbitrary propositions.)
In our application, Booleanism implies that if an agent has a definite degree of belief in some propositions, then she also has a definite degree of belief in any proposition that can be construed from these in terms of negation, conjunction, and disjunction.
What sorts of things are propositions? Probability theory doesn’t say. In line with our discussion in the previous chapter, we will informally understand propositions as possible states of the world. This is not a formal definition, since I haven’t defined ‘possible state of the world’. But I’ll make a few remarks that should help clarify what I have in mind.
Different sentences can represent the very same state of the world. Consider the current temperature in Edinburgh. I don’t know what it is. One possibility (one possible state of the world) is that it is 10°C. There is also a possibility that it is 50°F. How are these related? Since 10°C = 50°F, the second possibility is not an alternative to the first. It is the very same possibility, expressed with a different unit. The sentences ‘It is 10°C in Edinburgh’ and ‘It is 50°F in Edinburgh’ are different ways of picking out the same (possible) state of the world.
Like sentences, possible states of the world can be negated, conjoined, and disjoined. The negation of the possibility that it is 10°C is the possibility that it is not 10°C. If we negate that negated state, we get back the original state: the possibility that it is not not 10°C is nothing but the possibility that it is 10°C. In general, if we understand propositions as possible states of the world, then logically equivalent propositions are not just equivalent, but identical.
Possible states of the world can be more or less specific. That the temperature is 10°C is more specific than that it is between 7°C and 12°C. It is often useful to think of unspecific states as sets of more specific states. We can think of the possibility that it is between 7°C and 12°C as a collection of several possibilities, perhaps as the set { 7°C, 8°C, 9°C, 10°C, 11°C, 12°C }. The unspecific possibility obtains just in case one of the more specific possibilities obtains. A maximally specific state is called a possible world (in philosophy, and an ‘outcome’ in many other disciplines). We will sometimes model propositions as sets of possible worlds.
I should warn that the word ‘proposition’ has many uses in philosophy. In this course, all we mean by ‘proposition’ is ‘object of credence’. And ‘credence’, recall, is a semi-technical term for a certain quantity in the model we are building. It is pointless to argue over the nature of propositions before we have spelled out the model in more detail. Also, by ‘possible world’ I just mean ‘maximally specific proposition’. The identification of propositions with sets of possible worlds is not supposed to be an informative reduction.
Exercise 2.1 \(\dagger \)
First a reminder of some terminology from set theory. The intersection of two sets \(A\) and \(B\) is the set of objects that are in both \(A\) and \(B\). The union of \(A\) and \(B\) is the set of objects that are in one or both of \(A\) and \(B\). The complement of a set \(A\) is the set of objects that are not in \(A\). \(A\) is a subset of \(B\) if all objects in \(A\) are also in \(B\). \(A\) is a superset of \(B\) if all objects in \(B\) are also in \(A\).
Now, assume propositions are modelled as sets of possible worlds. Then the negation \(\neg A\) of a proposition \(A\) is the complement of \(A\).
- (a)
- What is the conjunction \(A \land B\) of two propositions, in set theory terms?
- (b)
- What is the disjunction \(A \lor B\)?
- (c)
- What, in set theory terms, does it mean that a proposition \(A\) entails a proposition \(B\)?
Exercise 2.2 \(\dagger \)\(\dagger \)
Not all objects of probability are possible states of the world. Booleanism entails that at least one object of probability is impossible. Can you explain why?
Let’s continue with the mathematics of probability. A probability measure, I said, is a function from propositions to numbers that satisfies certain conditions. These conditions are called probability axioms or Kolmogorov axioms, because their canonical statement was given by the Russian mathematician Andrej Kolmogorov in 1933.
The Kolmogorov Axioms
- (i)
- For any proposition \(A\), \(0 \leq \Cr (A) \leq 1\).
- (ii)
- If \(A\) is logically necessary, then \(\Cr (A) = 1\).
- (iii)
- If \(A\) and \(B\) are logically incompatible, then \(\Cr (A \lor B) = \Cr (A) + \Cr (B)\).
I used have ‘\(\Cr \)’ here for the probability measure, as we will be mostly interested in subjective probability or credence. ‘\(\Cr (A)\)’ is read as ‘the (subjective) probability of \(A\)’ or ‘the credence in \(A\)’. Strictly speaking, we should add subscripts, ‘\(\Cr _{i,t}(A)\)’, to make clear that subjective probability is relative to an agent \(i\) and a time \(t\); but we’re mostly dealing with statements that hold for all agents at all times, so we can omit the subscripts.
Understood as a condition on rational credence, axiom (i) says that credences range from 0 to 1: you can’t have a degree of belief greater than 1 or less than 0. Axiom (ii) says that if a proposition is logically necessary – like it is raining or it is not raining – then it must have subjective probability 1. Axiom (iii) says that the subjective probability of a disjunction equals the sum of the probability of the two disjuncts, provided these are logically incompatible, meaning they can’t be true at the same time. For example, since it can’t be both 8°C and 12°C, your credence in the disjunctive proposition \(8\celsius \lor 12\celsius \) must be \(\Cr (8\celsius ) + \Cr (12\celsius )\).
We’ll ask about the justification for these assumptions later. First, let’s derive a few consequences.
2.3Some rules of probability
Suppose your credence in the hypothesis that it is 8°C is 0.3. Then what should be your credence in the hypothesis that it is not 8°C? Answer: 0.7. In general, the probability of \(\neg A\) is always 1 minus the probability of \(A\):
The Negation Rule
\(\Cr (\neg A) = 1 - \Cr (A)\).
This follows from the Kolmogorov axioms. Here is how. Let \(A\) be any proposition. Then \(A \lor \neg A\) is logically necessary. By axiom (ii), \[ \Cr (A \lor \neg A) = 1. \] Since \(A\) and \(\neg A\) are logically incompatible, axiom (iii) tells us that \[ \Cr (A \lor \neg A) = \Cr (A) + \Cr (\neg A). \] Combining these two equations yields \[ 1 = \Cr (A) + \Cr (\neg A). \] From that, simple algebraic rearrangement give us the Negation Rule.
Next, we can prove that logically equivalent propositions always have the same probability.
The Equivalence Rule
If \(A\) and \(B\) are logically equivalent, then \(\Cr (A) = \Cr (B)\).
Proof: Assume \(A\) and \(B\) are logically equivalent. Then \(A \lor \neg B\) is logically necessary; so by axiom (ii), \[ \Cr (A \lor \neg B) = 1. \] Moreover, \(A\) and \(\neg B\) are logically incompatible, so by axiom (iii), \[ \Cr (A \lor \neg B) = \Cr (A) + \Cr (\neg B). \] By the Negation Rule, \[ \Cr (\neg B) = 1-\Cr (B). \] Putting all this together, we have \[ 1 = \Cr (A) + 1 - \Cr (B). \] Subtracting \(1-\Cr (B)\) from both sides yields \(\Cr (A) = \Cr (B)\).
Above I mentioned that if we understand propositions as possible states of the world, then logically equivalent propositions are identical: \(\neg \neg A\), for example, is the same proposition as \(A\). The Equivalence Rule shows that even if we had used a different conception of propositions that allows distinguishing between logically equivalent propositions, these differences would never matter to an agent’s subjective probabilities. If an agent’s credences satisfy the Kolmogorov axioms, then she must give the same credence to logically equivalent propositions.
Exercise 2.3 \(\dagger \)\(\dagger \)\(\dagger \)
Prove from Kolmogorov’s axioms that \(\Cr (A) = \Cr (A\land B) + \Cr (A \land \neg B)\). (Like the proofs above, each step of your proof should either be an instance of an axiom, or an application of the rules we have already established, or it should follow from earlier steps by simple logic and algebra.)
Next, let’s show that axiom (iii) generalizes to three disjuncts:
Additivity for three propositions
If \(A\), \(B\), and \(C\) are all incompatible with one another, then \(\Cr (A \lor B \lor C) = \Cr (A) + \Cr (B) + \Cr (C)\).
Proof sketch: \(A \lor B \lor C\) is equivalent (or identical) to \((A \lor B) \lor C\). If \(A\), \(B\), and \(C\) are mutually incompatible, then \(A \lor B\) is incompatible with \(C\). So by axiom (iii), \(\Cr ((A \lor B) \lor C) = \Cr (A \lor B) + \Cr (C)\). Again by axiom (iii), \(\Cr (A \lor B) = \Cr (A) + \Cr (B)\). Putting these together, we have \(\Cr ((A \lor B) \lor C) = \Cr (A) + \Cr (B) + \Cr (C)\).
The argument generalizes to any finite number of propositions \(A,B,C,D,\ldots \): the probability of a disjunction of \(n\) mutually incompatible propositions is the sum of the probability of the \(n\) propositions. This has the following consequence, which is worth remembering:
Probabilities from worlds
If the number of possible worlds is finite, then the probability of any proposition is the sum of the probability of the worlds at which the proposition is true.
Suppose two dice are tossed. There are 36 possible outcomes (“possible worlds”), which we might tabulate as follows.
| (1,1) | (1,2) | (1,3) | (1,4) | (1,5) | (1,6) |
| (2,1) | (2,2) | (2,3) | (2,4) | (2,5) | (2,6) |
| (3,1) | (3,2) | (3,3) | (3,4) | (3,5) | (3,6) |
| (4,1) | (4,2) | (4,3) | (4,4) | (4,5) | (4,6) |
| (5,1) | (5,2) | (5,3) | (5,4) | (5,5) | (5,6) |
| (6,1) | (6,2) | (6,3) | (6,4) | (6,5) | (6,6) |
Suppose you give equal credence \(\nicefrac {1}{36}\) to each of these outcomes or worlds. What credence should you then give to the hypothesis that both dice land on a number less than 4? Looking at the table, we can see that there are nine possible worlds at which the hypothesis is true: the top left quarter of the table. The hypothesis is equivalent to the disjunction of these possible worlds. Both dice land on a number less than 4 iff the outcome is (1,1) or (1,2) or (1,3) or (2,1) or (2,2) or (2,3) or (3,1) or (3,2) or (3,3). All of these outcomes are incompatible with one another. (The dice can’t land (1,1) and (1,2) at the same time.) The rules of probability therefore tell us that the probability of our target hypothesis is the sum of the probability of the individual worlds. Since each world has probability \(\nicefrac {1}{36}\), and there are nine relevant worlds, your credence that both dice land on a number less then 4 should be \(9 \cdot \nicefrac {1}{36} = \nicefrac {1}{4}\).
Exercise 2.4 \(\dagger \)
What credence should you give to the following propositions, in the scenario with the two dice?
- (a)
- At least one die lands 6.
- (b)
- Exactly one die lands 6.
- (c)
- The sum of the numbers that will come up is equal to 5.
Some thorny technical problems arise if there are infinitely many worlds. It would be nice if we could say that the probability of a proposition is always the sum of the probability of the worlds that make up the proposition. If there are too many worlds, however, this turns out to be incompatible with the mathematical structure of the real numbers. The most one can safely assume is that the principle holds if the number of worlds is countable, meaning that there are no more worlds than there are natural numbers 1,2,3,…. To secure this, axiom (iii) – which is known as the axiom of Finite Additivity – has to be replaced by an axiom of Countable Additivity. In this course, we will try to stay away from troubles arising from infinities, so for our purposes the weaker axiom (iii) will be enough.
Exercise 2.5 \(\dagger \)\(\dagger \)\(\dagger \)
Prove from Kolmogorov’s axioms that if \(A\) entails \(B\), then \(\Cr (A)\) cannot be greater than \(\Cr (B)\). (You may use the rules we have already derived.)
2.4Conditional probability
To continue, we need two more concepts. The first is the idea of conditional probability or, more specifically, conditional credence. Intuitively, an agent’s conditional credence reflects her degree of belief in a given proposition on the supposition that some other proposition is true. For example, I am fairly confident that it won’t snow tomorrow, and that the temperature will be above 4°C. Yet, on the supposition that it will snow, I am not at all confident that the temperature will be above 4°C. My unconditional credence in temperatures above 4°C is high, but my conditional credence in the same proposition, on the supposition that it will snow, is low.
Conditional credence relates two propositions: the proposition that is supposed, and the proposition that gets evaluated on the basis of that supposition.
To complicate things, there are actually two kinds of supposition, and two kinds of conditional credence. The two kinds of supposition correspond to a grammatical distinction between “indicative” and “subjunctive” conditionals. Compare the following statements.
- (1) If Shakespeare didn’t write Hamlet, then someone else did.
- (2) If Shakespeare hadn’t written Hamlet, then someone else would have.
The first of these (an indicative conditional) is highly plausible: we know that someone wrote Hamlet; if it wasn’t Shakespeare then it must have been someone else. The second statement (a subjunctive conditional), is plausibly false: if Shakespeare hadn’t written Hamlet, it is unlikely that somebody else would have stepped in to write the very same play.
The two conditionals (1) and (2) relate the same two propositions – the same possible states of the world. To evaluate either statement, we suppose that our world is a world in which Shakespeare didn’t write Hamlet. The difference lies in what we hold fixed when we make that supposition. To evaluate (1), we hold fixed our knowledge that Hamlet (the play) exists. Not so in (2). To evaluate (2), we bracket everything we know that we take to be a causal consequence of Shakespeare’s writing of Hamlet.
We will return to the second, subjunctive kind of supposition in section 9. For now, let’s focus on the first, indicative kind of supposition. I will write \(\Cr (A/B)\) for the (indicative) conditional credence in \(A\) on the supposition that \(B\). Again, intuitively this is the agent’s credence that \(A\) is true if (or given that or supposing that) \(B\) is true.
The slash ’/’ (some authors use ‘|’) is not a connective. \(\Cr (A/B)\) is not the agent’s credence in a special proposition designated by ‘\(A/B\)’. (Never write things like ‘\(\Cr (A/B/C)\)’ or ‘\(\Cr (A \land (B/C))\)’. These have no meaning.)
How are conditional credences related to unconditional credences? The answer is surprisingly simple, and captured by the following formula.
The Ratio Formula
\(\Cr (A/B) = \dfrac {\Cr (A \land B)}{\Cr (B)}\), provided \(\Cr (B)>0\).
That is, your credence in some proposition \(A\) on the (indicative) supposition \(B\) equals your unconditional credence in \(A \land B\) divided by your unconditional credence in \(B\).
To see why this makes sense, it may help to imagine your credence as distributing a certain quantity of “plausibility mass” over the space of possible worlds. When we ask about your credence in \(A\) conditional on \(B\), we set aside worlds where \(B\) is false. What we want to know is how much of the mass given to \(B\) worlds falls on \(A\) worlds. In other words, we want to know what fraction of the mass given to \(B\) worlds is given to \(A \land B\) worlds.
People disagree on the status of the Ratio Formula. Some treat it as a definition. On that approach, you can ignore everything I said about what it means to suppose a proposition and simply read ‘\(\Cr (B/A)\)’ as shorthand for ‘\(\Cr (A \land B)/\Cr (A)\)’. Others regard conditional beliefs as distinct and genuine mental states and see the Ratio Formula as a fourth axiom of probability. We don’t have to adjudicate between these views. What matters is that the Ratio Formula is true, and on this point both sides agree.
The second concept I want to introduce is that of probabilistic independence. We say that propositions \(A\) and \(B\) are (probabilistically) independent (for the relevant agent at the relevant time) iff \(\Cr (A/B) = \Cr (A)\). Intuitively, if \(A\) and \(B\) are independent, then it makes no difference to your credence in \(A\) whether or not you suppose \(B\), so your unconditional credence in \(A\) is equal to your credence in \(A\) conditional on \(B\).
Unlike causal independence, probabilistic independence is a feature of beliefs. Two propositions can be independent for one agent and not for another. That said, there are interesting connections between probabilistic (in)dependence and causal (in)dependence. For example, if an agent knows that two events are causally independent, then the events are often also independent in the agent’s degrees of belief. You may want to ponder why that is the case.
Exercise 2.6 \(\dagger \)
Assume \(\Cr (\emph {Snow}) = 0.3\), \(\Cr (\emph {Wind}) = 0.6\), and \(\Cr (\emph {Snow} \land \emph {Wind}) = 0.2\). What is \(\Cr (\emph {Snow}/\emph {Wind})\)? What is \(\Cr (\emph {Wind}/\emph {Snow})\)?
Exercise 2.7 \(\dagger \)\(\dagger \)
Using the Ratio Formula and the Equivalence Rule, show that if \(A\) is (probabilistically) independent of \(B\), then \(B\) is independent of \(A\) (assuming that \(\Cr (A)\) and \(\Cr (B)\) are greater than 0).
Exercise 2.8 \(\dagger \)\(\dagger \)
A fair die will be tossed, and you give equal credence to all six outcomes. Let \(\emph {Ex}\) be the proposition that the die lands 1 or 6. Let \(\emph {Odd}\) be the proposition that the die lands an odd number (1, 3, or 5), and let \(\emph {Low}\) be the proposition that the die lands 1, 2 or 3. Which of the following are true, in your belief state?
- (a)
- \(\emph {Ex}\) is independent of \(\emph {Odd}\).
- (b)
- \(\emph {Odd}\) is independent of \(\emph {Ex}\).
- (c)
- \(\emph {Ex}\) is independent of \(\emph {Low}\).
- (d)
- \(\emph {Odd}\) is independent of \(\emph {Low}\).
- (e)
- \(\emph {Ex}\) is independent of \(\emph {Odd} \land \emph {Low}\).
2.5Some more rules of probability
If you’ve studied propositional logic, you’ll know how to compute the truth-value of arbitrarily complex sentences from the truth-value of their atomic parts. For example, you can figure out that if \(A\) and \(B\) are true and \(C\) is false, then \(A \land \neg (B \lor \neg (C \lor A))\) is false. Now suppose instead of the truth-value of \(A\), \(B\), and \(C\), I give you their probability. Could you compute the probability of \(A \land \neg (B \lor \neg (C \lor A))\)? The answer is no. In general, while the probability of \(\neg A\) is determined by the probability of \(A\) (as we know from the Negation Rule), neither the probability of \(A\lor B\) nor the probability of \(A \land B\) is determined by the individual probabilities of \(A\) and \(B\).
Let’s have a look at conjunctive propositions, \(A \land B\). By rearranging the Ratio Formula, we get the following:
The Conjunction Rule
\(\Cr (A \land B) = \Cr (A) \cdot \Cr (B / A)\).
So the probability of a conjunction is the probability of the first conjunct times the probability of the second conditional on the first. If you only know the unconditional probabilities of the conjuncts, you can’t figure out the probability of the conjunction.
But there’s a special case. If \(A\) and \(B\) are independent, then \(\Cr (B/A) = \Cr (B)\). In that case, the probability of the conjunction is the product of the probability of the conjuncts:
The Conjunction Rule for independent propositions
If \(A\) and \(B\) are independent, then \(\Cr (A \land B) = \Cr (A) \cdot \Cr (B)\).
Why do we multiply (rather than, say, add) the probabilities in the Conjunction Rules? Suppose we flip two coins. What is the probability that they both land heads? You’d expect the first coin to land heads about half the time; and in half of those cases you’d expect the second to also land heads. The result is a half of a half. And half of a half is \(\nicefrac {1}{2}\) times \(\nicefrac {1}{2}\).
What about disjunctions, \(A \lor B\)? We know that if \(A\) and \(B\) are logically incompatible, then \(\Cr (A \lor B) = \Cr (A) + \Cr (B)\). What if \(A\) and \(B\) are not incompatible? In that case, we have to subtract the probability of the conjunction:
The Disjunction Rule
\(\Cr (A \lor B) = \Cr (A) + \Cr (B) - \Cr (A\land B)\).
Again, you can’t compute the probability of the disjunction just from the probability of the disjuncts.
Why do we subtract \(\Cr (A \land B)\) in the Disjunction Rule? The proposition \(A\lor B\) comprises three kinds of worlds: (1) worlds where \(A\) is true and \(B\) is false, (2) worlds where \(B\) is true and \(A\) is false, and (3) worlds where \(A\) and \(B\) are both true. These three sets are disjoint (mutually exclusive). By Additivity, the probability of the disjunction \(A \lor B\) equals the probability of \(A \land \neg B\) plus the probability of \(B \land \neg A\) plus the probability of \(A \land B\). Taken together, the worlds in (1) and (3) comprise precisely the \(A\)-worlds, and the worlds in (2) and (3) comprise the \(B\)-worlds. So if we add together \(\Cr (A)\) and \(\Cr (B)\), we have effectively double-counted the \(A \land B\) worlds. That’s why we need to subtract \(\Cr (A \land B)\).
Exercise 2.9 \(\dagger \)
Show that two propositions \(A\) and \(B\) with positive probability are independent if and only if \(\Cr (A \land B) = \Cr (A) \cdot \Cr (B)\). (Some authors use this as the definition of independence.)
Exercise 2.10 \(\dagger \)\(\dagger \)
Exercise 2.11 \(\dagger \)
In 1999, a British woman was convicted of the murder of her two sons, who she claimed died from Sudden Infant Death Syndrome (SIDS). The eminent paediatrician Sir Roy Meadow explained to the jury that 1 in 8500 infants die from SIDS and hence that the chance of SIDS affecting both sons was 1/8500 \(\cdot \) 1/8500 = 1 in 73 million. What is wrong with Sir Meadow’s reasoning?
I want to mention two more rules that play a special role in Bayesian accounts. The first goes back to a suggestion by Thomas Bayes published in 1763.
Bayes’ Theorem
\(\Cr (A/B) = \dfrac {\Cr (B/A) \cdot \Cr (A)}{\Cr (B)}\)
Proof: By the Ratio Formula, \(\Cr (A/B) = \Cr (A \land B) / \Cr (B)\). By the Conjunction Rule, \(\Cr (A \land B) = \Cr (B/A) \cdot \Cr (A)\). So we can substitute \(\Cr (A \land B)\) in the Ratio Formula by \(\Cr (B/A)\cdot \Cr (A)\), which yields Bayes’ Theorem.
Bayes’ Theorem relates the conditional probability of \(A\) given \(B\) to the inverse conditional probability of \(B\) given \(A\). Why that might be useful is best illustrated by an example.
Suppose you are unsure whether the die I am about to roll is a regular die or a trick die that has a six printed on all sides. You currently give equal credence to both possibilities. How confident should you be that the die is a trick die given that it will land six on the next roll? That is, what is \(\Cr (\emph {Trick}/ \emph {Six})\)? The answer isn’t obvious. Bayes’ Theorem helps. By Bayes’ Theorem,
\[ \Cr (\emph {Trick}/\emph {Six}) = \frac {\Cr (\emph {Six}/\emph {Trick}) \cdot \Cr (\emph {Trick})}{\Cr (\emph {Six})}. \]
The numerator on the right is easy. \(\Cr (\emph {Six}/ \emph {Trick})\) is 1: if the die has a six on all its sides then it is certain that it will land six. We also know that \(\Cr (\emph {Trick})\) is \(\nicefrac {1}{2}\). But what is \(\Cr (\emph {Six})\), your unconditional credence that the die will land six? Here we need one last rule:
The Law of Total Probability
\(\Cr (A) = \Cr (A/B)\cdot \Cr (B) + \Cr (A/\neg B)\cdot \Cr (\neg B)\).
This follows immediately from exercise 2.3 and the Conjunction Rule.
If we apply the Law of Total Probability to \(\Cr (\emph {Six})\) in the above application of Bayes’ Theorem, we get \[ \Cr (\emph {Trick}\;/\;\emph {Six}) = \frac {\Cr (\emph {Six}\;/\;\emph {Trick}) \cdot \Cr (\emph {Trick})}{ \Cr (\emph {Six}\;/\;\emph {Trick}) \cdot \Cr (\emph {Trick}) + \Cr (\emph {Six}\;/\;\neg \emph {Trick}) \cdot \Cr (\neg \emph {Trick}) }. \] It looks scary, but all the terms on the right are easy to figure out. We already know that \(\Cr (\emph {Six}\;/\; \emph {Trick}) = 1\) and that \(\Cr (\emph {Trick}) = \nicefrac {1}{2}\). Moreover, \(\Cr (\emph {Six}\;/\; \neg \emph {Trick})\) is plausibly \(\nicefrac {1}{6}\) and \(\Cr (\neg \emph {Trick})\) is \(\nicefrac {1}{2}\). Plugging all these values into the formula, we get \(\Cr (\emph {Trick}\;/\;\emph {Six}) = \nicefrac {6}{7}\). Your credence in the trick die hypothesis conditional on seeing a six should be \(\nicefrac {6}{7}\).
Exercise 2.12 \(\dagger \)\(\dagger \)\(\dagger \)
A stranger tells you that she has two children. You ask if at least one of them is a boy. The stranger says yes. How confident should you be that the other child is also a boy? (Assume there are only two sexes, which are equally common and independent among siblings.)
Essay Question 2.1
If an agent’s degrees of belief satisfy the probability axioms, it seems to follow from Kolmogorov’s axiom (ii) that the agent must be certain of every logical truth. Does this mean that our Bayesian model is inapplicable to ordinary agents, who are not logically omniscient? If so, is this a problem? Do you have an idea of how the model could be adjusted to allow for logical non-omniscience?
Sources and Further Reading
There are many good introductions to elementary probability theory. For a slightly more in-depth discussion of the topics we have covered, you may want to consult chapters 3–7 of Ian Hacking, An Introduction to Probability and Inductive Logic (2001). (You may find the rest of the book helpful as well.)
The problems infinitely many worlds raise for the Additivity axiom are nicely explained in Brian Skyrms, “Zeno’s paradox of measure” (1983).
The topic of the essay question is commonly discussed as the “problem of logical omniscience”. See, for example, Zeynep Soysal, “A metalinguistic and computational approach to the problem of mathematical omniscience” (2022) for an interesting recent proposal with pointers to the earlier discussion.