Friday, July 15, 2011

A Primer on Bayesian Philosophy: 2 - Bayesianism outlined

Bayesianism is broader than probabilism, the idea that credence can be represented by a function satisfying basic probability axioms. Accepting probabilism, many important questions are left open. Most importantly, how are scientific theories confirmed? How should new information affect our commitments? How do we contrast competing theories?

New evidence should lead to a new probability function; the problem is how to relate old probabilities to new probabilities. That is, what is the posterior probability of a hypothesis given the prior probability of the hypothesis and an update in the probability of some evidence?

Suppose that you observe the occurrence of a previously uncertain event E. Denote your prior probability by p and your posterior by q, so that $p(E)<1\text{ and }q(E)=1$.

Simple Conditionalization. $q(H)=p(H|E)$.

Updating on E, your posterior credence of H is your prior conditional of H given E. Further, those conditionals do not change. This is equivalent to assuming that odds ratios are invariant, i.e.

$\frac{q(H)}{q(H^c)}=\frac{p(H)}{p(H^c)}$

In English, to deny conditionalization is to claim that absolute odds - odds which ignore the updated evidence - change.

Rigidity. For arbitrary events A and B, $p(A|B)=q(A|B)$.

Before discussing the shortcomings and limitations of these principles, it is worth seeing what they can do for us. Some of you may be aware of the hypothetico-deductive model of science; the hypothetico-deductive principle follows from probabilism, conditionalization, and rigidity.

(I pause to say that this is really freakin' cool, and what follows was sufficient to provoke further study when I was introduced to it by a professor.)

Hypothetico-deductive principle. Let p be a prior probability, H denote an uncertain hypothesis, and E be uncertain evidence. If H is p-positively relevant to E - that is, p(H|E)>p(H), or equivalently by Bayes' theorem, p(E|H)>p(E) - and q(E)=1, then q(H)>p(H).

Proof. By conditionalization, $q(H)=p(H|E)$. By Bayes' theorem, $p(H|E)=p(E|H)p(H)/p(E)$. Since $p(E)<1$, we have $p(E|H)p(H)/p(E)>p(E|H)p(H)$. By $p(E|H)\leq1$ and rigidity, the result follows.

Short, sweet, and really really cool. If some theory makes a large number of uncertain predictions which are subsequently verified, the probability of the theory will approach one with reasonable assumptions. More properly, such observations impose very strong restrictions on posterior probabilities which fail to give high probability to a theory.

This principle can be stated more strongly: we may also determine the strength of confirmation yielded by evidence using Bayes' theorem. Recall the odds form of Bayes' theorem:

$\frac{p(H|E)}{p(H^c|E)}=\frac{p(E|H)}{p(E|H^c)}\times\frac{p(H)}{p(H^c)}$

As promised, rigidity and conditionalization justify the terminologies posterior odds' and prior odds'. Note that the Bayes factor

$\beta=\frac{p(E|H)}{p(E|H^c)}$

is the number by which the prior odds is multiplied to yield the posterior odds. If E is p-positively relevant to H, this number is greater than 1, and vice-versa. This number may be thought of as quantifying the strength of confirmation of H yielded by E.

That's obviously a very useful concept. It can be used to resolve paradoxes of confirmation which result from fairly simple assumptions like Nicod's Criterion, which states that observations of a previously uncertain particular instance confirm the corresponding regularity, if uncertain. This and logical equivalence entail that observing a non-black non-raven increases the probability of `all ravens are black'. The usual Bayesian response is to accept this counter-intuitive implication and explain why it is unproblematic by justifying the employment of differing Bayes factors. Observing a green apple confirms that all ravens are black, but not nearly so much as observing a black raven. And the uses are not limited to theoretical concerns; the practical utility of Bayes' theorem is overwhelming. Here, I'll let e-jedis provide the examples.1

So far, I have only presented the odds form of Bayes theorem to contrast a hypothesis and its complement, but the equation is the same for any two uncertain hypotheses. But you could fairly ask whether or not conditionalization as I have presented it is too simple. Can we update on uncertain evidence, i.e. when 1>q(E)?

Here the works of the late Richard Jeffrey are indispensable.2 Jeffrey Conditioning, or probability kinematics, is the standard way of updating probabilities in light of uncertain evidence. I opine that such a method is essential; as Jeffrey states: "Certainty is quite demanding. It rules out not only the far-fetched uncertainties associated with philosophical skepticism, but also the familiar uncertainties that affect real empirical inquiry in science and everyday life" (Subjective Probability, p.57).

Probability Kinematics. Let your prior probability be p and your posterior probability be q. Then if $\{E_i\}_{i\in[n]}$ is a partition of the sample space where q(E_i)>0 for all i,

$q(H)=\sum_{i\in[n]}p(H|E_i)q(E_i)$

Note that this follows from rigidity and simple conditionalization with 1>q(E). Simple/classic conditioning is the case that one block in this partition, E, is treated as certain.

Ok, so we know that Bayesian confirmation is theoretically and practically useful for resolving extremely broad classes of problems. But what are some of the problems and limitations with the theory?

Many of the issues concern assigning prior odds, interpretations of probability, and defending Bayesianism against other theories of credence; these will have to wait for the next two sections. A key limitation is that Bayesian learning cannot account for all learning. It requires logical omniscience: If you discover a logical implication, you must rework your prior accordingly. It requires rigidity: if you find out that your prior conditional probability was poorly calibrated - say due to inadequate statistical data - you have to redo the assessment. The domain of your prior may also be inadequate; the consideration of previously unimagined events and expectations must be included in a reworking. But these are merely the shortcomings of us mortal practitioners; what about theoretical shortcomings?

Normalizability is an issue: one cannot assign non-zero probabilities to uncountably many disjoint propositions. As most would agree that there is such a class of propositions, probability can never be complete, as uncountable sums of positive numbers always diverge.3 Everything discussed so far has also assumed that probabilities are particular real numbers. The theory must be expanded to account for vague probabilities.

These warnings mentioned, we can move on to the standard defense of Bayesianism.

1. This link is highly highly recommended, as is a related link by the same author.
2. I recommend reading his Subjective Probability: The Real Thing, a delightfully concise account which also includes a probability primer, many illustrative examples, and lots of exercises.
3. This issue is explored in detail in Hájek's What Conditional Probability Could not Be, amongst other places. For example, one cannot assign a probability to propositions about infinitely fine darts landing on particular points in a continuous region. Here many have sought to introduce infinitesimals to generalize probability - Hájek also discusses this - but such accounts run into fatal problems, and their prospects are dim.