## Friday, July 15, 2011

### Probabilistic Modus Ponens and Modus Tollens

In my Bayesian primer, I passingly refer to the fallacy of probabilistic modus tollens . Since I have caught myself practicing this poor yet intuitive mental habit on several occasions, a post is in order.

There are two types of probabilistic modus tollens to consider: the static and the dynamic. The static version is modus tollens as it occurs in a fixed probability; the dynamic version is modus tollens as it does not occur when conditioning from a prior to a posterior. The former is discussed by Carl Wagner here, and the latter is discussed here by Elliott Sober, in the context (Intelligent design) where the fallacy most prominently occurs. Both papers also discuss probabilistic modus ponens, both the static and dynamic versions of which are valid.

Probabilistic logic is often thought of as an extension of deductive logic. Usually, this does no harm. Contradictions have probability 0 and tautologies have probability 1, probabilities of conditionals p(B-->A) may be recast as conditional probabilities p(A|B); and modus ponens probabilizes in a natural way, trivially in the static version and with conditionalization in the dynamic.

Let's look at the static analogues first.

Modus ponens (classic):

1. A,
2. A-->B,
3. ergo B.

Modus ponens (static): for a probability p,

1a. p(A) is high,
2a. p(B|A) is high,
3a. ergo p(B) is high.

To see that this holds, note that (2a) is equivalent to "p(A and B)/p(B) is high", where p(B)>=p(A and B). So if p(B) is small, p(A and B) is small. But p(A and B)=p(B|A)p(A) where p(A) and p(B|A) are high by (1a) and (2a), so this places a lower bound on p(B). More generally (Wagner 2004): given p(B|A) arbitrary and p(A)>0,

$p(B|A)p(A)\leq p(B)\leq p(B|A)p(A)+1-p(A)$

where these bounds are the best possible. As we see, static modus ponens imposes a discipline on the conclusion which depends on the values of the probabilities in the premises: as p(B|A) and p(A) increase, so does the lower bound on p(B).

Modus tollens (classic):
1b. B --> A,
2b. ~A,
3b. ergo ~B.

The procedure for deriving the static form of modus tollens is more difficult, which is partly why it did not appear - so far as I know - until (Wagner 2004).

Modus tollens (static): Let p(A|B)=a and p(~A)=b. There are three cases:
(i) if a>0 and 1>b,

$\max\{(1-a-b)/(1-a),(a+b-1)/a)\}\leq p(\sim B)<1$

(ii) if a=0 and 1>=b>0, then

$1-b\leq p(\sim B)<1$

(iii) if a=1 and 1>b>=0, then

$b\leq p(\sim B)<1$

and these bounds are the best possible.

So we know that static modus tollens is similarly constraining when a and b are both large or both small.

Modus ponens (dynamic): let p denote your prior probability and q your posterior. By conditioning, q(B)=p(B|A). So the following form of modus ponens is valid:

1c. p(B|A) is high,
2c. q(A) is high,
3c. ergo q(B) is high.

In case q(A)=1, this is simply conditionalization. Else, we apply Jeffrey conditioning:

$q(B)=p(B|A)q(A)+p(B|\sim A)q(\sim A).$.

Where q(A) is sufficiently close to 1, q(B) approximates p(B|A). But note that the approximation fails if q(A) is small. We will now see that a similar formulation of modus tollens fails miserably.

Modus tollens (dynamic) [fallacious]: let p denote your prior probability and q your posterior.

1d. p(A|B) is high,
2d. q(A)=0,
3d. ergo, q(B) is low.

To see why this fails, we look to counterexamples. Suppose that B is a fair lottery with a large number of tickets, and A is the probability that some ticket Jim did not buy wins, and Jim only buys a single ticket. So if there are N tickets, p(A|B)=(N-1)/N. Then (2d) is the occurrence of Jim winning the lottery. (3d) states that you should conclude that the lottery was unfair. Dynamic probabilistic modus tollens leads us to doubt that any particular fair lottery is fair, or that a card has ever been randomly drawn from a normal deck that is indeed normal. This version of modus tollens holds only if p(A|B)=1, which gives us nothing more than the classic version.

In order to see where exactly this goes wrong, let's look at Bayes' theorem:

$\frac{q(B)}{q(\sim B)}=\frac{p(\sim A|B)}{p(\sim A|\sim B)}\times\frac{p(B)}{p(\sim B)}$

Now the big question is whether or not q(~ A)=1 diminishes prob(B) at all, i.e. if

$\beta=\frac{p(\sim A|B)}{p(\sim A|\sim B)}<1$.

We know by (1c) that p(~A|B) is small, and this is the only information information relevant to this Bayes' factor. But by Bayes' rule,

$p(\sim A|B)=p(B|\sim A)\frac{p(\sim A)}{p(B)},\text{ and }p(\sim A|\sim B)=p(\sim B|\sim A)\frac{p(\sim A)}{p(\sim B)}$,

so a much stronger set of premises is required to capture modus tollens. Namely, statements about the terms in these equations which yields p(~ A|~ B)>p(~ A|B). Using the fair lottery example, this would require that one expects in advance - say by some evidence of fraud committed by a relative of Jim's who works for the state company - that the lottery is prejudiced in favor of Jim.

As Sober explores, the fallacy as it occurs in the ID movement is to assume that probabilistic modus tollens with respect to evolution and irreducibly complex structures (IC) implies that evolution (E) is unlikely. But this requires, among other things, assumptions about p(IC|~E). If we generously suppose that ~E is dominated by supernatural designers, the question remains open whether p(IC|~E) is large at all. If we suppose for example that supernatural possibilities contain all naturalistic possibilities (including chance') and more, equivocating would lead us to assert p(IC|E)>p(IC|~E), even if the former term is very, very tiny indeed.

I remember attempting to explain this a few years ago, well before my sortie into Bayesianism. I was seeing if the local Secular Student Alliance meetings were worth attending, and a bright fellow who defended ID - though he didn't accept it - also showed up. It was difficult to illustrate without Bayesian tools why it is that ID not only fails to deductively follow from some explanatory failure of evolution but that it also fails to probabilistically follow without further assumptions. In particular, ID theorists must do positive work on their theory to compete with evolution. Mere anti-evolution will not suffice. This I could not adequately explain without the tools I now have.

I haven't found any decent responses to this observation as applied to ID, but I would of course be happy to see them. (In particular, I'm not interested in responses which assume without defense that p(IC|~E)>p(IC|E). Key idea: non-evolutionary explanations of IC conferring a significant probability need to dominate the conditional p(.|~E)...) A comprehensive account should strengthen the conclusion to p(IC|~E)>>p(IC|E), where >>' denotes "small enough that p(IC|~E)/p(E|IC)>1/2''; this will be a function of the prior odds on evolution.

Edit (8/14/2011): I found something!