This post is going to be formal, even too formal. But this post is not intended to be a one stop shop for the beginner to comprehend probability. Instead, the beginner should scan this post and use it as a Bayesianism-focused reference while studying the first few chapters of a traditional undergraduate probability textbook or some other professional work.4 Do not feel the need to understand everything here before proceeding further.
As I explain in the next post, the central idea of Bayesianism is the representation of credence, the strength of commitment to a proposition, by a non-zero real number [see (1)]. When encountering this idea and axioms of probability, think about what Bayesianism assumes in doing so. Ask yourself, for example, why negative reals do not represent credence. Think of what one could gain - if anything -by doing so. Ask yourself whether credence should lie on the real interval at all. Why not the rational numbers? Why not a vector space (to allow multi-dimensional values)? Why employ the continuum? Why should credence be normalizable [see (2)]?
I hope to discuss most of these questions in detail later, but I hope those encountering the axioms now will prefer honest toil to theft and think about why these postulates are postulated.
First, some notes on notation. The subset symbol `
As I am more comfortable with set notation, I introduce the topic as such, but it is possible to rewrite what follows in propositional form. It is quite common to encounter this, but the difference is symbolic.5 Conjunction, disjunction, `and', `or', negation, etc. may be substituted as needed.
Let
1. Positivity:
2. Normalizability:
3. Finite additivity:
Those familiar with probability theory will notice certain differences from a standard presentation. Note that p is defined on subsets of
3*. Countable additivity:
(Note that I have ceased to make the domain assumptions explicit.) It is an elementary theorem that (1), (2), and (3*) entail (3). But accepting countable additivity, along with assuming that Domain(p) is a sigma-field, may assume too much. For purposes of simplification, I will implicitly assume that p is defined on all subsets of the sample space. But as we will see in later sections, this is not necessarily the case.1 And of course, one may generalize probability further, but that would require more advanced mathematics, i.e. measure theory.
To understand the appeal of these axioms, it is important to remember that they are relatively young: Kolmogorov first published them in 1933. Before, probabilities were defined as relative frequencies, i.e.
where number(A) denotes the number of occurrences of A in N trials, or limiting long run frequencies, i.e.
One notices several problems with these notions. For the latter, the assumption that a limit exists is required. For both, N counts a reference class of events which requires specification, and reference classes are not always clear.2 There are also counter-factual commitments implicit in the definition, e.g. `if you were to toss this coin ad infinitum...' Worse still, relative frequency presupposes the uniform distribution and a finite sample space, and not all possibilities are equiprobable. But the Kolmogorov axioms capture such notions, where applicable.
Before going further, conditional probabilities need to be introduced. Very often, conditional probability is presented as a definition, and not an analysis - which it almost always is in practice. Usually, it is given in ratio analysis form:
The assumption that
As the ratio `definition' is most common, I accept it as the default, with caution as to its shortcomings.3 When
Theorem List
In what follows, probabilities are assumed to be defined wherever they appear. Items marked with an asterisk require countable additivity. [I currently see no need to include such theorems, some of which (e.g. continuity with respect to series of subsets and supersets) require calculus.] Unless stated otherwise, theorems follow from (1)-(3) and the ratio analysis of conditional probability. I will state these theorems in logical order; their derivation can be found in any elementary probability textbook, although many needlessly invoke countable additivity in the process.
Equivalence condition.
Probability of impossibility.
Complement rule.
This follows by noticing that
Subset rule.
Finite additivity.
Inclusion/Exclusion Principle.
This is often encountered in its simpler form:
Law of Total Probability.
This has a ready generalization:
Which has an equivalent in terms of conditional probability:
Multiplication law. For
Sample space reduction. p(B)>0 implies that
Bayes' Theorem.
This may be generalized using the Law of Total Probability:
Of great importance is the odds form of Bayes' Theorem: for H a hypothesis and E some evidence,
In this form, the term
1. The subjective interpretation of probability allows for this. It is possible to be separately committed to events
2. This is discussed further in the subjective/objective post.
3. See e.g. Alan Hájek's What Conditional Probability Could not Be.
4. I largely follow Colin Howson and Peter Urbach. Scientific Reasoning: The Bayesian Approach. Open Court: La Salle, 1989.
5. ibid, pp.18-9.
Want to get 30 bitcoin downline referrals every month, totally free?
ReplyDeleteHere's How:
1. Claim 5,000 (and up to 50,000) free satoshi per 24 hours from the Mellow Ads Faucet.
2. Start a 24 hours network campaign (using all your collected satoshis) promoting a bitcoin referral link.
3. When the campaign completes, re-claim and re-start.