Swell. But the important fact is not quantitative; it is qualitative, the fact that the evidence is varied. But why should variety-of-evidence be important? I look first at a trivial case to motivate the idea, then we may move on to a more general statement.
Evidence E may confirm H with respect to ~H, but not with respect to a particular G in ~H. For prior p and posterior q, employ classic conditioning on E, i.e. set q(E)=1, the previous sentence translates as follows:
We might say that E is indifferent with respect to H and G in p. But the assumption that relates these quantities. For then G and its complement in ~H, ~H\G, partition ~H:
which by definition of conditional probability, de Morgan's law, and finite additivity becomes
So if p(G) is large with respect to p(~H), the extent to which E confirms H against ~H is greatly limited, as the above quantity will approximate p(E|H)/p(E|G)=1 in that case.
The limitations imposed on confirmation by the `indifferent' competing hypotheses can be overcome with variety of evidence. That the Sun appears to go around the Earth helps to rule out exotic hypotheses, but it doesn't help to contrast heliocentrism and geocentrism, as both explain this apparent motion. Evolutionary theory is not confirmed over creationism by observations which creationists find unproblematic, such as `microevolutionary' change-over-time. If you are convinced that the canonical gospels and Acts are late forgeries, then the witness testimony they relate will do little to convince you that the Resurrection occurred. If you doubt the soundness of an experimental setup or you judge that it does not contrast the most probable theories, replications of a result should have little to no effect on your probabilities.
The effect of large amounts of a particular type of evidence can be thought of as `filtering' hypotheses, screening out those which fare poorly under that set of observations. But that `type' limits the severity of that filter. Variety-of-evidence can be thought of as a stronger filter, screening out hypotheses which might not have been contrasted by a category of evidence. If confidence in an experimental result is limited by confidence in those who performed the study, the replication of that result by a different group should have a stronger effect than replication by the original group. One may control for inaccuracies in a measuring device by measuring using multiple devices. All of this is common sense.
We need to be careful in thinking about a `type' of evidence and how this translates probabilistically. The strongest evidence for an old Earth is radiometric dating, but hypotheses designed to negate the consensus conclusions, e.g. errors in sampling and hypothetical `accelerating decay' rates, are contrasted by the variety of dating methods and the number of experiments. The strength of multiple datings on multiple samples is in the independence of the results; one radio-clock reading does not affect the other. To overcome the overwhelming power of the agreement in dates is to explain not only how a dating may go wrong, but how this agreement should occur given that the Earth did not form ~4.5bya. This problem, especially when examined in light of other agreeing methods, is so insuperable that rejecting the old age hypothesis in favor of the young Earth lies outside of `reasonable' bounds, normally conceived.
The old Earth example is handy for another reason, as it suggests a general approach to the problem often taken by Bayesians. Look at set of diverse evidence . Now we attempt to capture this notion in probabilistic terms.2 Intuitively, we may say that observations a and b are similar if p(a|b)>p(a). Readers familiar with probability will recognize that this is simply correlation, or dependence. The notion of dependence can be extended to the conditional case: p(a|b&H)>p(a|H). This has an obvious effect on Bayes' factors: if we expect a positive correlation of the diverse evidence given H but expect independence or negative correlation of that evidence given ~H, the effect is very powerful. In the case of independence of given ~H, the Bayes factor is
So if the numerator of this term is quite large, e.g. H the hypothesis that the Earth is approximately 4.5bya and ~H otherwise (assumed to be dominated by young Earth hypotheses), the effect is overwhelming, especially as n grows large. Suppose the E_i denotes a particular dating. If one thinks that each dating has probability 1/2 of getting the `wrong' result on the assumption of a young Earth, then the net Bayes factor is
And I think that this is a gross underestimate. This translates in general terms roughly as follows: variety-of-evidence is important because it screens for hypotheses which conditionally correlate that variety.
Up to our intuitions, this correlative account does a lot.3 But its practical, explicit application should be performed cautiously; the assumption of conditional independence concerning a large body of evidence is not to be done casually. That topic itself deserves one or several posts.
1. See 29+ Evidences for Macroevolution.
2. Here I follow Howson and Urbach, Scientific Reasoning, 1989, pp.112-115.
3. As far as I am aware, this approach is incomplete, even with extensions. See Andrew Wayne, Bayesianism and Diverse Evidence, 1995. Available here for those with access to JSTOR.