Showing posts with label Logic. Show all posts
Showing posts with label Logic. Show all posts

Tuesday, October 27, 2015

Occam+Bayes=Induction

  A classic problem in philosophy and the philosophy of science is how to justify induction. That is, how to rationally go from the fact that X is true in N previously observed cases to the belief that it is true in all cases, or at least in an additional, unobserved case. We will here propose a quick and simple method to justify induction, based on the combination of Occam's razor (to choose hypotheses) and Bayesian inference to update epistemic probabilities.


Notation

Let us introduce the following notation. Let \(H\) be some hypothesis which we want to judge for plausibility. Let \(X_k\) be the fact that \(X\) is true in the kth instance. Let \(X^n\) be the fact that \(X\) is true in the first n cases, that is \[X^n=X_1 \cap X_2 \cap \cdots \cap X_n=\bigcap_{k=1}^{n}X_k\] so that \[X^{n-1}\cap X_n=X^n\] Thus \(P\left ( X^n|H \right ) \) is the (epistemic) probability that we observe X in n cases, supposing H is true, and \(P\left ( H|X^n \right ) \) is the (epistemic) probability that H is true, supposing we observe X to be the case in n cases.


Occam's Razor

There are three basic, simplest hypotheses we can form, all the rest being more complex. These three are the
  • Proinductive (P) hypothesis: the chance of X happening again increases as we see more instances of it.
  • Contrainductive (C) hypothesis: the chance of X happening again decreases as we see more instances of it.
  • Uninductive (U) hypothesis: the chance of X happening again stays the same as we see more instances of it.
For concreteness, let \(F_H(n)=P\left ( X_{n}|H \cap X^{n-1} \right )\). Thus we say that, for \(m > 0\), \(F_P(n+m) > F_P(n)\), and \(\lim_{n \rightarrow \infty} F_P(n)=1\), and \(F_C(n+m) < F_C(n)\), and \(\lim_{n \rightarrow \infty} F_C(n)=0\), and \(F_U(n)=F_U(0)\).


Bayesian Inference

We want to find \(P\left ( H|X^n \right ) \) for the hypotheses listed in the previous section. We have \[ P\left ( X^n|H \right )=P\left ( X_n \cap X^{n-1}|H \right )=P\left ( X_n |X^{n-1} \cap H \right ) \cdot P\left ( X^{n-1} |H \right )=F_H(n) \cdot P\left ( X^{n-1} |H \right ) \] Therefore \[ P\left ( X^n|H \right )=\prod_{k=1}^{n} F_H(k) \] Suppose that there are \(N\) mutually exclusive and collectively exhaustive hypotheses. Then, Bayes' formula states: \[ P(H_m|A)=\frac{P(A|H_m)P(H_m)}{P(A|H_1)P(H_1)+P(A|H_2)P(H_2)+\cdots+P(A|H_N)P(H_N)} \] Thus, we have \[ P(H_m|X^n)=\frac{P(X^n|H_m)P(H_m)}{P(X^n|H_1)P(H_1)+P(X^n|H_2)P(H_2)+\cdots+P(X^n|H_N)P(H_N)} \] Therefore \[ P(H_m|X^n)=\frac{P(H_m)\prod_{k=1}^{n} F_{H_m}(k)}{P(H_1)\prod_{k=1}^{n} F_{H_1}(k)+P(H_2)\prod_{k=1}^{n} F_{H_2}(k) + \cdots + P(H_N)\prod_{k=1}^{n} F_{H_N}(k)} \] Let us suppose that the three hypotheses mentioned above are collectively exhaustive. Suppose, for concreteness that \(F_P(n)=\frac{n}{n+1}\), \(F_C(n)=\frac{1}{n+1}\), and \(F_U(n)=\frac{1}{2}\). Thus \(\prod_{k=1}^{n} F_{P}(k)=\frac{1}{n+1}\), and \(\prod_{k=1}^{n} F_{C}(k)=\frac{1}{(n+1)!}\), and \(\prod_{k=1}^{n} F_{U}(k)=\frac{1}{2^n}\). Let \(P(P)=p\) and \(P(C)=q\) and \(P(U)=r\) where \(p+q+r=1\). Then: \[ P(P|X^n)=\frac{p\frac{1}{n+1}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] \[ P(C|X^n)=\frac{q\frac{1}{(n+1)!}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] \[ P(U|X^n)=\frac{r\frac{1}{2^n}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] A simple assessment of limits shows that the former goes to 1 quite rapidly, for increasing n, for any nonzero p, and the latter two go to zero. In fact, for \(p=q=r=1/3\), for \(n>10\), \(P(P|X^n)>0.99\), and for \(n>17\), \(P(P|X^n)>0.9999\).

This example is meant to be only illustrative, to show the general way in which Occam's razor, combined with Bayesian inference, leads to a support of induction. The same things happening repeatedly lends credence to the hypothesis that the same things happen repeatedly, and detracts from the hypothesis that the same things are unlikely to happen repeatedly, or always happen with the same probability. In a very similar way, a coin repeatedly coming up heads supports the hypothesis that it is biased to come up heads, and detracts from the hypotheses that it is biased to come up tails or is fair. This may seem obvious, but it is beneficial to see exactly how the mathematical machinery supports this intuition.

We may also wish to include other hypotheses, but we must first assess the prior probabilities that they are true, and Occam's razor advises taking the inverse probability as inverse to the complexity of the hypothesis. Thus, even if on the hypothesis, observing n X's is more likely than on the three discussed, it would needs be more complex or ad hoc, and so would have a significantly lower prior probability.

Tuesday, September 29, 2015

Liars, Logic, and Information Theory

One of the most common types of logic puzzles involves two tribes, one that always tells the truth and another that always tells lies. There are many versions and variations of puzzles with this setup, but we can develop a method of approach that will work generally. The two main versions fall into 2 categories:
  1. Identification: we have a group of N people with some known possible set of identifications, and we ask questions to determine what tribe each is from.
  2. Information: We have a group of N people with some known possible set of identifications, and we ask them questions to determine M bits of information (independent yes/no questions). We do not need to identify the tribe of each person. For concreteness, we will take the bits to be 1s or 0s (i.e. we want to find whether the bit is 1 or 0)
The questions must be asked individually, and must be yes/no questions. We assume that the persons asked know all information relevant to the puzzle and understand the questions, supposing they are comprehensible.


A Brief Primer in some Information concepts


The fundamental unit of information is the bit. A single bit answers one yes/no question. If both answers are equally likely, the answer gives the most information, as otherwise you could guess the answer more easily. (In fact, the formula for the effective number of bits, if the chance of a "yes" answer is p, is given by: \(-p\log_2(p)-(1-p)\log_2(1-p) \approx 4p(1-p)\)). If there are \(2^N\) equally possible options, it takes N bits of information to narrow it down to one: in general, an additional bit halves the possibility space. If there are M possibilities, and \(2^{N-1}< M \leq 2^N\), then N bits of information are required. From a deterministic source--that is, a source with known, predictable behavior--one answer to one yes/no question yields at most 1 bit of information, and exactly one if both answers are equally probable. In general, if we discover M bits of information with N questions, if we only want a smaller number of bits, we will need fewer questions.


We will discuss some specific cases, describing some general methods of approaching the problem. We will forgo trivial cases, like asking a 1-bit question to someone of a known tribe, or identifying a person from an unknown tribe.

Information: One Person of Unknown Tribe, One Bit


Clearly we must ask at least one question, but can we determine it in exactly one question? Indeed we can. Our goal is to formulate a question such that, regardless of whether the person is a liar or a truther, the answer will correspond to the truth. We thus construct the following table, and look for a question such which would produce the listed "real answers" (answers taking into account whether the teller is a liar or truther).

Bit Value Identity Given Answer Honest Answer
1 Truther Yes Yes
1 Liar Yes No
0 Truther No No
0 Liar No Yes

The simplest way to construct such a question is just to ask one that corresponds to affirmative answers. In this case, the most easily constructed question is
Is one of the following true: the bit is 1 and you are a truther, the bit is 0 and you are a liar?
Regardless of whether the person asked is a truther or liar, the answer will always be "yes" if the bit is 1 and "no" if it is 0. The question may be found to simplify to something more natural sounding, but the question as given is sufficient. Moreover, if we require N bits of information, we can achieve such in exactly N questions. This will be our general approach. We will make a table in which the given answer corresponds to the information we seek. We will then formulate a question such as to produce the desired answer. This can be done most easily by forming a disjunction of the answers producing an affirmative.

Identification: One Person of Unknown Tribe, Unknown Language


In this case, the tribespeople have a language different than yours. They can understand your questions but reply in a way you can't understand. We will assume that you know the words for "yes" and "no" are "da" and "ja", but you don't know which corresponds to which. If you do not even know what the possible words for "yes" and "no" are, you can find this out with one additional question, merely by asking anything and then knowing that the response either means "yes" or "no". The question is then whether you can identify the tribe of the person, and in as few questions as possible. Given that we only seek one bit of information (the person is either a truther or a liar), we will attempt to do so with a single question. We will look for a question such that the response corresponds to the identity of the person. For concreteness, we will take "Da" to be indicative of a truther, "Ja" of a liar.

Identity Translation of "Da" Given Answer Translated Answer Honest Answer
Truther Yes Da Yes Yes
Truther No Da No No
Liar Yes Ja No Yes
Liar No Ja Yes No

Again, the simplest way to construct such a question is just to ask one that corresponds to affirmative answers, by a simple disjunction. In this case, the honest answer is "yes" exactly when "Da" means "yes". So we simply ask:
Does "Da" mean "yes"?
A truther will always answer "Da", and a liar will always answer "Ja". Note that we cannot determine what "Da" actually means from this question, and this accords with information theory concepts. We can only get one bit of information from one question. If we wanted to identify what "Da" meant without knowing the identiy, by a similar method we would find that the following question achieves that:
Is one of the following true: "Da" means "yes" and you are a truther, "Da" means "no" and you are a liar?
If the answer is "Da", "Da" means "yes".

Information: One Person of Unknown Tribe, Unknown Language, One Bit


This case is much like the preceding one, except we require neither then meaning of "Da" nor the identity of the person. As we need only one bit of information, we require at least one question. We will show how to do it in exactly one question. As before, we construct a table, but this time with three independent variables: the value of the bit, the identity of the person, and the meaning of "Da".

Bit Value Identity Translation of "Da" Given Answer Translated Answer Honest Answer
1 Truther Yes Da Yes Yes
1 Truther No Da No No
1 Liar Yes Da Yes No
1 Liar No Da No Yes
0 Truther Yes Ja No No
0 Truther No Ja Yes Yes
0 Liar Yes Ja No Yes
0 Liar No Ja Yes No

By the same method, the easiest (though not simplest) question to ask is:
Is one of the following true: you are a truther and "Da" means "yes" and the bit is 1, you are a liar and "Da" means "no" and the bit is 1, you are a truther and "Da" means "no" and the bit is 0, you are a liar and "Da" means "yes" and the bit is 0 ?
A simpler way would be to ask:
Is an odd number of the following true: the bit is 1, you are a truther, "Da" means "yes"?
In general, we can see that we can always get exactly one bit of information from one question, given certain other constraints. Not knowing the language or the identity of the person asked are no hindrances to getting information. Also, if we have \(2^M\) people from potentially different tribes who speak the same unknown language, or even if we only know the potential words for "yes" and "no" for one of their languages, we can still identify all of them in exactly M questions just by asking the one person M questions.

Identification: Truther, Liar, and Unhelpful in Unknown Order.


In this case, we have three people known to be some permutation of truther, liar and a third kind we call unhelpful. The unhelpful is a third type of tribesperson who answers so as to be maximally unhelpful. That is, he will answer so as to prevent you from getting information. The goal is to identify him regardless, as well as the other two. The first question is whether we can identify the three, and then, if it is possible, to do so in as few questions as we can. As there are 6 possible orderings, we will need 3 bits of information, corresponding to at least 3 questions. We must ask each person at least one question, as only asking 2 or fewer risks only asking the unhelpful, who provides no information. However, if we ask each of them one question, we only get two bits of information, as the unhelpful provides none. Thus we must ask at least 4 questions, with the 4th question being asked of one of the non-unhelpfuls.

In fact there is a way to do this. We ask a question which the truther and the liar will answer differently. We then take the odd one out among the three, who is guaranteed to be either a truther or a liar (in fact, the way he answers will decide which) and then ask him for one more bit of information to identify one of the others, which we have already described how to do. So, for instance, we can ask all three "Do you exist?" (or, if the language is unknown "Does 'Da' mean 'yes'?"). And then concoct a question to ask the odd one out to get the final requisite bit (left as an exercise for the reader). Thus we can achieve it in exactly 4 questions. In fact, for the first three questions, we only get 2/3 of a bit of information per answer, as, for each answer, we get 1 bit with 2/3 probability.

Monday, June 16, 2014

Some Set Theory

      Set theory is a fundamental branch of mathematics that studies sets, that is, collections of objects that can be treated as wholes.
"A set is a Many that allows itself to be thought of as a One."
- Georg Cantor.

A set has various elements or members, which may either be individual objects or themselves sets. An extension to this is the inclusion of the empty set (the set containing no objects, symbolized \(\emptyset\) or \(\left \{\right\}\)) as an allowable set. If set A has a, b and c as elements, we symbolize that as A={a,b,c}. Note that, in set theory, order and multiplicity don't matter, so A is also equal to {b,c,a,a,b}.


Definitions and Notation

\( x \in A \)         x is an element/member of A.
\( x \notin A \)                 x is not an element/member of A.
\( A \subseteq B \)                 Set A is a subset of B. That is, for all x, if x is an element of A, then x is an element of B. Note that "A is a subset of B" is equivalent to "B is a superset of A".
\( A = B \)                 A is equivalent to B. That is, for all x, x is an element of A if and only if x is an element of B.
\( A \neq B \)                 A is not equivalent to B.
\( A \subset B \)                 \( A \subseteq B \) and \( A \neq B \). If \( A \subset B \), we say A is a proper subset of B.
\(A \cap B \)                 The intersection of sets A and B. That is, the set that contains all of the elements that are in both A and B. Sets A and B are said to be disjoint if \(A \cap B = \emptyset\).
\(A \cup B \)                 The union of sets A and B. That is, the set that contains all of the elements that are in either A or B.
\(A \setminus B \)                 The set difference of sets A and B. That is, the set that contains all of the elements in A that are not in B.
\(\mathbb{Z} =\left \{ ...,-2,-1,0,1,2,... \right \} \)                 The set of all integers. That is, the set that includes 0 and every other number that can be achieved by successively adding or subtracting 1 from 0.
\(\mathbb{N} =\left \{1,2,3,... \right \} \)                 The set of all natural numbers. That is, the set that includes 1 and every other number that can be achieved by successively adding 1 to 1.
\(\mathbb{W} =\left \{0,1,2,3,... \right \} \)                 The set of all whole numbers. That is, the set that includes 0 and every other number that can be achieved by successively adding 1 to 0.
\( \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} Px \end{matrix}\right\} \)                 The set of all elements of the form \(x\) such that \(Px\) is true of x. For instance \(\mathbb{N}=\left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{Z} \wedge x>0 \end{matrix}\right\} \).
\( \mathbb{Q}= \left\{\begin{matrix} \frac{x}{y} \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{Z} \wedge y \in \mathbb{N} \end{matrix}\right\} \)                 The set of all rational numbers. That is, the set of all real numbers expressible as a ratio of integers with non-zero denominator.
\( \mathbb{N}_{n}=\left\{1,2,3,...,n\right\} \)                 The truncated set of the natural numbers up to \(n\). Another way of expressing it would be: \( \mathbb{N}_{n}=\left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{N} \wedge x \leq n \end{matrix}\right\} \).
\( \mathbb{R}=\left (-\infty, \infty \right) \)                 The set of all real numbers. As a note, we often see the shorthand \((a,b)\) (not to be confused with an ordered pair!) to mean \( \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{R} \wedge a < x \wedge x < b \end{matrix}\right\} \). We may also see the shorthand \([a,b]\) to mean \( \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{R} \wedge a \leq x \wedge x \leq b \end{matrix}\right\} \). The combined notation \((a,b]\) should be clear enough.
\(\mathbb{W} =\left \{0,1,2,3,... \right \} \)                 The set of all whole numbers. That is, the set that includes 0 and every other number that can be achieved by successively adding 1 to 0.
\(\mathbb{P} =\left \{2,3,5,7,11,... \right \} \)                 The set of all prime numbers. That is, the subset of the natural numbers that have exactly two factors: 1 and themselves. By a well-known theorem, we know that there are infinitely many prime numbers. They are often represented as a sequence of integers in ascending order: \(\mathbb{P}=\{p_{0},p_{1},p_{2},p_{3},...\}\).
\(\mathbb{A} \)                 The set of all algebraic numbers. That is, the set of numbers that solve polynomials with integer coefficients. We can also introduce the notation \(\mathbb{A}_{n}\) which is the set of all algebraic numbers of order \(n\), that is the set of numbers that solve polynomials of order \(n\) with integer coefficients. In other words: \( \mathbb{A}_n=\left\{\begin{matrix} x \\ \; \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{R} \wedge \exists A = \{a_{0},a_{1},a_{2},...a_{n}\} \sim \mathbb{N}_{n+1} \\ \wedge c \in A\Rightarrow c \in \mathbb{Z} \wedge 0=\sum_{k=0}^{n} a_{k} x^{k} \end{matrix}\right\} \).
We then take the union: \(\mathbb{A}=\mathbb{A}_{1} \cup \mathbb{A}_{2} \cup \mathbb{A}_{3} \cup ... \). A number in the set \(\mathbb{R} \setminus \mathbb{A}\) is called a transcendental number.
\( (a,b) \)                 An ordered pair, that is a pairing of mathematical objects in which the order matters. We can define them in the following way, for technical purposes: \( (a,b) = \left \{ \left \{ a \right \}, \left \{ a,b \right \} \right \} \).
\( A \times B \)                 The cartesian product of sets A and B. That is, the set of all ordered pairs \( (a,b) \) such that \( a \in A \) and \( b \in B \).
\( \mathcal{P}(A) \)                 The power set of set A. That is, the set of all subsets of /(A/). We can write this as \( \mathcal{P}(A)=\left\{\begin{matrix} X \end{matrix}\right|\left.\begin{matrix} X \subseteq A \end{matrix}\right\} \).
\( f: A \rightarrow B \)                 f is a function from A to B. That is, f takes an element of A and returns an element of B. Strictly speaking, a function from A to B is a subset of \( A \times B \) with the properties
  • \( \left\{\begin{matrix} a \end{matrix}\right|\left.\begin{matrix}(a,b)\in f \end{matrix}\right\} = A \)
  • If \((a,b)\in f \) and \((a,c)\in f \), then \(b=c \).
A is called the domain of f and B is called the range or codomain of f. Often we will see the notation \(y=f(x)\), which means that f associates y with x, or \((x,y)\in f\). In this case, y is called the image of x under f. More generally, if \(X \subseteq A \), then \(f(X)=Y= \left\{\begin{matrix} y \end{matrix}\right|\left.\begin{matrix} y=f(x) \wedge x \in X \end{matrix}\right\}\). Y is likewise called the image of X under f. Clearly \(Y \subseteq B \).
\( f: A \rightarrowtail B \)                 f is an injective function from A to B. That is, the function is one-to-one, or, if \(f(a)=f(b)\) then \(a=b\). Every injective function f has an inverse function, \(f^{-1}: f(A) \rightarrow A \). We can say \(f^{-1}=\left\{\begin{matrix} (b,a) \end{matrix}\right|\left.\begin{matrix} (a,b) \in f \end{matrix}\right\} \). Clearly the inverse of an injective function is injective. We can also see that \(f(f^{-1}(a))=a\), and \(f^{-1}(f(a))=a\).
\( f: A \twoheadrightarrow B \)                 f is a surjective function from A to B. That is, the function is onto. In other words, \(f(A)=B\), i.e. \( \left\{\begin{matrix} b \end{matrix}\right|\left.\begin{matrix}(a,b)\in f \end{matrix}\right\} = B \).
\( f: A \leftrightarrow B \)                 f is a bijective function from A to B. That is, f is both injective and surjective. The inverse of a bijective function is also bijective.
\( A \preceq B \)                 There exists an injective function from \(A\) to \(B\).
\( A \sim B \)                 There exists a bijective function from \(A\) to \(B\).
\( A \nsim B \)                 There does not exist a bijective function from \(A\) to \(B\).
\( A \prec B \)                 There exists an injective function from \(A\) to \(B\), but no bijective function from \(A\) to \(B\). That is, \( A \preceq B \) and \( A \nsim B \).
\( \left | A \right | \)                 The cardinality of \(A\). That is, a value associated with a set that can be used to compare sets with respect to some sense of magnitude.
\( \left | A \right | \leq \left | B \right |\)                 \( A \preceq B \). We say this as "the cardinality of A is less than or equal to the cardinality of B", though this is only something of a manner of speaking.
\( \left | A \right | = \left | B \right |\)                 The cardinality of \(A\) is the same as the cardinality of \(B\), meaning there exists a bijective function from \(A\) to \(B\). In other words, \( A \sim B \). When this is the case, it is often convenient to say "A and B are equinumerous", but this is a technical term, and should not be naively interpreted.
\(\aleph_{0}\)                 \(\aleph_{0}=\left | \mathbb{N} \right |\). \(\aleph_{0}\) is the "smallest" infinite cardinal. In other words, if \( \left | A \right |< \aleph_{0}\), then A is a finite set. Also, a set A is said to be countable iff \( \left | A \right | \leq \aleph_{0}\). If \(A \sim \mathbb{N} \), then A is said to be countably infinite.
\(\mathbb{S}_{n}\)                 \(\mathbb{S}_{n}=\left | \mathbb{N}_{n}\right |\). A is a finite set iff for some \(n\), \(\left | A \right |= \mathbb{S}_{n} \), where \(n \in \mathbb{N} \). Typically, if \(\left | A \right |= \mathbb{S}_{n} \), where \(n \in \mathbb{N} \), we say "A has n elements", though we will here avoid all notions of the "number of elements".
\( \left | A \right |+\left |B \right |\)                 \( \left | A \right |+\left |B \right |=\left |A \cup \tilde{B} \right |\), where \(\tilde{B} \sim B \) and \(A \cap \tilde{B}=\emptyset \).
\( \left | A \right |\times\left |B \right |\)                 \( \left | A \right |\times\left |B \right |=\left | A \times B \right |\).
\( \max \left (\left | A \right |,\left |B \right | \right)\)                 \( \max \left (\left | A \right |,\left |B \right | \right) = \left\{\begin{matrix} \left | A \right |, \\ \left | B \right |, \end{matrix}\right. \begin{matrix} B \preceq A \\ A \preceq B \end{matrix}\) .
\( \min \left (\left | A \right |,\left |B \right | \right)\)                 \( \min \left (\left | A \right |,\left |B \right | \right) = \left\{\begin{matrix} \left | A \right |, \\ \left | B \right |, \end{matrix}\right. \begin{matrix} A \preceq B \\ B \preceq A \end{matrix}\) .

Axioms of Set Theory


The most typical axiomatization of set theory is that of Zermelo-Fraenkel with the axiom of choice (ZFC). We will here list these axioms:
Extensionality                 Sets A and B are equivalent (symbolized \(A=B\)), iff, for all x, \(x \in A\) iff \(x \in B\).
Regularity                 If A is a non-empty set, then, for some element x of A, \(A \cap x=\emptyset\).
Schema of Specification                 If, for any x, we can determine whether or not Px is true of x (P being some predicate), then we can always construct the set \( \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} Px \end{matrix}\right\} \).
Pairing                 If A and B are sets, then there exists a set C such that \(A \in C\) and \(B \in C\).
Union                 If A is a set, then there exists a set B such that, if \(Y \in A\) and \(x \in Y\) then \(x \in B\). More generally, the union of any collection of sets exists.
Schema of Replacement                 If \(f\) is a definable function, and A is a set in the domain of \(f\), then, \(f(A)\) is a set.
Infinity                 There exists a set X such that \(\emptyset \in X\), and, if \(y \in X\) then \(\left \{y \right \} \in X\). That is, \(X=\left \{\emptyset, \left \{\emptyset\right \}, \left \{\left \{\emptyset\right \}\right \},...\right \} \). In particular, X contains infinitely many elements.
Power Set                 If A is a set, then \(\mathcal{P}(A) \) is also a set.
Choice                 If \(A_{1}\), \(A_{2}\), \(A_{3}\), ... is a sequence of non-empty sets, then there exists a set \(X=\left \{x_{1},x_{2},x_{3},...\right \}\), such that \(x_{n} \in A_{n}\).


Some Elementary Theorems


We will now discuss a few theorems. Theorems stated without proof are beyond the level of our discussion or are too involved. The reader is directed to a more technical discussion if more detail is desired.
Theorem: If \(A \subseteq B \) then \(A \preceq B \).
Proof: Let \(f(x)=x\). Then \(f\) is injective from A to B.

Theorem: If \(f: A \rightarrowtail B\), \(C \subseteq A\), \(D \subseteq A\) and \(C \cap D=\emptyset\), then \(f(C) \cap f(D)=\emptyset\).
Proof: Assume \(C \cap D=\emptyset\) and \(f(C) \cap f(D)=F\neq\emptyset\). Let \(\phi \in F\) then, for some \(c \in C\) and some \(d \in D\), \(f(c)=f(d)=\phi\). However, \(f\) is injective, and therefore, if \(f(c)=f(d)\), then \(c=d\). Therefore, \(c \in C \cap D\), and therefore \(C \cap D \neq \emptyset\), which contradicts our assumption. Therefore \(f(C) \cap f(D)=\emptyset\).
Corollary: If \(f: A \rightarrowtail B\), then \(f(C \cap D)=f(C) \cap f(D)\).
Corollary: If \(f: A \rightarrowtail B\), then \(f(C \cup D)=f(C) \cup f(D)\).

Theorem: \(A \sim A \).
Proof: The identity function \(f(x)=x\) is a bijection from A to A.

Theorem: If \(A \sim B \) then \(B \sim A \).
Proof: Let \(f\) be a bijective function from A to B. As all bijective functions have inverses, \(f^{-1}\) is a bijection from B to A.

Theorem: If \(A \sim B \) and \(B \sim C \) then \(A \sim C \).
Proof: Let \(f\) be a bijective function from A to B, and \(g\) be a bijective function from B to C. Then the composition \(g \circ f\) is a bijective function from A to C.

Theorem: If \(a \notin A \) and \(A \sim \mathbb{N}_{n} \) then \(A \cup \left\{a \right\} \sim \mathbb{N}_{n+1}\).
Proof: Let \(f\) be a bijective function from A to \(\mathbb{N}_{n}\). Let \(g(x)=\left\{\begin{matrix} f(x), \;\; \;\; \\ n+1, \;\; \;\; \end{matrix}\right. \begin{matrix} x \in A \\ x=a \end{matrix}\).
Then g is bijective from \(A \cup \left\{a \right\}\) to \(\mathbb{N}_{n+1}\).
Corollary: If \(B \cap A=\emptyset \) and \(B \sim \mathbb{N}_{m} \) and \(A \sim \mathbb{N}_{n} \) then \(A \cup B \sim \mathbb{N}_{n+m}\).

Theorem: If \(a \in A \) and \(A \sim \mathbb{N}_{n} \) then \(A \setminus \left\{a \right\} \sim \mathbb{N}_{n-1}\).
Proof: Let \(f\) be a bijective function from \(\mathbb{N}_{n}\) to A. Suppose \(a=f(m)\). Let \(g(x)=\left\{\begin{matrix} f(x), \;\; \;\; \\ f(x+1), \;\; \;\; \end{matrix}\right. \begin{matrix} x < m \\ x \geq m \end{matrix}\).
Then g is bijective from \(\mathbb{N}_{n-1}\) to \(A \setminus \left\{a \right\}\).
Corollary: If \(B \subseteq A \) and \(B \sim \mathbb{N}_{m} \) and \(A \sim \mathbb{N}_{n} \) then \(A \setminus B \sim \mathbb{N}_{n-m}\).

Theorem: If \(A \sim B \) and \(a \in A \) and \(a \in B \) then \(A \setminus \left\{a \right\} \sim B \setminus \left\{a \right\} \).
Proof: Let \(f\) be a bijective function from A to B. Suppose \(f(c)=a\). Define \(g(x)=\left\{\begin{matrix} a, \\ c, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=c \;\;\;\; \;\; \\ x=a \;\;\;\; \;\; \\ x \neq a \wedge x \neq c \end{matrix}\).
Clearly g is bijective (in fact, it is its own inverse) and \(g(A)=A\). Also, \((f \circ g)(a)=a\), and \(f \circ g\) is bijective. Therefore, \(f \circ g\) is a bijection from \(A \setminus \left\{a \right\}\) to \(B \setminus \left\{a \right\}\).
Corollary: If \(A \sim B \), and \(C \sim \mathbb{N}_{n}\) for some \(n \in \mathbb{N}\), where \(C \subseteq A \) and \(C \subseteq B \), then \(A \setminus C \sim B \setminus C \).
Proof: Apply the theorem above \(n\) times. Note that the set subtracted must be finite. The theorem is inapplicable for infinite sets. In fact, as a counterexample for infinite sets: \(\mathbb{Z} \setminus \mathbb{N} \nsim \mathbb{W} \setminus \mathbb{N}\) even though \(\mathbb{Z} \sim \mathbb{W}\).

Theorem: \(\mathbb{N}_{n} \sim \mathbb{N}_{m} \) if and only if \( {n = m} \).
Proof: Suppose that \({n \neq m}\) and \(\mathbb{N}_{n} \sim \mathbb{N}_{m} \). Without loss of generality, \(n < m \). By a theorem proved above, \(\mathbb{N}_{n} \setminus \mathbb{N}_{n} \sim \mathbb{N}_{m} \setminus \mathbb{N}_{n} \). But \(\mathbb{N}_{n} \setminus \mathbb{N}_{n} = \emptyset \). However, \(m \in \mathbb{N}_{m} \setminus \mathbb{N}_{n} \), so \(\mathbb{N}_{m} \setminus \mathbb{N}_{n} \neq \emptyset\). Clearly, then, it is impossible that \(\mathbb{N}_{n} \setminus \mathbb{N}_{n} \sim \mathbb{N}_{m} \setminus \mathbb{N}_{n} \), and so our assumption that \({n \neq m}\) must be false. Conversely, if \({n=m} \) then \(\mathbb{N}_{n}= \mathbb{N}_{m} \), and thus \(\mathbb{N}_{n} \sim \mathbb{N}_{m} \).

Theorem: For any sets \(A\) and \(B\), there is an injection from \(A\) to \(B\) if and only if there is a surjection from \(B\) to \(A\).
Proof: Suppose \(f: A \rightarrowtail B\). Then, let \(a \in A\) be any element in \(A\). Now, define \(g(x)=\left\{\begin{matrix} f^{-1}(x), \;\; \;\; \\ a, \;\; \;\; \end{matrix}\right. \begin{matrix} x \in f(B) \\ x \notin f(B) \end{matrix}\)
Clearly \(g\) is surjective from \(B\) to \(A\).
Now suppose \(f: A \twoheadrightarrow B\). That means that, for every \(b\in B\) there is at least one \(a \in A\) such that \(f(a)=b\). Now define \(g(x)=a\), where \(a\) is arbitrarily chosen for any \(x\), from among those \(a \in A\) such that \(f(a)=x\). Clearly \(g\) is injective from \(B\) to \(A\).
Corollary: "There is a surjection from \(A\) to \(B\) " \( \Leftrightarrow B \preceq A\).

Theorem: For any sets \(A\) and \(B\), there exists either an injection from \(A\) to \(B\) or a surjection from \(A\) to \(B\).
Proof (Informal): If there is no function from \(A\) to \(B\) that is either an injection or an injection, then, for any function \(f\), there are \(a_1,a_2 \in A\) such that \(a_1 \neq a_2\) and \(f(a_1)=f(a_2)\) and a \(b \in B\) such that \(b \notin f(A)\). But clearly we can construct a new function such that \(f(a_2)=b\). Moreover, we can continue this for an arbitrary set of \(\left\{\begin{matrix} b \end{matrix}\right|\left.\begin{matrix} b \in B \wedge b \notin f(A) \end{matrix}\right\}\) and/or \(\left\{\begin{matrix} a \end{matrix}\right|\left.\begin{matrix} a \in A \wedge \exists a' \in A, a' \neq a, f(a)=f(a') \end{matrix}\right\}\). Thus, we can always exhaust at least one set, as each set has a set of larger cardinality, and we can continue this up to any cardinality. Thus, we can exhaust either the domain (which would make the function injective) or the co-domain (which would make the function surjective).
Corollary: For any sets A and B, either \( A \preceq B \) or \( B \preceq A \).
Corollary: For any sets A and B, exactly one of the following is true:
  • \(A \prec B\)
  • \(B \prec A\)
  • \(A \sim B\)

Theorem: If \( A \preceq B \) and \( B \preceq A \) then \(A \sim B \).
Proof: Let \( f: A \rightarrowtail B \) and \( g: B \rightarrowtail A \) and \(h=g \circ f\). Let us introduce the notation \(h^{k}(x)\) defined by \(h^{0}(x)=x\) and \(h^{n+1}(x)=h(h^{n}(x))\). We now let \( C=A \setminus g(B)\) and \( X=h^{0}(C) \cup h^{1}(C) \cup h^{2}(C) \cup ...\).
We then define \(k(x)=\left\{\begin{matrix} f(x), \\ g^{-1}(x), \end{matrix}\right. \begin{matrix} x \in X \;\;\;\; \\ x \notin X \;\;\;\; \end{matrix}\).
We can then verify that \(k\) is bijective from \(A\) to \(B\):
  • Let \(b \in B\). If \(b \in f(X)\), then, for some \(a \in X \subseteq A\), \(k(a)=b\). Otherwise, we can find an \(a\) such that \(a=g(b)\). By construction, \(a \notin C\), and \(b \notin f(X)\), and since \(f(h^{n}(X)) \subseteq f(X)\), we find \(a \notin X\) and so \(k(a)=b\). Thus, for any \(b \in B\), \(b=k(a)\) for some \(a\in A\). Therefore \(k\) is surjective.
  • Assume, for some \(c \in X\) and some \(d \notin X\), \(f(c)=g^{-1}(d)\), or \(d=g(f(c))\). Therefore, for some \(n \in \mathbb{W}\), \(c \in h^{n}(C)\). But then \(d \in g(f(h^{n}(C)))=h^{n+1}(C) \subseteq X\). But that contradicts our assumption that \(d\notin X\), and thus, if \(k(c)=k(d)\), then \(c=d\). Therefore, \(k\) is injective.

Theorem: For any set A, \(A \prec \mathcal{P}(A) \).
Proof: Clearly \(f(x)=\left\{x \right\}\) is an injection from A to \(\mathcal{P}(A)\). Let \(f: A \rightarrow \mathcal{P}(A)\) be any function. Define \(S= \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \notin f(x) \end{matrix}\right\} \). Clearly, then, \(x \in S \Leftrightarrow x \notin f(x)\). Assume that, for some \(k \in A\), \(S=f(k)\). In that case, \(k \in f(k) \Leftrightarrow k \notin f(k)\), which is clearly impossible. Thus, our assumption fails, and there is no k such that \(k \in A\), \(S=f(k)\). Therefore, as \(S \in \mathcal{P}(A)\), \(f\) is not surjective, and thus not bijective.
Corollary: For any set A, \(A \prec \mathcal{P}(A) \prec \mathcal{P}(\mathcal{P}(A)) \prec ... \).
Corollary: \(\mathbb{N} \prec \mathcal{P}(\mathbb{N}) \prec \mathcal{P}(\mathcal{P}(\mathbb{N})) \prec ... \).

Theorem: \(\mathbb{N} \sim \mathbb{Z}\).
Proof: Let \(f(n)=\left\{\begin{matrix} \frac{n-1}{2} \;\; n \textrm{ odd} \\ -\frac{n}{2} \;\; n \textrm{ even} \end{matrix}\right.\). Then \(f\) can be seen to be bijective from \(\mathbb{N}\) to \(\mathbb{Z}\).

Theorem: \(\mathbb{N} \sim \mathbb{Q}\).
Proof: As \(\mathbb{N} \subset \mathbb{Q}\), we have \(\mathbb{N} \preceq \mathbb{Q}\). Let us now take \(f \left( \frac{a}{b} \right)= 2^{\textrm{sgn}(\frac{a}{b})}3^{a}5^{b}\). Clearly \(f(\mathbb{Q}) \subset \mathbb{N}\), and so \(f\) is an injection from \(\mathbb{Q}\) to \(\mathbb{N}\), and thus \(\mathbb{Q} \preceq \mathbb{N}\). Therefore \(\mathbb{N} \sim \mathbb{Q}\).

Theorem: \(\mathbb{N} \sim \mathbb{A}\).
Proof: As \(\mathbb{N} \subset \mathbb{A}\), we have \(\mathbb{N} \preceq \mathbb{A}\). Let us now take \(f \left( x \right)\) to be defined as \(f(x)=p_{1}^{k}p_{2}^{a_{0}}p_{3}^{a_{1}}...p_{n+2}^{a_{n}}\) where \(x\) is the kth real solution of \(0=\sum_{k=0}^{n} a_{k} x^{k}\), sorted according to increasing absolute magnitude, and \(p_{k}\) is the kth prime number. Then, \(f(\mathbb{A}) \subset \mathbb{Q} \sim \mathbb{N}\), and \(f\) is injective. Thus \(\mathbb{A} \preceq \mathbb{N}\), and thus \(\mathbb{N} \sim \mathbb{A}\).

Theorem: \((0,1) \sim \mathbb{R}\).
Proof: The function \(f=\frac{2 x-1}{x-x^{2}}\) is bijective from \((0,1)\) to \(\mathbb{R}\).

Theorem: \((0,1) \sim [0,1]\).
Proof: Let \(f(x)=\left\{\begin{matrix} \frac{1}{2}, \\ \frac{1}{2^{n-2}}, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=0 \;\;\;\;\;\;\;\;\;\;\;\;\; \\ x=\frac{1}{2^{n}}, n \in \mathbb{W} \\ x \neq 0, x \neq \frac{1}{2^{n}} \;\; \end{matrix}\). Then f is bijective from \([0,1]\) to \((0,1)\).

Theorem: \([0,1) \sim [0,1]\).
Proof: Let \(f(x)=\left\{\begin{matrix} \frac{1}{2^{n-1}}, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=\frac{1}{2^{n}}, n \in \mathbb{W} \\ x \neq \frac{1}{2^{n}} \;\; \end{matrix}\). Then f is bijective from \([0,1]\) to \([0,1)\).

Theorem: \([0,1) \sim (0,1]\).
Proof: Let \(f(x)= 1-x \). Then f is bijective from \((0,1]\) to \([0,1)\).

Theorem: \((a,b) \sim (0,1)\).
Proof: Let \(f(x)= \frac{x-a}{b-a} \). Then f is bijective from \((a,b)\) to \((0,1)\). Relations of the form \([a,b) \sim [0,1) \) can be identically proven.

Theorem: \((a,\infty) \sim (0,1)\).
Proof: Let \(f(x)=\frac{|x-a|}{|x-a|+1}\). Then f is bijective from \((a,\infty)\) to \((0,1)\). This function can also be used to prove \([a,\infty) \sim [0,1)\), \((-\infty, a) \sim (0,1)\) and \((-\infty, a] \sim [0,1)\).

Theorem: If \(a \in (0,1)\), then \([0,a)\cup (a,1]=[0,1]\setminus \left \{a \right \} \sim [0,1]\).
Proof: Let \(f(x)=\left\{\begin{matrix} a^{n}, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=a^{n+1}, n \in \mathbb{N} \\ x \neq a^{n+1} \;\;\;\; \;\; \end{matrix}\).
Then f is bijective from \([0,1]\setminus \left \{a \right \}\) to \([0,1)\).

Theorem: \(\mathbb{R} \sim \mathbb{R} \setminus \mathbb{A}\)
Proof: Let \( P= \left\{\begin{matrix} \frac{1}{p} \end{matrix}\right|\left.\begin{matrix} p \in \mathbb{P} \end{matrix}\right\}\). Clearly \(P \sim \mathbb{N}\), as \(P \sim \mathbb{P}\) and \(\mathbb{P} \sim \mathbb{N}\). Moreover, as \(\mathbb{N} \sim \mathbb{A}\), we have \(P \sim \mathbb{A}\), and since \((0,1) \sim \mathbb{R}\), we just need to show that \((0,1) \sim (0,1) \setminus P \). Let
\(f(x)=\left\{\begin{matrix} \frac{1}{p_k^{n+1}}, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=\frac{1}{p_k^{n}}, n,k \in \mathbb{N} \\ x \neq \frac{1}{p_k^{n}} \;\;\;\;\;\;\;\;\;\;\; \end{matrix}\).
Then \(f\) is bijective from \((0,1)\) to \((0,1) \setminus P \). Therefore, it follows that \(\mathbb{R} \sim \mathbb{R} \setminus \mathbb{A}\).

Theorem: \(\mathcal{P}(\mathbb{N}_{n})\sim \mathbb{N}_{2^{n}}\).
Proof: We use the fact that every number has a unique binary representation. Let \(A \in \mathcal{P}(\mathbb{N}_{n}) \). We then let \(f(A)=1+\sum_{x \in A} 2^{x-1}\) and \(f(\emptyset)=1\). Then f is bijective from \(\mathcal{P}(\mathbb{N}_{n})\) to \(\mathbb{N}_{2^{n}}\).

Theorem: If \(A \sim B\) then \(\mathcal{P}(A) \sim \mathcal{P}(B)\).
Proof: Let \(f\) be a bijective function from A to B. Let \(X \in \mathcal{P}(A)\). Then \(Y= f(X)\) will be a bijective function from \(\mathcal{P}(A)\) to \(\mathcal{P}(B)\). However, by Easton's Theorem, the converse is only true if we assume the generalized continuum hypothesis, which is the proposition that, if \(X\) is an infinite set, there is no set \(Y\) such that \(X \prec Y \prec \mathcal{P}(X)\).

Theorem: \(\mathbb{N}_{m} \times \mathbb{N}_{n} \sim \mathbb{N}_{mn}\).
Proof: Let \((a,b)\in \mathbb{N}_{m} \times \mathbb{N}_{n}\). Let \(f((a,b))= (a-1)n+b\), then f is bijective from \(\mathbb{N}_{m} \times \mathbb{N}_{n}\) to \(\mathbb{N}_{mn}\).

Theorem: \(\mathcal{P}(\mathbb{N}) \sim (0,1)\).
Proof: We use the fact that every number has a binary representation. However, there are equivalent representations for numbers with finite binary representations. For instance, \(\frac{3}{4}=0.11_{2}=0.101111..._{2}\). Let \(A\in \mathcal{P}(\mathbb{N})\). We the define
\(f(A)=\left\{\begin{matrix} \frac{1}{8}+\frac{1}{4}\sum_{k \in A} 2^{-k}, \\ \frac{1}{2}+\frac{1}{4}\sum_{k \in A} 2^{-k}, \end{matrix}\right. \begin{matrix} A \prec \mathbb{N} \\ A \sim \mathbb{N} \end{matrix}\).
Then we can see that \(f\) is injective from \(\mathcal{P}(\mathbb{N})\) to \((0,1)\). We then define \(g(x)= \left\{\begin{matrix} n \end{matrix}\right|\left.\begin{matrix} n\in \mathbb{N} \wedge \left \lfloor 2^n x \right \rfloor \equiv 1 \pmod 2 \end{matrix}\right\}\). Then we can see that \(g\) is injective from \((0,1)\) to \(\mathcal{P}(\mathbb{N})\). Thus, as \(\mathcal{P}(\mathbb{N}) \preceq (0,1)\) and \((0,1) \preceq \mathcal{P}(\mathbb{N})\), we find that \((0,1) \sim \mathcal{P}(\mathbb{N})\).
Corollary: \(\mathcal{P}(\mathbb{N}) \sim \mathbb{R}\).

Theorem: \(\mathbb{N} \times \mathbb{N} \sim \mathbb{N}\).
Proof: The function \(f(n)=(n,1)\) is injective from \(\mathbb{N}\) to \(\mathbb{N} \times \mathbb{N}\). Therefore \(\mathbb{N} \preceq \mathbb{N} \times \mathbb{N}\). The function \(f((a,b))=2^{a}3^{b}\) is injective from \(\mathbb{N} \times \mathbb{N}\) to \(\mathbb{N}\). Therefore \(\mathbb{N} \times \mathbb{N} \preceq \mathbb{N}\). Therefore \(\mathbb{N} \times \mathbb{N} \sim \mathbb{N}\).

Theorem: \([0,1] \times [0,1] \sim [0,1]\).
Proof: We use the fact that ever number has a unique binary representation. Let \(a,b \in [0,1]\), and \(a=\sum_{k=1}^{\infty} a_{k} 2^{-k}\) and \(b=\sum_{k=1}^{\infty} b_{k} 2^{-k}\), where \( a_k , b_k \in \left \{ 0,1 \right \} \). Then, let \(f((a,b))=\sum_{k=1}^{\infty} \left ( 2 a_k+b_k \right ) 4^{-k}\) and then we can see that \(f\) is bijective from \([0,1] \times [0,1]\) to \([0,1]\).
Corollary: \(\mathbb{R} \times \mathbb{R} \sim \mathbb{R}\).

Theorem: If \(\mathbb{N} \preceq A\) or \(\mathbb{N} \preceq B\), then \(|A|+|B|=|A|\times|B|=\max(|A|,|B|)\).

Theorem: \(\mathbb{S}_{m}+\mathbb{S}_{n}=\mathbb{S}_{m+n} \) and \(\mathbb{S}_{m}\times\mathbb{S}_{n}=\mathbb{S}_{mn} \).

Theorem: The following proposition is undecidable (cannot be proved or disproved) in ZFC: "There does not exist a set \(X \subseteq \mathbb{R}\) such that \(\mathbb{N} \prec X \prec \mathbb{R}\)."

Friday, May 2, 2014

Syllogisms and Probability

A syllogism is a systematic way of laying out a sequence of logical steps. They are very useful in formulating arguments, constructing proofs, or checking the coherence of a certain train of thought. A syllogism is a systematic way of laying out an argument. A syllogism consists of three parts: two premises (which can sometimes be construed as major and minor) and a conclusion. A syllogism is valid if the conclusion follows logically from the premises, whereas is it invalid if it does not. A syllogism is sound if both its premises are true, or at least plausibly true, and the syllogism is valid. Note that a syllogism may be invalid and yet have a true conclusion: it may even have true premises and a true conclusion and still be invalid.

We can also draw the distinction between a propositional and a categorical syllogism. In a propositional syllogism, we deal only with bare propositions connected by logical operators like "or", "and" "implies" etc. In a categorical syllogism, we deal only with statements of inclusion or exclusion in certain categories that are quantified, using such terms as "all" "some" and "no/none".

For example:
1) If Socrates is a man then Socrates is mortal.
2) Socrates is a man.
3) Therefore Socrates is mortal.

This is an example of a propositional syllogism. We can also write this in a categorical form:
1) All men are mortal.
2) Socrates is a man
3) Therefore Socrates is mortal.
We will henceforth focus on the propositional variety.

There are a number of standard propositional syllogisms (also called propositional forms)
  • Modus Ponens: If A, then B. A. Therefore B.
  • Modus Tollens: If A then B. Not B. Therefore, not A.
  • Hypothetical Syllogism: If A then B. If B then C. Therefore, if A then C.
  • Disjunctive Syllogism: Either A or B. Not A. Therefore B.
  • Conjunctive Syllogism: A. B. Therefore A and B.

We can further pare these down by noting that modus ponens and modus tollens are both actually disjunctive syllogisms when we state them using material conditionals. Remember that, in a material conditional, "if A then B" is equivalent to "Either not A, or B".

In a valid argument, if we grant that the premises are true, we must grant that the conclusion is also true. However, we typically are required to evaluate premises of which we are not certain. Namely, the premises have epistemic probabilities that are less than one. In that case, we can use our knowledge of probabilities to give bounds on the probability of the conclusion, and possibly even an estimate of the probability of the conclusion. That is, we can give limits to how plausible we are to take the conclusion to be. We will merely state it as a chart. In each case x is the probability of the first premise (major premise), y is the probability of the second premise (minor premise), and z is the probability of the conclusion. That is, \(P(p1)=x\), \(P(p2)=y\), \(P(c)=z\).

Bounds on Syllogism Probabilities

Name Form Bounds on Conclusion Estimate of Conclusion
Disjunctive Syllogism \(p1)\> A \cup B\)
\(p2)\> \sim A\)
\(c)\>\> \therefore B\)
\(x+y-1 \leq z \leq x\) \(z=x+\frac {y-1}{2}\)
Conjunctive Syllogism \(p1)\> A\)
\(p2)\> B\)
\(c)\>\> \therefore A \cap B\)
\(x+y-1 \leq z \leq \min(x,y)\) \(z=xy\)
Hypothetical Syllogism \(p1)\> A \Rightarrow B\)
\(p2)\> B \Rightarrow C\)
\(c)\>\> \therefore A \Rightarrow C\)
\(x+y-1 \leq z \leq 1\) \(z= \frac {x+y}{2}\)


For example, suppose we say "If it is raining, then Bob will bring an umbrella. It is raining. Therefore Bob will bring an umbrella." We are 80% sure that Bob will bring an umbrella if it is raining and 70% sure that it is raining. Thus, the probability that Bob does bring an umbrella is somewhere between 50% and 80%, with an estimate of 56%. That is not very confident.
Suppose we have two propositions of which we are 70% certain of each. Then the probability that both are true can be anywhere from 40% to 70% with an estimated value of 49%. Thus it is not enough that each of the premises be more plausible than their opposites: we must demand more in order that an argument be a good one.

Sometimes when people say "if A then B", they mean something different than the material conditional ("Either not A, or B"). Frequently, they mean something like "Given A, B will happen/be true". Thus the probability of "if A, then B" wouldn't be \(P(\sim A \cup B)\), but rather \(P(B|A)\). We will here give the table for the conditional probability case. That is, in every case that a conditional statement is given a probability, that probability is of the probability of the consequent given the antecedent.

Bounds on Syllogism Probabilities

Name Form Bounds on Conclusion Estimate of Conclusion
Modus Ponens \(p1)\> A \Rightarrow B\)
\(p2)\> A\)
\(c)\>\> \therefore B\)
\(xy \leq z \leq xy+1-y\) \(z=xy+\frac {1-y}{2}\)
Modus Tollens \(p1)\> A \Rightarrow B\)
\(p2)\> \sim B\)
\(c)\>\> \therefore \sim A\)
\(\frac {x+y-1}{x} \leq z < 1\) \(z=\frac {2x+y-1}{2x}\)

For a hypothetical syllogism with conditional probabilities, we can give no bounds or estimates.