tag:blogger.com,1999:blog-68603067008047241732020-05-17T02:51:22.700-07:00Hyperphronesisa Blog for All and NoneNadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.comBlogger28125tag:blogger.com,1999:blog-6860306700804724173.post-19161546600268770432019-09-19T15:01:00.000-07:002019-10-28T07:20:01.558-07:00Gaussians and Normal Distributions<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <img border="0" src="https://3.bp.blogspot.com/-v9f-ilLw3R4/XZ5YfXbalYI/AAAAAAAAWLY/uUy5NecDCLIu6F6UPct4BY_OpaE15Am4ACLcBGAsYHQ/s640/imageedit_6_8343379545.png" width="640" height="271" data-original-width="781" data-original-height="331" /> <br /><h1>Definition</h1><br />The standard <b>Gaussian function</b> is defined as \[ \bbox[5px,border:2px solid red] { g_{0,1}(x)=g(x)=\frac{1}{\sqrt{2\pi}}e^{-x^2/2} } \] This is by definition the probability density function of a standard normal random variable. It has zero mean and unit variance. A general Gaussian function (with mean \(\mu\) and variance \(\sigma^2\) is defined as \[ \bbox[5px,border:2px solid red] { g_{\mu,\sigma^2}(x)=\frac{1}{\sigma}g\left ( \frac{x-\mu}{\sigma} \right )=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\tfrac{1}{2\sigma^2}\left ( x-\mu \right )^2} } \] This is the probability density function of a general normal distribution. Note that if \(Z\) is a standard normal random variable, then \(X=\mu+\sigma Z\) will be a normal random variable with mean \(\mu\) and variance \(\sigma^2\). <br /><hr><br /><h1>Historical Timeline</h1><br /><table cellpadding="3"><tr style="border: 1px solid black"><td align="center" height="*" width="50" style="border: 1px solid black">1600 </td><td height="*" style="border: 1px solid black">In astronomy, discrepancies in measurements demanded some form of resolution to settle on a single number. Different astronomers had different ways of inferring the true value from a set of measurements, some using medians, some means, but varying widely in their techniques for calculation. Notably, Tycho Brahe and Johannes Kepler use unclear methods to obtain representative values from multiple measurements. </td></tr><tr style="border: 1px solid black"><td align="center" height="*" width="50" style="border: 1px solid black">1632 </td><td height="*" style="border: 1px solid black">Galileo Galilei is concerned with the ambiguity in obtaining a single value from multiple measurements, and notes that there must be a true value, and that errors are symmetrically distributed around this true value, with smaller errors being more likely than larger ones. </td></tr><tr style="border: 1px solid black"><td align="center" height="*" width="50" style="border: 1px solid black">1654 </td><td height="*" style="border: 1px solid black">Fermat and Pascal develop the modern theory of probability. Pascal develops his triangle (the binomial coefficients), and the probabilities of the binomial distribution for success probability \(p=1/2\). </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1710 </td><td height="*" style="border: 1px solid black">James Bernoulli finds the binomial distribution for general success probability. </td></tr></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1712 </td> <td height="*" style="border: 1px solid black">The Dutch mathematician Willem ’s Gravesande uses Pascal's calculation of probability to investigate birthrate discrepancies of male and female children. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1722 </td> <td height="*" style="border: 1px solid black">Roger Cotes suggests the true value of a set of measurements be at the "center of mass" of the observed values (the modern definition of the mean). This is the same as the value with the minimum square deviation from the measurements. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1733 </td> <td height="*" style="border: 1px solid black">Abraham De Moivre found that \[\binom{N}{k}\approx \frac{2^{N+1}}{\sqrt{2\pi N}}e^{-\tfrac{2}{N}\left (k-\tfrac{N}{2} \right )^2}\] and used it in relation to the binomial distribution. This is very clearly the first instance of the Gaussian function in a probability context. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1756 </td> <td height="*" style="border: 1px solid black">Thomas Simpson proved that the expected error is bounded when the error is distributed by what would today be called a two-sided geometric distribution, and in the case where it has a rectangular or triangular distribution. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1774 </td> <td height="*" style="border: 1px solid black">Pierre-Simon, marquis de Laplace argued the distribution of errors should take on a distribution of the form \(p(x)=\frac{m}{2}e^{-m|x|}\). </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1777 </td> <td height="*" style="border: 1px solid black">Daniel Bernoulli becomes concerned about the center-of-mass mean being universally accepted without justification. He favored a semicircular distribution of errors. He also proved that with fixed errors, the accumulated error tends toward a Gaussian-like distribution. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1801 </td> <td height="*" style="border: 1px solid black">Giuseppe Piazzi observes a celestial object proposed to be a planet (later identified as Ceres). It goes behind the sun and astronomers try to predict where it will re-emerge. Karl Friedrich Gauss guesses correctly while most others guess incorrectly. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1809 </td> <td height="*" style="border: 1px solid black">Gauss proposes the method of least squares, which leads him to conclude that the error curve must be a Gaussian function. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1810 </td> <td height="*" style="border: 1px solid black">Laplace proves the Central Limit Theorem, which concludes that the Gaussian is the limiting distribution of averages from almost any distribution. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1846 </td> <td height="*" style="border: 1px solid black">Adolphe Quetelet applies the Gaussian distribution to sociology and anthropometry, particularly to the distribution of chest sizes of Scottish soldiers. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1860 </td> <td height="*" style="border: 1px solid black">James Clark Maxwell shows that the Gaussian distribution applies to the velocities of gas particles. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1869 </td> <td height="*" style="border: 1px solid black">Sir Francis Galton applies Quetelet's work and the central limit theorem more broadly, to try to prove that intelligence is hereditary, and spends considerable time and energy investigating heredity and statistics. However, one of his primary motivations was to apply these to eugenics. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1873 </td> <td height="*" style="border: 1px solid black">C. S. Peirce, Galton, and Wilhelm Lexis refer to the Gaussian distribution as the "normal" distribution. </td></tr><tr><td align="center" height="*" width="50" style="border: 1px solid black">1900 </td> <td height="*" style="border: 1px solid black">Karl Pearson popularizes "normal distribution" as a shorter and uncredited name for the "Laplace-Gauss curve". </td></tr><tr><td align="center" height="*" width="75" style="border: 1px solid black">1947-50 </td> <td height="*" style="border: 1px solid black">P.G. Hoel and A.M. Mood publish popular textbooks that refer to the distribution with zero mean and unit variance as the "standard normal" distribution. </td></tr></table><a href="https://www.maa.org/sites/default/files/pdf/upload_library/22/Allendoerfer/stahl96.pdf">More information</a> <br /><hr><br /><h1>Normalization</h1><br />We wish to verify that the functions above are normalized, i.e. that they are proper probability density functions. To do this, let's evaluate the integral \[ I=\int_{-\infty}^{\infty}e^{-x^2}dx \] Recall the limit definition of the exponential: \(e^x=\underset{n \to \infty}{\lim}\left ( 1+\tfrac{x}{n} \right )^n\). Using this, the integral becomes the following limit: \[ I=\underset{n \to \infty}{\lim}\int_{-\sqrt{n}}^{\sqrt{n}}\left ( 1-\frac{x^2}{n} \right )^n dx =\underset{n \to \infty}{\lim}\sqrt{n}\int_{-1}^{1}\left ( 1-x^2 \right )^n dx \] Let's define \[ I_p=\int_{-1}^{1}\left ( 1-x^2 \right )^p dx \] Using integration by parts: \[ \begin{matrix} I_p & = & \left.\ x\left ( 1-x^2 \right )^p\right|_{x=-1}^{1}+2p\int_{-1}^{1}x^2\left ( 1-x^2 \right )^{p-1}dx \\ \\ & = &2p\left [\int_{-1}^{1}\left ( 1-x^2 \right )^{p-1}dx-\int_{-1}^{1}\left ( 1-x^2 \right )^{p}dx \right ] \end{matrix} \] It follows that \(I_p = 2p\left [ I_{p-1}-I_p \right ] \), and so \(I_p=\frac{2p}{2p+1}I_{p-1}\). As clearly \(I_0=2\), and \(I_{1/2}=\pi/2\), we have the two formulas, where m is some natural number: \[ I_m=2\prod_{k=1}^{m}\frac{2k}{2k+1} \\ I_{m+\tfrac{1}{2}}=\frac{\pi}{2(m+1)}\prod_{k=1}^{m}\frac{2k+1}{2k} \] Which reveals the relationship \[ I_{m+\tfrac{1}{2}}=\frac{\pi}{(m+1)I_m} \] Substituting this back in the original limit: \[ I=\underset{n \to \infty}{\lim}\sqrt{n+\tfrac{1}{2}}I_{n+\tfrac{1}{2}}= \underset{n \to \infty}{\lim}\frac{\pi\sqrt{n+\tfrac{1}{2}}}{(n+1)I_n}= \pi\frac{\underset{n \to \infty}{\lim}\frac{\sqrt{(n+\tfrac{1}{2})n}}{n+1}}{\underset{n \to \infty}{\lim}\sqrt{n}I_n}=\frac{\pi}{I} \] It follows that \( \bbox[5px,border:2px solid red] {I=\sqrt{\pi} }\). Note that we also obtain the following limit as a byproduct: \[ \bbox[5px,border:2px solid red] { \underset{n \to \infty}{\lim}2\sqrt{n}\prod_{k=1}^{n}\frac{2k}{2k+1}=\sqrt{\pi} } \] which can be seen as equivalent to the <b>Wallis product</b>. This suffices to show that the Gaussian functions defined above are indeed normalized, as expected. <br /><hr><br /><h1>General Integral and Corollaries</h1><br />Using the result above, let us evaluate \[ \int_{-\infty}^{\infty}e^{-ax^2+bx+c}dx \] This is easily done by completing the square: \(-ax^2+bx+c=-a\left ( x-\tfrac{b}{2a} \right )^2+\tfrac{b^2}{4a}+c\). This immediately gives \[ \int_{-\infty}^{\infty}e^{-ax^2+bx+c}dx=\int_{-\infty}^{\infty}e^{-a\left ( x-\tfrac{b}{2a} \right )^2+\tfrac{b^2}{4a}+c}dx=e^{\tfrac{b^2}{4a}+c}\int_{-\infty}^{\infty}e^{-au^2}du \] \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}e^{-ax^2+bx+c}dx=e^{\tfrac{b^2}{4a}+c}\sqrt{\pi/a} } \] Based on this, we can easily find: \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}g_{\mu,\sigma^2}(x)e^{t x}dx=e^{\mu t+\tfrac{1}{2}\sigma^2t^2} } \] This is the same as the <b>moment generating function</b> for a Gaussian distribution. Several results can be deduced from this. For instance, the <b>Fourier transform</b> of a Gaussian function: \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}g_{\mu,\sigma^2}(x)e^{-i\omega x}dx=e^{-\tfrac{1}{2}\sigma^2\omega^2-i\mu\omega} } \] The Fourier transform of a Gaussian is another Gaussian, with variance equal to one over the input variance. By taking the real and imaginary parts, we find that: \[ \int_{-\infty}^{\infty}g_{\mu,\sigma^2}(x)\cos(\omega x)dx=e^{-\tfrac{1}{2}\sigma^2\omega^2}\cos(\mu\omega) \\ \int_{-\infty}^{\infty}g_{\mu,\sigma^2}(x)\sin(\omega x)dx=e^{-\tfrac{1}{2}\sigma^2\omega^2}\sin(\mu\omega) \] Substituting a complex number for \(a\) gives us: \[ \int_{0}^{\infty}e^{-\alpha e^{-i\theta}x^2}dx=\frac{1}{2}\sqrt{\frac{\pi}{\alpha e^{-i\theta}}}=\frac{1}{2}\sqrt{\frac{\pi}{\alpha }}e^{i\theta/2} \\ \\ \int_{0}^{\infty}e^{-\alpha \cos(\theta)x^2}\cos\left (\alpha\sin(\theta)x^2\right ) dx= \frac{1}{2}\sqrt{\frac{\pi}{\alpha }}\cos(\theta/2) \\ \\ \int_{0}^{\infty}e^{-\alpha \cos(\theta)x^2}\sin\left (\alpha\sin(\theta)x^2\right ) dx= \frac{1}{2}\sqrt{\frac{\pi}{\alpha }}\sin(\theta/2) \] Particularly, when \(\theta=\pi/2\), we have \[ \bbox[5px,border:2px solid red] { \int_{0}^{\infty}\cos\left (\alpha x^2\right ) dx= \int_{0}^{\infty}\sin\left (\alpha x^2\right ) dx= \frac{1}{2}\sqrt{\frac{\pi}{2\alpha }} } \] Let us evaluate the integrals \[ I_n=\int_{0}^{\infty}x^ne^{-\tfrac{1}{2\sigma^2}x^2}dx \] By integrating by parts we find \[ I_n=\frac{1}{\sigma^2(n+1)}\int_{0}^{\infty}x^{n+2}e^{-\tfrac{1}{2\sigma^2}x^2}dx=\frac{I_{n+2}}{\sigma^2(n+1)} \] Which can be written as \(I_{n+2}=\sigma^2(n+1)I_n\). As \(I_0=\sigma\sqrt{\pi}/2\) and \(I_1=\sigma^2\), we find that \[ \bbox[5px,border:2px solid red] { \int_{0}^{\infty}x^ne^{-\tfrac{1}{2\sigma^2}x^2}dx=\sigma^{n+1}\cdot\left\{\begin{matrix} \sqrt{\tfrac{\pi}{2}}1\cdot3\cdot5\cdots (n-1) & \: \: \: n\: \: \mathrm{even}\\ 2\cdot4\cdot6\cdots (n-1) & \: \: \: n\: \: \mathrm{odd}\\ \end{matrix}\right. } \] Finally, let us examine the following parametrized integral \[ I(a,b)=\int_{0}^{\infty}e^{-a^2x^2-\tfrac{b^2}{x^2}}dx \] Let us differentiate with respect to \(b\) \[ \frac{\partial }{\partial b}I(a,b)=\int_{0}^{\infty}\frac{\partial }{\partial b}e^{-a^2x^2-\tfrac{b^2}{x^2}}dx=-2b\int_{0}^{\infty}\frac{1}{x^2}e^{-a^2x^2-\tfrac{b^2}{x^2}}dx \] We can then make the substitution \(y=\tfrac{b}{ax}\) to find \[\frac{\partial }{\partial b}I(a,b)=-2a\int_{0}^{\infty}e^{-a^2y^2-\tfrac{b^2}{y^2}}dy=-2aI(a,b)\] Given that \(I(a,0)=\tfrac{\sqrt{\pi}}{2a}\), it follows that \[ \bbox[5px,border:2px solid red] { \int_{0}^{\infty}e^{-a^2x^2-\tfrac{b^2}{x^2}}dx=\frac{\sqrt{\pi}}{2|a|}e^{-2|ab|} } \] Additionally, if we set \(b\to b\sqrt{i}\), we find \[ \int_{0}^{\infty}e^{-ax^2-\tfrac{bi}{x^2}}dx=\frac{1}{2}\sqrt{\frac{\pi}{a}}e^{-2\sqrt{abi}} =\frac{1}{2}\sqrt{\frac{\pi}{a}}e^{-\sqrt{2ab}}e^{-i\sqrt{2ab}} \\ \\ \int_{0}^{\infty}e^{-ax^2}\cos\left ( \frac{b}{x^2} \right )dx=\frac{1}{2}\sqrt{\frac{\pi}{a}}e^{-\sqrt{2ab}}\cos({\sqrt{2ab}}) \\ \\ \int_{0}^{\infty}e^{-ax^2}\sin\left ( \frac{b}{x^2} \right )dx=\frac{1}{2}\sqrt{\frac{\pi}{a}}e^{-\sqrt{2ab}}\sin({\sqrt{2ab}}) \] Using the last expression and taking the limit as \(a\) goes to zero, we find \[ \bbox[5px,border:2px solid red] { \int_{0}^{\infty}\sin\left ( \frac{b}{x^2} \right )dx=\sqrt{\frac{b\pi}{2}} } \] Similar results follow using complex substitutions in the above general cases. <br />A remarkably related integral that we can evaluate with these results is \[ I(a,b)=\int_{-\infty}^{\infty}\frac{\cos(ax)}{b^2+x^2}dx \] We use the elementary fact that \(t^{-1}=\int_{0}^{\infty}e^{-xt}dx\). Namely, we write: \[ I(a,b)=\int_{-\infty}^{\infty}\int_{0}^{\infty}\cos(ax)e^{-t(b^2+x^2)}dtdx \] Interchanging the order of integration and using the general formula we derived above, we find \[ I(a,b)=\int_{0}^{\infty}e^{-tb^2}\int_{-\infty}^{\infty}\cos(ax)e^{-tx^2}dxdt =\int_{0}^{\infty}e^{-tb^2}\sqrt{\frac{\pi}{t}}e^{-\frac{a^2}{4t}}dt \] Letting \(t=u^2\), we put it into the form above \[ I(a,b)=2\sqrt{\pi}\int_{0}^{\infty}e^{-u^2b^2}e^{-\frac{a^2}{4u^2}}du \] \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}\frac{\cos(ax)}{b^2+x^2}dx=\frac{\pi}{|b|}e^{-|ab|} } \] Particularly, if \(a=b=1\) \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}\frac{\cos(x)}{1+x^2}dx=\frac{\pi}{e} } \] <br /><hr><br /><h1>Maximum Likelihood</h1><br />Suppose we wish to find a distribution with two parameters (\(\mu\) and \(\sigma^2\)) with the property that the maximum likelihood estimators for \(\mu\) and \(\sigma^2\) are the sample mean and variance, respectively. Additionally, we require that the distribution be symmetric about the mean. We will suppose that \[ f(x;\mu,\sigma^2)=\tfrac{1}{\sigma}g\left ( \tfrac{x-\mu}{\sigma} \right ) \] Where \(g(x)\) is a distribution with zero mean and unit variance, such that \(g(-x)=g(x)\). Let us also define \(\eta(x)=g'(x)/g(x)\), which satisfies \(\eta(-x)=-\eta(x)\). Then the log-likelihood for samples \(x_1,x_2,...,x_N\) is given by: \[ \ell(\mu,\sigma^2;\mathbf{x})=\sum_{k=1}^{N}\ln\left (f(x_k;\mu,\sigma^2) \right )=\sum_{k=1}^{N}\ln\left (\tfrac{1}{\sigma}g\left ( \tfrac{x_k-\mu}{\sigma} \right ) \right ) \] The maximization constraints for each parameter are: \[ \frac{\partial }{\partial \mu}\ell(\mu,\sigma^2;\mathbf{x})= \tfrac{-1}{\sigma}\sum_{k=1}^{N}\eta\left (\tfrac{x_k-\mu}{\sigma} \right ) =0 \\ \frac{\partial }{\partial \sigma^2}\ell(\mu,\sigma^2;\mathbf{x})= -\frac{N}{2\sigma^2}-\frac{1}{2\sigma^3}\sum_{k=1}^{N}(x_k-\mu) \eta\left ( \tfrac{x_k-\mu}{\sigma} \right )=0 \] By the supposition, these conditions hold when \[ \mu=\tfrac{1}{N}\sum_{k=1}^{N}x_k=\overline{\mathbf{x}} \\ \sigma^2=\tfrac{1}{N}\sum_{k=1}^{N}(x_k-\overline{\mathbf{x}})^2=\overline{\mathbf{x}^2}-\overline{\mathbf{x}}^2=\mathrm{var}(\mathbf{x}) \] Now, suppose we set \(x_1=a\) and \(x_2=x_3=....=x_N=a-Nb\). Then \(\overline{\mathbf{x}}=a-(N-1)b\). We then have from the first condition: \[ 0=\sum_{k=1}^{N}\eta\left (\tfrac{x_k-(a-(N-1)b)}{\sigma} \right )=\eta\left (\tfrac{(N-1)b}{\sigma} \right )+(N-1)\eta\left (\tfrac{-b}{\sigma} \right ) \] So that \(\eta\left (\tfrac{(N-1)b}{\sigma} \right )=(N-1)\eta\left (\tfrac{b}{\sigma} \right )\). As \(\eta\) is continuous and must satisfy this for any choice of variables, it must be the case that \(\eta(x)=\frac{g'(x)}{g(x)}=Bx\). This differential equation can be easily solved to give \(g(x)=g(0) \cdot e^{\tfrac{B}{2}x^2}\). Requiring that this be a normalized function puts it in the form \(g(x)=\frac{b}{\sqrt{\pi}} \cdot e^{-b^2x^2}\) for some \(b>0\), which makes \(\eta(x)=-2b^2x\). Using the fact that \(g(x)\) has unit variance gives \(b=\tfrac{1}{\sqrt{2}}\). Now we turn to the second constraint \[ 0=-\frac{N}{2\sigma^2}+\frac{1}{2\sigma^4}\sum_{k=1}^{N}(x_k-\mu)^2 \] Which easily gives: \[ \sigma^2=\frac{1}{N}\sum_{k=1}^{N}(x_k-\overline{\mathbf{x}})^2=\mathrm{var}(\mathbf{x}) \] Thus the normal distribution has the required properties. Moreover, the sample mean and variance are the maximum likelihood estimators for the normal distribution. This is the original method Carl Friedrich Gauss used to derive the distribution. <br /><hr><br /><h1>Convolution</h1><br />Recall that the convolution of two functions is defined as \[ a(x)*b(x)=\int_{-\infty}^{\infty}a(t)b(x-t)dt \] For the case of two general Gaussians, we use the general integral above: \[ \\ g_{\mu_1,\sigma_1}*g_{\mu_2,\sigma_2} =\frac{1}{\sqrt{2\pi\sigma_1^2}}\frac{1}{\sqrt{2\pi\sigma_2^2}} \int_{-\infty}^{\infty}e^{-\tfrac{1}{2\sigma_1^2}\left ( t-\mu_1 \right )^2}e^{-\tfrac{1}{2\sigma_2^2}\left ( x-t-\mu_2 \right )^2}dt \\ \\ a=\tfrac{1}{2\sigma_1^2}+\tfrac{1}{2\sigma_2^2},\: \: b=\mu_1\tfrac{1}{\sigma_1^2}+(x-\mu_2)\tfrac{1}{\sigma_2^2},\: \: c=\tfrac{\mu_1^2}{2\sigma_1^2}+\tfrac{(x-\mu_2)^2}{2\sigma_2^2} \\ \\ g_{\mu_1,\sigma_1^2}*g_{\mu_2,\sigma_2^2}=\frac{1}{\sqrt{2\pi\sigma_1^2}}\frac{1}{\sqrt{2\pi\sigma_2^2}}e^{\tfrac{b^2}{4a}+c}\sqrt{\pi/a} \\ \\ g_{\mu_1,\sigma_1^2}*g_{\mu_2,\sigma_2^2}=\frac{1}{\sqrt{2\pi(\sigma_1^2+\sigma_2^2)}}e^{-\tfrac{1}{2(\sigma_1^2+\sigma_2^2)}\left ( x-(\mu_1+\mu_2) \right )^2} \] Thus: \[ \bbox[5px,border:2px solid red] { g_{\mu_1,\sigma_1^2}*g_{\mu_2,\sigma_2^2}=\frac{1}{\sqrt{2\pi(\sigma_1^2+\sigma_2^2)}}e^{-\tfrac{1}{2(\sigma_1^2+\sigma_2^2)}\left ( x-(\mu_1+\mu_2) \right )^2}=g_{\mu_1+\mu_2,\sigma_1^2+\sigma_2^2} } \] That is, the convolution of two Gaussians is another Gaussian with mean and variance equal to the sum of the convolved means and variances. This could be much more easily seen from the convolution property of the Fourier transform. <br /><hr><br /><h1>Entropy</h1><br />The entropy of a probability distribution \(f(x)\) is defined as \[ \bbox[5px,border:2px solid red] { H=-\int_{-\infty}^{\infty}f(x)\ln\left ( f(x) \right )dx } \] We wish to find a function that maximizes this value subject to the constraints that it is normalized, that it has mean \(\mu\) and variance \(\sigma^2\). This can be done using the method of Lagrange multipliers. Namely, we define the function \[ \begin{align*} L(f,a,b,c)=& -\int_{-\infty}^{\infty}f(x)\ln(f(x))dx+a\left [ 1-\int_{-\infty}^{\infty}f(x)dx \right ]\\ & +b \left[\mu-\int_{-\infty}^{\infty}xf(x)dx \right ]+c\left[ \sigma^2-\int_{-\infty}^{\infty}(x-\mu)^2f(x)dx \right ] \end{align*} \\ \\ L(f,a,b,c)=a+b\mu+c\sigma^2-\int_{-\infty}^{\infty}f(x)\left [\ln(f(x))+a+bx+c(x-\mu)^2 \right ]dx \] Using the Euler-Lagrange Equation, we find that \(f(x)\) must satisfy \[ \frac{\partial }{\partial f}f(x)\left [\ln(f(x))+a+bx+c(x-\mu)^2 \right ]=0 \\ \ln(f(x))+a+bx+c(x-\mu)^2+1=0 \\ f(x)=e^{-1-a-bx-c(x-\mu)^2} \] Combining this with the conditions from the other Lagrange factors, we immediately find: \[ f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\tfrac{1}{2\sigma^2}(x-\mu)^2}=g_{\mu,\sigma^2}(x) \] That is, a normal distribution has the maximal entropy of all distributions with a given mean and variance. The entropy is then given by \[ \bbox[5px,border:2px solid red] { H=\frac{1}{2}\ln \left( 2\pi e\sigma^2 \right ) } \] This can be seen as a way of understanding the central limit theorem, described below, as wel las the stability under convolution: summing independent random variables increases the entropy, and already maximum-entropy distributions can only add their variances. <br /><hr><br /><h1>Central Limit Theorem</h1><br />Suppose X is a random variable with a well-defined mean and variance. That is, \(\mathrm{E}(X)=\mu\) and \(\mathrm{E}((X-\mu)^2)=\sigma^2\) both exist. We want to find the distribution of \[ \bbox[5px,border:2px solid red] { Z_n=\frac{\overline{X_n}-\mu}{\sigma/\sqrt{n}} } \] Where \[ \overline{X_n}=\frac{1}{n}\sum_{k=1}^{n}X_k \] and the \(X_k\) are all independent variables distributed in the same way as \(X\). It is easy to show that if \(A\) is a random variable with distribution \(a(x)\) and \(B\) is a random variable with distribution \(b(x)\), independent to \(A\). Then the distribution of \(A+B\) is \(a(x)*b(x)\). Given this, it is useful to use the Fourier transforms of the distributions, or something similar to it. In fact, it is most advantageous to use the moment generating function, where the moment generating function of \(X\) is defined as \(M_X(t)=\mathrm{E}(e^{Xt})\). This has the property that \(M_{A+B}(t)=M_{A}(t)M_{B}(t)\). Another useful property is \[ M_{aX+b}(t)=\mathrm{E}(e^{t(aX+b)})=e^{tb}\mathrm{E}(e^{atX})=e^{tb}M_X(at) \] Using this, let us find the moment generating function of \(Z_n\). \[ M_{Z_n}(t)=\mathrm{E}\left ( e^{tZ_n} \right ) =\mathrm{E}\left ( e^{t\frac{\overline{X_n}-\mu}{\sigma/\sqrt{n}}} \right ) \\ \\ M_{Z_n}(t)=e^{-t\sqrt{n}\frac{\mu}{\sigma}}\mathrm{E}\left ( e^{\frac{t}{\sigma\sqrt{n}}\sum_{k=1}^{n}X_k} \right )=e^{-t\sqrt{n}\frac{\mu}{\sigma}}M_X^n\left (\frac{t}{\sigma\sqrt{n}} \right ) \] Based on the definition of the moment generating function, we find \[ M_X(t)=\mathrm{E}\left ( e^{tX} \right )=\mathrm{E}\left ( 1+\frac{t}{1!}X+\frac{t^2}{2!}X^2+\frac{t^3}{3!}X^3+... \right ) \\ M_X(t)=1+t\mathrm{E}(X)+\frac{t^2}{2!}\mathrm{E}(X^2)+\frac{t^3}{3!}\mathrm{E}(X^3)+... \] Using this in our expression above, we get \[ M_{Z_n}(t)=e^{-t\sqrt{n}\frac{\mu}{\sigma}}\left ( 1+\frac{t}{\sigma\sqrt{n}}\mu+\frac{t^2}{2n\sigma^2}\left ( \mu^2+\sigma^2 \right )+O\left ( \sqrt{n^3} \right ) \right )^n \] Taking the limit as \(n\) goes to infinity, we get \[ \bbox[5px,border:2px solid red] { \underset{n \to \infty}{\lim} M_{Z_n}(t)=e^{t^2/2} } \] That is, the scaled mean is asymptotically a standard normal. Generally, the mean of \(n\) i.i.d. random variables with mean \(\mu\) and variance \(\sigma^2\) approaches a normal distribution with mean \(\mu\) and variance \(\sigma^2/n\). <br /><hr><br /><h1>Gaussian Limits of Other Distributions</h1><br />For several families of probability distributions, both discrete and continuous, with both finite and infinite support, in the limit of certain parameters, and scaling for the mean and variance, the distribution converges to a Gaussian. <br />The general approach will be as follows: suppose the random variable is \(x\) and the distribution \(f(x;\mathbf{p})\) is parametrized by some list of parameters \(\mathbf{p}\). We will turn these parameters into a functions of \(n\) (which we will let tend to infinity), \(\mathbf{p}(n)\) and find the mean \(\mu(n)\) and variance \(\sigma^2(n)\) as functions of \(n\). We then rescale to obtain the new distribution \[ g(z;n)=\sigma(n)\cdot f(\mu(n)+z\cdot\sigma(n);\mathbf{p}(n)) \] This distribution will have zero mean and unit variance. Clearly if \(f\) is normalized, \(g\) will be as well. All we will be looking for is how \(g\) varies with \(z\). To this end, we will take \[ \underset{n \to \infty}{\lim}\frac{\partial }{\partial z} \ln \left (g(z;n) \right ) \] As we expect \(g\) to converge to a standard normal, we expect this limit to be \(-z\). <br />Another approach may be to use the moment generating function. In particular, we expect \[ \underset{n \to \infty}{\lim} e^{-\tfrac{\mu(n)}{\sigma(n)}t}M_{X;\mathbf{p}(n)}\left ( \frac{t}{\sigma(n)} \right )=e^{t^2/2} \] <br /><hr width="45%"><h3>Binomial Distribution</h3><br />\[ f(x;n,p)=\binom{n}{x}p^x(1-p)^{n-x} \\ \mu(n)=np,\: \: \sigma(n)=\sqrt{np(1-p)} \\ M_{X;n}(t)=\left ( 1-p+pe^t \right )^n \] Using the distribution method is rather involved and a discussion of it can be seen in the article on Stirling's approximation. However, the moment generating function method is much simpler: \[ M_{Z;n}(t)=\left ( 1+p\left [e^{t/\sqrt{np(1-p)}}-1 \right ] \right )^ne^{-t\sqrt{\frac{np}{1-p}}} \\ M_{Z;n}(t)=\left ( 1+\frac{pt}{\sqrt{np(1-p)}}+\frac{t^2}{2n(1-p)}+O(n^{-3/2}) \right )^ne^{-t\sqrt{\frac{np}{1-p}}} \] Thus, in the limit of large n \[ \underset{n \to \infty}{\lim}M_{Z;n}(t)=e^{\frac{t^2}{2(1-p)}}e^{-\frac{pt^2}{2(1-p)}}=e^{t^2/2} \] <br /><hr width="45%"><h3>Poisson Distribution</h3><br />\[ f(x;\lambda)=e^{-\lambda}\frac{\lambda^x}{x!} \\ \lambda(n)=n,\: \: \mu(n)=n,\: \: \sigma(n)=\sqrt{n} \\ M_{X;n}(t)=e^{n\left ( e^t-1 \right )} \] We will make use of the fact that \[ \ln(x!)=(x+\tfrac{1}{2})\ln(x)-x+O(1) \] Which follows from Stirling's approximation. It follows that \(\ln(g)\) and its derivative take the form \[ \ln(g(z;n))=(n+z\sqrt{n})\ln(n)-\left (n+z\sqrt{n}+\tfrac{1}{2} \right )\ln(n+z\sqrt{n})+z\sqrt{n}+O(1) \\ \frac{\partial }{\partial z}\ln(g(z;n))=\sqrt{n}\ln(n)-\sqrt{n}\ln\left (n+z\sqrt{n} \right )-\frac{1}{2\left (n+z\sqrt{n} \right )} \] And so it follows that \[ \underset{n \to \infty}{\lim}\frac{\partial }{\partial z}\ln(g(z;n))=-z \] This analysis is much more simple when approached with the moment generating function method \[ M_{Z;n}(t)=e^{n\left ( e^\frac{t}{\sqrt{n}}-1 \right )}e^{-t\frac{n}{\sqrt{n}}} =e^{\frac{t^2}{2}+O(n^{-1/2})} \] Clearly then, in the limit \[ \underset{n \to \infty}{\lim}M_{Z;n}(t)=e^{t^2/2} \] <br /><hr width="45%"><h3>Gamma Distribution</h3><br />\[ f(x;n,\theta)=\frac{x^{n-1}}{(n-1)!\theta^n}e^{-x/\theta} \\ \mu(n)=n\theta,\: \: \sigma(n)=\sqrt{n}\theta \\ M_{X;n}(t)=\left ( 1-\theta t \right )^{-n} \] Proceeding as above \[ \ln(g(z;n))=(n-1)\ln(n\theta+z\sqrt{n}\theta)-(n\theta+z\sqrt{n}\theta)/ \theta \\ \frac{\partial }{\partial z}\ln(g(z;n))=\frac{n-1}{\sqrt{n}+z}-\sqrt{n} \\ \underset{n \to \infty}{\lim}\frac{\partial }{\partial z}\ln(g(z;n))=-z \] However, it is simpler to use \[ M_{Z;n}(t)=\left ( 1-\frac{t}{\sqrt{n}} \right )^{-n}e^{-t\sqrt{n}} \\ \underset{n \to \infty}{\lim}M_{Z;n}(t)=e^{t^2/2} \] <br /><hr width="45%"><h3>Beta Distribution</h3><br />\[ f(x;a,b)=\frac{x^{a-1}(1-x)^{b-1}}{\mathrm{B}(a,b)} \\ a=\alpha n,\: \: b=\beta n,\: \: \mu(n)=\frac{a}{a+b}=\frac{\alpha}{\alpha+\beta}, \nu=\frac{\beta}{\alpha+\beta}, \\ \sigma(n)=\sqrt{\frac{ab}{(a+b)^2(a+b+1)}}=\frac{1}{\sqrt{n}}\sqrt{\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+\tfrac{1}{n})}}\approx\frac{1}{\sqrt{n}}s \] Where \(s=\sqrt{\tfrac{\alpha\beta}{(\alpha+\beta)^3}}\). This approximation is not significant since we will be taking the limit of \(n\), and the approximation is quite good even for moderately sized \(n\). \[ \ln(g(z;n))=(\alpha n-1)\ln\left ( \mu+\tfrac{s}{\sqrt{n}}z \right )+(\beta n-1)\ln\left ( \nu-\tfrac{s}{\sqrt{n}}z \right )-\ln(\mathrm{B}(\alpha n,\beta n)) \\ \frac{\partial }{\partial z}\ln(g(z;n))=s\left [\frac{\alpha n-1}{\mu\sqrt{n}+sz}-\frac{\beta n-1}{\nu\sqrt{n}-sz} \right ] \\ \underset{n \to \infty}{\lim}\frac{\partial }{\partial z}\ln(g(z;n))=-s^2z\frac{\alpha+\beta}{\mu\nu}=-z \] <br /><hr><br /><h1>Correlated Normal Variables</h1><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-S-BbpbiIFWs/XbY1eDvEPeI/AAAAAAAAWpc/We9V7r8h4Z0KmC2QQ5yjgN20SwBs08IrQCLcBGAsYHQ/s1600/ezgif.com-crop%2B%25281%2529.gif" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://2.bp.blogspot.com/-S-BbpbiIFWs/XbY1eDvEPeI/AAAAAAAAWpc/We9V7r8h4Z0KmC2QQ5yjgN20SwBs08IrQCLcBGAsYHQ/s400/ezgif.com-crop%2B%25281%2529.gif" width="382" height="400" data-original-width="377" data-original-height="395" /></a></div>Often we encounter variables that are not independent. These can be normally distributed or closely modeled as normally distributed. Suppose \(X,Y,Z\) are independent standard normal random variables. Let us define: \[ V=aX+bY+c \\ W=dX+fZ+g \] As sums of Gaussian distributions, \(V\) and \(W\) will each be themselves Gaussians. Since \(X,Y,Z\) are independent with zero mean, it follows that \(\mathrm{E}(XY)=\mathrm{E}(XZ)=\mathrm{E}(YZ)=0\). We then find the means and variances of \(V\) and \(W\): \[ \mu_V=\mathrm{E}(V)=c,\: \:\: \: \: \sigma_V^2=\mathrm{E}((V-\mu_V)^2)=a^2+b^2 \\ \mu_W=\mathrm{E}(W)=g,\: \:\: \: \: \sigma_W^2=\mathrm{E}((W-\mu_W)^2)=d^2+f^2 \] The covariance of \(V\) and \(W\) is given by \[\sigma_{VW}=\mathrm{E}((V-\mu_V)(W-\mu_W))=ad\] Thus the two variables are correlated, and we can achieve any desired correlation. We can achieve the same effect using only two variables: \[ \begin{bmatrix} V\\ W \end{bmatrix}=\begin{bmatrix} \sigma_V\sqrt{1-\rho^2} & \rho\sigma_V\\ 0 & \sigma_W \end{bmatrix} \begin{bmatrix} X\\ Y \end{bmatrix}+\begin{bmatrix} \mu_V\\ \mu_W \end{bmatrix} \] Where \(\rho\) is the correlation coefficient between \(V\) and \(W\). When defined this way, the joint probability density function is given by \[ f(v,w)=\frac{1}{2\pi\sigma_V\sigma_W\sqrt{1-\rho^2}}e^{-\tfrac{1}{2(1-\rho^2)}\left [ \tfrac{(v-\mu_V)^2}{\sigma_V^2}+\tfrac{(w-\mu_W)^2}{\sigma_W^2}-2\rho\tfrac{(v-\mu_V)(w-\mu_W)}{\sigma_V\sigma_W} \right ]} \] More generally, suppose that we have \(n\) correlated Gaussian variables represented in a column matrix: \(\mathbf{x}\). Let \(\boldsymbol{\mu}=\mathrm{E}(\mathbf{x})\).The covariance matrix is defined as \(\mathbf{\Sigma}=\mathrm{E}\left (\mathbf{(x-\boldsymbol{\mu})}\mathbf{(x-\boldsymbol{\mu})}^T \right )\), so that \(\mathbf{\Sigma}_{a,b}=\mathrm{E}\left ((x_a-\mu_a)(x_b-\mu_b) \right ) \). Then the probability density function is given by: \[ \bbox[5px,border:2px solid red] { f(\mathbf{x})=\frac{1}{\sqrt{(2\pi)^n\left | \mathbf{\Sigma} \right |}}e^{-\tfrac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})} } \] Let \(\mathbf{z}\) be a column matrix of \(n\) independent standard normal random variables. Let \(M\) be an \(n\) by \(n\) matrix such that \(MM=\mathbf{\Sigma}\). Then if we take \(\mathbf{x}=M\mathbf{z}+\boldsymbol{\mu}\), then \(\mathbf{x}\) will be distributed with the density function given just above. <br /><hr><br /><h1>Wiener Processes and Stochastic Differential Equations</h1><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-VJwBBmGlQ9w/XalANlu2nDI/AAAAAAAAWRo/i3L26OxdcD8wCQy5GPzCLBH-Ix7Lq014ACLcBGAsYHQ/s1600/ezgif.com-crop%2B%25285%2529.gif" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://2.bp.blogspot.com/-VJwBBmGlQ9w/XalANlu2nDI/AAAAAAAAWRo/i3L26OxdcD8wCQy5GPzCLBH-Ix7Lq014ACLcBGAsYHQ/s400/ezgif.com-crop%2B%25285%2529.gif" width="400" height="337" data-original-width="775" data-original-height="652" /></a></div>Let \(z_k\) be independent standard normal random variables for each \(k\). We define a random variable \(W\) as a function of \(t \geq 0\) to be \[ \bbox[5px,border:2px solid red] { W(t)=\underset{n \to \infty}{\lim}\frac{1}{\sqrt{n}}\sum_{1 \leq k \leq nt} z_k } \] It can be seen from the proceeding sections that at each \(t\), \(W(t)\) is a normal random variable with zero mean and variance \(t\). Moreover, \(W(s+t)-W(s)\) can be seen to have the same distribution and is independent of \(W(s)\). This is the definition of a Wiener process, also known as Brownian motion. Note that the summed variables don't even need to be normal, but just zero-mean and unit-variance, and the central limit theorem will guarantee that the process still tend to the same distribution. <br />One way to understand this is using the notation \(dW=z_t\sqrt{dt}\) where \(z_t\) is understood as an independent standard normal random variable for each \(t\). That way \(dW\) is a normal random variable with variance \(dt\). Then \[W(t)=\int_{0}^{t}dW\] It is clear from the definition that, for \(c>0\): \(\frac{1}{\sqrt{c}}W(ct)\) is distributed in exactly the same way and has all the same properties as \(W(t)\) (It's another Wiener process). Thus \(W(t)\) exhibits self-similarity. It is also evident that, for the same process, the covariance of \(W(s)\) and \(W(t)\) is \(\min(s,t)\). Suppose that \(V(t)=W(f(t))\) where \(f(t)\) is a non-decreasing function. Then it follows that \(dV=\sqrt{f'(t)}dW\). <br />If we wish to simulate a Wiener process at the times \(t_k\). This can be done incrementally as: \[ W(t_{k+1})=W(t_k)+\sqrt{t_{k+1}-t_k}\cdot z_k \] Where the \(z_k\) are independent standard normal variables. <br />The Wiener process and its stochastic differential \(dW\) is fundamental to the study of stochastic differential equations. Stochastic differential equations have many applications in physics, finance, population growth, and econometrics. One common general form for these equations is \[ \bbox[5px,border:2px solid red] { dX=\mu(X,t)dt+\sigma(X,t)dW } \] In this equation, \(\mu(X,t)\) is the drift and \(\sigma(X,t)\) is the spread or diffusion. The density function for \(X\), \(f(x,t)\), satisfies the Fokker-Planck equation: \[ \frac{\partial }{\partial t}f(x,t)=-\frac{\partial }{\partial x}\left [ \mu(x,t)f(x,t) \right ]+\frac{1}{2}\frac{\partial^2 }{\partial x^2}\left [\sigma^2(x,t)f(x,t) \right ] \] Suppose we wish to find what happens to the new variable \(Y=g(X,t)\), where \(g(x,t\) is a smooth, multiply differentiable function. We can expand \(dY\) in a Taylor series: \[ dY=\frac{\partial g}{\partial t}dt+\frac{\partial g}{\partial x}dX+\frac{1}{2}\frac{\partial^2 g}{\partial x^2}dX^2+... \] Subsituting in the expansion for \(dX\), and taking advantage of the fact that \(dW^2=dt\) in the proper statistical sense: \[ \bbox[5px,border:2px solid red] { dY=\left [\frac{\partial g}{\partial t}+\mu\frac{\partial g}{\partial x}+\frac{\sigma^2}{2}\frac{\partial^2 g}{\partial x^2} \right ]dt+\sigma\frac{\partial g}{\partial x}dW } \] All the remaining terms are higher than first order and hence will not be significant. This is Ito's lemma and allows us to make changes of variable for various stochastic processes, which allows us to solve several types of stochastic differential equations. <br /><hr width="45%"><h3>General Integrated Brownian Motion</h3><br />Let us define: \[ V(t)=\int_{0}^{t}f'(s)W(s)ds=\int_{0}^{t}(f(t)-f(s))dW \] A fact that follows easily from the definition of the Wiener process, known as Ito isometry, is: \[ \bbox[5px,border:2px solid red] { \mathrm{E}\left ( \left [\int_0^t F(s)dW \right ]^2 \right )=\mathrm{E}\left ( \int_0^t F(s)^2ds \right ) } \] Given this, it follows that (for \(a>0\)): \[ \mathrm{Var}\left ( V(t) \right)=\int_0^t \left ( f(t)-f(s) \right )^2ds \\ \mathrm{cov}\left ( V(t+a),V(t) \right)=\int_0^t \left ( f(t+a)-f(s) \right )\left ( f(t)-f(s) \right )ds \] For instance, when \(f(t)=t\), \(\mathrm{cov}\left ( V(t+a),V(t) \right)=t^2\tfrac{3a+2t}{6}\). Note that as the sum of zero-mean Gaussians, all of the random variables are Gaussian as well. <br />A way to simulate such a process at times \(t_k\) is to use the above to determine the covariance matrix \(\mathbf{\Sigma}\) of all the \(V(t_k)\). Then the process can be simulated by the method described in the section on correlated normal variables. Namely, we take a vector of iid standard normal variables \(\mathbf{z}\), determine a matrix \(M\) such that \(MM=\mathbf{\Sigma}\), then \(V(\mathbf{t})=M\mathbf{z}\). This technique applies to any correlated process. <br />Note also, from the note above, we easily derive the formula, for \(f(t)\) some function and constant \(A \geq 0\): \[ \bbox[5px,border:2px solid red] { \int_0^t A\cdot f(s)dW=A\cdot W\left ( \int_0^t f^2(s)ds \right ) } \] <br /><hr width="45%"><h3>Brownian Motion with Drift</h3><br />Suppose that \[ \bbox[5px,border:2px solid red] { dX=\mu dt+\sigma dW } \] Where \(\mu\) and \(\sigma\) are constants. Let us define \(Y=g(X,t)\), where \(g(x,t)=\frac{x-\mu t}{\sigma}\). By Ito's lemma, the differential becomes: \[ dY=\left [\frac{-\mu}{\sigma}+\mu\frac{1}{\sigma}+0 \right ]dt+\sigma\frac{1}{\sigma}dW=dW \] Thus \(\bbox[5px,border:2px solid red] {X(t)=\sigma W(t)+\mu t+X_0}\). It also follows that X is a normal random variable with mean \(\mu t+X_0\) and variance \(\sigma^2 t\). <br /><hr width="45%"><h3>Geometric Brownian Motion</h3><br />Suppose that \[ \bbox[5px,border:2px solid red] { dX=X\mu dt+X\sigma dW } \] Where \(\mu\) and \(\sigma\) are constants. This is a process where the percentage change \(\frac{dX}{X}\) follows a Brownian motion with drift. Let us define \(Y=\ln(X)\). By Ito's lemma, the differential becomes: \[ dY=\left [0+\mu X\frac{1}{X}-\frac{\sigma^2 X^2}{2}\frac{1}{X^2} \right ]dt+\sigma X\frac{1}{X}dW=\left ( \mu-\frac{\sigma^2}{2} \right )dt+\sigma dW \] But this has the same form as the case of Brownian motion with drift. Thus \(Y(t)=\sigma W(t)+\left (\mu-\frac{\sigma^2}{2} \right ) t+Y_0\), from which it follows that \[ \bbox[5px,border:2px solid red] { X(t)=X_0 \cdot e^{\left (\mu-\frac{\sigma^2}{2} \right ) t+\sigma W(t)} } \] Moreover, \(X\) will be log-normally distributed with \(\mu\)-parameter equal to \(\ln(X_0)+(\mu-\tfrac{\sigma^2}{2})t\) and \(\sigma^2\)-parameter equal to \(\sigma^2 t\). <br /><hr width="45%"><h3>Ornstein–Uhlenbeck process</h3><br />Suppose that \[ \bbox[5px,border:2px solid red] { dX=-X\theta dt+\sigma dW } \] Where \(\theta\) and \(\sigma\) are constants. Let us define \(Y=g(X,t)\), where \(g(x,t)=x e^{\theta t}\). By Ito's lemma, the differential becomes: \[ dY=\left [\theta X e^{\theta t}-X\theta e^{\theta t}+0 \right ]dt+\sigma e^{\theta t}dW=\sigma e^{\theta t} dW \] It follows that \[ Y=Y_0+\frac{\sigma}{\sqrt{2\theta}} W(e^{2\theta t}-1) \] And so \[ \bbox[5px,border:2px solid red] { X=X_0e^{-\theta t}+\frac{\sigma}{\sqrt{2\theta}}e^{-\theta t} W(e^{2\theta t}-1) } \] Another common formulation has \[ \bbox[5px,border:2px solid red] { dX=(\xi-X)\theta dt+\sigma dW } \] Which has the solution \[ \bbox[5px,border:2px solid red] { X(t)=X_0e^{-\theta t}+(1-e^{-\theta t})\xi +\frac{\sigma}{\sqrt{2\theta}}e^{-\theta t} W(e^{2\theta t}-1) } \] This shows that \(X\) is at each time a Gaussian with mean \(X_0e^{-\theta t}+(1-e^{-\theta t})\xi\), and variance \(\tfrac{\sigma^2}{2\theta}(1-e^{-2\theta t})\). It follows that, asymptotically, the distribution tends toward a Gaussian with mean \(\xi\) and variance \(\tfrac{\sigma^2}{2\theta}\). Thus the process is mean-reverting. The covariance of \(X\) between times \(a\) and \(b\) is given by \[ \mathrm{cov}\left ( X(a),X(b) \right )=\frac{\sigma^2}{2\theta}\left ( e^{-\theta|a-b|}-e^{-\theta(a+b)} \right ) \] <br /><hr><br /><h1>Generating Gaussian Random Samples</h1><br />Often we wish to simulate independent samples from a standard normal distribution. By the central limit theorem, one option is to simulate \(N\) uniform random variables, and take the normalized mean. However, this is quite inefficient. We descibe two methods below: <br /><hr width="45%"><h3>2D-Distribution Based</h3><br />The typical method of evaluating the Gaussian integral is to square the integral and evaluate it as a two-dimensional integral by an advantageous change of variables. For example: \[ I=\int_{-\infty}^{\infty}e^{-x^2/2}dx \\ I^2=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-(x^2+y^2)/2}dxdy =\int_{0}^{2\pi}\int_{0}^{\infty}e^{-r^2/2}rdrd\theta \\ I^2=\left.\begin{matrix} \theta \end{matrix}\right|_0^{2\pi}\cdot \left [ -e^{-r^2/2} \right ]_0^\infty =2\pi \] This implies that \(\theta\) is uniformly distributed between \(0\) and \(2\pi\), and \(r\) is distributed with a CDF given by \(F(r)=1-e^{-r^2/2}\). Thus, if \[ \begin{align} x &=r\cos(\theta), &\: \: y &= r\sin(\theta)\nonumber\\ \theta &= 2\pi U_1, &\: \: r &= \sqrt{-2\ln(U_2)}\nonumber\\ \nonumber \end{align} \] Where \(U_1\) and \(U_2\) are uniform random variables on \((0,1]\), then \(x\) and \(y\) are independent standard normal variables. This is called the <b>Box-Muller Method</b>. <br />An equivalent method, <b>Marsaglia's Polar Method</b>, samples \((u,v)\) uniformly over the unit disk, then returns \[ x=u\sqrt{\frac{-2\ln(u^2+v^2)}{u^2+v^2}} \\ y=v\sqrt{\frac{-2\ln(u^2+v^2)}{u^2+v^2}} \] as two independent standard Gaussian variables. <br /><hr width="45%"><h3>Distribution-Geometry Based</h3><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-jsenLcvNYyY/XZeykAbnsKI/AAAAAAAAWJI/uSrrOkPql4IpG7C6zssEuAYMEG8uIqTfQCLcBGAsYHQ/s1600/zig1.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://4.bp.blogspot.com/-jsenLcvNYyY/XZeykAbnsKI/AAAAAAAAWJI/uSrrOkPql4IpG7C6zssEuAYMEG8uIqTfQCLcBGAsYHQ/s320/zig1.png" width="320" height="257" data-original-width="608" data-original-height="488" /></a></div>This method, known as the <b>Ziggurat Method</b> is quite general: it can be applied to any random variable with a monotone decreasging density function, or one with a a symmetric one that is monotone decreasing around a central mode. The main principle we use is that the x-coordinate of a point uniformly selected under a given probability density function is distributed with that probability density. The Ziggurat algorithm cuts the distribution into N layers, stacked in the way a ziggurat appears. The less area there is in the tail and in the ziggurat but outside the curve, the more efficient the algorithm will be, as fewer resamplings will be needed. <br /> <br /> For the specific case of a normal distribution, the algorithm proceeds as follows: <ol><li><b>Pre-compute layer limits and probabilities </b>. We select N y-limits such that \(0 = y_0 < y_1 < y_2 < ... < y_{N-1} < y_N=1\). We then get the probabilities \[ p_k={\frac{2}{\sqrt{\pi}}}\int_{y_{k}}^{y_{k+1}}\sqrt{-\ln(x)}dx \] We also define \(x_k=-2\ln(y_k)\). Once we have the limits and probabilities, they can be stored and used for all further calculations. Note that the first layer includes the tail (\(x_0=\infty\)), which will have to be dealt with differently than the other layers. </li><li><b>Randomly select a layer</b>. From the N layers, randomly pick one with probability corresponding to the layer probability computed above: save the selected index \(k\) (layer \(k\) spans from \(y_k\) to \(y_{k+1}\)). </li><li><b>Sample layer</b>. If \(k>0\), pick a uniform random value \(u\) between 0 and 1. Then \(x=u \cdot x_{k}\). If \(k=0\), use the tail algorithm. </li><li><b>Test sampled value</b>. If \(x < x_{k+1}\), return \(y\). Otherwise, pick a uniform random value \(v\) between 0 and 1. define \(y=y_k+v \cdot (y_{k+1}-y_k)\). If \(y < e^{-x^2/2}\), return x. Otherwise, resample the layer (return to step 2). </li><li><b>Tail algorithm</b>. If \(k=0\), pick uniform random values \(u_1,u_2,u_3,u_4\) between 0 and 1. If \(u_1 < x_1\cdot y_1\), return \(x=x_1 \cdot u_2\). Otherwise, let \(\xi=-\ln(u_3)/x_1\) and \(y=-\ln(u_4)\). If \(2y > \xi^2\) return \(x=x_1+\xi\). Otherwise, resample the layer (return to step 2). </li><li><b>Sign of value</b>. Pick a random bit \(b\) which is equally likely to be 0 or 1. Set \(x \to (-1)^b x\). </li></ol> <br /><hr><br /><h1>Gaussian Kernel Smoothing and Density Estimation</h1><br />Gaussian functions serve as very convenient and useful kernels for a number of applications. A kernel is a function that depends on a location and a value at that location and spreads that value to nearby locations. This is useful for smoothing, interpolating, and making discrete sets continuous. The Gaussian is an attractive kernel because it is normalized, it has natural location (\(\mu\)) and width (\(\sigma\)) parameters, tails off quickly, and is not abitrarily constrained to a finite window. Applying the kernel generally operates like a convolution: for discrete data \(\left \{ (x_1,y_1),(x_2,y_2),...,(x_N,y_N) \right \}\) we define a function as \[ F(x)=\nu(x)\cdot\sum_{k=1}^{N}y_k\cdot g_{0,\sigma^2}(x-x_k) \] Where \(\nu(x)\) is a normalizing or scaling function. For continuous data, \(f(x)\) defined for all \(x\), then the application of the kernel gives: \[ F(x)=\nu(x)\cdot [f*g_{0,\sigma^2}(x)] \] The parameter \(\sigma\) is the only one left unspecified, and allows us to have control over the degree of smoothing. In higher dimensions, it may be asymmetric, including off-diagonal terms. It is possible to vary it for different \(x_k\). <br /><hr width="45%"><h3>Gaussian Blur</h3><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-8PTaeQUmAGs/XZDa0tDh88I/AAAAAAAAWFY/KOzsRCclrJQKzvpfPzDyixJfRpiQs8DKQCLcBGAsYHQ/s1600/Cappadocia_Gaussian_Blur.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://1.bp.blogspot.com/-8PTaeQUmAGs/XZDa0tDh88I/AAAAAAAAWFY/KOzsRCclrJQKzvpfPzDyixJfRpiQs8DKQCLcBGAsYHQ/s320/Cappadocia_Gaussian_Blur.png" width="180" height="360" data-original-width="320" data-original-height="639" /></a></div>Given an image, often it is desirable to remove noise, soften sharp edges, or remove detail. This is useful, for example, to make backgrounds draw less attention, or, in edge-detection, to reduce the number of detected features. Several processes tend to produce noise or sharp edges and so it is advantageous to follow them with a process to reduce such unwanted features. By using a Gaussian kernel, we can achieve this desired type of smoothing, with the degree of smoothing controlled by \(\sigma\) (in units of pixels). One major advantage of the Gaussian kernel, particularly for image processing, is that, for diagonal covariance matrices, it can be decomposed into sequential one-dimensional convolutions, and hence is generally much more efficient than other kernels. Note that, for color images, the process is done one each of the different RGB channels. <br />For practical purposes, the Gaussian kernels used are rarely infinite in extent, and need not be continuous. Rather, a \((2n+1)\times(2n+1)\) pixel kernel is used, where \(n\geq 3\sigma\), which is just as effective, nearly indistinguishable, and far more efficient. This kernel, for \(0\leq i,j\leq 2n\), can be given by: \[ K(i,j)=\frac{1}{\kappa^2}e^{-\tfrac{1}{2\sigma^2}\left [ \left (i-n \right )^2+\left (j-n \right )^2 \right ]} \] Where \[ \kappa=\sum_{j=-n}^{n}e^{-\tfrac{j^2}{2\sigma^2}} \] <br /><hr width="45%"><h3>Gaussian Kernel Smoothing</h3><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-DBCSVznI6kI/XZwhK_RKTmI/AAAAAAAAWK0/GK6CbGMI-fUEDz4zSryMucIH0s3tGzxqwCLcBGAsYHQ/s1600/ezgif.com-gif-maker.gif" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://4.bp.blogspot.com/-DBCSVznI6kI/XZwhK_RKTmI/AAAAAAAAWK0/GK6CbGMI-fUEDz4zSryMucIH0s3tGzxqwCLcBGAsYHQ/s320/ezgif.com-gif-maker.gif" width="301" height="320" data-original-width="343" data-original-height="365" /></a></div>Given a discrete set of x-values and their corresponding y-values, we define a function as: \[ F(x)=\frac{\sum_{k=1}^{N}y_k \cdot g_{0,\sigma^2}(x-x_k)}{\sum_{k=1}^{N}g_{0,\sigma^2}(x-x_k)} \] This function has the property that points near \(x_k\) will have values close to \(y_k\). In fact, in the limit as \(\sigma^2\) goes to zero, the function converges to the nearest-neighbor or Voronoi map. By adjusting \(\sigma^2\), we can adjust how quickly the function transitions between values. This is most easily seen from a one-dimensional example: suppose we have the two data points \(\{(x_1,y_1),(x_2,y_2)\}\). Then the function above can be equivalently written as: \[ F(x)=\frac{y_1+y_2}{2}+\frac{y_2-y_1}{2}\tanh\left ( \left [ \frac{x_2-x_1}{2\sigma^2} \right ]\left ( x-\tfrac{x_1+x_2}{2} \right ) \right ) \] The rise span of this function is proportional to \(\frac{\sigma^2}{x_2-x_1}\). This behavior is somewhat opposite to what may be desired, namely, that more widely separated points have shorter rise spans. This can either be simply accepted as a feature of the method, or different \(\sigma\) can be used at different points, depending on the distance to their nearest-neighbors. <br />This type of smoothing has a great number of applications, as it can readily be used in multivariate or vector data (for both \(x\) and \(y\)). It even permits easy adaptation to non-euclidean spaces, e.g. over the surface of a sphere. <br />The method however has a few drawbacks. One is that care needs to be taken in cases where the numerator and denominator are very small. Another is that evaluating the function can be rather computationally intensive when needed to be evaluated at a large number of locations or when \(N\) is large. However, the method is quite powerful and admits for much flexibility to provide smooth interpolations and extrapolation for discrete data. <br /><hr width="45%"><h3>Kernel Density Estimation</h3><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-Nz750YxcbpQ/XbYgNqHFRtI/AAAAAAAAWpM/1eppASfFVaQDlEeTaKGg6FmOS6JiWfQdwCLcBGAsYHQ/s1600/ezgif.com-optimize.gif" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://3.bp.blogspot.com/-Nz750YxcbpQ/XbYgNqHFRtI/AAAAAAAAWpM/1eppASfFVaQDlEeTaKGg6FmOS6JiWfQdwCLcBGAsYHQ/s400/ezgif.com-optimize.gif" width="400" height="329" data-original-width="908" data-original-height="746" /></a></div>Suppose we sample \(N\) times from an unknown probability distribution, and then wish to estimate the density function. If we had a guess as tothe form of the distribution, we could use date to estimate the underlying parameters. However, often the distribution is not of a known kind so this is not pragmatic. Instead we can estimate the density from the sample values \(x_1,x_2,...,x_N\) as \[ f(x)=\frac{1}{N}\sum_{k=1}^{N}g_{0,\sigma^2}\left ( x-x_k \right ) \] Note that this density function has mean \(\frac{1}{N}\sum_{k=1}^{N}x_k=\overline{x}\) and variance \(\sigma^2+\frac{1}{N}\sum_{k=1}^{N}(x_k-\overline{x})^2\). <br>This can be extended to multiple dimensions in a straightforward way, although in that case the variance may be replaced either by a matrix proportional to the sample covariance matrix, or by a diagonal matrix. For the one-dimensional case, a general rule of thumb for picking the variance is \(\sigma=\sqrt{\mathrm{var}\left ( \mathbf{x} \right )}\left ( \frac{4}{3n} \right )^{1/5}\). Another, rather aggressive approximation is \(\sigma=\tfrac{1}{2}\max\left ( \Delta x \right )\) where \(\Delta x\) is the difference between successive sorted sample values. <br /><hr><br /><h1>Normal Distributions in Non-Euclidean Spaces</h1><br />The distributions we have so far looked at exist in a Euclidean space. That is, the space in which the random variables exist and interact is taken to be flat, having zero curvature everywhere. In general, for two vectors in an (N+1)-dimensional space, with constant curvature \(K\) (\(K=0\) means a flat or Euclidean space, \(K>0\) means a positively curved or elliptical space, and \(K<0\) means a negatively curved or hyperbolic space), we can define a sort of inner product, called a bilinear form: \[ \bbox[5px,border:2px solid red] { \mathbf{a}\cdot\mathbf{b}=a_0 b_0+K \sum_{j=1}^N a_jb_j } \] We will be looking exclusively as unit vectors, i.e. vectors that satisfy \(\mathbf{v}\cdot\mathbf{v}=|\mathbf{v}|^2=v_0^2+K \sum_{j=1}^N v_j^2=1\). <br>In general, the distance between the points corresponding to the unit vectors \(\mathbf{a}\) and \(\mathbf{b}\) is \[ \bbox[5px,border:2px solid red] { d(\mathbf{a},\mathbf{b})=\tfrac{1}{\sqrt{K}}\cos^{-1}\left ( \mathbf{a}\cdot\mathbf{b} \right )=\tfrac{1}{\sqrt{-K}}\cosh^{-1}\left ( \mathbf{a}\cdot\mathbf{b} \right ) } \] Where the second form is more easily applicable for \(K<0\). Note that Euclidean space must be approached as the limit as \(K \to 0\). Moreover, the zeroth dimension in Euclidean space is just a bookkeeping device: as only the zeroth dimension contributes to the magnitude of the vector, the other components are free to take on any magnitude. However, note that the Euclidean distance is indeed given as the limit of the general distance formula: \[ d(\mathbf{a},\mathbf{b})=\underset{K \to 0}{\lim} \frac{1}{\sqrt{K}}\cos^{-1}\left ( \sqrt{1-K\sum_{j=1}^{N}a_j^2} \sqrt{1-K\sum_{j=1}^{N}b_j^2}+K \sum_{j=1}^N a_jb_j \right ) \\ d(\mathbf{a},\mathbf{b})=\sqrt{\sum_{j=1}^N (a_j-b_j)^2} \] A more useful form for differential geometry is to impose the unit-vector condition directly, and use generalized polar coordinates use the parametrization \[ r=d(\mathbf{x},[1,\mathbf{0}]) \\ d\boldsymbol{\psi}^2=d\theta_1^2+\sin^2(\theta_1)d\theta_2^2+\sin^2(\theta_1)\sin^2(\theta_2)d\theta_3^2+... \] Where the \(\theta\)s are generalized angles. Then the distance metric (the differential distance between nearby points as a function of their coordinates) is given by \[ \bbox[5px,border:2px solid red] { ds^2=dr^2+\left [ \frac{\sin(r\sqrt{K})}{\sqrt{K}} \right ]^2 d\boldsymbol{\psi}^2 } \] For \(N=2\), if the meric is expressed as \(ds^2=A(v,w) dv^2+B(v,w)dw^2\), the Gaussian curvature \(G\) is given by \[ \bbox[5px,border:2px solid red] { G=\frac{-1}{2\sqrt{AB}}\left ( \frac{\partial }{\partial v}\left [ \frac{1}{\sqrt{AB}}\frac{\partial B}{\partial v} \right ]+\frac{\partial }{\partial w}\left [ \frac{1}{\sqrt{AB}}\frac{\partial A}{\partial w} \right ] \right ) } \] It is easy to verify that for our distance metric, we do indeed get \(G=K\). <br>Let us take the following probability distribution over unit vectors \(\mathbf{x}\): \[ \bbox[5px,border:2px solid red] { \bbox[5px,border:2px solid red] { f(\mathbf{x})=C\cdot \exp\left (\tfrac{1}{K \sigma^2}\left [\boldsymbol{\mu}\cdot\mathbf{x}-1 \right ] \right ) } } \] Where \(C\) is a normalization constant, and \(\boldsymbol{\mu}\) is a constant unit vector, and \(\sigma^2\) is some positive constant. This is the generalized Von Mises–Fisher distribution, which extends the normal distribution to non-Euclidean geometries. In the limit as \(K \to 0\): \[ f(\mathbf{x})=\frac{1}{(2\pi\sigma^2)^{N/2}} \exp\left (\frac{-1}{2 \sigma^2}\sum_{j=1}^{N}(x_j-\mu_j)^2 \right ) \] Which is the expected Euclidean distribution. The more general multivariate distribution could be recovered by modifying the definition of the bilinear form. In one dimension, the distributions take the form \[ \bbox[5px,border:2px solid red] { f(r)= \left\{\begin{matrix} \frac{\sqrt{K}}{2\pi e^{-\tfrac{1}{K\sigma^2}} I_0\left (\tfrac{1}{K\sigma^2} \right )} \exp\left ( \frac{\cos(r\sqrt{K})-1}{K\sigma^2} \right ) & \: \: \: \: \: \: \: \: K > 0\\ \\ \frac{1}{\sqrt{2\pi\sigma^2} }\exp\left ( -\frac{r^2}{2\sigma^2} \right ) & \: \: \: \: \: \: \: \: K = 0\\ \\ \frac{\sqrt{-K}}{2e^{-\tfrac{1}{K\sigma^2}} K_0 \left (\tfrac{1}{-K\sigma^2} \right )}\exp\left ( \frac{\cosh(r\sqrt{-K})-1}{K\sigma^2} \right ) & \: \: \: \: \: \: \: \: K < 0 \end{matrix}\right. } \] Where \(I_0(x)\) and \(K_0(x)\) are the zero-order modified bessel functions of the first and second kind. It is clear from this that \[ \underset{x\to \infty}{\lim}I_0(x)e^{-x}\sqrt{x}=\frac{1}{\sqrt{2\pi}} \\ \underset{x\to \infty}{\lim}K_0(x)e^{x}\sqrt{x}=\sqrt{\frac{\pi}{2}} \] Here we show two animations, one showing a centered, symmetric distribution with unit variance for different curvatures. Then we also show the same distribution, but off-center. <br/><a href="https://1.bp.blogspot.com/-QhYaUwBKeMY/XaJA9-uHYgI/AAAAAAAAWLs/gCgSTPCxfK8r6eN_6OglcaciAtXPZnPJwCLcBGAsYHQ/s1600/ezgif.com-gif-maker%2B%25281%2529.gif" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-QhYaUwBKeMY/XaJA9-uHYgI/AAAAAAAAWLs/gCgSTPCxfK8r6eN_6OglcaciAtXPZnPJwCLcBGAsYHQ/s1600/ezgif.com-gif-maker%2B%25281%2529.gif" width="330" height="330" data-original-width="443" data-original-height="347" alt="1-D Generalized Von Mises-Fisher distribution (\(\mu=0\), \(\sigma^2=1\))"/></a><a href="https://3.bp.blogspot.com/-qgyc46Kxsro/XaJrh8UZ_rI/AAAAAAAAWMA/ZtvsKEM_L9cLYZ4UbcA_UHuqJvnU8ya0gCLcBGAsYHQ/s1600/ezgif.com-optimize%2B%25283%2529.gif" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-qgyc46Kxsro/XaJrh8UZ_rI/AAAAAAAAWMA/ZtvsKEM_L9cLYZ4UbcA_UHuqJvnU8ya0gCLcBGAsYHQ/s1600/ezgif.com-optimize%2B%25283%2529.gif" width="330" height="330" data-original-width="443" data-original-height="347" alt="1-D Generalized Von Mises-Fisher distribution (\(\mu=-1\), \(\sigma^2=1\))"/></a><br/>In two dimensions: Let \(\boldsymbol{\mu}=[1,0,0]\). <br>Let us find the probability \(\mathrm{P}(r=d(\mathbf{x},\boldsymbol{\mu}) < t)\). It follows from the above that \(\boldsymbol{\mu}\cdot\mathbf{x}=\cos\left (r\sqrt{K} \right )\). From the distance metric, it can easily be seen that a circle with a radius \(r\) will have circumference \(2\pi\tfrac{\sin(r\sqrt{K})}{\sqrt{K}}\). From this the probability can be seen to be given by: \[ \mathrm{P}(r < t)=2\pi C\int_{0}^{t}\frac{\sin(r\sqrt{K})}{\sqrt{K}} \exp \left (\frac{\cos(r\sqrt{K})-1}{K\sigma^2} \right )dr \\ \mathrm{P}(r < t)=2\pi C \sigma^2\left [ 1-\exp \left (\frac{\cos(t\sqrt{K})-1}{K\sigma^2} \right ) \right ] \] For \(t\) as large as possible, this must be one, and so the normalization constant is given by \[ C=\frac{1}{2\pi\sigma^2}\left\{\begin{matrix} \left ( 1-e^{-\tfrac{2}{K\sigma^2}} \right )^{-1} & K>0\\ 1 & K \leq 0 \end{matrix}\right. \] Which makes the probability density function \[ \bbox[5px,border:2px solid red] { f(\mathbf{x})=\frac{\exp\left (\tfrac{1}{K \sigma^2}\left [\boldsymbol{\mu}\cdot\mathbf{x}-1 \right ] \right )}{2\pi\sigma^2}\left\{\begin{matrix} \left ( 1-e^{-\tfrac{2}{K\sigma^2}} \right )^{-1} & K>0\\ 1 & K \leq 0 \end{matrix}\right. } \] The radial CDF is then given by \[ \mathrm{P}(r < t)=\left [ 1-\exp \left (\frac{\cos(t\sqrt{K})-1}{K\sigma^2} \right ) \right ]\left\{\begin{matrix} \left ( 1-e^{-\tfrac{2}{K\sigma^2}} \right )^{-1} & K>0\\ 1 & K \leq 0 \end{matrix}\right. \] If \(K=0\), this is just the usual 2-dimensional Gaussian distribution with equal variances and zero covariance. <br>If \(K>0\), this is an elliptical Von Mises-Fisher distribution. <br>If \(K<0\), this is a hyperbolic generalized Von Mises-Fisher distribution. <br>On the left we show a 2-D Generalized Von Mises-Fisher distribution with \(\mu=1\), \(\sigma^2=1\), for \(|K| \leq 3\). <br> For the special case of \(K=-1\), we can use the Poincare disk model: on the right we show the 2 dimensional generalized Von Mises-Fisher distribution with \(|\mu|\leq3\) and \(\sigma=0.65\). <br/><a href="https://1.bp.blogspot.com/-xJxXKnOqvBw/XbHGjpt9EnI/AAAAAAAAWiA/YaqT1LduOIIxsdY7FJH429LOAUmohL31ACLcBGAsYHQ/s1600/ezgif.com-crop.gif" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-xJxXKnOqvBw/XbHGjpt9EnI/AAAAAAAAWiA/YaqT1LduOIIxsdY7FJH429LOAUmohL31ACLcBGAsYHQ/s1600/ezgif.com-crop.gif" width="330" height="330" data-original-width="443" data-original-height="347" alt="2-D Generalized Von Mises-Fisher distribution (\(\mu=1\), \(\sigma^2=1\))"/></a><a href="https://2.bp.blogspot.com/-FVyco22Q6HM/XaXcs-_hNCI/AAAAAAAAWPI/nMqsiHGI90kAxDum3pGuGNykKNeEdk8OACLcBGAsYHQ/s1600/ezgif.com-crop%2B%25281%2529.gif" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-FVyco22Q6HM/XaXcs-_hNCI/AAAAAAAAWPI/nMqsiHGI90kAxDum3pGuGNykKNeEdk8OACLcBGAsYHQ/s1600/ezgif.com-crop%2B%25281%2529.gif" width="330" height="330" data-original-width="443" data-original-height="347" alt="2-D Generalized Von Mises-Fisher distribution in the Poincare disk model, (\(-3\leq\mu\leq 3\), \(\sigma=0.65\))"/></a>Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-12368042562689226832018-02-22T09:08:00.000-08:002018-02-23T13:28:04.563-08:00A Theorem about Circles and a Volumizing Algorithm<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h2>A Circle Theorem</h2><br />Take a circle of radius \(R\). Select a point \(A\) inside it a distance \(a\) from the center, with \(a < R\). From \(A\), construct \(N>1\) line segments starting from A and touching the circle, segment k touching the circle at \(P_k\), such that if \(a-b\equiv \pm 1 \mod N\), then \(\measuredangle P_aAP_b=2\pi/N\), that is all the segments are equally-angularly-spaced. Let \(d_k=\overline{AP_k}\). Then \[ \prod_{k=1}^{N}d_k=\prod_{k=1}^{N}\left ( a\cos \left ( \theta_0+\frac{2 k \pi}{N} \right )+\sqrt{R^2-a^2+a^2\cos^2 \left ( \theta_0+\frac{2 k \pi}{N} \right )} \right ) \\ \prod_{k=1}^{N}d_k=\prod_{k=1}^{N}\left (\sqrt{R^2-a^2}\exp\left ( \sinh^{-1}\left (\frac{a}{\sqrt{R^2-a^2}}\sin \left ( \theta'_0+\frac{2 k \pi}{N} \right ) \right ) \right ) \right ) \] Therefore \[ \sqrt[N]{\prod_{k=1}^{N}d_k}=\sqrt{R^2-a^2}\exp \left (\frac{1}{N}\sum_{k=1}^{N} \sinh^{-1}\left (\frac{a}{\sqrt{R^2-a^2}}\sin \left ( \theta'_0+\frac{2 k \pi}{N} \right ) \right ) \right ) \] It follows that, for N even, as the summation will cancel in every term, \[ \sqrt[N]{\prod_{k=1}^{N}d_k}=\sqrt{R^2-a^2} \] This also holds asymptotically, as the error approaches zero. it is generally not true for N odd. <br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-yN1sSh6XNLs/Wo8j13fgzTI/AAAAAAAAR6U/p-8ZCvR9aHkGXCeWdJeqqZs5Je1uUY-fACLcBGAs/s1600/ezgif.com-crop.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-yN1sSh6XNLs/Wo8j13fgzTI/AAAAAAAAR6U/p-8ZCvR9aHkGXCeWdJeqqZs5Je1uUY-fACLcBGAs/s320/ezgif.com-crop.gif" width="320" height="318" data-original-width="331" data-original-height="329" /></a></div><br />Note that this much more widely generalizes the well-known geometric mean theorem. This can be seen as a consequence of the power of a point theorem. <br /><br />A pleasant interpretation of this is that if we take a diametric cross-section of a sphere and choose a point on that disk, the height of the sphere above that point is the geometric mean of the legs of any \(2N>1\) equiangular, planar, stellar net connecting that point to the boundary of the disk. <br /><br /><h2>A Related Volumizing Algorithm</h2><br />This theorem suggests an algorithm for producing a 3D volume given a closed 2D boundary shape. If we assume the 2D shape is of a diametric cross-section, we simply apply the method detailed above to produce the height above that point. That is, for a given point inside the shape, we take an N-leg equiangular stellar net emanating from that point to the boundary of the shape. The height of the surface at that point is then the geometric mean of the N legs of that net. <br /><br />This method ensures that circular shapes produce spherical surfaces. However, if N is low, for less regular boundary shapes, the resulting surface may be quite lumpy or sensitive to how the angles of each net are chosen. One solution, then, is simply to make N large enough. However, this may end up being computationally expensive. <br /><br />In theory, it may be possible to find the asymptotic value: find all parts of the boundary shape visible from the given point, and find the integral of the log of the distance, sweeping over the angle. If the boundary is a polygon, this involves evaluating (or approximating) integrals of the form \[ \int\ln\left ( \sin(x) \right)dx \] Which have no general closed form in terms of elementary functions. However, we can evaluate certain cases. One easy example is that of an infinite corridor formed from two parallel lines. We find that the height profile is double that of a circular cylinder. It may be desirable, then, to determine another function to multiply by which will halve the heights of corridors but leave hemispheres undisturbed. <br /><br />Below we give some visual examples of the results of the algorithm. The original 2D shapes are shown in red. <br /><a href="https://1.bp.blogspot.com/-VHGaGeRNvns/Wo9I2iyfnuI/AAAAAAAAR7c/pOgOcF7ucxkcLP7ASOsPdOQRtHZBGJ6fgCLcBGAs/s1600/ezgif.com-gif-maker%252835%2529.gif" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-VHGaGeRNvns/Wo9I2iyfnuI/AAAAAAAAR7c/pOgOcF7ucxkcLP7ASOsPdOQRtHZBGJ6fgCLcBGAs/s200/ezgif.com-gif-maker%252835%2529.gif" width="160" height="127" data-original-width="443" data-original-height="347" /></a> <a href="https://1.bp.blogspot.com/-yhjvFURs5gg/Wo9IJKDgRtI/AAAAAAAAR7U/hsjSGP5HYCIHWZnNQui-VeJeVErevGh_ACLcBGAs/s1600/ezgif.com-gif-maker%252834%2529.gif" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-yhjvFURs5gg/Wo9IJKDgRtI/AAAAAAAAR7U/hsjSGP5HYCIHWZnNQui-VeJeVErevGh_ACLcBGAs/s200/ezgif.com-gif-maker%252834%2529.gif" width="160" height="127" data-original-width="443" data-original-height="347" /></a> <a href="https://2.bp.blogspot.com/-3egkdN2zwic/Wo9AZgQOuqI/AAAAAAAAR64/6av7WE71DWEuU5_JFDb_pQIlDvqxCUsuQCLcBGAs/s1600/ezgif.com-gif-maker%252833%2529.gif" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-3egkdN2zwic/Wo9AZgQOuqI/AAAAAAAAR64/6av7WE71DWEuU5_JFDb_pQIlDvqxCUsuQCLcBGAs/s200/ezgif.com-gif-maker%252833%2529.gif" width="160" height="127" data-original-width="443" data-original-height="347" /></a> <a href="https://4.bp.blogspot.com/-gB5tjcLi_fM/Wo9AYzAY3yI/AAAAAAAAR60/51wK6MUbvhoeU2V64UWRLNXvKryLY9KFwCLcBGAs/s1600/ezgif.com-gif-maker%252832%2529.gif" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-gB5tjcLi_fM/Wo9AYzAY3yI/AAAAAAAAR60/51wK6MUbvhoeU2V64UWRLNXvKryLY9KFwCLcBGAs/s200/ezgif.com-gif-maker%252832%2529.gif" width="160" height="127" data-original-width="443" data-original-height="347" /></a> <a href="https://1.bp.blogspot.com/-wIZqVN7pQ2U/Wo9AY3TrYVI/AAAAAAAAR6w/BhnWxPz641Qk7J6p1yNdCfpHxy5wbqKcgCLcBGAs/s1600/ezgif.com-gif-maker%252830%2529.gif" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-wIZqVN7pQ2U/Wo9AY3TrYVI/AAAAAAAAR6w/BhnWxPz641Qk7J6p1yNdCfpHxy5wbqKcgCLcBGAs/s200/ezgif.com-gif-maker%252830%2529.gif" width="160" height="127" data-original-width="443" data-original-height="347" /></a> <a href="https://3.bp.blogspot.com/-o6psnVCwFc0/Wo9AY3kQosI/AAAAAAAAR6s/4zuJo9MffGcIbeLl6O49cR6uIHkf8GRLACLcBGAs/s1600/ezgif.com-gif-maker%252831%2529.gif" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-o6psnVCwFc0/Wo9AY3kQosI/AAAAAAAAR6s/4zuJo9MffGcIbeLl6O49cR6uIHkf8GRLACLcBGAs/s200/ezgif.com-gif-maker%252831%2529.gif" width="160" height="127" data-original-width="443" data-original-height="347" /></a> <br /> In order, an equilateral triangle, an icosagon, a five-pointed star, an almost-donut, an Escherian tesselating lizard, and a tesselating spider. Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-30256697451432613842018-02-18T12:03:00.000-08:002018-02-18T14:39:57.351-08:00Rotating Fluid<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br />Suppose we have an infinitely tall cylinder of radius R, filled to a height H with an incompressible fluid. We then set the fluid rotating about the cylindrical axis at angular speed \(\omega\). Suppose we take a differential chunk of fluid on the surface, a radius r from the axis. <div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-s75mpoX7q4w/WonVmdLOi9I/AAAAAAAAR5I/zSxTuWCB6MUKabGCfcaZKygon5DhDqBMgCLcBGAs/s1600/image.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-s75mpoX7q4w/WonVmdLOi9I/AAAAAAAAR5I/zSxTuWCB6MUKabGCfcaZKygon5DhDqBMgCLcBGAs/s320/image.png" width="320" height="252" data-original-width="503" data-original-height="396" /></a></div> The resulting normal force will then be \(N=F_c+W\). This normal force, as the name suggests, will be normal to the fluid surface. It follows by simple geometry, that \[ \frac{dy}{dr}=\frac{F_c}{W}=\frac{r \omega^2}{g} \] From which it follows that the height of the surface at any radius will be given by \[ y=\frac{r^2 \omega^2}{2g}+C \] Let us define \[ \omega_0=2\sqrt{gH}/R \\ u=\omega/\omega_0 \] Given that the fluid is incompressible, we know that the total volume does not change. From this, we can determine that the height of the surface at any radius will be given by: \[ y(r)=2H\left ( ru/R \right )^2+\left\{\begin{matrix} H(1-u^2) \\ 2H(u-u^2) \end{matrix}\right. \, \, \, \, \, \, \, \begin{matrix} u \leq 1\\ u > 1 \end{matrix} \] The highest point on the liquid surface is then given by: \[ y_{\textrm{max}}=\left\{\begin{matrix} H(1+u^2)\\ 2Hu \end{matrix}\right. \, \, \, \, \, \, \, \begin{matrix} u \leq 1\\ u > 1 \end{matrix} \] If \(u > 1\), the center of the base of the cylinder is not covered by fluid. There is a minimum radius at which fluid can be found. This minimum radius is given by: \[ r_{\textrm{min}}=R\sqrt{1-\frac{1}{u}} \] If the fluid is of uniform density and of total mass M, then the moment of inertia of the rotating fluid is given by \[ I=\left\{\begin{matrix} \frac{MR^2}{2}\left ( 1+\frac{u^2}{3} \right )\\ MR^2\left ( 1-\frac{1}{3u} \right ) \end{matrix}\right. \, \, \, \, \, \, \, \begin{matrix} u \leq 1\\ u > 1 \end{matrix} \] Note for each of these piecewise functions, the functions and their first derivatives are continuous. Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-57960169623686249812018-02-06T13:06:00.001-08:002018-07-28T19:40:56.657-07:00Bias in Statistical Judgment<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br><h2>Bias in Performance Evaluation</h2><br>Suppose you are an employer. You are looking to fill a position and you want the best person for the job. To do this, you take a pool of applicants, and for each one, you test them N times on some metric X. From these N tests, you will develop some idea of what each applicant's performance will look like, and based on that, you will hire the applicant or applicants with the best probable performance. However, you know that each applicant comes from one of two populations which you believe to have different statistical characteristics, and you know immediately which population each applicant comes from. <br><br>We will use the following model: We will assume that the population from which the applicants are taken is made up of two sub-populations A and B. These two sub-populations have different distributions of individual mean performance that are both Gaussian. That is, an individual drawn from sub-population A will have an expected performance that is normally distributed with mean \(\mu_A\) and variance \(\sigma_A^2\). Likewise, an individual drawn from sub-population B will have an expected performance that is normally distributed with mean \(\mu_B\) and variance \(\sigma_B^2\). Individual performances are then taken to be normally distributed with the individual mean and individual variance \(\sigma_i^2\). <br><br>Suppose we take a given applicant who we know comes from sub-population B. We sample her performance N times and get performances of \(\{x_1,x_2,x_3,...,x_N\}=\textbf{x}\). We form the following complete pdf for the (N+1) variables of the individual mean and the N performances: \[ f_{\mu_i,\textbf{x}|B}(\mu_i,x_1,x_2,...,x_N)=\frac{1}{\sqrt{2\pi}^{N+1}}\frac{1}{\sigma_B \sigma_i^N} \exp\left ({-\frac{(\mu_i-\mu_B)^2}{2\sigma_B^2}} \right ) \prod_{k=1}^N\exp\left ({-\frac{(x_k-\mu_i)^2}{2\sigma_i^2}} \right ) \] It follows that the distribution conditioned on the test results is proportional to: \[ f_{\mu_i|,\textbf{x},B}(\mu_i)\propto \exp\left ({-\frac{(\mu_i-\mu_B)^2}{2\sigma_B^2}} \right ) \prod_{k=1}^N\exp\left ({-\frac{(x_k-\mu_i)^2}{2\sigma_i^2}} \right ) \] By normalizing we find that this implies that the individual mean, given that it comes from sub-population B and given the N test results, is normally distributed with variance \[ \sigma_{\tilde{\mu_i}}^2=\left ( {\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} \right )^{-1} \] and mean \[ \tilde{\mu_i}=\frac{\frac{\mu_B}{\sigma_B^2}+\frac{1}{\sigma_i^2}\sum_{k=1}^{N}x_k}{\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} =\frac{\frac{\mu_B}{\sigma_B^2}+\frac{N}{\sigma_i^2}\bar{\textbf{x}}}{\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} \] We will assume that this mean and variance are used as estimators to predict performance. Note that, in the limit of large N, \(\sigma_{\tilde{\mu_i}}^2\rightarrow \sigma_i^2/N\) and \(\tilde{\mu_i}\rightarrow \bar{\textbf{x}}\rightarrow \mu_i\), as expected. <br><br>Suppose we assume sub-populations A and B have the same variance \(\sigma_{AB}^2\), but \(\mu_A>\mu_B\). then we can note the following few implications: <ul><li>The belief about the sub-population the applicant comes from acts effectively as another performance sample of weight \(\sigma_i^2/\sigma_{AB}^2\).</li> <li>If applicant 1 comes from sub-population A and applicant 2 comes from sub-population B, even if they perform identically in their samples, applicant 1 would nevertheless still be preferred.</li> <li>The more samples are taken, the less the sub-population the applicant comes from matters.</li> <li>The larger the difference in means between the sub-populations is assumed to be, the better the lesser-viewed applicant will need to perform in order to be selected over the better-viewed applicant.</li> <li>Suppose we compare \(\tilde{\mu_i}\) to \(\bar{\textbf{x}}\). Our selection criteria will simply be if the performance predictor is above \(x_m\). We want to find the probability of being from a given sub-population given that the applicant was selected by each predictor. For the sub-population-indifferent predictor: \[ P(A|\bar{\textbf{x}}\geq x_m)=\frac{P(\bar{\textbf{x}}\geq x_m|A)P(A)}{P(\bar{\textbf{x}}\geq x_m|A)P(A)+P(\bar{\textbf{x}}\geq x_m|B)P(B)} \\ \\ P(A|\bar{\textbf{x}}\geq x_m)= \frac{P(A)Q\left (\frac{x_m-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} {P(A)Q\left (\frac{x_m-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right ) + P(B)Q\left (\frac{x_m-\mu_B}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} \] Where \[ Q(z)=\int_{z}^{\infty}\frac{e^{-s^2/2}}{\sqrt{2\pi}}ds\approx \frac{e^{-z^2/2}}{z\sqrt{2\pi}} \] For the sub-population-sensitive predictor, we first note that \[ \tilde{\mu_i} \geq x_m \Rightarrow \bar{\textbf{x}}\geq x_m+(x_m-\mu_A)\frac{\sigma_i^2}{N\sigma_A^2}=x_m' \] Which then implies \[ P(A|\tilde{\mu_i}\geq x_m)=\frac{P(\tilde{\mu_i}\geq x_m|A)P(A)}{P(\tilde{\mu_i}\geq x_m|A)P(A)+P(\tilde{\mu_i}\geq x_m|B)P(B)} \\ \\ P(A|\tilde{\mu_i}\geq x_m)= \frac{P(A)Q\left (\frac{x_m'-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} {P(A)Q\left (\frac{x_m'-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right ) + P(B)Q\left (\frac{x_m'-\mu_B}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} \] As \(x_m > \mu_A\) and thus \(x_m' > x_m\), it is easy to see that \(P(A) < P(A|\bar{\textbf{x}}\geq x_m) < P(A|\tilde{\mu_i}\geq x_m) \). Thus the sensitivity further biases the selection towards sub-population A. We can call \(\bar{\textbf{x}}\) the <b>meritocratic predictor</b> and \(\tilde{\mu_i}\) the <b>semi-meritocratic predictor</b>. </li></ul> <br><h2>Some Sociological Implications</h2>Though the above effects may, in theory, be small, their effects in practice may not be. Humans are not perfectly rational and are not perfect statistical computers. The above is meant to give motivation for taking seriously effects that are often much more pronounced. If there is a perceived difference in means, there is likely a tendency to exaggerate it, to think that the difference in means should be visible, and hence that the two distributions should be statistically separable. Likewise, population variances are often perceived as narrower than they really are, leading to further amplification of the biasing effect. Moreover, the parameter estimations are not based simply on objective observation of the sub-populations, but also if not mainly on subjective, sociological, psychological, and cultural factors. As high confidence in one's initial estimates makes one less likely to take more samples, the employer's judgment may rest heavily on subjective biases. Given this, if the employer's objective is simply to hire the best candidates, she should simply use the meritocratic predictor (or perhaps at least invest some time into getting accurate sub-population parameters). <br><br> However, it is worth noting some effects on the candidates themselves. As a rule, the candidates are not subjected to this bias just in this bid for employment alone, but rather serially and repeatedly, in bid after bid. This may have any of the following effects: driving applicants toward jobs where they will be more favored (or less dis-favored) by the bias; affecting the applicant's self-evaluations, making them think their personal mean is closer to the broadly perceived sub-population mean; normalizing the broadly perceived sub-population mean, with an implicit devaluation of deviation from it. Also, we can note the following well-known problem: personal means tend to increase in challenging jobs, meaning that the unfavorable bias will perpetually stand in the way of the development of the negatively biased candidate, which then only serves to further feed into the bias. Both advantages and disadvantages tend to widen, making this a subtle case of "the rich get richer and the poor get poorer". <br><br>The moral of all this can be summarized as: the semi-meritocratic predictor should be avoided if possible as it is very difficult to implement effectively and has a tendency to introduce a host of detrimental effects. Fortunately, the meritocratic predictor loses only a small amount by way of informative-ness, and avoids the drawbacks mentioned above. Care should then be taken to ensure that the meritocratic selection system is implemented as carefully as can be managed to preclude the introduction of biasing effects. one way of washing out the effects of biasing in general is simply to give the applicants many opportunities to demonstrate their abilities. <br> Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-52954088498135132532017-08-01T14:36:00.001-07:002017-08-06T18:14:44.183-07:00Some Newtonian Gravitational Mechanics<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h2>Duration of a trajectory</h2><br />Suppose we launch an object straight up. We wish to find how long it will take to return. Suppose we launch it up at a speed \(v_0\). It is well known that the classical escape velocity is given by \[ v_e=\sqrt{\frac{2MG}{R}} \] By examining the energy equation, we find that the speed when the object is a distance r from the center of the planet is given by: \[ v(r)=v_e\sqrt{\frac{R}{r}-\gamma} \] Where \[ \gamma=1-\frac{v_0^2}{v_e^2} \] To find the travel time, we integrate: \[ T=2\int_{R}^{R/\gamma}\frac{dr}{v(r)}=\frac{2}{v_e}\int_{R}^{R/\gamma}\frac{dr}{\sqrt{\frac{R}{r}-\gamma}}=\frac{2R}{v_e}\int_{\gamma}^{1}\frac{du}{u^2\sqrt{u-\gamma}} \] \[ T=\frac{2R}{v_e}\frac{\tan^{-1}\left ( \sqrt{\frac{1}{\gamma}-1} \right )+\sqrt{\gamma-\gamma^2}}{\gamma^{3/2}}=\frac{2R}{v_e}\frac{\sin^{-1}(u)+u\sqrt{1-u^2}}{(1-u^2)^{3/2}} \] Where \(u=v_0/v_e\). <h2>Optimal Path through a Planet</h2><br />We want to find the best path through a planet of radius R, connecting two points \(2\alpha\) radians apart (great circle angle). We assume the planet is of uniform density. As is well known, the acceleration due to gravity a radius r from the center of the planet is given by: \[ a=-gr/R \] Where \(g\) is the surface gravitational acceleration. Thus, if it falls from the surface along a path through the planet, its speed at a distance r from the center will be given by \[ \tfrac{1}{2}mv^2=\tfrac{1}{2}m\frac{g}{R}\left ( R^2-r^2 \right ) \] \[ v(r)=\sqrt{\frac{g}{R}} \sqrt{R^2-r^2} \] Let us suppose it falls along the path specified by the function \(r(\theta)\), where r is even and \(r(\pm\alpha)=R\). The total time is given by \[ T=2\int_{0}^{\alpha}\frac{d\ell}{v}=2\sqrt{\frac{R}{g}}\int_{0}^{\alpha}\frac{\sqrt{r^2(\theta)+r'^2(\theta)}}{\sqrt{R^2-r^2(\theta)}}d\theta \] In order to obtain conditions for the optimal path, then, we use calculus of variations. The Lagrangian is \[ L(r,r',\theta)=\frac{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}} \] Using the Beltrami Identity, we find: \[ \frac{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}}-\frac{r'^2}{{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}}}=\frac{r^2}{{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}}}=C \] Let \(1+ \tfrac{1}{C^2}=1/q^2\). Rearranging, we find: \[ r'=r\sqrt{\frac{\left (1+ \tfrac{1}{C^2} \right )r^2-R^2}{R^2-r^2}}=\frac{r}{q}\sqrt{\frac{r^2-R^2q^2}{R^2-r^2}} \] As \(r'(0)=0\), this implies that \[ r(0)=Rq \] \[ r=\frac{r(0)}{R} \] Let us make the change of variables: \(u=r^2/R^2\). This then gives: \[ u'=2u\sqrt{\frac{\tfrac{1}{q^2}u-1}{1-u}} \] \[ u(0)=q^2 \] In order to determine this value, we can integrate the differential equation: \[ \frac{1}{2u} \sqrt{\frac{1-u}{\tfrac{1}{q^2}u-1}}du=d\theta \] \[ \int_{q^2}^{1}\frac{1}{2u} \sqrt{\frac{1-u}{\tfrac{1}{q^2}u-1}}du=\frac{\pi}{2}(1-q)=\int_{0}^{\alpha}d\theta=\alpha \] Thus \[ q=1-\frac{2\alpha}{\pi} \] We can then find the total travel time: \[ T=2\sqrt{\frac{R}{g}}\int_{0}^{\alpha}\frac{\sqrt{r^2(\theta)+r'^2(\theta)}}{\sqrt{R^2-r^2(\theta)}}d\theta=2\sqrt{\frac{R}{g}}\int_{Rq}^{R}\frac{\sqrt{r^2(\theta)+r'^2(\theta)}}{\sqrt{R^2-r^2(\theta)}}\frac{1}{r'}dr \] \[ T=2\sqrt{\frac{R}{g}}\int_{Rq}^{R} \frac{q}{r} \sqrt{r^2+\frac{r^2}{q^2}{\frac{r^2-R^2q^2}{R^2-r^2}}}\frac{dr}{\sqrt{r^2-R^2q^2}} \] \[ T=\sqrt{\frac{R}{g}}\sqrt{1-q^2}\int_{Rq}^{R} \frac{2rdr}{\sqrt{R^2-r^2} \sqrt{r^2-R^2q^2}} \] \[ T=\sqrt{\frac{R}{g}}\sqrt{1-q^2}\int_{q^2}^{1} \frac{dx}{\sqrt{1-x^2} \sqrt{x^2-q^2}} \] \[ T=\pi \sqrt{\frac{R}{g}}\sqrt{1-q^2}=2\sqrt{\frac{R}{g}}\sqrt{\pi\alpha-\alpha^2} \] Below we show several trajectories along the optimal path for several values of alpha: <div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-Oq7T-WZW90M/WYCSKWQsuqI/AAAAAAAAQrs/OhqG-Givxjs8atQkxnJdBNW3Sc2jZuqVgCLcBGAs/s1600/tunnels.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-Oq7T-WZW90M/WYCSKWQsuqI/AAAAAAAAQrs/OhqG-Givxjs8atQkxnJdBNW3Sc2jZuqVgCLcBGAs/s640/tunnels.jpg" width="864" height="513" data-original-width="1260" data-original-height="749" /></a></div><br /><br />In fact, these solutions are hypocycloids. Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-56025425361770109732017-07-27T08:17:00.002-07:002017-07-27T14:04:41.919-07:00Golomb's Sequence<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h2>Definition</h2><br />Golomb's sequence, named after Solomon Golomb, is a curious sequence of whole numbers that describes itself. It is defined in the following way: it is a non-decreasing sequence of whole numbers where the nth term gives the number of times n occurs in the sequence, and the first term is 1. From this we can begin constructing it: The second element must be greater than 1 as there is only one 1. It must be 2, and so must be the third element. Given this, there must be 2 threes, and from here on we may merely refer to the terms in the sequence and continue from there. The first several terms of the sequence are: \[ 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, \\ 9, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12,... \] <br /><br /><h2>Recurrence Relation</h2><br />The sequence can be given an explicit recurrence relation by stating it in the following way, using the self-describing property: To determine the next term in the sequence, go back the number of times that the previous term occurred (this will put you at the next-smallest value), then add one. For example, to determine the 12 term (6), count the number of times that the value of the 11th term (5) occurs (3 times). Step back that many terms (to the 9th term: 5) then add one to that value (6). This then gives the recurrence relation: \[ a(n+1)=1+a\left ( n+1-a(a(n)) \right ) \] Where \(a(1)=1\). <br /><br /><h2>Asymptotic Behavior</h2><br />The recurrence relation allows us to give an asymptotic expression for the value of the sequence. Let us suppose the sequence grows like \[ a(n)=A n^\alpha \] Let us put this into the recurrence relation: \[ A(n+1)^\alpha=1+A\left ( n+1-A(A n^\alpha)^\alpha \right )^\alpha \] Simplifying and rearranging, we obtain: \[ 1=\frac{1}{A(n+1)^\alpha}+\left (1-A^{1+\alpha}\frac{n^{\alpha^2}}{n+1} \right )^\alpha \] As \(\alpha<1\), \(\frac{n^{\alpha^2}}{n+1}\) goes to zero. For small x, \((1+x)^b\rightarrow 1+bx\). Thus, asymptotically: \[ 1\approx\frac{1}{A(n+1)^\alpha}+1-\alpha A^{1+\alpha}\frac{n^{\alpha^2}}{n+1} \] \[ \alpha A^{2+\alpha}n^{\alpha^2}(n+1)^{\alpha-1} \approx 1 \] Thus it must be the case that \[ \alpha^2+\alpha-1=0 \] \[ A=\alpha^{-\frac{1}{2+\alpha}} \] The solution to the first equation is \[ \alpha=\left \{\varphi-1,-\varphi \right \} \] Where \(\varphi\) is the golden ratio. As the exponent is clearly positive, we find the sequence is asymptotic to: \[ a(n)\rightarrow \varphi^{2-\varphi}n^{\varphi-1} \] Below we plot the ratio of these two expressions: <div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-GtraMPsc2XY/WXoDz1eqALI/AAAAAAAAQrM/_v8pTX5ZLr0XXF_6VdM4mRcnGZgUdAE3gCLcBGAs/s1600/Selection_191.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-GtraMPsc2XY/WXoDz1eqALI/AAAAAAAAQrM/_v8pTX5ZLr0XXF_6VdM4mRcnGZgUdAE3gCLcBGAs/s640/Selection_191.png" width="640" height="316" data-original-width="1600" data-original-height="790" /></a></div><br /><br />Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-6382304003783265632017-07-07T11:05:00.001-07:002017-07-07T15:09:05.345-07:00Continued Fractions<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h2>Definition and Background</h2><br />A continued fraction is a representation of a number \(x\) in the form \[ x=a_0+\cfrac{b_0}{a_1+\cfrac{b_1}{a_2+\cfrac{b_2}{a_3+\cfrac{b_3}{\ddots}}}} \] Often, the b's are taken to be all 1's and the a's are integers. This is called the canonical or simple form. There are numerous ways of representing continued fractions. For instance, \[ x=a_0+\cfrac{1}{a_1+\cfrac{1}{a_2+\cfrac{1}{a_3+\cfrac{1}{\ddots}}}} \] can be represented as \[ x=a_0+\overset{\infty}{\underset{k=1}{\mathrm{K}}}\frac{1}{a_k} \] Or as \[ \left [ a_0;a_1,a_2,a_3,... \right] \] <br /><br /><h2>Construction Algorithm</h2><br />The continued fraction terms can be determined as follows: Given \(x\), set \(x_0=x\). Then \[ a_k=\left \lfloor x_k \right \rfloor \] \[ x_{k+1}=\frac{1}{x_k-a_k} \] Continue until \(x_k=a_k\). <br /><br /><h2>Convergents</h2><br />The convergents of a continued fraction are the rational numbers resulting from taking the first n terms of the continued fraction. Let \(P_n\) and \(Q_n\) be the numerators and deominators respectively of the nth convergent (the one that includes \(a_n\)). It is not difficult to show that \[ P_n=a_nP_{n-1}+P_{n-2} \] \[ Q_n=a_nQ_{n-1}+Q_{n-2} \] An alternate way of saying this is that \[ \begin{bmatrix} a_n & 1\\ 1 & 0 \end{bmatrix} \begin{bmatrix} P_{n-1} & Q_{n-1}\\ P_{n-2} & Q_{n-2} \end{bmatrix} = \begin{bmatrix} P_{n} & Q_{n}\\ P_{n-1} & Q_{n-1} \end{bmatrix} \] Where \[ \begin{bmatrix} P_{-1} & Q_{-1}\\ P_{-2} & Q_{-2} \end{bmatrix}= \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \] And therefore \[ {}^L\prod^n_{k=0} \begin{bmatrix} a_k & 1\\ 1 & 0 \end{bmatrix} = \begin{bmatrix} P_{n} & Q_{n}\\ P_{n-1} & Q_{n-1} \end{bmatrix} \] Let \[p_n=\frac{P_n}{P_{n-1}}\] \[q_n=\frac{Q_n}{Q_{n-1}}\] Then \[p_n=a_n+\frac{1}{p_{n-1}}\] \[q_n=a_n+\frac{1}{q_{n-1}}\] We find that \[ \frac{P_{n+1}}{Q_{n+1}}=a_0+\sum_{k=0}^{n}\frac{(-1)^k}{Q_kQ_{k+1}} \] And thus \[ \left | x- \frac{P_{n}}{Q_{n}}\right |<\frac{1}{Q_nQ_{n+1}} \] As \(a_n \geq 1\), \(Q_n \geq F_n\) i.e. the nth Fibonacci number. This, then, implies Hurwitz's theorem: For any irrational number x, there exist infinitely many ratios \(P/Q\) such that \[ \left | x-\frac{P}{Q} \right |<\frac{k}{Q^2} \] Only if \(k \geq 1/\sqrt{5}\). <br /><br /><h2>Periodic Continued Fractions</h2><br />Suppose that for \(k \geq N\), \(a_{k+M}=a_k\). Let \[ [a_0;a_1,a_2,...a_{N-2}]=\frac{P_{Y1}}{Q_{Y1}} \] \[ [a_0;a_1,a_2,...a_{N-1}]=\frac{P_{Y2}}{Q_{Y2}} \] \[ \left [a_N;a_1,a_2,...a_{N+M-2} \right ]=\frac{P_{Z1}}{Q_{Z1}} \] \[ \left [a_N;a_1,a_2,...a_{N+M-1} \right ]=\frac{P_{Z2}}{Q_{Z2}} \] Then x satisfies the formula \[ x=\frac{P_{Y2}\cdot y+P_{Y1}}{Q_{Y2} \cdot y+Q_{Y1}} \] Where y satisfies \[ y=\frac{P_{Z2}\cdot y+P_{Z1}}{Q_{Z2} \cdot y+Q_{Z1}} \] Thus a continued fraction will be eventually periodic if and only if it is the solution of some quadratic polynomial. <br /><br /><h2>Generic Continued Fractions</h2><br /> Let x be uniformly chosen between 0 and 1. We define a sequence of random variables as follows \[ \xi_0=x \] \[ \xi_{n+1}=\frac{1}{\xi_n}-\left \lfloor \frac{1}{\xi_n} \right \rfloor \] Clearly, if \[x=[0;a_1,a_2,a_3,...] \] Then \[\xi_n=[0;a_{n+1},a_{n+2},a_{n+3},...]\] Let us assume that, asymptotically, the \(\xi\)'s approach a single distribution. Based on our definitions, this would imply that \[ P(\xi_{n+1} < z)=\sum_{k=1}^{\infty} P \left (\frac{1}{k} < \xi_n < \frac{1}{k+z} \right ) \] Differentiating both sides gives the required relationship: \[ f_\xi(z)=\sum_{k=1}^{\infty}\frac{f_\xi\left ( \tfrac{1}{k+z} \right )}{(k+z)^2} \] Let us test the function \[ f_\xi(z)=\frac{A}{1+z} \] \[ \sum_{k=1}^{\infty}\frac{A}{1+\tfrac{1}{k+z}}\frac{1}{(k+z)^2}=\sum_{k=1}^{\infty}\frac{1}{(1+k+z)(k+z)} \] \[ \sum_{k=1}^{\infty}\frac{1}{(1+k+z)(k+z)}=\sum_{k=1}^{\infty}\frac{1}{k+z}-\frac{1}{k+z+1}=\frac{A}{1+z} \] It can be proved more rigorously that this is indeed the asymptotic probability density function, with \(A=1/\ln(2)\). Thus \[ P(\xi_{n} < z)=\log_2(1+z) \] From this we can easily find the asymptotic density function for the continued fraction terms. The probability that \(a_{n+1}=k\) is the same as the probability that \(\left \lfloor \tfrac{1}{\xi_n} \right \rfloor=k\). This is then \[ P(a_{n+1}=k)=P\left ( \frac{1}{k+1} < \xi_n \leq \frac{1}{k} \right )=\log_2(1+\tfrac{1}{k})-\log_2(1+\tfrac{1}{k+1}) \] \[ P(a_{n+1}=k)=\log_2\left ( \frac{(k+1)^2}{k(k+2)} \right )=\log_2\left ( 1+\frac{1}{k(k+2)} \right ) \] This is called the Gauss-Kuzmin Distribution. <br /><br />From this we can then easily find the asymptotic geometric mean of the terms \[ \underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}a_k}=\exp\left (\underset{n \to \infty}{\lim} \frac{1}{n}\sum_{k=1}^{\infty} \ln(a_k)\right )=\exp\left ( E(\ln(a_k)) \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}a_k}= \exp\left (\sum_{j=1}^{\infty}P(a_k=j)\ln(j) \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}a_k}= \prod_{j=1}^\infty \left ( 1+\frac{1}{j(j+2)} \right )^{\log_2(j)}=2.685452001...=K_0 \] This value is called Khinchin's Constant. <br /><br />Let us now look at the asymptotic behavior of the convergents. Namely, we wish to examine the asymptotic behavior of the denominators. First note that \[ \xi_n=\frac{1}{\xi_{n-1}}-a_n \] If we let \(y_n=1/\xi_n\), we then have \[ y_{n-1}=a_n+\frac{1}{y_n} \] From above we have that \[q_n=a_n+\frac{1}{q_{n-1}}\] As, asymptotically, \(\xi_n \sim \xi_{n-1}\), this implies that, asymptotically, \(y_n \sim y_{n-1} \sim 1/\xi_n\) and therefore \(q_n \sim q_{n-1} \sim 1/\xi_n\). Thus \[ f_q(z)=\left\{\begin{matrix} \frac{1}{z^2}\frac{1}{\ln(2)}\frac{1}{1+1/z} \\ 0 \end{matrix}\right.\; \; \begin{matrix} z > 1 \\ z \leq 1 \end{matrix} \] As \[ Q_n=\prod_{k=1}^{n}q_k \] We have \[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}=\underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}q_k}=\exp\left (\underset{n \to \infty}{\lim}\frac{1}{n}\sum_{k=1}^{\infty}\ln(q_k) \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}=\exp\left (E(\ln(q_n)) \right )= \exp\left (\int_{1}^{\infty}\ln(z)\frac{1}{z^2}\frac{1}{\ln(2)}\frac{1}{1+1/z}dz \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}= \exp\left (-\frac{1}{\ln(2)}\int_{0}^{1}\frac{\ln(z)}{1+z}dz \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}=\exp\left ( \frac{\pi^2}{12\ln(2)} \right )=3.275823... \] This value (or sometimes its natural log) is called Levy's constant. <br /><br />We want to know how efficient continued fractions are for representing numbers relative to place-value expansions. Suppose we are working in base b. We want to find how many terms in the continued fraction expansion are required to obtain an approximation good to m base-b digits. We will have obtained such an approximation when the error is less than \(b^{-m}\) but greater than \(b^{-(m+1)}\). From above we have \[ \left | x- \frac{P_{n}}{Q_{n}}\right |<\frac{1}{Q_nQ_{n+1}} \] Thus \[ b^{-(m+1)} < \left | x- \frac{P_{n}}{Q_{n}}\right | < \frac{1}{Q_nQ_{n+1}} < \frac{1}{Q_n^2} \leq b^{-m} \] Rearranging, we have \[ b^m \leq Q_n^2 < b^{m+1} \] \[ b^{\frac{m}{2n}} \leq \sqrt[n]{Q_n} < b^{\frac{m+1}{2n}} \] Thus, as the center expression approaches a limit for large n, it follows that \(m/n\) does as well. Namely, by rearranging, we find that for n the number of continued fraction terms needed to express x in base b up to m decimal places, \[ \underset{m,n \to \infty}{\lim}\frac{m}{n}=\frac{\pi^2}{6\ln(2)\ln(b)} \] This is known as Loch's Theorem. In particular, for base 10, this implies that each continued fraction term provides on average 1.03064... digits of precision. In fact, base 10 is the largest integral base for which the continued fraction is more efficient. Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-45464010671814289922017-05-31T13:34:00.002-07:002017-07-15T14:06:55.552-07:00Iterated Radicals<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h2>The Case of Square Roots</h2><br />We wish to examine the behavior of the iterated radical expression \[ R_a(n)=\underbrace{\sqrt{a+\sqrt{a+...\sqrt{a+R_a(0)}}}}_{n\textrm{ radicals}} \] Let \[ A=\lim_{n \rightarrow \infty} R_a(n) \] Then clearly \[ A^2=a+A \] And so \[ A=\tfrac{1+\sqrt{1+4a}}{2} \] In order to determine the nature of the convergence to this limit, let us examine a function defined as follows: \[ f(x/q)=\sqrt{a+f(x)} \] Where q is a value yet to be determined. Clearly \(f(0)=A\), and it is not hard to see that \[ R_a(n)=f\left ( \tfrac{f^{-1}(R_a(0))}{q^n} \right ) \] Thus the behavior of f, as well as the value of q, will determine the convergence of \(R_a(n)\). We rearrange the above relation to get \[ f^2(x)=a+f(qx) \] Let us expand f in a Taylor series. \[ f(x)=A+b_1 x +b_2 x^2 +b_3 x^3+... \] We can substitute this into our functional equation to get \[ A^2+2Ab_1 x+(2A b_2+b_1^2)x^2+(2Ab_3+2b_1b_2)x^3+...=a+A+qb_1x+q^2b_2x^2+q^3b_3x^3+... \] By equating coefficients, we find that \(q=2A\). Note that changing \(b_1\) only affects the scaling of the function. Assuming we want the inverse to be positive as we approach from below, \(b_1\) must be negative, thus we simply set \(b_1=-1\). Now the rest of the coefficients can be found algorithmically in sequence. In general, the coefficient of \(x^k\) will be \[ b_k=\frac{1}{(2A)^k-2A}\sum_{j=1}^{k-1}b_jb_{k-j} \] And thus \[ f\left ( \tfrac{x}{2A} \right )=\sqrt{a+f(x)} \]\[ f^2(x)=a+f(2Ax) \\ R_a(n)=f\left ( \tfrac{f^{-1}(R_a(0))}{(2A)^n} \right ) \] Where f is defined by the polynomial with the given coefficients. It follows that \[ \lim_{n \rightarrow \infty} (2A)^n(A-R_a(n))=\lim_{n \rightarrow \infty} (2A)^n(f(0)-f(f^{-1}(R_a(0))/(2A)^n)) \]\[ \lim_{n \rightarrow \infty} (2A)^n(A-R_a(n))=-f'(0)f^{-1}(R_a(0))=f^{-1}(R_a(0)) \]\[ \lim_{n \rightarrow \infty} (2A)^n \left (A-\underbrace{\sqrt{a+\sqrt{a+...\sqrt{a+z}}}}_{n\textrm{ radicals}} \right )=f^{-1}(z) \] Another way to construct \(f(x)\) is by the following approach, which converges fairly quickly: Let \(f_0(x)=A-x\). We define \[ f_{k+1}(x)= f_k^2\left (\frac{x}{2A} \right )-a \] Then \[ \lim_{k \rightarrow \infty}f_k(x)=f(x) \] <h3>A Special Trigonometric Case</h3><br /> For the case of \(a=2\), it is easy to show by induction that \[ b_k=2(-1)^k\frac{1}{(2k)!} \] Which would imply that \[ f(x)=2\cos(\sqrt{x}) \] Therefore \[ \lim_{n \rightarrow \infty} 4^n \left( 2-\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2}}}}_{n \textrm{ radicals}}\right)=\pi^2/4 \] <h3>An Infinite Product</h3><br /> Beginning with \[ f^2(x)=a+f(2Ax) \] Let us differentiate to obtain \[ f(x)f'(x)=Af'(2Ax) \] Thus, if we define \[ g(x)=-xf'(x) \] Then we easily see that \[ g(2Ax)=2g(x)f(x) \] Clearly \(g(0)=0, g'(0)=1\). Then \[ g(x)=2f\left (\tfrac{x}{2A} \right )g\left (\tfrac{x}{2A} \right )=2^2f\left (\tfrac{x}{2A} \right ) f\left (\tfrac{x}{(2A)^2} \right ) g\left (\tfrac{x}{(2A)^2} \right ) \]\[ g(x)=2^N g\left (\tfrac{x}{(2A)^N} \right )\prod_{k=1}^{N} f\left (\tfrac{x}{(2A)^k} \right ) \]\[ g(x)=(2A)^N g\left (\tfrac{x}{(2A)^N} \right )\prod_{k=1}^{N} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \] Taking the limit \[ g(x)=\underset{N \to \infty}{\lim}(2A)^N g\left (\tfrac{x}{(2A)^N} \right )\prod_{k=1}^{N} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right )=x\prod_{k=1}^{\infty} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \] Thus \[ -f'(x)=\prod_{k=1}^{\infty} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \] Thus we need only examine the zeros of f to find the zeros of f'. In fact, if f has zeros \[\left \{z_1,z_2,z_3,... \right \}\] Then f will have extrema at \[ \bigcup_{k=1}^{\infty}\left \{(2A)^kz_1,(2A)^kz_2,(2A)^kz_3,... \right \} \] <br /><br /><h4>An Associated Infinite Series</h4><br />Differentiating the log of both sides of the result above, we find the infinite series: \[ \frac{d}{dx}\ln\left (-f'(x) \right )=\frac{d}{dx}\ln\left (\prod_{k=1}^{\infty} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \right ) \]\[ \frac{f''(x)}{f'(x)}=\sum_{k=1}^{\infty}\frac{1}{(2A)^k}\frac{f'\left (\tfrac{x}{(2A)^k} \right )}{f\left (\tfrac{x}{(2A)^k} \right )} \] <h3>Zeros of \(f(x)\)</h3><br /> Below is a plot of the zeros of for different values of a on the vertical axis, plotted semi-logarithmically. <div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-8Y2gc0WUeQo/WTrQBwTgsEI/AAAAAAAAQg4/cHSzodRZ3pQnGXk8vTX3j2U_e3150W7LgCLcB/s1600/fzeros.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-8Y2gc0WUeQo/WTrQBwTgsEI/AAAAAAAAQg4/cHSzodRZ3pQnGXk8vTX3j2U_e3150W7LgCLcB/s1600/fzeros.jpg" data-original-width="1050" data-original-height="589" /></a></div><br /><br /> Below is a plot the sign of f (Yellow is positive, blue is negative), from which the zero contours can be seen. However, we can also see that some zeroes of f for certain values of a are multiple roots, as f goes to zero without changing sign. <div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-UTO5d6cusr4/WTrP7swDJVI/AAAAAAAAQg0/QG3j7H-5sbIznkoTf6EoiNYecMde-05OQCLcB/s1600/ysign2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-UTO5d6cusr4/WTrP7swDJVI/AAAAAAAAQg0/QG3j7H-5sbIznkoTf6EoiNYecMde-05OQCLcB/s1600/ysign2.jpg" width="1050" height="589" data-original-width="1353" data-original-height="782" /></a></div><br /><br /> <h4>Special Cases</h4><br /> Two special cases bear mentioning. In the case \(a=1\), the zeros are given by \[ z_n=2.1973\cdot (1+\sqrt{5})^{2n} \] for \(n \geq 0\). In fact, in this case, after the first zero, f is always between -1 and 0. f is -1 at \[ x_n=2.1973\cdot (1+\sqrt{5})^{2n+1} \] for \(n \geq 0\). For \(a=2\), the zeros are at \[ z_n=\left ( (2n+1)\frac{\pi}{2} \right )^2 \] And, in fact, \(f(x)=2\) at \(x_n=\left ( 2n\pi \right )^2\), and \(f(x)=-2\) at \(x=\left ( (2n+1)\pi \right )^2\), for \(n\geq0\). <br /><br /><h3>Periodic and Possible Fractal Structure</h3><br />Although f is generally not very interesting close to zero, it exhibits remarkable behavior on larger scales. We find, namely, that if we take \[ h(x)=\left | f(x) \right |^{x^{-\log_{2A}(2)}} \] Then h is exponentially periodic, asymptotically. We define \[J(x)=h((2A)^x)\] This function has period 1, asymptotically. Below we show the behavior of J for some values of A <div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-8WSvdX5NrYI/WT1eNbJfsxI/AAAAAAAAQiQ/zLRwu5b8xDAQuemWo2UOz3lkA9bB_SXggCLcB/s1600/ezgif.com-gif-maker%25281%2529.gif" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-8WSvdX5NrYI/WT1eNbJfsxI/AAAAAAAAQiQ/zLRwu5b8xDAQuemWo2UOz3lkA9bB_SXggCLcB/s640/ezgif.com-gif-maker%25281%2529.gif" width="640" height="478" data-original-width="568" data-original-height="424" /></a></div><br /><br />Note that the number of zeros remains constant. All seem to be single roots. In fact, the location of the dominant maxima seem constant as well However, within the periodicity, J appears to have a fractal structure. Below we show a zoom of \(J(x)\) for \(a=3\). <div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-uVW9rum-a4k/WTuguLwg-hI/AAAAAAAAQhQ/Q_jsCK4KYR87YzC_YsmT2JH-4UEJwFqYgCLcB/s1600/testnew51.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-uVW9rum-a4k/WTuguLwg-hI/AAAAAAAAQhQ/Q_jsCK4KYR87YzC_YsmT2JH-4UEJwFqYgCLcB/s640/testnew51.gif" width="640" height="478" data-original-width="568" data-original-height="424" /></a></div><br /><br /><h2>Complex Behavior</h2><br />We can take the series and functional definitions of the function and use them to extend the function to the entire complex plane. Below we plot the complex sign of \(f(Cz|z|)\) for different values of a, and a certain value of C (this rescaling done to make the regularities more evident). The complex sign is given by the color: <ul> <li><span style="background-color: #0000ff"><font color="white"><b>Dark Blue</b></font></span>\(\Leftrightarrow\textrm{Re}< 0 ,\textrm{Im} < 0 \)</li> <li><span style="background-color: #41c4f4"><b>Light Blue</b></span>\(\Leftrightarrow\textrm{Re} < 0 ,\textrm{Im} > 0 \)</li> <li><span style="background-color: #f4a742"><b>Orange</b></span>\(\Leftrightarrow\textrm{Re} > 0,\textrm{Im} < 0 \)</li> <li><span style="background-color: #FFFF00"><b>Yellow</b></span>\(\Leftrightarrow\textrm{Re} > 0,\textrm{Im} > 0 \)</li></ul>This allows us to find zeros, which correspond to points where all four colors meet. <div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-OnEmdVKFz5E/WWfTOa7Rf1I/AAAAAAAAQng/BYAmmtJKrK4iAsZoFlEdDAz-_hqT5q_4wCLcBGAs/s1600/cpxsign%25283%2529%25281%2529-iloveimg-cropped.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-OnEmdVKFz5E/WWfTOa7Rf1I/AAAAAAAAQng/BYAmmtJKrK4iAsZoFlEdDAz-_hqT5q_4wCLcBGAs/s640/cpxsign%25283%2529%25281%2529-iloveimg-cropped.gif" width="1050" height="589" data-original-width="1307" data-original-height="810" /></a></div><br />We note several remarkable features: <ul> <li>The function is conjugate-symmetric. </li><li>The function displays remarkable regularity away from the real line. Note the persistent ripples which reach total regularity at \(a=2\). There is a structure of "fingers" that gradually join, each finger corresponding to one zero. The position of certain features on the real line remains fixed, e.g. the prominent feature at about 0.8. </li> <li>The evolution of the function over a can be broken into three eras. <ol type="I"> <li><b>Pre-Saturating</b>: For \(a< 1 \), there is exactly one real zero. </li> <li><b>Saturating</b>: For \(1\leq a < 2 \), zeros join to form pairs of real zeros. </li> <li><b>Saturated</b>: For \(2 \leq a\), all zeros are real.</li></ol> <li>The number and larger-scale density of zeros remains roughly constant. </li> <li>The function displays quasi-fractal properties, as it becomes increasingly self-similar on larger scales. In a sense, a cross between periodic and fractal behavior, as seen in the other figures. </li> <li>The process of the fusing of complex zeros into pairs of real zeros can also be seen in the plots of the real zeros above, giving a new view of the branching features. </li> <li>The fingers coalesce along elliptical paths. In fact, these ellipses are of the form \(x^2+2y^2=C'^2\) </li> </ul> <br /><br /><h2>The Case of Arbitrary Roots</h2><br />More generally, suppose we examine \[ R_a(n)=\underbrace{\sqrt[p]{a+\sqrt[p]{a+...\sqrt[p]{a+R_a(0)}}}}_{n\textrm{ radicals}} \] Let \[ A=\lim_{n \rightarrow \infty} R_a(n) \] Then \[ f(x/q)=\sqrt[p]{a+f(x)} \] Clearly \(f(0)=A\), and it is not hard to see that, again \[ R_a(n)=f(f^{-1}(R_a(0))/q^n) \] If we do the same analysis as before we find that \(q=pA^{p-1}=p(1+a/A)\). Let \(f_0(x)=A-x\). We define \[ f_{k+1}(x)= f_k^p\left (\frac{x}{q} \right )-a \] Then \[ \lim_{k \rightarrow \infty}f_k(x)=f(x) \] Then similarly we have \[ \lim_{n \rightarrow \infty} q^n(A-R_a(n))=f^{-1}(R_a(0)) \] MATLAB code for evaluating the function for a given a and given radical can be found <a href="https://drive.google.com/file/d/0B0wbt0p892o7ekxVZ29qN3JHcTQ/view?usp=sharing">here</a>. Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-61518051443795888332016-01-06T10:01:00.002-08:002018-07-28T19:43:08.735-07:00Some Introductory Quantum Mechanics: Theorems of the Formalism<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> Quantum mechanics (QM) has a number of curious and interesting associated phenomena. Some of these were hinted at in the <a href="http://www.hyperphronesis.com/2015/10/some-introductory-quantum-mechanics.html">first part of this series</a>. The effects can be inferred from the mathematical formalism discussed in the <a href="http://www.hyperphronesis.com/2015/12/some-introductory-quantum-mechanics.html">previous post in this series</a>. Here we will discuss several of these, again without reference to interpretation. <br><br>This is part of a multi-part series giving a general introduction to quantum theory. This is part 3. <br><br><hr><br><h2>Heisenberg's Uncertainty Principle </h2><br> The variance of any observable is defined as \[ \sigma_A^2=\left \langle A^2 \right \rangle-\left \langle A \right \rangle^2=\left \langle \left ( A-\left \langle A \right \rangle \right )^2 \right \rangle \] Where \(\left \langle Q \right \rangle=\left. \langle \psi \right. | Q\left. |\psi \right \rangle\) is the <b>expected value</b> of the operator Q. Roughly speaking, \(\sigma_A\) is the "width" of distribution of the potential values for A. We then define a new state vector as \[ \left. | a \right \rangle=\left (A-\left \langle A \right \rangle \right ) \left. |\psi \right \rangle \] So that \[ \sigma_A^2=\left \langle a \right. |\left. a \right \rangle \] We similarly define \[ \sigma_B^2=\left \langle B^2 \right \rangle-\left \langle B \right \rangle^2=\left \langle \left ( B-\left \langle B \right \rangle \right )^2 \right \rangle=\left \langle b \right. |\left. b \right \rangle \] Where \[ \left. | b \right \rangle=\left (B-\left \langle B \right \rangle \right ) \left. |\psi \right \rangle \] Then, by the <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality">Cauchy-Schwartz inequality</a>: \[ \sigma_A^2\sigma_B^2=\left \langle a \right. |\left. a \right \rangle\left \langle b \right. |\left. b \right \rangle \geq \left | \left \langle a \right. |\left. b \right \rangle \right |^2 \] Let \(z= \left \langle a \right. |\left. b \right \rangle\). Then \[ \left | z \right |^2=[\mathrm{Re}(z)]^2+[\mathrm{Im}(z)]^2\geq[\mathrm{Im}(z)]^2=\left [\frac{z-\bar{z}}{2i} \right ]^2=\left [\frac{\left \langle a \right. |\left. b \right \rangle-\left \langle b \right. |\left. a \right \rangle}{2i} \right ]^2 \] However, \[ \left \langle a \right. |\left. b \right \rangle=\left \langle \left ( A-\left \langle A \right \rangle \right ) \left ( B-\left \langle B \right \rangle \right ) \right \rangle=\left \langle AB \right \rangle-\left \langle A \right \rangle\left \langle B \right \rangle \] \[ \left \langle b \right. |\left. a \right \rangle=\left \langle \left ( B-\left \langle B \right \rangle \right ) \left ( A-\left \langle A \right \rangle \right ) \right \rangle=\left \langle BA \right \rangle-\left \langle B \right \rangle\left \langle A \right \rangle \] So \[ \left | z \right |^2\geq\left [\frac{\left \langle AB \right \rangle-\left \langle BA \right \rangle}{2i} \right ]^2=\left [\frac{\left \langle [A,B] \right \rangle}{2i} \right ]^2 \] Where \([A,B]=AB-BA\) is the commutator of the two operators A and B (In general, two operators need not commute, and so the commutator will not vanish). Thus, we can state the general uncertainty principle for any two operators: \[ \sigma_A \sigma_B \geq\tfrac{1}{2}| \left \langle \left [A,B \right ] \right \rangle | \] For example, let us take the one-dimensional position and momentum operators: \[ A=x,\;B=\frac{\hbar}{i}\frac{\partial }{\partial x} \] \[ [x,p_x]\left. | \psi \right \rangle=xp_x\left. | \psi \right \rangle-p_xx\left. | \psi \right \rangle =\frac{\hbar}{i}\left (x\frac{\partial}{\partial x}\left. | \psi \right \rangle-\frac{\partial }{\partial x}x\left. | \psi \right \rangle \right )=i \hbar \left. | \psi \right \rangle \] Thus \[ \sigma_x \sigma_{p_x} \geq\frac{\hbar}{2} \] This is the famous <b>position-momentum uncertainty relation</b>. <br><br><hr><br><h2>No Cloning and Related Theorems </h2><br> Suppose we want to find an operator that takes a quantum state and produces a copy of it. That is, we feed in a state and a "blank" state, operate on the two of them, and the result is the original state and a copy of it. Let this operator be called C, and the blank state be called b. That is: \[ C \left. | \psi \right \rangle_A\left. | b \right \rangle_B= \left. | \psi \right \rangle_A\left. | \psi \right \rangle_B \] As C is a transformation/evolution operator, it must be <a href="https://en.wikipedia.org/wiki/Unitary_operator">unitary</a>, so it preserves inner products, and \(C^\dagger C=I\). Therefore \[ C \left. | \phi \right \rangle_A\left. | b \right \rangle_B=\left. | \phi \right \rangle_A\left. | \phi \right \rangle_B \] \[ \left \langle b \right.|_B \left \langle \phi \right.|_A C^\dagger=\left \langle \phi \right.|_B \left \langle \phi \right.|_A \] \[ \left \langle b \right.|_B \left \langle \phi \right.|_A \left. | \psi \right \rangle_A\left. | b \right \rangle_B = \left \langle b \right.|_B \left \langle \phi \right.|_A C^\dagger C \left. | \psi \right \rangle_A\left. | b \right \rangle_B =\left \langle \phi \right.|_B \left \langle \phi \right.|_A \left. | \psi \right \rangle_A\left. | \psi \right \rangle_B \] However, \(\left \langle b|b \right \rangle=1\), so \[ \left \langle \phi|\psi \right \rangle=\left \langle \phi|\psi \right \rangle^2 \] Thus \(\left \langle \phi|\psi \right \rangle \in \left \{ 0,1 \right \}\), that is, the two wavefunctions are orthogonal or identical. But the two states can be chosen arbitrarily, and need not be identical or orthogonal (indeed we can always construct a wavefunction as a linear combination of an orthogonal state and an identical state, and so achieve any inner product). <br><br>Moreover,as C must be linear, if \(\left. | \chi \right \rangle=\alpha \left. | \phi \right \rangle+\beta \left. | \psi \right \rangle\), then \[ C\left. | \chi \right \rangle_A \left. | b \right \rangle_B= C \left ( \alpha \left. | \phi \right \rangle_A+\beta \left. | \psi \right \rangle_B \right ) \left. | b \right \rangle_B =\alpha C \left. | \phi \right \rangle_A \left. | b \right \rangle_B+\beta C \left. | \psi \right \rangle_A \left. | b \right \rangle_B \] \[ C\left. | \chi \right \rangle_A \left. | b \right \rangle_B = \alpha \left. | \phi \right \rangle_A \left. | \phi \right \rangle_B + \beta \left. | \psi \right \rangle_A \left. | \psi \right \rangle_B \] However, \[ C\left. | \chi \right \rangle_A \left. | b \right \rangle_B=\left. | \chi \right \rangle_A \left. | \chi \right \rangle_B=\alpha^2 \left. | \phi \right \rangle_A \left. | \phi \right \rangle_B+\alpha\beta \left ( \left. | \phi \right \rangle_A \left. | \psi \right \rangle_B + \left. | \psi \right \rangle_A \left. | \phi \right \rangle_B \right )+\beta^2 \left. | \psi \right \rangle_A \left. | \psi \right \rangle_B \] And these two expressions clearly need not be equivalent. We are free to choose \(\alpha, \beta, \phi\), and \(\psi\) arbitrarily, and, in general, the two expressions will be unequal. Thus there cannot be a way to copy arbitrary quantum states. <br><br>Since there is no way to clone a quantum state, there is thus no way to go in the opposite direction, namely start with two identical states and transform that into a "blank" state and an original. The argument runs in much the same way, and can be seen as a dual of the no cloning theorem, called the <b>no-deleting theorem</b>. <br><br>Suppose it were possible to measure and communicate the state of an arbitrary quantum state as a sequence of classical bits. Since classical bits can be easily copied, it would then be possible to copy quantum states, in violation of the no cloning theorem. Thus it is not possible to measure and communicate the state of an arbitrary quantum state as a sequence of classical bits, and this is called the <b>no teleportation theorem</b>. <br><br>An extension of the no cloning theorem to <a href="https://en.wikipedia.org/wiki/Quantum_state#Mixed_states">mixed states</a>is the <b>no broadcast theorem</b>, which states that one can't convey a general quantum state to two or more recipients. <br><br><hr><br><h2>Correspondence Principle and the Ehrenfest Theorem </h2><br> A rather clear demand on quantum mechanics is that its predictions tend to those of standard classical mechanics in the appropriate limits. Given that we do not observe macroscopic objects to display unusual, characteristically quantum phenomena, quantum mechanics must make the same predictions as classical mechanics, asymptotically. The probabilities for macroscopic objects to display such phenomena must be vanishingly small. In general, the observed classical parameters will correspond to the expected values of the quantum analogues. <br><br>As an example, let us find the rate of change of the expected value of a generic observable \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle A \right \rangle=\frac{\mathrm{d} }{\mathrm{d} t}\left \langle \psi|A|\psi \right \rangle =\left \langle \frac{\partial }{\partial t}\psi|A|\psi \right \rangle + \left \langle \psi|\frac{\partial }{\partial t}A|\psi \right \rangle + \left \langle \psi|A|\frac{\partial }{\partial t}\psi \right \rangle \] However, since the wavefunction satisfies the Schrodinger equation, we have \[ i \hbar \frac{\partial }{\partial t}\left. | \psi \right \rangle= H\left. | \psi \right \rangle \] And, moreover \[ -i \hbar \frac{\partial }{\partial t}\left \langle \psi | \right.= \left \langle \psi | \right. H \] Thus \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle A \right \rangle =-\frac{1}{i\hbar}\left \langle\psi|HA|\psi \right \rangle + \left \langle \psi|\frac{\partial }{\partial t}A|\psi \right \rangle + \frac{1}{i\hbar}\left \langle \psi|AH|\psi \right \rangle \] \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle A \right \rangle = \frac{1}{i\hbar}\left \langle[A,H] \right \rangle+\left \langle \frac{\partial }{\partial t}A \right \rangle \] Let \(x=A\) \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle x \right \rangle = \frac{1}{i\hbar}\left \langle[x,H] \right \rangle+\left \langle \frac{\partial }{\partial t}x \right \rangle =\frac{1}{i\hbar}\left \langle[x,H] \right \rangle =\frac{1}{i\hbar}\left \langle[x,\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}+V] \right \rangle \] \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle x \right \rangle =\frac{1}{i\hbar}\left \langle[x,\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}+V] \right \rangle =\frac{\hbar}{im}\left \langle \frac{\partial }{\partial x} \right \rangle=\frac{\left \langle p \right \rangle}{m} \] Let \(p=A\) \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle p \right \rangle = \frac{1}{i\hbar}\left \langle[p,H] \right \rangle+\left \langle \frac{\partial }{\partial t}p \right \rangle \] \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle p \right \rangle = \frac{1}{i\hbar}\left \langle[p,H] \right \rangle+\left \langle \frac{\partial }{\partial t}p \right \rangle =\frac{1}{i\hbar}\left \langle[p,\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}+V] \right \rangle = -\left \langle \frac{\partial V}{\partial x} \right \rangle \] These are the same as the classical dynamical equations for the position and momentum. Thus, as it is often the case that the wavefunctions are highly localized, at least compared to macroscopic scales, quantum mechanics predicts the same macroscopic behavior as classical mechanics. <br><br>Another fact derivable from the Ehrenfest theorem is the following. Suppose Q is an operator that does not depend explicitly on time. Then we have \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle Q \right \rangle = \frac{1}{i\hbar}\left \langle[Q,H] \right \rangle \] From our discussion in the section of the Heisenberg uncertainty principle \[ \sigma_H\sigma_Q\geq\tfrac{1}{2}|\left \langle \left [ H,Q \right ] \right \rangle|=\frac{\hbar}{2}\left | \frac{\mathrm{d} }{\mathrm{d} t} \left \langle Q \right \rangle \right | \] Though time is not an observable, let us nevertheless define \[ \sigma_t=\frac{\sigma_Q}{|\mathrm{d}\left \langle Q \right \rangle/\mathrm{dt}|} \] We then have \[ \sigma_H\sigma_t\geq\frac{\hbar}{2} \] A result analogous to that of position and momentum. <br><br><hr><br><h2>Bell's Theorem and the Kochen-Specker Theorem </h2><br>Certain interpretations of quantum mechanics hold that the measurements and observations of the quantum systems are deterministic, and the only reason they seem indeterministic is because we lack full knowledge of the system. They hold that there are <b>hidden variables</b> in the system that we have not or maybe even can not uncover that govern the system, and it is only our ignorance of these that makes us unable to predict with certainty what we will observe. Models like these are called <b>realistic</b>, in the sense that, prior to the measurement, there is a definite, singular reality of what we will observe (or in some cases, counterfactually would observe). <br><br>Another principle typically regarded as fundamental is that the system is local, that is, causal effects cannot propagate faster than the speed of light. In principle, if the system could be appropriately manipulated, it would be possible to use non-local systems to send messages into the past. <br><br>Often, these interpretations are hard or impossible to test. However, certain versions can be tested, as they make predictions that would be inconsistent with those of standard quantum mechanics. Bell's inequality is one way to rule out certain types of local realistic models. <br><br>Let us take a source that produces a sequence of identical electron pairs in the entangled state \(\tfrac{1}{\sqrt{2}}\left (\left. | \uparrow\downarrow \right \rangle+\left. | \downarrow\uparrow \right \rangle \right )\). That is, the two particles are perfectly anti-correlated in the z direction. <br><br> We then send the particles in opposite directions to two detectors, A and B. These detectors measure the spin along axes at angles \(\alpha\) and \(\beta\) with respect to the z-axis respectively. Let us define \(p(\alpha,\beta)\) as +1 if the measured spins are the same (both up or both down) and -1 if they are different (one up, one down). \(P(\alpha,\beta)\) we then define as the average of p over many trials. Standard quantum theory predicts that \(P(\alpha,\beta)=-\cos(\alpha-\beta)\). <br><br>Suppose there are hidden variables that determine what will be measured. For simplicity, we consolidate them all, for the whole system, in the single variable \(\textbf{v}\). Let \(A(\alpha,\textbf{v})=1\) if the particle sent to A, which is set at angle \(\alpha\), with variables \(\textbf{v}\), will be found to have spin up, and similarly with \(A(\alpha,\textbf{v})=-1\) for spin down. Likewise with \(B(\beta,\textbf{v})=1\) and \(B(\beta,\textbf{v})=-1\) for detector B. Clearly \(p(\alpha,\beta,\textbf{v})=A(\alpha,\textbf{v})B(\beta,\textbf{v})\). Since the particles are perfectly anti-correlated when the detectors are aligned, we have \(A(\alpha,\textbf{v})=-B(\alpha,\textbf{v})\). <br><br>To average over many trials, we merely average over the different hidden variables, which are assumed to follow some sort of distribution, \(\rho(\textbf{v})\). Thus, we then have \[ P(\alpha,\beta)=\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v})A(\alpha,\textbf{v})B(\beta,\textbf{v})d\textbf{v} =-\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v})A(\alpha,\textbf{v})A(\beta,\textbf{v})d\textbf{v} \] \[ P(\alpha,\beta)-P(\alpha,\gamma)= -\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v}) \left [A(\alpha,\textbf{v})A(\beta,\textbf{v})-A(\alpha,\textbf{v})A(\gamma,\textbf{v}) \right] d\textbf{v} \] As \(A^2(\alpha,\textbf{v})\) for any input variables, we can write: \[ P(\alpha,\beta)-P(\alpha,\gamma)= -\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v}) \left [1-A(\beta,\textbf{v})A(\gamma,\textbf{v}) \right]A(\alpha,\textbf{v})A(\beta,\textbf{v}) d\textbf{v} \] Given that \[ \left |\int_{R} f(\textbf{x})d\textbf{x} \right |\leq\int_{R} |f(\textbf{x})|d\textbf{x}, \;\;\; | A(\alpha,\textbf{v})|=1, \;\;\; \rho(\textbf{v}) \left [1-A(\beta,\textbf{v})A(\gamma,\textbf{v}) \right]\geq 0 \] We then have \[ |P(\alpha,\beta)-P(\alpha,\gamma)|\leq \int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v}) \left [1-A(\beta,\textbf{v})A(\gamma,\textbf{v}) \right] d\textbf{v} \] \[ |P(\alpha,\beta)-P(\alpha,\gamma)|\leq1+P(\beta,\gamma) \] Which is <b>Bell's Inequality</b>. This equality should be satisfied for all local hidden variable interpretations (the reason locality is required is to preclude instantaneous or faster-than-light signals being sent from one detector to the other). However, it is incompatible with standard quantum mechanics. For instance, let \(\alpha=0\), \(\beta=\pi/2\) and \(\gamma=\pi/4\). Then \[ |P(\alpha,\beta)-P(\alpha,\gamma) |=\tfrac{1}{\sqrt{2}}\nleq 1+P(\beta,\gamma) =1-\tfrac{1}{\sqrt{2}} \] As there have been experiments performed that violate Bell's inequality, this provides strong evidence against local hidden variables interpretations. However, there are some loopholes: for instance, <b>superdeterminism</b>, a sort of conspiracy theory that not only are the systems we study deterministically governed, but so are our experiments, including us, and are so as to make us observe statistical violations of Bell's inequality regardless. <br><br>An associated result called the <b><a href="http://arxiv.org/pdf/quant-ph/9706009v1.pdf">Kochen-Specker (KS) Theorem</a></b> shows that <b>non-contextual</b> hidden variable interpretations are incompatible with quantum mechanics. That is, interpretations in which the observables measured have a single definite value independent of how they are measured are incompatible with quantum mechanics. However, it leaves open the possibility for contextual hidden-variables interpretations, in which the manner of measurement is relevant to the obtained result. <br><br>One might think that one could use entanglements to send messages faster than light, given that the effects are instantaneous (The moment Alice's electron is observed to have spin up on the z-axis, Bob's electron will have spin down on the z-axis). However, a theorem called the <b>no communication theorem</b> shows that it is not possible for one observer, by measuring some subset of a system, to communicate information to another observer. While the effects may be instantaneous, they do not carry information, and it is only after the two observers meet up and compare results that they note that they have correlations that defy local realism. <br><br><hr><br><h2>The Quantum Zeno Effect </h2><br>Suppose we have a particle that can be in one of two states (spinning up or down, decayed or not decayed). We can represent it as a 2 by 1 matrix. Suppose it begins in the state \[ \left. |\psi(0) \right \rangle=\begin{bmatrix} 1\\ 0 \end{bmatrix} \] If it is allowed to evolve by itself, its time dependent state is given by \[ \left. |\psi(t) \right \rangle=\begin{bmatrix} \alpha(t)\\ \beta(t) \end{bmatrix} \] Where the functions satisfy the condition stated above at t=0, and the state is properly normalized. Suppose that the other state is stable, i.e., that once it "flips" it stays "flipped". <br><br>Suppose \(|\beta(t)|^2\approx (t/\tau)^n\) for t close to 0, where \(\tau\) is some characteristic time of the system. Suppose we allow the state to evolve unperturbed for a length of time T (small relative to \(\tau\)), and then measure it. The probability that it will be found in the original state is simply \[ P_1=|\alpha(T)|^2\approx 1- (T/\tau)^n \] However, suppose, instead, that we measure it N times, after each time of length T/N. Then the chance that it will be found in the original state is the chance that it hadn't been found to have changed after any interval. That can be found by the usual methods of probability theory: \[ P_N=(|\alpha(T/N)|^2)^N\approx (1- \left (\tfrac{T}{N\tau} \right)^n)^N\approx e^{- \left(\tfrac{T}{\tau}\right)^n N^{1-n}} \] Thus, if \(n>1\), the probability tends to 1 as N increases, and if \(n<1\) the probability tends to 0 as N increases (if \(n=1\), the probability tends to an exponential function of time). Thus, if the probability changes in the appropriate way, watching a system repeatedly tends to keep it in the same state. Moreover, if the system is measured continuously, it would never change at all. This has lead some to remark that a quantum watched pot never boils. This phenomenon is called the <b>quantum Zeno effect</b>, after the philosophical paradoxes of a similar nature. <br><br><hr><br><h2>Quantum Teleportation and Indirect Entanglement </h2><br>Suppose Alice and Bob are in separate locations, but connected by classical communication channels. They also each have one of a pair of entangled particles in the state \[ \tfrac{1}{\sqrt{2}}\left. |\uparrow \right \rangle_A\left. |\uparrow \right \rangle_B+\tfrac{1}{\sqrt{2}}\left. |\downarrow \right \rangle_A\left. |\downarrow \right \rangle_B \] Where the subscripts denote whose particle it is. Alice also has another particle in the arbitrary state \[ \alpha\left. |\uparrow \right \rangle_C+\beta\left. |\downarrow \right \rangle_C \] The state of the entire system can be written as \[ \tfrac{\alpha}{\sqrt{2}}\left. |\uparrow \uparrow \uparrow \right \rangle_{ABC}+ \tfrac{\alpha}{\sqrt{2}}\left. |\downarrow \downarrow \uparrow \right \rangle_{ABC}+ \tfrac{\beta}{\sqrt{2}}\left. |\uparrow \uparrow \downarrow \right \rangle_{ABC}+ \tfrac{\beta}{\sqrt{2}}\left. |\downarrow \downarrow \downarrow \right \rangle_{ABC} \] This can also be written in the form \[ \frac{1}{2} \begin{pmatrix} \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}+\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}}\left (\alpha\left. |\uparrow \right \rangle_{B}+\beta\left. |\downarrow \right \rangle_{B} \right ) + \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}-\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}}\left (\alpha\left. |\uparrow \right \rangle_{B}-\beta\left. |\downarrow \right \rangle_{B} \right ) \\ + \tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}+\left. |\downarrow \uparrow \right \rangle_{AC}}{\sqrt{2}} \left (\beta\left. |\uparrow \right \rangle_{B}+\alpha\left. |\downarrow \right \rangle_{B} \right ) +\tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}-\left. |\downarrow \uparrow \right \rangle_{AC}}{\sqrt{2}}\left (\beta\left. |\uparrow \right \rangle_{B}-\alpha\left. |\downarrow \right \rangle_{B} \right ) \end{pmatrix} \] Thus, if Alice measures her pair of particles to be in any of the four entangled states (all of which are mutually orthogonal, and so are completely distinguishable) \[ \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}+\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}},\; \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}-\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}},\; \tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}+\left. |\downarrow \uparrow \right \rangle_{AC}}{\sqrt{2}},\; \tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}-\left. |\downarrow \uparrow \right \rangle_{AC}} {\sqrt{2}} \] Bob's state will become, respectively \[ \left (\alpha\left. |\uparrow \right \rangle_{B}+\beta\left. |\downarrow \right \rangle_{B} \right ),\; \left (\alpha\left. |\uparrow \right \rangle_{B}-\beta\left. |\downarrow \right \rangle_{B} \right ),\; \left (\beta\left. |\uparrow \right \rangle_{B}+\alpha\left. |\downarrow \right \rangle_{B} \right ),\; \left (\beta\left. |\uparrow \right \rangle_{B}-\alpha\left. |\downarrow \right \rangle_{B} \right ) \] It then suffices for Alice to communicate to Bob which entangled state she measured, and then Bob can apply an appropriate operator to put his particle in the state in which particle C was originally. Thus, the state has been teleported from Alice to Bob. Indeed, neither Alice nor Bob need know what particle C's original state was, though they can know that it was perfectly teleported. Note that the entangelment between Alice's and Bob's particles is, in the end, broken, and Alice's two particles are left entangled. <br><br><center>**********</center> <br>Another example of the odd nature of entanglement can be demonstrated with the following. Suppose we have two independent sources that produce the entangled particle pairs \[ \tfrac{\left. |\uparrow \uparrow \right \rangle_{AB}+\left. |\downarrow \downarrow \right \rangle_{AB}}{\sqrt{2}},\;\; \tfrac{\left. |\uparrow \uparrow \right \rangle_{CD}+\left. |\downarrow \downarrow \right \rangle_{CD}}{\sqrt{2}} \] Particle A is sent to Alice, D to Dave, and B and C to Becca. We can write the total state of the system as follows \[ \tfrac{1}{2}\left ( \left. |\uparrow \uparrow\uparrow \uparrow \right \rangle_{ABCD}+ \left. |\uparrow \uparrow\downarrow \downarrow \right \rangle_{ABCD}+ \left. |\downarrow \downarrow\uparrow \uparrow \right \rangle_{ABCD}+ \left. |\downarrow \downarrow\downarrow \downarrow \right \rangle_{ABCD} \right ) \] Alternatively, we could write it the following, equivalent way \[ \frac{1}{2} \begin{pmatrix} \tfrac{\left. |\uparrow \uparrow \right \rangle_{BC}+\left. |\downarrow \downarrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \uparrow \right \rangle_{AD}+\left. |\downarrow \downarrow \right \rangle_{AD}}{\sqrt{2}}+ \tfrac{\left. |\uparrow \uparrow \right \rangle_{BC}-\left. |\downarrow \downarrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \uparrow \right \rangle_{AD}-\left. |\downarrow \downarrow \right \rangle_{AD}}{\sqrt{2}} \\ + \tfrac{\left. |\uparrow \downarrow \right \rangle_{BC}+\left. |\downarrow \uparrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \downarrow \right \rangle_{AD}+\left. |\downarrow \uparrow \right \rangle_{AD}}{\sqrt{2}} + \tfrac{\left. |\uparrow \downarrow \right \rangle_{BC}-\left. |\downarrow \uparrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \downarrow \right \rangle_{AD}-\left. |\downarrow \uparrow \right \rangle_{AD}}{\sqrt{2}} \end{pmatrix} \] Thus, if Becca measures to see if her two particles are in any of the standard entangled states, as in the quantum teleportation setup, Alice's and Dave's particles will become entangled, and in the same entangled state as Becca's particles, no less. Becca can disentangle her particles from Alice's and Dave's, while entangling Alice's and Dave's particles, which initially bore no relation to one another. In this way, two particles can become entangled without ever having interacted, so entanglement need not require interaction. <br><br><hr><br><h2>Spatial Phenomena </h2><br>Let us look at the case of an electron in an infinite quantum well. That is, an electron in the potential that has the form \[ V(x)=\left\{\begin{matrix} 0\;\;\;0\leq x \leq L \\ \infty\;\;\mathrm{o.w.} \end{matrix}\right. \] As the wavefunction must be continuous, and clearly the wavefunction is zero outside the well, we have \(\psi(0)=0\) and \(\psi(L)=0\). Let us suppose the wavefunction is in an energy eigenstate. In that case, we solve the time-independent Schrodinger equation inside the well: \[ E\psi(x)=\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}\psi(x) \] This has the solutions \[ \psi(x)=A\cos(\lambda x)+B\sin(\lambda x) \] Where \(\lambda=\sqrt{2mE}/\hbar\). From the condition that \(\psi(0)=0\), \(A=0\). From the condition that \(\psi(L)=0\), \(\lambda=n\pi/L\), where n is a positive integer. From the normalization condition, we have \(B=\sqrt{2/L}\). Thus \[ \psi_n(x)=\sqrt{\frac{2}{L}}\sin\left ( \frac{n\pi x}{L} \right ) \] And the corresponding energy is \[ E_n=n^2\frac{\pi^2 \hbar^2}{2m L^2} \] Note that the energy is <b>quantized</b>, that is, it is only ever found to have a value in this discrete set of values. This feature is common in quantum mechanics: the boundary conditions, or conditions for convergence will restrict certain observables to fall into a discrete set. A related phenomenon is when the set of possible values for an observable falls into a fragmented set, of the form \([a_1,a_2]\cup[a_3,a_4]\cup...\) where the a's are strictly increasing. In such a case, the system will have allowed <b>bands</b>, and will need sizable "kicks" to get over the gaps. This is the basis for how transistors work. <br><br>Note also that in the case of the quantum well, all the eigenfunctions are orthogonal and form a complete set. An arbitrary initial wavefunction \(\psi(x,0)\) will, at time t be equal to \[ \psi(x,t)=\frac{2}{L}\sum_{n=1}^{\infty} c_n \sin\left ( \frac{n\pi x}{L} \right ) e^{-itE_n/\hbar} , \;\; c_n=\int_{0}^{L}\psi(x,0)\sin\left ( \frac{n\pi x}{L}\right )dx \] We can also see something of the correspondence principle, namely, that for high energies, the probability distribution is nearly uniform in the well (it oscillates, as it goes above and below its average value, but the scale of the oscillations, for high enough energy, is imperceptible at macroscopic scales). Classically, for a particle bouncing back and forth in such a well, we would expect a uniform distribution (supposing we didn't know where the particle began). <br><br>Another result from this, which is true of quantum systems in general, is that even in the lowest energy state, when the most energy possible has been removed (the system is as "cold" as possible), the energy is non-zero. This is called the <b>zero-point energy</b>. Thus, even at "absolute" zero, the electron would still not be motionless, since, in this case \(0< E_1 =\left \langle p^2 \right \rangle/2m\), and so the root-mean-square momentum would be non-zero. <br><br>An important case of a sort of quantum well is the atom, in which the nucleus attracts the electrons and so confines them. In the case of the atom, there are likewise quantized energy states. Since these are stationary states, the wavefunction does not vary with time, and so the effective charge density likewise is constant. This explains why the atom does not radiate energy, as it would in the classical case. However, in the case of the atom, which is necessarily a three-dimensional system, the states are also quantized with respect to angular momentum. <br><br><center>**********</center> <br>Another interesting phenomenon is that of <b>quantum tunneling</b>. Suppose we have a particle moving in the +x direction impinging on a finite barrier of the form \[ V(x)=\left\{\begin{matrix} \tfrac{\hbar^2 q}{2m}\;\;\;0\leq x \leq L \\ 0\;\;\;\;\; \mathrm{o.w.} \end{matrix}\right. \] Let us call the regions before in and beyond the barrier regions I, II, and III respectively. Suppose it initially has momentum \(\hbar k\). Its energy will be given by \(\hbar^2 k^2/2m\), and further suppose that this energy is less than the potential barrier. Solving the schrodinger equation inside the barrier, we easily find that the wavefunction will be of the form \(A e^{\lambda x}+Be^{-\lambda x}\), where \(\lambda=\sqrt{q-k^2}\). <br><br>Then we can write the wavefunction (ignoring normalization) in the three regions as \[ \psi(x)=\left\{\begin{matrix} A_1 e^{ikx}+B_1 e^{-ikx} \;\;\;\, \mathrm{ I} \\ A_2 e^{\lambda x}+B_2 e^{-\lambda x} \;\;\;\;\; \mathrm{ II} \\ A_3 e^{ikx}+B_3 e^{-ikx} \;\;\;\;\; \mathrm{III} \end{matrix}\right. \] However, \(B_3=0\), since that term corresponds to a wave moving to the left, which would not happen in the case of an incident wave going in the +x direction. The other coefficients can be found by ensuring that the wavefunction and its derivative are continuous. In particular, we find that \[ T=|A_1/A_3|^2=\frac{1}{1+\tfrac{q^2}{4k^2(q-k^2)}\sinh(\lambda L)} \] This represents the probability that the particle will be found on the opposite side of the barrier. Note that, contrary to classical mechanics, there is a definite, non-zero probability of finding the particle on the opposite side of the barrier. This feature of particles doing classically impossible things is a frequent characteristics of quantum mechanics. This phenomenon helps explain why the sun continues to fuse hydrogen even though it is not hot enough for the atoms to overcome the electrostatic repulsion, as the particles have a probability of tunneling through the classically forbidden region. <br><br>Similarly, we can see that, even if the particle did have enough energy to cross the barrier, there is not a 100% chance of finding it on the other side of the barrier. Just as a particle may sometimes cross a classically forbidden barrier, sometimes it fails to cross a classically allowed barrier.Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-17507565838539250352015-12-25T11:26:00.001-08:002016-01-06T15:47:44.540-08:00The Double Angle Formula<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h2>Deriving the formula: \(\sin(2x)=2\sin(x)\cos(x)\) </h2><br /><h3>Way 1: From Geometry</h3><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-69JuPDgDN4g/VntaXaAVKWI/AAAAAAAAPkM/4qNbDpZ9NsA/s1600/trigproof.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="300" src="http://3.bp.blogspot.com/-69JuPDgDN4g/VntaXaAVKWI/AAAAAAAAPkM/4qNbDpZ9NsA/s200/trigproof.png" width="300" /></a></div>\[ RB=QA \;\;\;\;\;\;\;\;\;\; RQ=BA \] \[ \frac{RQ}{PQ}=\frac{QA}{OQ}=\sin(\alpha) \;\;\;\;\;\;\;\; \frac{PR}{PQ}=\frac{OA}{OQ}=\cos(\alpha) \] \[ \frac{PQ}{OP}=\sin(\beta) \;\;\;\;\;\;\;\; \frac{OQ}{OP}=\cos(\beta) \] \[ \frac{PB}{OP}=\sin(\alpha+\beta) \;\;\;\;\;\;\;\; \frac{OB}{OP}=\cos(\alpha+\beta) \] \[ PB=PR+RB=\frac{OA}{OQ}PQ+QA \] \[ \frac{PB}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OQ}\frac{OQ}{OP} \] \[ \sin(\alpha+\beta)=\cos(\alpha)\sin(\beta)+\sin(\alpha)\cos(\beta) \] Particularly, if \(\alpha=\beta=x, \;\;\;\; \sin(2x)=2\sin(x)\cos(x)\). <br /><br /><hr /><br /> <h3>Way 2: From the Product Formula</h3><br />Recall from this post that the product formulas for sine and cosine are, respectively: \[ \sin(x)=x\prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 n^2} \right ) \] And \[ \cos(x)=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \] Thus \[ \sin(2x)=2x\prod_{n=1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) =2\cdot x\prod_{n=\mathrm{even}\geq1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) \cdot \prod_{n=\mathrm{odd}\geq1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) \] \[ \sin(2x) =2\cdot x\prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 n^2} \right ) \cdot \prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 (n-1/2)^2} \right ) \] \[ \sin(2x)=2\cdot \sin(x) \cdot \cos(x) \] <br /><br /><hr /><br /><h3>Way 3: From the Taylor Series</h3> The Taylor series for sine and cosine can be construed as, respectively: \[ \frac{\sin(\sqrt{x})}{\sqrt{x}}=\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k+1)!}x^k \] \[ \cos(\sqrt{x})=\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^k \] Thus \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{j=0}^{\infty}\frac{(-1)^j}{(2j+1)!}x^j \sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^k \] Using a Cauchy product, we find: \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{j=0}^{\infty}c_j x^j \] Where \[ c_m=\sum_{n=0}^{m} \frac{(-1)^n}{(2n+1)!}\frac{(-1)^{m-n}}{(2(m-n))!} =\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{m} \binom{2m+1}{2n+1} =\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{m} \binom{2m}{2n+1}+\binom{2m}{2n} \] \[ c_m=\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{2m} \binom{2m}{n}=\frac{(-1)^m}{(2m+1)!}2^{2m} \] And thus \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{m=0}^{\infty}\frac{(-1)^m}{(2m+1)!}(4x)^m=\frac{\sin(\sqrt{4x})}{\sqrt{4x}}=\frac{\sin(2\sqrt{x})}{2\sqrt{x}} \] Substituting \(x=y^2\) and rearranging, we find: \( 2\sin(y)\cos(y)=\sin(2y) \) <br /><br /><hr /><br /><h3>Way 4: From Euler's Formula</h3>Euler's formula is: \[ e^{ix}=\cos(x)+i\sin(x) \] Thus \[ e^{i2x}=\cos(2x)+i\sin(2x)=\left ( e^{ix} \right)^2=\left (\cos(x)+i\sin(x) \right )^2 \] \[ e^{i2x}=\left [\cos^2(x)-\sin^2(x) \right ]+i\left [ 2\sin(x)\cos(x) \right ] \] Thus, by equating real and imaginary parts, \(\sin(2x)=2\sin(x)\cos(x)\) and \(\cos(2x)=\cos^2(x)-\sin^2(x)\) <br /><br /><hr /><br /><h2>The Half-Angle Formulas</h2>We find from the last demonstration \[ \cos(2x)=\cos^2(x)-\sin^2(x)=2\cos^2(x)-1=1-2\sin^2(x) \] Substituting \(2x=y\) and solving, we find: \[ \sin\left ( \frac{y}{2} \right )=\sqrt{\frac{1-\cos(y)}{2}} \] \[ \cos\left ( \frac{y}{2} \right )=\sqrt{\frac{1+\cos(y)}{2}} \] <br /><br /><hr /><br /><h2>An Infinite Product Formula</h2>We can write the double-angle formula as \[ \sin(x)=2\sin\left ( \frac{x}{2} \right )\cos\left ( \frac{x}{2} \right ) \] Iterating this, we then have \[ \sin(x)=2^n\sin\left ( \frac{x}{2^n} \right ) \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] However, in the limit as n gets large, \(2^n\sin\left ( \frac{x}{2^n} \right )\rightarrow x\). Thus, letting n go to infinity, we have \[ \sin(x)=x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] A simple theorem of this general result is \[ \frac{\pi}{2}=\frac{1}{\cos(\tfrac{\pi}{4})\cos(\tfrac{\pi}{8})\cos(\tfrac{\pi}{16})\cdots } =\frac{1}{\sqrt{\tfrac{1}{2}}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}}}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}}}}\cdots }=\frac{2}{\sqrt{2}}\frac{2}{\sqrt{2+\sqrt{2}}}\frac{2}{\sqrt{2+\sqrt{2+\sqrt{2}}}}\cdots \] This is known as Viète's formula. <br /><br /><hr /><br /><h2>A Nested Radical Formula</h2>We note that \[ 2\cos(x/2)=\sqrt{2+2\cos(x)} \] Thus, by iterating, we find \[ 2\cos(x/2^n)=\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}} \] Thus \[ 2\sin(x/2^{n+1})=\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}}} \] And we can thus conclude that \[ x=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}}} \] For example \[ \pi/3=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+1}}}}}} \] \[ \pi/2=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2}}}}}} \] <br /><br /><hr /><br /><h2>An Infinite Series</h2>Above, we derived \[ \sin(x)=x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] Taking the log of both sides and differentiating \[ \frac{\mathrm{d} }{\mathrm{d} x}\ln\left (\sin(x) \right )=\frac{\mathrm{d} }{\mathrm{d} x}\ln\left (x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \right ) \] \[ \cot(x)=\frac{1}{x}-\sum_{k=1}^{\infty}\frac{1}{2^k}\tan \left ( \frac{x}{2^k} \right ) \] \[ \\ \frac{1}{x}-\cot(x)=\sum_{k=1}^{\infty}\frac{1}{2^k}\tan \left ( \frac{x}{2^k} \right ) \] From this we can easily derive \[ \frac{1}{\pi}=\sum_{k=2}^{\infty}\frac{1}{2^k}\tan \left ( \frac{\pi}{2^k} \right ) \] <br /><br /><hr /><br /><h2>A Definite Integral</h2>Let \[ I=\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )dx =\int_{\pi/2}^{\pi}\ln\left ( \sin(x) \right )dx =\int_{0}^{\pi/2}\ln\left ( \cos(x) \right )dx \] Then \[ 2I=\int_{0}^{\pi}\ln\left ( \sin(x) \right )dx =2\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )dx =\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )+\ln\left ( \cos(x) \right )dx \] \[ 2I=\int_{0}^{\pi/2}\ln\left ( \sin(x) \cos(x) \right )dx=\int_{0}^{\pi/2}\ln\left (\tfrac{1}{2} \sin(2x) \right )dx=-\frac{\pi}{2}\ln(2)+\int_{0}^{\pi/2}\ln\left (\sin(2x) \right )dx \] By the substitution \(u=2x\), we then have \[ 2I=-\frac{\pi}{2}\ln(2)+\tfrac{1}{2}\int_{0}^{\pi}\ln\left (\sin(u) \right )du=-\frac{\pi}{2}\ln(2)+I \] Therefore \[ I=\int_{0}^{\pi/2}\ln\left (\sin(x) \right )dx=-\frac{\pi}{2}\ln(2) \]Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-36749077692972372202015-12-15T20:17:00.000-08:002016-01-07T20:54:24.168-08:00Some Introductory Quantum Mechanics: Mathematico-Theoretical Background<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> Quantum mechanics (QM), being a novel and revolutionary framework for describing phenomena, requires a substantially different mathematical tool-set and way of thinking about physical systems and objects. There is dispute over how exactly to interpret the mathematical system used, but we will not discuss here the various interpretations. Rather, we will just describe and examine the framework and how it can be used to make predictions, all of which is agreed upon. <br /><br />This will be a multi-part series giving a general introduction to quantum theory. This is part 2. <br /><br /><hr /><br /><h2>Hilbert, State, and Dual Spaces </h2><br /> <b>Hilbert space</b> is a generalized <b>vector space</b>: a sort of extended analog of the usual Euclidean space. Elements of a Hilbert space are sorts of <b>vectors</b>, and are denoted using a label (basically just a name) and some indication of vector-hood. We will use "<b>bra-ket notation</b>", in which elements of the vector space are denoted as\(\left | \phi \right >\) (a <b>ket</b>) (\(\phi\) is merely a label. We may sometimes use numbers, or other symbols, but these are all merely labels). Every such element has a corresponding "sister" in what is called the <b>dual space</b>, which is denoted by \(\left < \phi \right |\) (a <b>bra</b>). (The name is basically a joke: two halves of the word "bracket"). The use of the dual space will become apparent in our later discussion. In general, and in QM especially, the vector space is <b>complex</b>, meaning the vector's "components" (loosely speaking) are complex numbers. <br /><br /><hr /><br /><h2>Inner Products </h2><br /> To be a Hilbert space, there must also be an <b>inner product</b>, or a way of associating a complex number to each pair of vectors (the order may be important: the inner product of A and B need not be the same as that of B and A). The inner product of \(\left | \phi \right > \) and \(\left | \psi \right > \) is denoted by \(\left \langle \psi \right | \left. \phi \right \rangle\), that is the dual of \(\left | \psi \right > \) acting on \(\left | \phi \right > \). In particular, to be a Hilbert space, we must have that if \(\left \langle \psi \right | \left. \phi \right \rangle = z \), \(\left \langle \phi \right | \left. \psi \right \rangle = \bar{z} \), that is, the complex conjugate. If \(\left | \phi \right \rangle= r \left | \psi \right \rangle\) then \(\left \langle \phi \right |= \bar{r} \left \langle \psi \right |\). Also, we must have \(\left \langle \psi \right | \left. \psi \right \rangle \geq 0\), with equality holding iff \(\left | \psi \right >\) is the <b>zero vector</b>. Clearly \(\left \langle \psi \right | \left. \psi \right \rangle \) will be real. <br /><br />Beyond this, the inner product is <b>linear</b>. In general, if \(\left | \phi \right \rangle= a\left | \alpha \right \rangle+b\left | \beta \right \rangle \) and \( \left | \psi \right \rangle= c\left | \gamma \right \rangle+d\left | \delta \right \rangle \), then we have: \[ \left \langle \psi \right | \left. \phi \right \rangle =a\bar{c}\left \langle \gamma \right | \left. \alpha \right \rangle + a\bar{d}\left \langle \delta \right | \left. \alpha \right \rangle + b\bar{c}\left \langle \gamma \right | \left. \beta \right \rangle + b\bar{d}\left \langle \delta \right | \left. \beta \right \rangle \] We can also prove the famous <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality">Cauchy-Schwartz Inequality</a>, namely, that: \[ \left |\left \langle \psi \right | \left. \phi \right \rangle \right |^2 \leq \left \langle \psi \right | \left. \psi \right \rangle \left \langle \phi \right | \left. \phi \right \rangle \] Two vectors \(\left | \phi \right > \) and \(\left | \psi \right > \) are said to be <b>orthogonal</b> if \(\left \langle \psi \right | \left. \phi \right \rangle=0\). A vector is said to be <b>normal</b> or <b>normalized</b> if \(\left \langle \phi \right | \left. \phi \right \rangle =1\). If we have a set of vectors \({| \left. \phi_1 \right \rangle} , {| \left. \phi_2 \right \rangle} , {| \left. \phi_3 \right \rangle},...\) such that \( \left \langle \phi_j \right. | \left. \phi_k \right \rangle = 0 \) for all \(j \neq k\), then the set is called <b>orthogonal set</b>. If it is also the case that \( \left \langle \phi_k \right. | \left. \phi_k \right \rangle = 1 \) for all k, then the set is called <b>orthonormal</b>. <br /><br /><hr /><br /><h2>Operators </h2><br /> An <b>operator</b> is something which acts on a vector to produce another vector: \(A \left | \phi \right \rangle= \left | \phi' \right \rangle\). The operator \(A\) is <b>linear</b> if, for any \(\left | \phi \right \rangle= a\left | \alpha \right \rangle+b\left | \beta \right \rangle\), we have \( A\left | \phi \right \rangle= a A\left | \alpha \right \rangle+b A\left | \beta \right \rangle \). <br />Let \(A \left | \phi \right \rangle= \left | \phi' \right \rangle\) and \(B \left | \psi \right \rangle= \left | \psi' \right \rangle\). If \(\left \langle \psi \right | \left. \phi' \right \rangle=\left \langle \psi' \right | \left. \phi \right \rangle\) then A and B are called <b>conjugate operators</b>, denoted \(A=B^{\dagger}\) and \(B=A^{\dagger}\), so \(A=\left (A^{\dagger} \right )^\dagger\). We also have \(\left \langle \phi' \right |= \left \langle \phi \right | A^\dagger\). If \(A=A^\dagger\), then A is called <b>Hermitian</b>. If \(A=-A^\dagger\), then A is called <b>anti-Hermitian</b>. If \(\left \langle \psi' \left | \right. \phi'\right \rangle = \left \langle \psi \left | \right. \phi\right \rangle \), for all pairs of vectors, then A is called <b>unitary</b>. <br />We also have the following properties: \[ (A+B)\left | \phi \right \rangle= A\left | \phi \right \rangle+B\left | \phi \right \rangle \] \[ AB\left | \phi \right \rangle= A\left (B\left | \phi \right \rangle \right ) \] Note that it is not necessarily the case that \[ AB\left | \phi \right \rangle= BA\left | \phi \right \rangle \] That is, operators need not <b>commute</b>. In fact, we commonly use the notation \([A,B]=AB-BA\) (this is called the <b>commutator</b> of A and B). Non-commutativity will play an important role in the theory. <br /><br />For a given A, in some cases, for certain \(\left | \phi \right \rangle\), we have that \(A\left | \phi \right \rangle= \lambda \left | \phi \right \rangle \) for some constant \(\lambda\). In this case, we call \(\lambda\) an <b>eigenvalue</b> of the operator A and \(\left | \phi \right \rangle\) the corresponding <b>eigenvector</b>. <br />Often it is the case that we can find a set of orthonormal vectors that are the eigenvectors of a given linear operator, such that we can also write any vector as a linear sum of the eigenvectors. In that case, \[| \left. \psi \right \rangle = a_1 | \left. \phi_1 \right \rangle +a_2 | \left. \phi_2 \right \rangle+a_3 | \left. \phi_3 \right \rangle+...\]where \(a_k=\left \langle \phi_k \right. | \left. \psi \right \rangle\) (\(a_k\) is called the <b>projection </b> of \(\psi\) into the \(\phi_k\) direction). Then \[\left \langle \psi\left. \right | \psi \right \rangle=|a_1|^2+|a_2|^2+|a_3|^2+...\] \[A\left| \psi \right \rangle = \lambda_1 a_1 | \left. \phi_1 \right \rangle + \lambda_2 a_2 | \left. \phi_2 \right \rangle+\lambda_3 a_3 | \left. \phi_3 \right \rangle+...\] \[\left \langle \psi \right | A\left| \psi \right \rangle = \lambda_1 \left |a_1 \right |^2 + \lambda_2 \left |a_2 \right |^2+\lambda_3 \left |a_3 \right |^2 +...\] If the operator is also Hermitian, then we call it an <b>observable</b>. Particularly, if an operator is Hermitian, all its eigenvalues are real. <br />If \(| \left. \psi \right \rangle \) is normalized, then we can use the notation \(\left \langle A \right \rangle_\psi=\left \langle \psi\left | A \right |\psi \right \rangle\) and \(\sigma^2_A=\left \langle A^2 \right \rangle_\psi-\left \langle A \right \rangle^2_\psi\). <br /><br /><hr /><br /><h2>Postulates of Quantum Mechanics </h2><br />Given that mathematical background, we can now lay out the fundamental postulates of QM. Exactly how to interpret these postulates will be left for later discussion. <ol><li><b>Wavefunction Postulate</b><br />The state of a physical system at a given time is defined by a <b>wavefunction</b> which is a ket vector in the Hilbert space of possible states. Generally, the vector is required to be normalized. </li><li><b>Observable Postulate</b><br />Every physically measurable quantity corresponds to an observable operator that acts on the vectors in the Hilbert space of possible states. </li><li><b>Eigenvalue Postulate</b><br />The possible results of a measurement of a physically measurable quantity are the eigenvalues of the corresponding observable. </li><li><b>Probability Postulate</b><br />Suppose the set of orthonormal eigenvectors of observable A \({| \left. \phi_{k_1} \right \rangle} , {| \left. \phi_{k_2} \right \rangle} , {| \left. \phi_{k_3} \right \rangle},...\) all have eigenvalue \(\lambda\). Suppose the initial wavefunction can be written as \(| \left. \psi \right \rangle = a_1 | \left. \phi_1 \right \rangle +a_2 | \left. \phi_2 \right \rangle+a_3 | \left. \phi_3 \right \rangle+...\) (i.e. the linear sum of orthonormal eigenvectors of A). Note that \(\psi\) is a <b>superposition</b> of other <b>eigenstates</b>. That is, it is a sort of combination of states that have definite properties. Each eigenstate has a well-defined value for the observable, but \(\psi\) does not. <br />The probability of measuring the observable to have the value \(\lambda\) is given by \(P(\lambda)=\left | a_{k_1} \right |^2+\left | a_{k_2} \right |^2+\left | a_{k_3} \right |^2+...\). More simply, if no two eigenvectors have the same eigenvalue, then the probability that we will measure the observable to have value \(\lambda_k\) is \(| \left \langle \phi_k\left | \right. \psi\right \rangle |^2\). This is called the <b>Born Rule</b>. <br />Given this, it is easy to see that \(\left \langle A\right \rangle_\psi=\left \langle \psi \left | A \right | \psi\right \rangle\) is the <b>expected value</b> of the operator A. </li><li><b>Collapse Postulate</b><br />Immediately after measurement, the wavefunction becomes the normalized projection of the prior wavefunction onto the sub-space of values that give the measured eigenvalue. That is, using the above description, the wavefunction immediately after measurement becomes \(\alpha \cdot( a_{k_1}| \left. \phi_{k_1}\right \rangle +a_{k_2}| \left. \phi_{k_2}\right \rangle+a_{k_3}| \left. \phi_{k_3}\right \rangle +...)\) where \(\alpha\) is a suitable normalization constant, chosen to make the resulting vector normalized. More simply, if no two eigenvectors have the same eigenvalue, then the wavefunction immediately after we measure the observable to have value \(\lambda_k\) is \(| \left. \psi \right \rangle=| \left. \phi_k \right \rangle\). </li><li><b>Evolution Postulate</b><br />The time-evolution of the wavefunction, in the absence of measurement, is given by the <b>time-dependent Schrodinger Equation</b>: \[ \hat{E} \left.|\psi \right \rangle=\hat{H}\left.|\psi \right \rangle \] Where \(\hat{E}\) is the <b>energy operator</b>, which is given by \(i \hbar \frac{\partial }{\partial t}\), and \(\hat{H}\) is the <b>Hamiltonian operator</b>, which is defined analogously as in classical mechanics. In particular, it is the sum of the kinetic and potential energy operators.<br /><br /> </li></ol> <br /><br /><hr /><br /><h2>Spatial Dimensions</h2><br />A common Hilbert space to use is that of functions of one spatial dimension and time. This is an example of an infinite dimensional Hilbert space (at any x-coordinate, the wavefunction could take on a completely independent value). We often speak of <b>eigenfunctions</b> instead of eigenvectors in such a space. In this Hilbert space, we define the inner product of two wavefunctions to be \[\left \langle \phi\left | \right. \psi\right \rangle =\int_{-\infty}^{\infty}\bar{\phi}(x,t)\psi(x,t)dx\]. The <b>momentum operator</b> in the x-direction is given by \(P_x=\frac{\hbar}{i}\frac{\partial }{\partial x}\). The position operator is quite simply \(X=x\). The (un-normalized) eigenfunctions for each are easily found to be, respectively \[ \left. | \psi\right \rangle_p=e^{ipx/\hbar} \] \[ \left. | \psi\right \rangle_{x_0}=\sqrt{\delta(x-x_0)} \] <br />The classical kinetic energy is given by \(E_k=\frac{1}{2}mv^2=\frac{p^2}{2m}\). The potential energy is given simply by \(E_p=V(x,t)\), that is, merely a specification of the potential energy as a function of position and possibly time. Thus, the time-dependent Schrodinger Equation can be written as \[ i \hbar \frac{\partial }{\partial t} \left.|\psi \right \rangle=\left ( \frac{-\hbar ^2}{2m} \frac{\partial^2 }{\partial x^2}+V(x,t) \right)\left.|\psi \right \rangle \] If the wavefunction is an eigenfunction of energy, with eigenvalue E, then its energy does not change with time and we can write the <b>time-independent Schrodinger Equation</b>: \[ E \left.|\psi \right \rangle=\left ( \frac{-\hbar ^2}{2m} \frac{\partial^2 }{\partial x^2}+V(x,t) \right)\left.|\psi \right \rangle \] That is, \(\psi\) is an eigenfunction of the Hamiltonian. We can often then solve this to find not only the wavefunction solutions, but the energy solutions: often such an equation will only be soluble with a discrete set of possible energies. The conditions of normalizability and normalization, as well as <b>boundary conditions</b> contribute toward determining energies and solutions. <br />The extension to multiple dimensions follows analogously. <br /><br /><hr /><br /><h2>Spin </h2><br /> The Hilbert space to describe the spin state of an electron (or other spin 1/2 particle) is typically that of a two-by-one matrix. That is, a ket will be of the form \[ \left. |\psi \right \rangle= \begin{pmatrix} a\\ b \end{pmatrix} \] And the corresponding bra will be \[ \left \langle \psi | \right.= \begin{pmatrix} \bar{a} & \bar{b} \end{pmatrix} \] The condition for normalization is that \(|a|^2+|b|^2=1\). A similar description can be used for polarization for photons. The operators for spin in the x, y and z directions, are, respectively: \[ S_x=\frac{\hbar}{2}\begin{pmatrix} 0 & 1\\ 1 & 0 \end{pmatrix} \] \[ S_y=\frac{\hbar}{2}\begin{pmatrix} 0 & -i\\ i & 0 \end{pmatrix} \] \[ S_z=\frac{\hbar}{2}\begin{pmatrix} 1 & 0\\ 0 & -1 \end{pmatrix} \] All of these have eigenvalues \(+\frac{\hbar}{2}\) and \(-\frac{\hbar}{2}\), with corresponding eigenvectors: \[ \left. |+x \right \rangle=\left. |+ \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} 1\\ 1 \end{pmatrix},\; \; \left. |-x \right \rangle=\left. |- \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} 1\\ -1 \end{pmatrix} \] \[ \left. |+y \right \rangle=\left. |\rightarrow \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} -i\\ 1 \end{pmatrix},\; \; \left. |-y \right \rangle=\left. |\leftarrow \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} 1\\ i \end{pmatrix} \] \[ \left. |+z \right \rangle=\left. |\uparrow \right \rangle=\begin{pmatrix} 1\\ 0 \end{pmatrix},\; \; \left. |-z \right \rangle=\left. |\downarrow \right \rangle=\begin{pmatrix} 0\\ 1 \end{pmatrix} \] <br /><br /><hr /><br /><h2>Multiple Particles </h2><br />In the case of more than one particle, we can construct a total wavefunction by composing those of each particle. For instance, if we have two particles, the first with spin up and the second with spin down, we can write that in a variety of ways. For instance: \[ \left. |\uparrow \right \rangle_1 \otimes \left. |\downarrow \right \rangle_2=\left. |\uparrow \right \rangle_1\left. |\downarrow \right \rangle_2=\left. |\uparrow \downarrow \right \rangle \] Clearly this case can be described in a way that treats each particle separately: the first particle is in one state and the second particle is in another state. However, sometimes it can be the case that the total wavefunction cannot be described in such a way. For instance: \[ \left. |\psi \right \rangle=\frac{1}{\sqrt{2}}\left ( \left. |\uparrow \downarrow \right \rangle +\left. | \downarrow \uparrow \right \rangle \right ) \] In this case, if we measure the first particle to have spin up, the wavefunction collapses to the state \(\left. |\uparrow \downarrow \right \rangle\). This is an example of <b>entanglement</b>, which is where two objects' states cannot be independently described.Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-15188932909355338652015-11-03T08:10:00.001-08:002019-08-29T08:57:04.286-07:00Stirling's Approximation: Derivation and Corollaries<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h2>Lemma 1: \(\lim_{n \rightarrow \infty} \sqrt[n]{n!}/n=1/e\) </h2><br /><h3>Way 1 (somewhat rigorous)</h3><br />From elementary calculus, we have that: \[ \int_{0}^{1} \ln(x) dx =-1 \] Taking this as a Riemann sum, as done in introductory calculus, we have: \[ -1=\int_{0}^{1}\ln(x)dx=\lim_{N \rightarrow \infty} \sum_{k=1}^{N}\ln\left (\frac{k}{N} \right ) \cdot \frac{1}{N} \] \[ -1=\lim_{N \rightarrow \infty} -\ln(N)+\frac{1}{N} \sum_{k=1}^{N}\ln\left (k \right ) \] \[ -1=\lim_{N \rightarrow \infty} -\ln(N)+\frac{1}{N} \ln\left (N! \right ) \] Therefore, \[ \lim_{N \rightarrow \infty} \frac{\sqrt[N]{N!}}{N}=\frac{1}{e} \] <br /><br /><hr /><br /><h3>Way 2 (less rigorous)</h3><br />\[ \lim_{n \rightarrow \infty} \frac{\sqrt[n]{n!}}{n}=x \] So, for n big, in a certain sense: \[ n! \approx (nx)^n \] \[ \frac{(n+1)!}{n!(n+1)}=1 \approx \frac{((n+1)x)^{n+1}}{(nx)^n (n+1)}=\left ( 1+ \frac{1}{n} \right )^n x \] Thus, in order to get equality in the limit, we must have: \[ x = \lim_{n \rightarrow \infty} \left ( 1+ \frac{1}{n} \right )^{-n}=\frac{1}{e} \] <br /><br /><hr /><br /> <h2>Lemma 2: Wallis Product in Factorial Form </h2><br />Recall from <a href="http://www.hyperphronesis.com/2015/10/product-formula-for-sine-and-some.html">this article</a> the following expression for pi: \[ \frac{\pi}{2}=\prod_{k=1}^{\infty}\frac{2k \cdot 2k}{(2k-1)(2k+1)}=\lim_{N \rightarrow \infty}\prod_{k=1}^{N}\frac{2k \cdot 2k}{(2k-1)(2k+1)}=\lim_{N \rightarrow \infty} \frac{\left ( 2^N \cdot N! \right )^4}{\left ( (2N)! \right )^2(2N+1)} \] <br /><br /><hr /><br /> <h2>Lemma 3: An Inequality for the Natural Logarithm</h2><br />Let \(x,y > 0\). Clearly \[ 0 \leq \frac{1}{y^2 (1+y)^2 (2y+1)^2} \] Therefore \[ 0 \leq \int_{x}^{\infty}\frac{dy}{y^2 (1+y)^2 (2y+1)^2}=\frac{1}{x}+\frac{1}{x+1}+\frac{4}{x+1/2}-6\ln \left ( 1+\frac{1}{x} \right ) \] \[ 6\ln \left ( 1+\frac{1}{x} \right ) -\frac{6}{x+1/2} \leq \frac{1}{x}+\frac{1}{x+1}-\frac{2}{x+1/2} \] \[ (x+\tfrac{1}{2})\ln \left ( 1+\frac{1}{x} \right ) -1 \leq \frac{(x+\tfrac{1}{2})}{6}\left (\frac{1}{x}+\frac{1}{x+1} \right )-\frac{1}{3}=\frac{1}{12x(x+1)} \] Also, clearly \[ 0 \leq \frac{16y^2+41y+24}{y(1+y)^2 (2+y)^2 (2y+1)^2} \] Therefore \[ 0 \leq \int_{x}^{\infty}\frac{16y^2+41y+24}{y(1+y)^2 (2+y)^2 (2y+1)^2} dy=6\left (\ln \left ( 1+\frac{1}{x} \right )-\frac{1}{x+\tfrac{1}{2}} \right)-\frac{1}{2(x+\tfrac{1}{2})(x+1)(x+2)} \] And so \[ \frac{1}{12(x+1)(x+2)} \leq (x+\tfrac{1}{2})\ln \left ( 1+\frac{1}{x} \right )-1 \] <br /><br /><hr /><br /> <h2>Theorem: Stirling's Approximation </h2><br />Let us define a function and sequence of coefficients as follows: \[ g(n)=\ln\left ( \frac{n!}{\left ( \tfrac{n}{e} \right )^n \sqrt{2\pi n}} \right )=\sum_{k=-\infty}^{\infty} A_k n^k \] We then have, from lemma 1, \[ \frac{1}{e}=\lim_{n \rightarrow \infty} \frac{\sqrt[n]{n!}}{n}=\lim_{n \rightarrow \infty} \frac{\sqrt[n]{\left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{g(n)}}}{n}=\frac{1}{e} \lim_{n \rightarrow \infty} \sqrt[2n]{2\pi n} \cdot e^{g(n)/n} \] Thus \[ 1=\lim_{n \rightarrow \infty} e^{g(n)/n}=\exp\left (\lim_{n \rightarrow \infty} \sum_{k=-\infty}^{\infty} A_k n^{k-1} \right )=\exp\left (\lim_{n \rightarrow \infty} \sum_{k=1}^{\infty} A_k n^{k-1} \right ) \] And therefore \(A_k=0\) for \(k \geq 1\). From lemma 2, \[ \frac{\pi}{2}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot n! \right )^4}{\left ( (2n)! \right )^2(2n+1)}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{g(n)} \right )^4}{\left ( \left (\tfrac{2n}{e} \right )^{2n} \sqrt{4\pi n} \cdot e^{g(2n)} \right )^2(2n+1)} \] \[ \frac{\pi}{2}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot n! \right )^4}{\left ( (2n)! \right )^2(2n+1)}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{g(n)} \right )^4}{\left ( \left (\tfrac{2n}{e} \right )^{2n} \sqrt{4\pi n} \cdot e^{g(2n)} \right )^2(2n+1)} \] \[ \frac{\pi}{2}=\lim_{n \rightarrow \infty} \frac{n \pi}{2n+1} \cdot e^{4g(n)-2g(2n)} \] \[ 0=\lim_{n \rightarrow \infty} 2g(n)-g(2n)=2A_0-A_0=A_0 \] Therefore, \(A_k=0\) for \(k \geq 0\), and thus \(\lim_{n \rightarrow \infty} g(n)=0\). Thus it follows that \[ \lim_{n \rightarrow \infty} \frac{n!}{\left ( \tfrac{n}{e} \right )^n \sqrt{2\pi n}}=1 \] This fact is known as <b>Stirling's Approximation</b>. Moreover, we have \[ g(n)-g(n+1)=\ln\left ( \frac{n!\left ( \tfrac{n+1}{e} \right )^{n+1} \sqrt{2\pi (n+1)}}{(n+1)!\left ( \tfrac{n}{e} \right )^n \sqrt{2\pi n}} \right )=\ln\left ( \frac{(n+1)^{n+\tfrac{1}{2}}}{e \cdot n^{n+\tfrac{1}{2}}} \right ) \] \[ g(n)-g(n+1)=(n+\tfrac{1}{2})\ln\left ( 1+\frac{1}{n} \right )-1 \] By lemma 3, we then have \[ \frac{1}{12(n+1)(n+2)} \leq g(n)-g(n+1) \leq \frac{1}{12n(n+1)} \] \[ \sum_{k=n}^{\infty} \frac{1}{12(k+1)(k+2)}=\frac{1}{12(n+1)} \leq \sum_{k=n}^{\infty} g(k)-g(k+1)=g(n)-g(\infty)=g(n) \leq \sum_{k=n}^{\infty} \frac{1}{12k(k+1)}=\frac{1}{12n} \] That is \(\tfrac{1}{12(n+1)} \leq g(n) \leq \tfrac{1}{12n}\). And therefore: \[ \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n}\cdot e^{\tfrac{1}{12(n+1)}} \leq n! \leq \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{\tfrac{1}{12n}} \] In fact, it is possible to obtain exact formulas for \(g(n)\). For example, by more advanced calculations, we can show that \[ g(n)=\int_{0}^{\infty}\frac{2 \tan^{-1}\left ( \tfrac{y}{n} \right )}{e^{2\pi y}-1}=\sum_{k=1}^{\infty} \frac{B_{2k}}{2k(2k-1)n^{2k-1}}=\frac{1}{12n}-\frac{1}{360n^3}+\frac{1}{1260n^5}- \cdots \] Where \(B_m\) is the mth <a href="https://en.wikipedia.org/wiki/Bernoulli_number"><b>Bernoulli number</b></a>. These two expressions are, respectively <a href="http://mathworld.wolfram.com/BinetsLogGammaFormulas.html">Binet's second expression</a> and <a href="http://mathworld.wolfram.com/StirlingsSeries.html">Stirling's series</a>. <br /><br /><hr /><br /> <h2>Corollary: Product of a Rational Function </h2><br />Firstly, since \[ \prod_{k=1}^N \left(ak+b\right)=a^N\prod_{k=1}^N \left(k+\frac{b}{a}\right) \] We will just evaluate \[ \prod_{k=1}^N \left(k+b\right)=\frac{(N+b)!}{b!} \approx \left(\frac{N+b}{e}\right)^{N+b}\frac{\sqrt{2\pi(N+b)}}{b!}=N^{N+b+\tfrac{1}{2}}e^{-N}\frac{\sqrt{2\pi}}{b!}e^{-b}\left(1+\frac{b}{N}\right)^{N+b+\tfrac{1}{2}} \] \[ \prod_{k=1}^N \left(k+b\right)=\frac{(N+b)!}{b!} \approx N^{N+b+\tfrac{1}{2}}e^{-N}\frac{\sqrt{2\pi}}{b!} \] More generally, given the above, it is not difficult to demonstrate the following generalization. Let \(m,n > 0\). Let \(a_1,a_2,...,a_m\) and \(b_1,b_2,...,b_n\) and \(r_1,r_2,...,r_m\) and \(s_1,s_2,...,s_n\) be sequences of numbers, such that \[ \sum_{k=1}^m r_k=\sum_{k=1}^n s_k \] and \[ \sum_{k=1}^m a_k r_k=\sum_{k=1}^n b_k s_k \] Then \[ \prod_{k=1}^\infty\frac{(k+a_1)^{r_1}(k+a_2)^{r_2}\cdots (k+a_m)^{r_m}}{(k+b_1)^{s_1}(k+b_2)^{s_2}\cdots (k+b_n)^{s_n}}=\frac{\prod_{j=1}^n (b_j!)^{s_j}}{\prod_{j=1}^m (a_j!)^{r_j}} \] In cases where the coefficients are non-integral, we use the <a href="https://en.wikipedia.org/wiki/Gamma_function">Gamma function</a> (an extension of the factorial to non-integers), instead of factorials: \[ \prod_{k=1}^\infty\frac{(k+a_1)^{r_1}(k+a_2)^{r_2}\cdots (k+a_m)^{r_m}}{(k+b_1)^{s_1}(k+b_2)^{s_2}\cdots (k+b_n)^{s_n}}=\frac{\prod_{j=1}^n (\Gamma (b_j+1))^{s_j}}{\prod_{j=1}^m (\Gamma (a_j+1))^{r_j}} \] For instance \[ \prod_{k=0}^\infty \frac{(k+1)(k+a+b)}{(k+a)(k+b)}=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}=B(a,b) \] \[ \frac{\sin(\pi x)}{\pi x}=\prod_{k=1}^\infty\frac{(k-x)(k+x)}{k^2}=\frac{\Gamma(1)^2}{\Gamma(1-x) \Gamma(1+x)}=\frac{1}{\Gamma(1-x)x \Gamma(x)} \] \[ \prod_{k=1}^\infty\frac{(1+\tfrac{1}{k})^x}{1+\tfrac{x}{k}}=\prod_{k=1}^\infty\frac{(k+1)^x k}{k^x (k+x)}=\frac{\Gamma(1)^x \Gamma (1+x)}{\Gamma(2)^x \Gamma(1)}=\Gamma(1+x) \] <br /><br /><hr /><br /> <h2>Corollary: Asymptotic Behavior of Bernoulli Numbers </h2><br />In <a href="http://www.hyperphronesis.com/2015/10/derivation-of-formula-for-even-values.html">this article</a>, we found that \[ \zeta(2n)=\frac{1}{2}\frac{(2\pi)^{2n}}{(2n)!}\left | B_{2n} \right | \] Combining this with Stirling's approximation, we find that \[ \left | B_{2n} \right |=2\zeta(2n)\frac{(2n)!}{(2\pi)^{2n}} \approx 4\left ( \frac{n}{\pi e} \right )^{2n} \sqrt{n\pi} \cdot e^{1/24n} \] <br /><br /><hr /><br /> <h2>Corollary: Approximation for Binomial Coefficients </h2><br />\[ \binom{a}{b}=\frac{a!}{b!(a-b)!} \approx \frac{\left (\tfrac{a}{e} \right )^a \sqrt{2\pi a}} {\left (\tfrac{b}{e} \right )^b \sqrt{2\pi b}\left (\tfrac{a-b}{e} \right )^{a-b} \sqrt{2\pi (a-b)}}=\frac{1}{\sqrt{2\pi}}\sqrt{\frac{a}{b(a-b)}} \frac{a^a}{b^b (a-b)^{a-b}} \] <br /><br /><hr /><br /> <h2>Corollary: Normal from Binomial </h2><br /> Let \(0 < p < 1\) and \(p+q=1\). Let \[ F_n(x)=\binom{n}{x}p^x q^{n-x} \] \[ f_n(x)=\sqrt{npq}F_n(np+x\sqrt{npq}) \] \[ \phi_n(x)=\ln(f_n(x)) \\ \\ \phi_n(x)=\ln(n!)-\ln((np+x\sqrt{npq})!)-\ln((nq-x\sqrt{npq})!)+(np+x\sqrt{npq})\ln(p)+(nq-x\sqrt{npq})\ln(q) \] Using Stirling's Approximation and some algebra \[ \phi_n(x) = -\tfrac{1}{2}\ln(2\pi)-\left (\tfrac{1}{2}+ np+x\sqrt{npq} \right )\ln\left ( 1+x\sqrt{\frac{q}{np}} \right)-\left (\tfrac{1}{2}+ nq-x\sqrt{npq} \right )\ln\left ( 1-x\sqrt{\frac{p}{nq}} \right)+O(\tfrac{1}{n}) \] Using the series expansion \(\ln(1+x)=x-\tfrac{1}{2}x^2+O(x^3) \) \[ \phi_n(x) = -\tfrac{1}{2}\ln(2\pi)-\tfrac{1}{2}x^2+O(\tfrac{1}{\sqrt{n}}) \] Thus, as \(n\) goes to infinity \[ \phi_\infty(x) = -\tfrac{1}{2}\ln(2\pi)-\tfrac{1}{2}x^2 \] \[ f_\infty(x) = \frac{e^{-x^2/2}}{\sqrt{2\pi}} \] Thus, in the limit, scaling for the changing means and variances, the <a href="https://en.wikipedia.org/wiki/Binomial_distribution">binomial distribution</a> tends to the normal distribution. Moreover, since the binomial distribution is normalized, we find that \[ \int_{-\infty}^{\infty}\frac{e^{-x^2/2}}{\sqrt{2\pi}}dx=1 \] Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-78484514609611201292015-10-27T14:55:00.000-07:002015-10-27T14:55:06.789-07:00Occam+Bayes=Induction<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> A classic problem in philosophy and the philosophy of science is how to justify induction. That is, how to rationally go from the fact that X is true in N previously observed cases to the belief that it is true in all cases, or at least in an additional, unobserved case. We will here propose a quick and simple method to justify induction, based on the combination of Occam's razor (to choose hypotheses) and Bayesian inference to update epistemic probabilities. <br /><hr /><br /><h3>Notation </h3>Let us introduce the following notation. Let \(H\) be some hypothesis which we want to judge for plausibility. Let \(X_k\) be the fact that \(X\) is true in the kth instance. Let \(X^n\) be the fact that \(X\) is true in the first n cases, that is \[X^n=X_1 \cap X_2 \cap \cdots \cap X_n=\bigcap_{k=1}^{n}X_k\] so that \[X^{n-1}\cap X_n=X^n\] Thus \(P\left ( X^n|H \right ) \) is the (epistemic) probability that we observe X in n cases, supposing H is true, and \(P\left ( H|X^n \right ) \) is the (epistemic) probability that H is true, supposing we observe X to be the case in n cases. <br /><hr /><br /><h3>Occam's Razor </h3>There are three basic, simplest hypotheses we can form, all the rest being more complex. These three are the <ul><li><b>Proinductive</b> (P) hypothesis: the chance of X happening again increases as we see more instances of it. </li><li><b>Contrainductive</b> (C) hypothesis: the chance of X happening again decreases as we see more instances of it. </li><li><b>Uninductive</b> (U) hypothesis: the chance of X happening again stays the same as we see more instances of it. </li></ul>For concreteness, let \(F_H(n)=P\left ( X_{n}|H \cap X^{n-1} \right )\). Thus we say that, for \(m > 0\), \(F_P(n+m) > F_P(n)\), and \(\lim_{n \rightarrow \infty} F_P(n)=1\), and \(F_C(n+m) < F_C(n)\), and \(\lim_{n \rightarrow \infty} F_C(n)=0\), and \(F_U(n)=F_U(0)\). <br /><hr /><br /><h3>Bayesian Inference </h3>We want to find \(P\left ( H|X^n \right ) \) for the hypotheses listed in the previous section. We have \[ P\left ( X^n|H \right )=P\left ( X_n \cap X^{n-1}|H \right )=P\left ( X_n |X^{n-1} \cap H \right ) \cdot P\left ( X^{n-1} |H \right )=F_H(n) \cdot P\left ( X^{n-1} |H \right ) \] Therefore \[ P\left ( X^n|H \right )=\prod_{k=1}^{n} F_H(k) \] Suppose that there are \(N\) mutually exclusive and collectively exhaustive hypotheses. Then, Bayes' formula states: \[ P(H_m|A)=\frac{P(A|H_m)P(H_m)}{P(A|H_1)P(H_1)+P(A|H_2)P(H_2)+\cdots+P(A|H_N)P(H_N)} \] Thus, we have \[ P(H_m|X^n)=\frac{P(X^n|H_m)P(H_m)}{P(X^n|H_1)P(H_1)+P(X^n|H_2)P(H_2)+\cdots+P(X^n|H_N)P(H_N)} \] Therefore \[ P(H_m|X^n)=\frac{P(H_m)\prod_{k=1}^{n} F_{H_m}(k)}{P(H_1)\prod_{k=1}^{n} F_{H_1}(k)+P(H_2)\prod_{k=1}^{n} F_{H_2}(k) + \cdots + P(H_N)\prod_{k=1}^{n} F_{H_N}(k)} \] Let us suppose that the three hypotheses mentioned above are collectively exhaustive. Suppose, for concreteness that \(F_P(n)=\frac{n}{n+1}\), \(F_C(n)=\frac{1}{n+1}\), and \(F_U(n)=\frac{1}{2}\). Thus \(\prod_{k=1}^{n} F_{P}(k)=\frac{1}{n+1}\), and \(\prod_{k=1}^{n} F_{C}(k)=\frac{1}{(n+1)!}\), and \(\prod_{k=1}^{n} F_{U}(k)=\frac{1}{2^n}\). Let \(P(P)=p\) and \(P(C)=q\) and \(P(U)=r\) where \(p+q+r=1\). Then: \[ P(P|X^n)=\frac{p\frac{1}{n+1}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] \[ P(C|X^n)=\frac{q\frac{1}{(n+1)!}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] \[ P(U|X^n)=\frac{r\frac{1}{2^n}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] A simple assessment of limits shows that the former goes to 1 quite rapidly, for increasing n, for any nonzero p, and the latter two go to zero. In fact, for \(p=q=r=1/3\), for \(n>10\), \(P(P|X^n)>0.99\), and for \(n>17\), \(P(P|X^n)>0.9999\). <br /><br />This example is meant to be only illustrative, to show the general way in which Occam's razor, combined with Bayesian inference, leads to a support of induction. The same things happening repeatedly lends credence to the hypothesis that the same things happen repeatedly, and detracts from the hypothesis that the same things are unlikely to happen repeatedly, or always happen with the same probability. In a very similar way, a coin repeatedly coming up heads supports the hypothesis that it is biased to come up heads, and detracts from the hypotheses that it is biased to come up tails or is fair. This may seem obvious, but it is beneficial to see exactly how the mathematical machinery supports this intuition. <br /><br />We may also wish to include other hypotheses, but we must first assess the prior probabilities that they are true, and Occam's razor advises taking the inverse probability as inverse to the complexity of the hypothesis. Thus, even if on the hypothesis, observing n X's is more likely than on the three discussed, it would needs be more complex or ad hoc, and so would have a significantly lower prior probability. Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-13858792506800329392015-10-23T12:56:00.001-07:002019-08-02T05:47:51.872-07:00Some Introductory Quantum Mechanics: Classical Background and Non-Classical Phenomena<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> Quantum mechanics (QM) is a theoretical framework that describes the fundamental nature of reality, of particles of matter and light among potential others. QM arose from and in contrast to classical mechanics (CM), with many formulations and features still relying heavily on CM ideas. However, several phenomena established that CM cannot be the whole story, and would need to be amended. A new theory would need to be introduced to account for these phenomena, which would also predict some startling other ones. However, the best way to interpret the new theory is still disputed. <br><br>This will be a multi-part series giving a general introduction to quantum theory. <br><br><hr><br><h2>Classical Mechanics </h2><br> QM is distinct from CM, though similar in several respects. CM, in general, looks at the behavior of idealized geometrical bodies, rigid, elastic, and fluid. The state is always definite, and in this state, momentum, energy, position and the like are well-defined and definite (we may make an exception for statistical mechanics, but in that case, these quantities may take on distributions only in the sense of an ensemble: it would still be in principle possible to determine these properties for each element in the ensemble, as <a href="https://en.wikipedia.org/wiki/Maxwell's_demon">Maxwell's demon</a> would do). CM is how we tend naively to see the world. Things look like they are definite, spatially constrained, like a bunch of tiny definite parts, or large definite volumes moving along definite paths. This is decidedly not the case in QM. <br><br>CM has three main, equivalent formulations: Newtonian, Lagrangian and Hamiltonian. <br> <br><ul><li><b>Newtonian</b>: <a href="https://en.wikipedia.org/wiki/Newtonian_mechanics">Newtonian mechanics</a> is the typical pedagogical formulation. It deals with the position and velocity of point masses, extended bodies, fluids, etc. in terms of forces, which relate back to position via <a href="https://en.wikipedia.org/wiki/Newton's_laws_of_motion#Newton.27s_second_law">Newton's second Law</a> (which is really more of a definition). That is, for each asymptotically infinitesimal bit of matter in the system, find the net forces (and torques/stresses), in terms of the positions of the other bits of matter, relate it to the acceleration via Newton's second Law, and then solve the big set of differential equations (or use an iterative approximation method like <a href="https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods">Runge-Kutta</a>) to find the trajectories of each bit of matter (often the problem is much simplified by various symmetries, homogeneities, localities, redundancies, and conservation considerations).</li><br><li><b>Lagrangian</b>: <a href="https://en.wikipedia.org/wiki/Lagrangian_mechanics">Lagrangian mechanics</a> deals more in energies, specifically a certain function of time space and momentum (of all the degrees of freedom) called the Lagrangian, which is typically just the kinetic minus the potential energy. Lagrangian mechanics allows one to deal with constraints in a simpler and more elegant way. Integrating the Lagrangian over time gives the action. The <a href="https://en.wikipedia.org/wiki/Principle_of_least_action">principle of stationary action</a> states that objects move so as to make the action at a minimum (or sometimes, though rarely, at a maximum). This can be roughly and loosely interpreted as saying that objects go along the "easiest" trajectories. Lagrangians are still used extensively in modern physics, such as in <a href="http://en.wikipedia.org/wiki/Quantum_field_theory">quantum field theory</a> and the <a href="https://en.wikipedia.org/wiki/Path_integral_formulation">path integral formulation</a>.</li><br><li><b>Hamiltonian</b>: <a href="https://en.wikipedia.org/wiki/Hamiltonian_mechanics">Hamiltonian mechanics</a> also deals in energies, specifically a certain function, related to the Lagrangian, called the Hamiltonian, which generally is equal to the total energy of the system. The trajectories are then found via Hamilton's equations, which are a set of differential equations relating changes of the Hamiltonian to changes of the position and momentum. This formalism uses rather abstract notions, such as frames of reference, generalized coordinates, phase space and the like. However, it is one of the most powerful formulations of classical mechanics and serves as one of the basic frameworks for the development of quantum mechanics.</li></ul><br>Measurement in CM is intuitive and simple. We measure the position of a thing by looking at where it is and recording that. Measurement need not affect the thing being measured, at least in principle. But even if it can't be done in practice, the information being sought is still there and definite regardless. A hypothetical <a href="https://en.wikipedia.org/wiki/Laplace's_demon">Laplace's demon</a> could know all the parameters as they really are. This is very plausibly not the case in QM <br><br>As a rule, in CM, if an object requires energy \(E\) to do X, but only has energy \(E'< E\), then the object won't be able to do X. For example, if a marble is in a bowl with sides at a height requiring energy E to overcome (i.e. if the object is of mass m, the height of the sides is \(E/mg\)), but the marble only has energy \(E'< E\), the marble cannot escape the bowl. There is no chance that anyone will ever make a measurement of the position of the marble and have that be outside the bowl. Interestingly, this is not the case in QM. <br><br>CM is generally deterministic in a rather strict sense (though there are certain rare <a href="http://www.pitt.edu/~jdnorton/papers/DomePSA2006.pdf">exceptions</a>). Given that all of the above formulations are equivalent, they are all reducible to a set of second-order differential equations of various initial positions. This means that if all initial positions and velocities are known, even if the relevant forces are time-dependent, the trajectory of each object at all future times is unique and determinable. Any apparent indeterminism is merely apparent, namely epistemic. Assigning probabilities to different states or outcomes is done not because the state is ill-defined or there is some amount of indeterminism that emerges somehow. Rather, it is due to not knowing the initial state or not knowing how the system evolves. Were we to know completely the initial state and how it evolves, there would be no indeterminism. Moreover, any correlations arise from epistemically vague definite correlations. For instance, if we have two marbles, one of mass 100g and one of mass 105g, give one to one experimenter and the other to another, though they do not know which they received, once one experimenter weighs his marble, he immediately knows the weight of the other marble, even if it is very far away. We will find that this is not the case in QM. <br><br>A further development of CM was the inclusion of electromagnetic phenomena. These were incorporated in <a href="https://en.wikipedia.org/wiki/Maxwell's_equations">Maxwell's equations</a>, which describe how electromagnetic fields are generated and changed by charges and currents. In essence, there is a ubiquitous, continuous electromagnetic field, which can be excited and disturbed in various ways, producing effects like <a href="https://en.wikipedia.org/wiki/Electromagnetic_radiation">radiation</a> and <a href="https://en.wikipedia.org/wiki/Electromagnetic_induction">induction</a> (which lend themselves to a huge array of engineering and technological applications). A relatively simple theorem of electromagnetic theory is that <a href="https://en.wikipedia.org/wiki/Larmor_formula">accelerating charges radiate energy</a>. This is most easily seen as being due to producing electric fields of varying strengths, combined with the fact that electromagnetic changes travel at a finite speed. For example, an oscillating charge will produce fields now weaker now stronger as it moves closer and further from a point. If we put a charge on a spring a distance away, it would begin oscillating, too, due to the varying force acting on it. Thus we could extract energy from the oscillating charge, and so it must be radiating energy, and so its oscillations will gradually decay. (Note that this implies that charges in orbit around one another will gradually radiate off their energy and fall into one another.) One of the outcomes of Maxwell's electromagnetic theory was the demonstration that light was electromagnetic in nature: electromagnetic disturbances propagated at the speed of light, and thinking of light as electromagnetic radiation accounted for a huge array of optical phenomena. <br><br>Also, electromagnetism is decidedly a wave-theory. The electromagnetic field is continuous and ubiquitous: it doesn't come in discrete "chunks" or "lumps" and it can have any value. It can have arbitrary energy (or energy density, a the case may be). This is opposed to particles, objects like little marbles, with definite extents, centers. When particles move, the stuff they are made of literally goes from one place to another. Whereas, when a wave moves, the field in one place increases, and decreases in another place: the pattern as opposed to the substance moves. Waves display interference effects: two waves could interfere constructively (increasing the size of the wave) or destructively (decreasing the size of the wave), whereas this seems impossible for particles. Destructive interference for particles would mean that when two particles came together, suddenly there was less substance there. We will return to this in discussing the two-slit experiment below. <br><br><hr><br><h2>Non-Classical Phenomena</h2><br> There were several phenomena that indicated that CM was not the whole story, that it failed to give a full description of the world. These then paved the way for the development of QM. <ul><li><h4>Millikan's and Rutherford's Experiments</h4>Millikan discovered, by a <a href="https://en.wikipedia.org/wiki/Oil_drop_experiment">very ingenious experiment</a>, that charge was <b>quantized</b>, i.e. it came in "chunks" or "lumps". There was a smallest unit of charge. The existence of electrons as objects with a definite mass had already been discovered by Thompson, experimenting with cathode ray tubes, but it was not known whether electrons had a definite, single charge. Millikan found that charge only came in integer multiples of the fundamental charge, known to be about \(1.6 \times 10^{-19} \mathrm{C}\). Rutherford then <a href="https://en.wikipedia.org/wiki/Geiger%E2%80%93Marsden_experiment">demonstrated</a> that the atom was structured, not as Thomson supposed, like a <a href="https://en.wikipedia.org/wiki/Plum_pudding_model">plum pudding</a>, but rather with a small, dense, positively charged nucleus with the electrons in some arrangement around it. </li><br><li><h4>Stability and Discrete Radiation of the Atom</h4><a href="https://en.wikipedia.org/wiki/Rutherford_model">Rutherford's model of the atom</a> (as well as any similar model) is impossible, according to classical electromagnetic theory. As discussed above, orbiting charges cannot persist indefinitely, as they will radiate off energy, and the orbit will eventually decay, the particles eventually colliding. As this clearly does not happen, there must be some modification to the understanding of the atom. In addition, it was noticed that an excited atom <a href="https://en.wikipedia.org/wiki/Discrete_spectrum">only emitted radiation at definite frequencies</a>, not in a continuous spectrum. In the case of hydrogen, the radiation frequencies followed <a href="https://en.wikipedia.org/wiki/Rydberg_formula">a very simple pattern</a>. This behavior, however, could not be accounted for on classical mechanics, as the electron orbiting the nucleus could potentially have any energy. Moreover, if the electron could only have certain definite energies, it became difficult to see how it could go from one definite energy to another without taking on the intermediate energies. Clearly classical theory would have to be modified to allow for this. </li><br><li><h4>Photoelectric Effect</h4>It was observed that <a href="https://en.wikipedia.org/wiki/Photoelectric_effect">shining light on a metal induced a current</a>. This by itself was predictable by CM, given the understanding that the metal had electrons in it, and when light shone on the metal, some electrons absorbed the energy and so were able to escape the metal to produce a current. However, according to CM, the energy of the light depended solely on the amplitude (i.e. brightness): it would not depend on the frequency (i.e. color) of the light used. Also, for sufficiently dim light, there should be a lag time between when the light comes on and electrons are emitted, due to the electrons needing to absorb a sufficient amount of light energy. However, neither of these predictions were correct: very bright light of sufficiently low frequency induced no current. And at sufficiently high frequencies, regardless of how dim the light was, the current began immediately, with no delay. This led Einstein correctly to conclude that light was <b>quantized</b>, in units called <b>photons</b>. The energy of each photon was related to the frequency of the light. The brighter the light, the greater the number of photons per unit time. This would entail that for light of a low frequency, even if bright, no electrons would be ejected from the metal, as each photon lacks enough energy to eject an electron, and the chance of multiple photons hitting the same electron is negligible (and the energy that is absorbed is dissipated as heat in the meantime). Moreover, for high enough frequencies, the energy per electron is linear with respect to frequency, with slope \(h= 6.626 \times 10^{-34} \mathrm{J}\cdot \mathrm{s}\), known as <b>Planck's constant</b> (however, the <b>current</b>, is dependent on the brightness of the light). This leads to the conclusion that the energy of each photon is given by \(E=hf\). </li><br><li><h4>Black Body Radiation</h4>A <a href="https://en.wikipedia.org/wiki/Black_body">black body</a> is defined as a perfect radiating source: it absorbs all radiation that falls on it, at a constant temperature. Such a body is known to radiate electromagnetic radiation, but finding and making sense of the spectrum of such a body is non-trivial. According to classical electromagnetic theory, the amount of radiation produced is expected to be <a href="https://en.wikipedia.org/wiki/Rayleigh%E2%80%93Jeans_law">proportional to the square of the frequency</a>. That is, the higher the frequency, the more radiation. This is clearly not what happens in nature: otherwise hot objects would emit huge amounts of X-rays and gamma rays, and would instantaneously reach absolute zero, transforming all the thermal energy into electromagnetic radiation, as the total radiation is unbounded. However, Planck found that, by postulating that electromagnetic radiation was quantized as photons, with energies given by \(E=hf\), the total radiation was bounded, and tailed off at higher frequencies. The <a href="https://en.wikipedia.org/wiki/Planck's_law">resulting formula</a> is <a href="https://xkcd.com/54/">well born out by experiments</a>, lending support to his postulation. </li> <br><li><h4>Double Slit Experiment</h4> <a href="https://en.wikipedia.org/wiki/Double-slit_experiment">An experiment</a> was performed in which a very dim coherent light source was placed in front of a photographic plate, behind an opaque plate with two narrow slits. The light source was so dim that it emitted no more than one photon at a time. What was found was very strange, according to classical mechanics. The photographic plate produced a pattern of spots where each photon hit it, indicating that the light had been behaving like particles. However the pattern produced is what the classical wave theory predicted: an <a href="https://en.wikipedia.org/wiki/Interference_%28wave_propagation%29">interference pattern</a>. Had the photons been acting like genuine classical particles, a different pattern would have emerged, one with only two peaks as opposed to many. Classical theory had no way to account for this. In addition, whenever any sort of measuring apparatus was put in place to detect which slit the photon passed through (if it was behaving like a classical particle, it would need to have a definite position and hence pass through a definite slit), the wave-pattern disappeared and a particle-pattern emerged. Classical physics has no way to explain this. Moreover, the experiment has these same features, even when performed with electrons, atoms <a href="https://medium.com/the-physics-arxiv-blog/physicists-smash-record-for-wave-particle-duality-462c39db8e7b#.c2ovecnht">and even molecules</a>. In each case, the interference pattern produced is consistent with thinking of each object as if it were a wave with wavelength \(\lambda=h/p\), where p is the momentum of the object. More generally, \(\mathbf{p}=\frac{h}{2\pi}\mathbf{k}\), where \(\mathbf{k}\) is the <b><a href="https://en.wikipedia.org/wiki/Wave_vector">wave vector</a></b> (a sort of generalized, multidimensional wavelength). In fact, the quantity \(\frac{h}{2\pi}\) comes up so frequently that it is given its own symbol: \(\hbar\). </li> <br> <li><h4>Stern-Gerlach Experiment</h4> <a href="https://en.wikipedia.org/wiki/Stern%E2%80%93Gerlach_experiment">It was noticed</a> that when a stream of certain atoms passed through an inhomogeneous magnetic field, the stream separated into several beams, two in the case of silver atoms. This demonstrated not only that the atoms had a <b><a href="https://en.wikipedia.org/wiki/Magnetic_moment">magnetic dipole moment</a></b>, but also that this moment was quantized, as otherwise it would have produced a smear, as opposed to several beams. The magnetic moment was correctly attributed to the charged particles in the atom, in particular the electrons. This implied that the electron had angular momentum. In classical mechanics, an object has angular momentum purely in terms of its structure and rotation. For example a wheel has angular momentum given its distribution of mass combined with its rotation. A point particle in classical mechanics cannot have angular momentum. Thus, as the electron was not known to have any internal structure, nor any literal rotation, the angular momentum could not be accounted for by classical physics. The angular momentum was thus given the name <b><a href="https://en.wikipedia.org/wiki/Spin_%28physics%29">spin</a></b>. An electron always has a measured angular momentum of either \(+h/2\) (called spin up) or \(-h/2\) (called spin down), relative to the axis of measurement. This itself is non-classical: classically, if an object has angular momentum about a certain axis, its angular momentum about an orthogonal axis will be zero, but electrons are never measured to have zero spin.</li> <br> <li><h4>Apparent Indeterminacy</h4>Suppose we have an electron with measured spin up along the x-axis. If it is measured along the y-axis, it will be found to have either spin up or spin down along that axis. Moreover, the spin measured along that axis will appear to be perfectly random: the results of such an experiment pass every known test for statistical randomness. This feature arises often in similar cases. For instance, in the two-slit experiment, where the next photon (or electron) hits the screen is also apparently random. A half-silvered mirror is a common device in optics, which transmits half the light shone on it and reflects the other half. However, if we put two detectors at points where transmitted and reflected light would go, and shine very dim light on it, such that no more than one photon is reaching the half-silvered mirror at a time, the pattern of detectors registering will be also apparently random. The pattern of detection passes every known test for statistical randomness. This type of behavior is very different from the usual CM sort. This <a href="https://en.wikipedia.org/wiki/Quantum_indeterminacy">apparent indeterminacy or randomness</a> is a major aspect of quantum mechanics, and belies much of the disputes and misunderstandings surrounding it.</li> </ul>Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-53172120802040492382015-10-20T20:57:00.001-07:002019-01-11T07:39:01.160-08:00Product Formula for Sine and Some Interesting Corollaries<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h3>Deriving the Product Formula: The Easy Way </h3><br />Recall from <a href="http://www.hyperphronesis.com/2015/10/derivation-of-formula-for-even-values.html">this post</a> that: \[ \sum_{n=1}^{\infty} \frac{1}{x^2+n^2}=\frac{\pi}{2x} \coth(\pi x)-\frac{1}{2x^2} \] We then substitute \(x=i z\): \[ \sum_{n=1}^{\infty} \frac{1}{n^2-z^2}=-\frac{\pi}{2z} \cot(\pi z)+\frac{1}{2z^2} \] We then go down the following line of calculation: \[ \sum_{n=1}^{\infty} \frac{2z}{n^2-z^2}=\frac{1}{z}-\pi\cot(\pi z) \] \[ \int\sum_{n=1}^{\infty} \frac{2z}{n^2-z^2}dz=C+\int \frac{1}{z}-\pi\cot(\pi z) dz \] \[ \sum_{n=1}^{\infty} -\ln \left (1-\frac{z^2}{n^2} \right )=C+\ln (z) - \ln (\sin (\pi z) ) \] \[ \sin(\pi z)=C' z\prod_{n=1}^{\infty}\left ( 1-\frac{z^2}{n^2} \right ) \] We can find \(C'\) by looking at the behavior near zero, and so find that: \[ \sin(\pi z)=\pi z\prod_{n=1}^{\infty}\left ( 1-\frac{z^2}{n^2} \right ) \] Therefore: \[ \sin(z)=z\prod_{n=1}^{\infty}\left ( 1-\frac{z^2}{\pi^2 n^2} \right ) \] <br /><br /><hr /><br /> <h3>Deriving the Product Formula: The Overkill Way, by Weierstrass' Factorization Theorem </h3><br />Suppose a function can be expressed as \[ f(x)=A\frac{\prod_{n=1}^{M}\left ( x-z_n \right )}{\prod_{n=1}^{N}\left ( x-p_n \right )} \] Where \(M \leq N\) and \(N\) can be arbitrarily large, even tending to infinity. Assuming there are no poles of degree >1 (all poles are simple), we can rewrite this as \[ f(x)=K+\sum_{n=1}^{\infty} \frac{b_n}{x-p_n} \] Where some of the \(b_n\) may be zero. We can also write this as \[ f(x)=f(0)+\sum_{n=1}^{\infty} b_n \cdot \left ( \frac{1}{x-p_n}+\frac{1}{p_n} \right ) \] Suppose \(f(0) \neq 0\), and that \(f\) is an <a href="https://en.wiktionary.org/wiki/integral_function">integral function</a> (i.e. an entire function). In that case, the logarithmic derivative \(f'(x)/f(x)\) has poles of degree 1. Moreover, \[\lim_{x \rightarrow z_n} (x-z_n)\frac{f'(x)}{f(x)}=d_n \] Where \(d_n\) is the degree of the zero at \(z_n\). Thus: \[ \frac{f'(x)}{f(x)}=\frac{f'(0)}{f(0)}+\sum_{n=1}^{\infty} d_n \cdot \left ( \frac{1}{x-z_n}+\frac{1}{z_n} \right ) \] Integrating: \[ \ln(f(x))=\ln(f(0))+x \frac{f'(0)}{f(0)}+\sum_{n=1}^{\infty} d_n \cdot \left ( \ln \left (1-\frac{x}{z_n} \right ) +\frac{x}{z_n} \right ) \] \[ f(x)=f(0) e^{x \frac{f'(0)}{f(0)}} \prod_{n=1}^{\infty} \left (1-\frac{x}{z_n} \right )^{d_n} e^{x\frac{d_n}{z_n}} \] This is our main result, called the Weierstrass factorization theorem. In particular, for the function \(f(x)=\sin(x)/x\) \[ \frac{\sin(x)}{x}=\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x}{n \pi} \right ) e^{x\frac{1}{n \pi}}=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{n^2 \pi^2} \right ) \] Thus \[ \sin(x)=x\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 n^2 } \right ) \] <br /><br /><hr /><br /> <h3>Corollary 1: Wallis Product </h3><br />Let us plug in \(x=\pi/2\): \[ \sin(\pi/2)=1=\frac{\pi}{2}\prod_{n=1}^{\infty} \left (1-\frac{1}{4 n^2 } \right ) \] \[ \pi=2\prod_{n=1}^{\infty} \left (\frac{4 n^2}{4 n^2-1 } \right )=2\frac{2 \cdot 2}{1 \cdot 3} \cdot \frac{4 \cdot 4}{3 \cdot 5} \cdot \frac{6 \cdot 6}{5 \cdot 7} \cdot \frac{8 \cdot 8}{7 \cdot 9} \cdots \] More generally: \[ \pi=\frac{N}{M} \sin(\pi M/N) \prod_{n=1}^{\infty} \left (\frac{N^2 n^2}{N^2 n^2 -M^2} \right ) \] This is useful when \(\sin(\pi M/N)\) is easily computable, such as when \(\sin(\pi M/N)\) is algebraic (e.g. \(M=1\), \(N=2^m\) ). For example: \[ \pi=2 \sqrt{2} \prod_{n=1}^{\infty} \left (\frac{4^2 n^2}{4^2 n^2 -1^2} \right ) \] \[ \pi=\frac{2}{3} \sqrt{2} \prod_{n=1}^{\infty} \left (\frac{4^2 n^2}{4^2 n^2 -3^2} \right ) \] \[ \pi=\frac{3}{2} \sqrt{3} \prod_{n=1}^{\infty} \left (\frac{3^2 n^2}{3^2 n^2 -1^2} \right ) \] \[ \pi=\frac{3}{4} \sqrt{3} \prod_{n=1}^{\infty} \left (\frac{3^2 n^2}{3^2 n^2 -2^2} \right ) \] \[ \pi=3 \prod_{n=1}^{\infty} \left (\frac{6^2 n^2}{6^2 n^2 -1^2} \right ) \] \[ \pi=\frac{3}{5} \prod_{n=1}^{\infty} \left (\frac{6^2 n^2}{6^2 n^2 -5^2} \right ) \] \[ \pi=3\sqrt{2}(-1+\sqrt{3}) \prod_{n=1}^{\infty} \left (\frac{12^2 n^2}{12^2 n^2 -1^2} \right ) \] <br /><br /><hr /><br /> <h3>Corollary 2: Product Formula for Cosine </h3><br />Let us evaluate the sine formula at \(x+\pi/2\): \[ \sin(x+\pi/2)=\cos(x)=\left (x+\frac{\pi}{2} \right )\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x+\pi/2}{\pi n } \right ) \] \[ \cos(x)=\frac{\sin(x+\pi/2)}{\sin(\pi/2)}=\left (1+\frac{x}{\pi/2} \right )\prod_{n=-\infty, n \neq 0}^{\infty} \frac{\left (1-\frac{x+\pi/2}{\pi n } \right )}{\left (1-\frac{\pi/2}{\pi n } \right )} \] \[ \cos(x)=\left (1+\frac{x}{\pi/2} \right )\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x}{\pi (n-1/2) } \right )=\prod_{n=-\infty}^{\infty} \left (1-\frac{x}{\pi (n-1/2) } \right ) \] \[ \cos(x)=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \] Alternatively, we can derive this directly from the Weierstrass factorization theorem. <br />Additionally, by using imaginary arguments, we can derive the formulae: \[ \sinh(x)=x\prod_{n=1}^{\infty} \left (1+\frac{x^2}{\pi^2 n^2 } \right ) \] \[ \cosh(x)=\prod_{n=1}^{\infty} \left (1+\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \] <br /><br /><hr /><br /> <h3>Corollary 3: Sine is Periodic </h3><br />Let us evaluate the sine formula at \(x+\pi\): \[ \sin(x+\pi)=\left (x+\pi \right )\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x+\pi}{\pi n } \right ) \] \[ \sin(x+\pi)=\cdots \left (1+\frac{x+\pi}{3\pi} \right ) \left (1+\frac{x+\pi}{2\pi} \right )\left (1+\frac{x+\pi}{\pi} \right )\left (x+\pi \right ) \left (1-\frac{x+\pi}{\pi} \right )\left (1-\frac{x+\pi}{2\pi} \right ) \left (1-\frac{x+\pi}{3\pi} \right ) \cdots \] \[ \sin(x+\pi)=\cdots \left (\frac{4}{3}+\frac{x}{3\pi} \right ) \left (\frac{3}{2}+\frac{x}{2\pi} \right )\left (2+\frac{x}{\pi} \right ) \pi \left (1+\frac{x}{\pi}\right ) \left (\frac{-x}{\pi} \right )\left (\frac{1}{2}-\frac{x}{2\pi} \right ) \left (\frac{2}{3}-\frac{x}{3\pi} \right ) \cdots \] \[ \sin(x+\pi)=\cdots \frac{4}{3}\left (1+\frac{x}{4\pi} \right ) \frac{3}{2}\left (1+\frac{x}{3\pi} \right )2\left (1+\frac{x}{2\pi} \right ) \pi \left (1+\frac{x}{\pi}\right ) \left (\frac{-x}{\pi} \right ) \frac{1}{2}\left (1-\frac{x}{\pi} \right ) \frac{2}{3}\left (1-\frac{x}{2\pi} \right ) \cdots \] \[ \sin(x+\pi)=-2x\left ( \prod_{k=2}^{\infty} \frac{k^2-1}{k^2} \right ) \left ( \prod_{n=1}^{\infty} \left (1-\frac{x^2}{n^2 \pi^2} \right ) \right )=-\sin(x) \] As the first product easily telescopes. Thus \(\sin(x+2\pi)=\sin((x+\pi)+\pi)=-\sin(x+\pi)=\sin(x)\). Therefore, sine is periodic with period \(2\pi\). <br /><br /><hr /><br /> <h3>Corollary 3: Some Zeta Values </h3><br />Let us begin expanding the product for sine in a power series \[ \sin(x)=x\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 n^2 } \right )=x-\frac{x^3}{\pi^2}\left (\frac{1}{1^2}+\frac{1}{2^2}+\cdots \right )+\frac{x^5}{\pi^4}\left (\frac{1}{1^2 \cdot2^2}+\frac{1}{1^2 \cdot3^2}+\cdots \frac{1}{2^2 \cdot3^2}+\frac{1}{2^2 \cdot4^2}+\cdots \right )+\cdots \] \[ \sin(x)=x-\frac{x^3}{\pi^2}\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )+\frac{x^5}{\pi^4}\left (\sum_{m=1,n=1, m < n}^{\infty}\frac{1}{m^2n^2} \right )+\cdots \] \[ \sin(x)=x-\frac{x^3}{\pi^2}\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )+\frac{x^5}{2\pi^4}\left (\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )^2- \sum_{k=1}^{\infty}\frac{1}{k^4} \right )+\cdots \] By comparing this to the Taylor series for sine, we find: \[ \frac{1}{3!}=\frac{1}{\pi^2}\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right ) \] \[ \frac{1}{5!}=\frac{1}{2\pi^4}\left (\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )^2- \sum_{k=1}^{\infty}\frac{1}{k^4} \right ) \] From which it follows that \[ \sum_{k=1}^{\infty}\frac{1}{k^2}=\frac{\pi^2}{6} \] \[ \sum_{k=1}^{\infty}\frac{1}{k^4}=\frac{\pi^4}{90} \] In fact, for the fourth term, we find, similarly, that \[ \frac{1}{7!}=\frac{1}{6\pi^6}\left ( \left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )^3-3\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )\left (\sum_{k=1}^{\infty}\frac{1}{k^4} \right )+2\left (\sum_{k=1}^{\infty}\frac{1}{k^6} \right ) \right ) \] From which it follows that \[ \sum_{k=1}^{\infty}\frac{1}{k^6}=\frac{\pi^6}{945} \] Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-76949884524375349042015-10-10T15:01:00.000-07:002015-10-21T08:28:01.229-07:00Derivation of a Formula for the Even Values of the Riemann Zeta Function<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h3>Lemma 1: Fourier Series of the Dirac Comb </h3><br />A <b>Dirac comb</b> of period T is defined as \[{\mathrm{III}}_T(x)=\sum_{k=-\infty}^{\infty} \delta(x-kT)\] Where \(\delta(x)\) is the Dirac delta function. Since the Dirac comb is periodic with period T, we can expand it as a fourier series: \[\sum_{k=-\infty}^{\infty} \delta(x-kT)=\sum_{n=-\infty}^{\infty} A_n e^{i 2 \pi n x/T}\] We solve for the \(A_m\) in the usual way: \[ \int_{-T/2}^{T/2}\sum_{k=-\infty}^{\infty} \delta(x-kT)e^{-i 2 \pi m x/T} dx=1=\int_{-T/2}^{T/2}\sum_{n=-\infty}^{\infty} A_n e^{i 2 \pi (n-m) x/T} dx=T\cdot A_m \]\[ A_m=1/T \] Thus: \[\sum_{k=-\infty}^{\infty} \delta(x-kT)=\frac{1}{T}\sum_{n=-\infty}^{\infty} e^{i 2 \pi n x/T}\] <br /><br /><hr /><br /><h3>Lemma 2: An Infinite Series </h3><br />\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=\frac{1}{x}+\sum_{n=1}^{\infty} \frac{1}{x+i n}+\frac{1}{x-i n}=\frac{1}{x}+2x\sum_{n=1}^{\infty} \frac{1}{x^2+n^2} \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=\int_{0}^{\infty} \sum_{n=-\infty}^{\infty} e^{-y(x+i n)} dy \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=\int_{0}^{\infty} e^{-yx} \sum_{n=-\infty}^{\infty} e^{-iyn} dy \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=2\pi \int_{0}^{\infty} e^{-yx} \sum_{k=-\infty}^{\infty} \delta(x-2\pi k) dy \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=2\pi \left (\frac{1}{2}+ \sum_{k=1}^{\infty} e^{-2\pi k x} \right ) \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=2\pi \left (\frac{1}{2}+ \frac{e^{-2\pi x}}{1-e^{-2\pi x}} \right )= \pi \frac{e^{2\pi x}+1}{e^{2\pi x}-1} \] Therefore, combining the first and last expressions and rearranging, we find: \[ \sum_{n=1}^{\infty} \frac{1}{x^2+n^2}=\frac{\pi}{2x} \frac{e^{2\pi x}+1}{e^{2\pi x}-1}-\frac{1}{2x^2}=\frac{\pi}{2x} \coth(\pi x)-\frac{1}{2x^2} \] Additionally, by taking the limit as x approaches zero, we find: \[ \sum_{n=1}^{\infty} \frac{1}{n^2}=\frac{\pi^2}{6} \] <br /><br /><hr /><br /><h2>Theorem: Formula for the Even Values of the Riemann Zeta Function </h2><br />Recall that, by definition: \[ \zeta(n)=\sum_{k=1}^{\infty}\frac{1}{k^n} \] Let us then analyze \[ f(x)=1-\frac{x}{2}+\sum_{n=2}^{\infty}\frac{x^{n}}{n!} A_{n} \] Where \[ A_n=-2 \cdot n! \cdot \cos(n\pi/2) \cdot 2^{-n}\pi^{-n} \zeta(n) \] Thus: \[ f(x)=1-\frac{x}{2}-2\sum_{n=1}^{\infty}\left (\frac{-x^2}{4\pi^2} \right )^n \zeta(2n) \]\[ f(x)=1-\frac{x}{2}-2\sum_{n=1}^{\infty}\left (\frac{-x^2}{4\pi^2} \right )^n \sum_{k=1}^{\infty}\frac{1}{k^{2n}} \]\[ f(x)=1-\frac{x}{2}-2\sum_{k=1}^{\infty}\sum_{n=1}^{\infty}\left (\frac{-x^2}{4\pi^2 k^2} \right )^n \]\[ f(x)=1-\frac{x}{2}-2\sum_{k=1}^{\infty} \frac{-x^2}{4\pi^2 k^2}\frac{1}{1+\frac{x^2}{4\pi^2 k^2}} \]\[ f(x)=1-\frac{x}{2}+\frac{x^2}{2\pi^2}\sum_{k=1}^{\infty} \frac{1}{k^2+\frac{x^2}{4\pi^2}} \]\[ f(x)=1-\frac{x}{2}+\frac{x^2}{2\pi^2} \left ( \frac{\pi^2}{x} \frac{e^x+1}{e^x-1} -\frac{2\pi^2}{x^2} \right ) \]\[ f(x)=\frac{x}{2} \left ( \frac{e^x+1}{e^x-1} -1 \right )=\frac{x}{e^x-1} \] Therefore, for n>1, \[ A_n=\lim_{x \rightarrow 0} \frac{\mathrm{d}^n }{\mathrm{d} x^n} \frac{x}{e^x-1} \] These numbers are called the <b>Bernoulli Numbers</b>, symbolized as \(B_n\) and they are easily found to be all rational. Thus, by rearranging, we find: \[ \zeta(2n)=\frac{\pi^{2n} 2^{2n-1} \left | B_{2n} \right |} {(2n)!} \] Thus, all the even values of the zeta function can be found by finding the appropriate Bernoulli number, which itself can be found by simple differentiation. Moreover, we see that all the values are rational multiples of the corresponding power of pi. Specifically, we find that: \[ \zeta(2)=\frac{\pi^2}{6} \]\[ \zeta(4)=\frac{\pi^4}{90} \]\[ \zeta(6)=\frac{\pi^6}{945} \]\[ \zeta(8)=\frac{\pi^8}{9450} \]\[ \zeta(10)=\frac{\pi^{10}}{93555} \] Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-71513839633524216422015-09-29T18:17:00.001-07:002019-08-03T09:48:41.624-07:00Liars, Logic, and Information Theory<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> One of the most common types of logic puzzles involves two tribes, one that always tells the truth and another that always tells lies. There are many versions and variations of puzzles with this setup, but we can develop a method of approach that will work generally. The two main versions fall into 2 categories: <ol><li><b>Identification</b>: we have a group of N people with some known possible set of identifications, and we ask questions to determine what tribe each is from.</li><li><b>Information</b>: We have a group of N people with some known possible set of identifications, and we ask them questions to determine M bits of information (independent yes/no questions). We do not need to identify the tribe of each person. For concreteness, we will take the bits to be 1s or 0s (i.e. we want to find whether the bit is 1 or 0)</li></ol>The questions must be asked individually, and must be yes/no questions. We assume that the persons asked know all information relevant to the puzzle and understand the questions, supposing they are comprehensible. <br><hr><br><h3>A Brief Primer in some Information concepts</h3><br> The fundamental unit of information is the bit. A single bit answers one yes/no question. If both answers are equally likely, the answer gives the most information, as otherwise you could guess the answer more easily. (In fact, the formula for the effective number of bits, if the chance of a "yes" answer is p, is given by: \(-p\log_2(p)-(1-p)\log_2(1-p) \approx 4p(1-p)\)). If there are \(2^N\) equally possible options, it takes N bits of information to narrow it down to one: in general, an additional bit halves the possibility space. If there are M possibilities, and \(2^{N-1}< M \leq 2^N\), then N bits of information are required. From a deterministic source--that is, a source with known, predictable behavior--one answer to one yes/no question yields at most 1 bit of information, and exactly one if both answers are equally probable. In general, if we discover M bits of information with N questions, if we only want a smaller number of bits, we will need fewer questions. <br><br><hr>We will discuss some specific cases, describing some general methods of approaching the problem. We will forgo trivial cases, like asking a 1-bit question to someone of a known tribe, or identifying a person from an unknown tribe. <br><br> <h3>Information: One Person of Unknown Tribe, One Bit </h3><br> Clearly we must ask at least one question, but can we determine it in exactly one question? Indeed we can. Our goal is to formulate a question such that, regardless of whether the person is a liar or a truther, the answer will correspond to the truth. We thus construct the following table, and look for a question such which would produce the listed "real answers" (answers taking into account whether the teller is a liar or truther). <div><br><table style="width: 400px; border-collapse:collapse; border:1px solid black;"><colgroup><col width="800"><col width="800"><col width="800"><col width="800"></colgroup><tbody><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Bit Value</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Identity</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Given Answer</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Honest Answer</b></td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">1</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Truther</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">1</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Liar</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">0</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Truther</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">0</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Liar</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td></tr> </tbody></table><br></div> The simplest way to construct such a question is just to ask one that corresponds to affirmative answers. In this case, the most easily constructed question is <blockquote>Is one of the following true: the bit is 1 and you are a truther, the bit is 0 and you are a liar?</blockquote>Regardless of whether the person asked is a truther or liar, the answer will always be "yes" if the bit is 1 and "no" if it is 0. The question may be found to simplify to something more natural sounding, but the question as given is sufficient. Moreover, if we require N bits of information, we can achieve such in exactly N questions. This will be our general approach. We will make a table in which the given answer corresponds to the information we seek. We will then formulate a question such as to produce the desired answer. This can be done most easily by forming a disjunction of the answers producing an affirmative. <br><br><h3>Identification: One Person of Unknown Tribe, Unknown Language </h3><br> In this case, the tribespeople have a language different than yours. They can understand your questions but reply in a way you can't understand. We will assume that you know the words for "yes" and "no" are "da" and "ja", but you don't know which corresponds to which. If you do not even know what the possible words for "yes" and "no" are, you can find this out with one additional question, merely by asking anything and then knowing that the response either means "yes" or "no". The question is then whether you can identify the tribe of the person, and in as few questions as possible. Given that we only seek one bit of information (the person is either a truther or a liar), we will attempt to do so with a single question. We will look for a question such that the response corresponds to the identity of the person. For concreteness, we will take "Da" to be indicative of a truther, "Ja" of a liar. <div><br><table style="width: 400px; border-collapse:collapse; border:1px solid black;"><colgroup><col width="800"><col width="800"><col width="800"><col width="800"></colgroup><tbody><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Identity</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Translation of "Da"</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Given Answer</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Translated Answer</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Honest Answer</b></td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Truther</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Da</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Truther</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Da</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Liar</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Ja</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Liar</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Ja</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td></tr> </tbody></table><br></div> Again, the simplest way to construct such a question is just to ask one that corresponds to affirmative answers, by a simple disjunction. In this case, the honest answer is "yes" exactly when "Da" means "yes". So we simply ask: <blockquote>Does "Da" mean "yes"?</blockquote>A truther will always answer "Da", and a liar will always answer "Ja". Note that we cannot determine what "Da" actually means from this question, and this accords with information theory concepts. We can only get one bit of information from one question. If we wanted to identify what "Da" meant without knowing the identiy, by a similar method we would find that the following question achieves that: <blockquote>Is one of the following true: "Da" means "yes" and you are a truther, "Da" means "no" and you are a liar?</blockquote>If the answer is "Da", "Da" means "yes". <br><br><h3>Information: One Person of Unknown Tribe, Unknown Language, One Bit </h3><br> This case is much like the preceding one, except we require neither then meaning of "Da" nor the identity of the person. As we need only one bit of information, we require at least one question. We will show how to do it in exactly one question. As before, we construct a table, but this time with three independent variables: the value of the bit, the identity of the person, and the meaning of "Da". <div><br><table style="width: 400px; border-collapse:collapse; border:1px solid black;"><colgroup><col width="800"><col width="800"><col width="800"><col width="800"></colgroup><tbody><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Bit Value</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Identity</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Translation of "Da"</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Given Answer</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Translated Answer</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Honest Answer</b></td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">1</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Truther</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Da</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">1</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Truther</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Da</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">1</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Liar</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Da</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">1</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Liar</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Da</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">0</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Truther</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Ja</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">0</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Truther</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Ja</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">0</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Liar</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Ja</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">0</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Liar</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Ja</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Yes</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">No</td></tr> </tbody></table><br></div> By the same method, the easiest (though not simplest) question to ask is: <blockquote>Is one of the following true: you are a truther and "Da" means "yes" and the bit is 1, you are a liar and "Da" means "no" and the bit is 1, you are a truther and "Da" means "no" and the bit is 0, you are a liar and "Da" means "yes" and the bit is 0 ?</blockquote>A simpler way would be to ask: <blockquote>Is an odd number of the following true: the bit is 1, you are a truther, "Da" means "yes"?</blockquote>In general, we can see that we can always get exactly one bit of information from one question, given certain other constraints. Not knowing the language or the identity of the person asked are no hindrances to getting information. Also, if we have \(2^M\) people from potentially different tribes who speak the same unknown language, or even if we only know the potential words for "yes" and "no" for one of their languages, we can still identify all of them in exactly M questions just by asking the one person M questions. <br><br><h3>Identification: Truther, Liar, and Unhelpful in Unknown Order. </h3><br> In this case, we have three people known to be some permutation of truther, liar and a third kind we call unhelpful. The unhelpful is a third type of tribesperson who answers so as to be maximally unhelpful. That is, he will answer so as to prevent you from getting information. The goal is to identify him regardless, as well as the other two. The first question is whether we can identify the three, and then, if it is possible, to do so in as few questions as we can. As there are 6 possible orderings, we will need 3 bits of information, corresponding to at least 3 questions. We must ask each person at least one question, as only asking 2 or fewer risks only asking the unhelpful, who provides no information. However, if we ask each of them one question, we only get two bits of information, as the unhelpful provides none. Thus we must ask at least 4 questions, with the 4th question being asked of one of the non-unhelpfuls. <br><br>In fact there is a way to do this. We ask a question which the truther and the liar will answer differently. We then take the odd one out among the three, who is guaranteed to be either a truther or a liar (in fact, the way he answers will decide which) and then ask him for one more bit of information to identify one of the others, which we have already described how to do. So, for instance, we can ask all three "Do you exist?" (or, if the language is unknown "Does 'Da' mean 'yes'?"). And then concoct a question to ask the odd one out to get the final requisite bit (left as an exercise for the reader). Thus we can achieve it in exactly 4 questions. In fact, for the first three questions, we only get 2/3 of a bit of information per answer, as, for each answer, we get 1 bit with 2/3 probability. Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-20743963107191823782015-07-16T23:23:00.001-07:002018-02-04T12:52:41.232-08:00Preliminary Matters Relating to Morality<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h3>Obligations and Duties</h3><br />We wish to characterize obligations and duties (taken as essentially synonymous) in a more definite way than is typically used, specifically, as pertains to morality, as typically conceived. Many uses of the term have no relation to morality whatsoever. For instance, a legal obligation is merely something demanded of someone by the governing laws which, should he fail to fulfill it, would result in some sort of penalty. If the penalty were absent, the so-called obligation would be rendered irrelevant, as it would be merely up to the disposition of the one obligated whether to fulfill it or not, and no enforcement could be possible. Thus, legal obligations are no more than demands with enforced consequences: it is demanded of the person to do something, and, failing to do so, punishment will result. Another sort is a social or societal obligation. In this case, there is a certain expectation to behave in a certain way, and failing to behave results in some loss of social esteem, stigmatization, shunning, demotion, reduced access to social assets (like favors or company), etc. <br /><br />However, clearly moral obligations are not of either of these sorts: with a moral obligation, even if no punishment or repercussion would be visited from without, there would still be the internal drive to act. Moreover, even if it were demanded of us by law to act immorally, or our society expected us to do so, that would have no moral bearing on whether we should so act. The missing ingredient, then, if an obligation or duty is to be different from a mere demand or expectation, with or without penalties for transgression, is the drive from within: there is no duty without a sense of dutifulness. If one feels no obligation to do a thing, then one simply has no such obligation. <blockquote><i><b>"[D]uty has no hold on [a man] unless he desires to be dutiful."</b> </i><br />-B. Russell</blockquote> <br /><hr /><br /><h3>Truth and Objectivity</h3><br />The simplest way to analyze objective truth is to begin by looking at statements already agreed to be objectively true: (A) "If X is a triangle, X has three sides", (B) "Horses exist". How is it that these statements are objectively true? Surely it is that, when we interpret them correctly, we get a claim about the world that accurately describes it. The truth value of the propositions will depend on how we interpret the terms. For instance, if we interpret the term "triangle" (merely a word: a set of symbols) to mean what we normally mean by the word "square", then (A) would be false. It is only when the semantic content of the terms is specified (as well as the way in which the content of the sentence is to be educed from the terms, e.g. grammar) that the sentence or proposition can have an objective truth value. When the terms are left unspecified, or determined on a subject by subject basis, then the proposition is subjective. Thus, all that is needed to make a system as objective as, say, geometry, is to have the terms well-defined, be it a moral system or any other. <br /><hr /><br /><h3>Voluntary Action</h3><br />We will define a voluntary action as one a person does as a result of a choice they make. Involuntary actions are basically irrelevant to considerations, in any practical sense, except insofar as they can be changed via voluntary actions. It is then also clear that voluntary actions are the only ones that can be considered in any plausible morality: someone is not moral or immoral based on actions they can't control. This is often summarized in the dictum "ought implies can": regardless of what, exactly, "ought" is taken to mean in the end, it must imply that the thing one ought to do is one that one can do (though "can" might itself need some further analysis). Furthermore, "ought" seems to imply also "can not", as in "can do otherwise". If one can't help but do something, it cannot be meaningfully said that she ought to do it. Thus, oughts imply a choice, where the alternatives can each be acted on <br /><br />In any choice between alternatives, choosing one must mean that one wanted that option, for if one wanted a different option, she would have chosen it. "Want" here is to be taken in a more general sense than it may often be. You may want to go with your friends to the movies, but do homework instead, and why? Because though you may prefer movies to homework in general, you prefer doing well in a class at the expense of spending less time with your friends to spending more time with your friends and doing worse in the class. In the greater context, you prefer doing homework to going to the movies in this case, as opposed to generally preferring movies to homework with no context. Thus, all voluntary choices are the result of the person doing what she wants: everyone always does what they want most, as far as they can. A clear corollary of this is that to change voluntary behavior, one must appeal to what the person in question wants or cares about. This is abundantly clear in experience as well. Moreover, the converse is also manifestly true: if something affected someone's voluntary behavior, it must have appealed to what she wanted or cared about. For, as everyone does what they want most, as far as they can, what affects their voluntary actions must have appealed to what they want or cared about. Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-56566561949445120542015-02-22T09:59:00.000-08:002015-07-05T16:04:55.059-07:00Valuation Theory<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> <br /><h3> Valuation Systems </h3>A valuation system (VS) is any system by which value is assigned to things. That is, the way in which terms like "better", and "worse", "good", and "bad", are given meaning or are understood. For example, in choosing a hinge for a door, one system of saying "hinge X is better than hinge Y" is to consider price (cheaper being better, for instance), or resistance to rust, or weight, or color, or size, etc. After all, there is no unqualified way to say "hinge A is better than hinge B", and any statement that does not explicitly state the way in which hinge A is deemed better than hinge B will have some implicit VS.<br /><br />All VSs have a <b>domain</b>, which is the set of all things which can be valued by that VS. The VS used to compare hinges won't be able to compare the value of microprocessors, or political parties, or cake recipes. It is important to keep in mind the domain of a VS when discussing it. We will denote the domain of VS X as D<sub>X</sub>.<br /><hr /><br /><h3>Types of Valuation Systems</h3><br />There are two general sorts of valuation systems:<br /><br /><ul><li><b>Comparative Valuation Systems </b>(CVSs): Determines only the ranking of value for the elements of a given, countable set. If X is a CVS and X values A above B, we will write that as \( (A>B)_X\), which we can read as "A is better than B, according to X". Note that CVSs don't have any notion of "good" or "bad", but only "better" and "worse", and possibly "best", if there is some element better than the rest.</li><br /><ul><li>A subset of CVSs are Bi-comparative VSs (bCVSs, or C<sub>2</sub>VSs), which only rank sets with exactly two elements, either with one better and one worse, or with both equal. If the bCVS has the additional property of being <b>transitive</b>, then the system can be used to impose a <b>partial ordering</b> on the elements of its domain. </li><br /></ul><li><b>Evaluative Valuation Systems </b>(EVSs): Determines the plain value of every element in its domain, like a function. Namely, we can symbolize "the value of A, according to EVS X" as \(V_X(A)\). Without loss of generality, we can take the values assigned to be real numbers. If only order is important, we can take the range to be the numbers in the interval \([-1,1]\). Note that EVSs can have a notion of "good" and "bad", in that we can define "A is bad, according to EVS X" as \(V_X(A)< c \), for some number c, which we can take to be 0. Similar statements can be similarly defined. To keep notation consistent, we will write \((A>B)_X\) iff \(V_X(A)>V_X(B)\), for some EVS X.</li></ul> <br /><hr /><br /><h3>Indifferent Extensions</h3><br /> We can also define the <b>indifferent extension</b> of a valuation system X with domain D<sub>X</sub> as the valuation system that is identical to X for any elements in D<sub>X</sub>, and is indifferent to all other things. More exactly, we can define it for the cases of CVSs and EVSs as follows: <ul><li>CVSs:<br />Let \(X\) be a CVS with domain \(D_X\). The CVS \(X'\) is the <b>indifferent extension</b> of \(X\), such that, for any \( a,b \notin D_X\) and \(c \in D_X\), \((a< c )_{X'} \), \((a=b)_{X'}\). </li><br /><li>EVSs:<br />Let \(X\) be an EVS with domain \(D_X\). The EVS \(X'\) is the <b>indifferent extension</b> of \(X\), such that, for any \( a\notin D_X\), \(V_{X'}(a)=0\). </li></ul> <br /><hr /><br /><h3>Optimal Elements</h3><br /> We can also give meaning to statements like "t is the best element in set S, according to X", in two senses. We can say that t is the <b>optimal</b> element of S according to VS X if, for every element s of S such that \(s \neq t\), then \( (t > s)_X \). We can say that t is an <b>equi-optimal</b> element of S according to VS X if, for every element s of S, \( (t \geq s)_X \). We can also say that "t is the best element in set S, according to set A", for some set A of VSs, if, for each VS X in A, s is the optimal element in X. We might also stipulate that for every VS in A there is an optimal element in S. Similarly for equi-optimal. <br /><br />If we want to say something like "t is the best element in S" without qualifying it by a VS, it must be the case that all valuation systems agree (or perhaps there is some "best VS" which would deem s optimal, but we will get to that later). Namely, we say that s is the <b>universo-optimal</b> (UO) element of S if, for every VS X for which there is an optimal element in S, s is the optimal element of X. We also can say that s is a <b>universo-equi-optimal</b>(UEO) element of S if, for every VS X for which there is an equi-optimal element in S, s is an equi-optimal element of X. Note that for there to be a universo-optimal element, <u>all</u> relevant VSs must agree: if there is even one VS for which there is a different optimal element than another, then there is no universo-optimal element in S. <br /><hr /><br /><h3>Meta-Valuation Systems, Optimal Valuation Systems, and Recommendation</h3><br /> We can also have VSs whose domain includes some subset of the set of all VSs. We can call these <b>meta-valuation systems</b> (MVS). We can also define the set of totally meta-VSs (TMVS), which is the set of all VSs whose domain includes the set of all VSs. <br />Now, if there is to be some VS that can be called "the best VS", it must be the case that it is UO (or at least UEO) in the set of all VSs. Thus we define: <br />a VS X is the objectively best VS iff, for ever VS Y in the set TMVSs for which there is an optimal element, X is the optimal element of Y in the set of all VSs. <br />However, it seems not hard to very strongly suggest if not prove that there is no such VS, for all it takes are two TMVSs with optimal elements that disagree as to this optimal element, and this seems very easy to construct. Thus there simply is no such objectively best VS. We can call this the <b>Universo-Optimality Absence Theorem</b>.<br /><br />Also, we can say that VS A <b>recommends</b> VS B if \((B>A)_A\). We denote this by \(A \rightarrow B\). Clearly A must be a MVS, as it includes the VS B in its domain. The relevance is that, if we hold to VS A, and A recommends B, then we should discard A and take up B instead. We may have some issues if A recommends multiple VSs, but then the solution would then be to follow the recommendation that is outranks the rest. For example, if \(A \rightarrow B\) and \(A \rightarrow C\), and \((B>C)_A\), then we should choose B, rather than C. However, we will say that a VS A is a consistent recommender if it is the case that if \(A \rightarrow B\), and \(A \rightarrow C\), and \((B>C)_A\), then \(C \rightarrow B\), and it is not the case that \(B \rightarrow C\). Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-36433027594059653732014-07-01T21:15:00.005-07:002019-02-15T13:08:06.591-08:00Probability Problems<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> There are many interesting problems that can be studied with probability theory. Here I will discuss a few of my favorites. </br><hr></br><h3> Maxima in Data </h3>Suppose we have a sequence of values of random variables \(\left(X_{k} \right )_{k=1}^{n}\)that are independent and identically distributed with density function \(f_{X}(x)\). We wish to find the expected number of local maxima in the data, that is, values of \(X_{k}\) such that \(X_{k-1} < X_{k} > X_{k+1}\). Let \(y=X_k\). Then, the probability \(p\) that a certain value y is a maximum is that of having \(X_{k-1} < y > X_{k+1}\). As all the \(X\)s are independent and identically distributed, we can calculate this probability by finding \[p= \int_{-\infty}^{\infty}f_{X}(x)F_{X}^2(x)dx\] Where \(F_{X}(x)=\int_{-\infty}^{x}f_{X}(y)dy\) is the cumulative distribution function of \(f_{X}\). The form above is like \(\sum_y P(X_k=y)P(X_{k-1} < y)P(X_{k+1} < y)\), except we use the continuous formulation. However, clearly \(F_{X}'(x)=f_X(x)\), and so the formula becomes \[p= \int_{-\infty}^{\infty}F_{X}'(x)F_{X}^2(x)dx\] But, by elementary calculus, we can change this to \[p= \int_{0}^{1}F^2 dF=\frac{1}{3}\] (the change in bounds arises from the fact that \(F_{X}(-\infty)=0\) and \(F_{X}(+\infty)=1\)). Thus, we expect one-third of the values in the sequence to be local maxima. Likewise, we expect one-third to be local minima, and the remaining third to be neither. </br></br>By the same method, we can look for other patterns. For instance, the fraction of data points that are higher than all four of their closest neighbors is \(\frac {1}{5}\). The fraction of data points that are bigger than their closest neighbors and smaller than their next-closest neighbors is \(\frac {1}{30}\). In fact, all the calculations can be made by evaluating integrals of the form \( \int_{0}^{1} x^m (1-x)^n dx\). We can also use results like this to test for non-independent or non-identically distributed data. It may even be possible to use it in fraud or bias detection. Based on next-to-nothing-at-all, I would expect human generated data to fail some of these tests. </br></br>We can also find that the distribution of the number of maxima in \(n\) data points, regardless of the probability distribution \(f_{X}\), is approximately normally distributed with mean \(\frac{n}{3}\) and variance \(\frac{2 n}{45}\). Thus, if we found fewer than 2960 or more than 3040 maxima in a list of 9000 data points, we could be 95% confident that the list was not of independent and identically distributed values. We can also run the same test for minima, but for non-maxima-non-minima, the variance is instead \(\frac{2 n}{15}\). </br>The values for the variances were found empirically. I don't really know how one would go about finding them analytically. </br></br>We can also find the distribution of the values of the maxima, which is easily found to be \[g(x)=3 f_{X}(x)F_{X}^2\] Other distributions are similarly found. </br><hr></br><h3> Joint Lives </h3> Suppose we stat with \(N\) couples (\(2N\) people), and at a later time, \(M\) of the original \(2N\) people remain. We want to find the expected number of intact couples remaining. Let \(C(M)\) and \(W(M)\) be the expected remaining number of couples and widows respectively when M total people are left. We then note that, as any remaining person is equally likely to be eliminated next, we have: \[ C(M-1)=C(M)-2 \frac{C(M)}{M} \\ W(M-1)=W(M)-\frac{W(M)}{M}+2 \frac{C(M)}{M}\] We can solve this recurrence relation, subject to the constraints \(W(M)+2 C(M)=M\) and \(C(2N)=N, W(2N)=0\), and find that \[ C(M)=\frac{M(M-1)}{2(2N-1)} \\ W(M)=\frac{M(2N-M)}{2N-1} \] If we express M as a fraction of the total starting population: \(M=2xN\), and express \(C\) and \(W\) as fractions of the total population, we find, for \(N\) big: \[ C(x)=x^2 \\ W(x)=x(1-x) \] Also, for the general case of starting out with \(kN\) \(k\)-tuples, the expected number of intact \(k\)-tuples when \(M\) individuals remain is given by: \[K(M)=N \frac{\binom{M}{k}}{\binom{kN}{k}}\] For the case of triples, we have the number of triples, doubles and singles when M individuals remain is given by: \[K_3 (M)= \frac{M(M-1)(M-2)}{3(3N-1)(3N-2)} \\ K_2 (M)= \frac{M(M-1)(3N-M)}{(3N-1)(3N-2)} \\ K_1 (M)= \frac{M(3N-M)(3N-M-1)}{(3N-1)(3N-2)} \] Generally, with the same sense as discussed above, the fraction of the population in a m-tuple, beginning with only k-tuples, when fraction \(x\) of the population remains, is given by: \[K_m (x)=\binom{k-1}{m-1} x^m (1-x)^{k-m}\] In fact, the general form for the expected number can be given as \[ K_m (M)= N \frac{\binom{M}{m}\binom{kN-M}{k-m}}{\binom{kN}{k}} \] </br><hr></br><h3> Random Finite Discrete Distribution</h3> Suppose we have a discrete random variable that can take on the values \(1,2,3,...,n\) with probabilities \(p_1,p_2,p_3,...p_n\) respectively, subject to the constraint \(\sum_{k=1}^n p_k=1\). Let \(p\) be an arbitrary value among the \(p\)s. We will take any combination of values for the \(p\)s as equally likely. By looking at the cases of n=2 and n=3, we find that the probability density function of \(p\) is given by \[ f_P(p)=(n-1)(1-p)^{n-2} \] And the cumulative distribution function is given by \[ F_P(p)=1-(1-p)^{n-1} \] The average value of \(p\) is then \[\int_{0}^{1} p(n-1)(1-p)^{n-2}dp=\frac{1}{n}\] And the variance is \[\int_{0}^{1} p^2 (n-1)(1-p)^{n-2}dp-\frac{1}{n^2}=\frac{n-1}{n^2 (n+1)}\] We thus find that the chance that \(p\) is above the average value is \[P\left ( p > \frac{1}{n} \right )=\left ( 1-\frac{1}{n} \right )^{n-1}\] In the limit as n becomes large, this value tends to \(\frac{1}{e}\). </br>A confidence interval containing fraction x of the total probability, for large n, is given by: \[ \frac{1}{n} \ln \left(\frac{e}{e x +1-x} \right) \leq p \leq \frac{1}{n} \ln \left(\frac{e}{1-x} \right) \] For instance, a \(50 \%\) confidence interval is given by \(\frac{1}{n}\ln \left(\frac{2 e}{1+e}\right) \leq p \leq \frac{1}{n}\ln(2 e)\). </br></br>We can also extend this to continuous distributions with finite support if we only consider the net probability of landing in equally-sized bins. While the calculation may break down if the number of possible values is actually infinite, it can be used to get some information about distributions with an arbitrarily large number of possible values. </br><hr></br><h3> Maximum of Exponential Random Variables </h3>Suppose we have \(N\) independent and identically distributed exponential random variables \(X_1,X_2,...X_N\) with means \(\mu\). That is, \(f_X (x_k)=\frac{1}{\mu} e^{-\frac{x}{\mu}}\) when \(x \geq 0\) and zero otherwise. Let us interpret the random values as lifetimes for \(N\) units. The exponential distribution has the interesting property of <b>memorylessness</b>, which means that \(P(x > a+b| x > b)=P(x > a)\). We can show this by using the definition: \[ P(x>a+b|x>b)=\frac{P(x>a+b \cap x>b)}{P(x>b)}=\frac{P(x>a+b)}{P(x>b)} \\ P(x>a+b|x>b)=\frac{\int_{a+b}^{\infty}e^{-\frac{x}{\mu}}dx}{\int_{b}^{\infty}e^{-\frac{x}{\mu}}dx}=\frac{e^{-\frac{a+b}{\mu}}}{e^{-\frac{b}{\mu}}}=e^{-\frac{a}{\mu}}=P(x>a) \] In other words, given that a unit lasted \(b\) minutes, the chance that it will last another \(a\) minutes is the same as that it would last \(a\) minutes. We now calculate the probability distribution of the minimum of the \(N\) random variables. The probability that the minimum of \(X_1,X_2,...X_N\) is no less than \(x\) is the same as the probability that \(X_1 \geq x \cap X_2 \geq x \cap...X_N \geq x \). As all the \(X\)s are independent, this can be simplified to a product, and as all the \(X\)s are identically-distributed, we can simplify this further: \[ P(\min(X_1,X_2,...)\geq x)=\left ( P(X_1 \geq x) \right )^N= \left ( \int_{x}^{\infty}\frac{1}{\mu} e^{-\frac{x}{\mu}}\right )^N \\ P(\min(X_1,X_2,...)\geq x)= e^{-\frac{xN}{\mu}} \\ P(\min(X_1,X_2,...)\leq x)= 1-e^{-\frac{xN}{\mu}} \\ f_{\min(X)}(x)=\frac{N}{\mu}e^{-\frac{xN}{\mu}} \] Thus, the average of the minimum of \(X_1,X_2,...X_N\) is \(\frac{\mu}{N}\). We now combine these two facts, the mean minimum vale and the memorylessness. We start with all units operational, and we have to wait an average of \(\frac{\mu}{N}\) until the first one fails. However, given that the first one fails, the expected additional wait time until the next one fails is just \(\frac{\mu}{N-1}\), that is, the expected minimum of \(N-1\) units. Thus, the expected time that the \(m\)th unit fails is given by \[\mu\sum_{k=0}^{m-1}\frac{1}{N-k}\] Thus, the expected maximum time, when the \(N\)th unit fails is \[\mu\sum_{k=1}^{N}\frac{1}{k}\] </br></br>More generally, we can look at the distributions of the kth <b>order statistic</b> of \(X_1,X_2,...X_N\). The kth order statistic, denoted \(X_{(k)}\), is defined as the kth smallest value, so that \(X_{(1)}\) is the smallest (minimum) value, and \(X_{(N)}\) is the largest (maximum) value. The pdf is easily found to be: \[ f_{X_{(k)}}(x)=k {N \choose k} F_X^{k-1}(x)\left[1-F_X(x)\right]^{N-k}f_X(x) \] Where \(F_X(x)\) is the cdf of X, and \(f_X(x)\) is the pdf of X. So, in this case, \[ f_{X_{(k)}}(x)=\frac{k}{\mu} {N \choose k} e^{-(N-k+1)x/\mu}\left[1-e^{-x/\mu}\right]^{k-1} \] Thus, the <a href="http://en.wikipedia.org/wiki/Moment-generating_function">moment generating function</a> is given by: \[ g(t)=\frac{k}{\mu} {N \choose k} \int _0 ^\infty e^{-(N-k+1-\mu t)x/\mu}\left[1-e^{-x/\mu}\right]^{k-1} dx \] By a simple transformation, we find that: \[ g(t)=k {N \choose k} \int _0 ^1 u^{N-k-\mu t}(1-u)^{k-1} du \] This puts the integral in a <a href="http://en.wikipedia.org/wiki/Beta_function">well-known form</a>, which has the value \[ g(t)=\frac{N!}{\Gamma(N+1-\mu t)}\frac{\Gamma(N-k+1-\mu t)}{(N-k)!} \] By a simple calculation, the <b><a href="http://en.wikipedia.org/wiki/Cumulant">cumulants</a></b> are then given by the surprisingly simple form: \[ \kappa_n=\mu^n(n-1)!\sum_{j=N-k+1}^{N} \frac{1}{j^n} \] Several interesting results follow from this: <ul><li>For the Nth order statistic (the maximum), we already know that the mean value goes as \(\sum_{j=1}^{N} \frac{1}{j}\). But now we see that the other cumulants go as \((n-1)!\sum_{j=1}^{N} \frac{1}{j^n}\). Thus, the variance converges, in the limit, to \(\mu^2 \frac{\pi^2}{6}\). The <a href="http://en.wikipedia.org/wiki/Skewness">skewness</a> converges, in the limit, to \(\frac{12 \sqrt{6}}{\pi^3}\zeta(3)\), and the <a href="http://en.wikipedia.org/wiki/Kurtosis#excess_kurtosis">excess kurtosis</a> converges to \(\frac{12}{5}\). In fact, if we shift to take into account the unbounded mean, the distribution of the maximum converges to a <a href="http://en.wikipedia.org/wiki/Gumbel_distribution">Gumbel distribution</a>. This is a special case of a fascinating result known as the <a href="http://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%93Gnedenko_theorem">extreme value theorem</a>. </li><li>For any given, fixed, finite \(k\geq 0\), \(X_{(N-k)}\) converges, as N goes to infinity, to a non-degenerate distribution with finite, positive variance, if we shift it to account for the unbounded mean. </li><li>For k of the form \(k=\alpha N\) (or the nearest integer thereto), for some fixed alpha between 0 and 1, for \(\alpha \neq 1\), in the limit at N goes to infinity, the distribution of \(X_{(\alpha N)}\) become <a href="http://en.wikipedia.org/wiki/Degenerate_distribution">degenerate distributions</a> with all the probability density located at \(\mu\ln\left(\frac{1}{1-\alpha}\right)\). These are, of course, the locations of the \(100\alpha \%\) quantiles, and so \(X_{(\alpha N)}\) is a <a href="http://en.wikipedia.org/wiki/Consistent_estimator">consistent estimator</a> for the \(100\alpha \%\) quantile. </li></ul></br></br>As a more general result, let us find the cdf of \(X_{(\alpha N)}\) for an arbitrarily distributed X, in the limit as N goes to infinity. The cdf of \(X_{(\alpha N)}\) is given by: \[ F_{X_{(\alpha N)}}(y)=\alpha N {N \choose \alpha N} \int _{-\infty} ^{y} F_X^{\alpha N-1}(x)\left[1-F_X(x)\right]^{N-\alpha N}f_X(x) dx \] As \(f_X(x)=\frac{d}{dx}F_X(x)\), we then have, by a simple substitution: \[ F_{X_{(\alpha N)}}(y)=\alpha N {N \choose \alpha N} \int _{0} ^{F_X(y)} u^{\alpha N-1}\left[1-u\right]^{N-\alpha N} du \] This is the cdf of a <a href="http://en.wikipedia.org/wiki/Beta_distribution">Beta distributed</a> random variable, with mean \(\mu=\frac{\alpha N}{N+1}\) and variance \(\sigma^2=\frac{\alpha N (N-\alpha N+1)}{(N+1)^2(N+2)}\). Thus, as N goes to infinity, this will converge in distribution to a degenerate distribution with all the density at \(y=F_X^{-1}(\alpha)\), that is, at the \(100\alpha \%\) quantile of the distribution. </br> <hr></br><h3> Choosing a Secretary </h3>Suppose we need to hire a secretary. We have \(N\) applicants arrive and we interview them sequentially: once we interview and dismiss an applicant, we cannot hire her. The applicants all have differing skill levels, and we want to pick as qualified an applicant as we can. We want to find the optimal strategy for choosing whom to hire. We easily see that the optimal strategy is something like the following. We consider and reject the first \(K\) applicants. We then choose the first applicant who is better than all the preceding ones. Thus, our problem reduces to finding the optimal value for \(K\). We will do so in a way that maximizes the probability that the most qualified secretary is selected. We thus have the probability: \[ P(\mathrm{best\, is\, chosen})=\sum_{n=1}^{N}P(\mathrm{n^{th}\, is\, chosen} \cap \mathrm{n^{th}\, is\, best}) \\ P(\mathrm{best\, is\, chosen})=\sum_{n=1}^{N}P(\mathrm{n^{th}\, is\, chosen}| \mathrm{n^{th}\, is\, best})P(\mathrm{n^{th}\, is\, best}) \] We then note that each applicant in line is the best applicant with equal probability. That is, \(P(\mathrm{n^{th}\, is\, best})=\frac{1}{N}\). Also, we can find the conditional probabilities. If \(M \leq K\), then \(P(\mathrm{M^{th}\, is\, chosen}| \mathrm{M^{th}\, is\, best})=0\). If the \((K+1)\)th applicant is best, she will certainly be chosen, that is \(P(\mathrm{(K+1)^{th}\, is\, chosen}| \mathrm{(K+1)^{th}\, is\, best})=1\). Also, we find that \(P(\mathrm{(K+m)^{th}\, is\, chosen}| \mathrm{(K+m)^{th}\, is\, best})=\frac{K}{K+m}\), as that is the chance that the second-best applicant among the first \(K+m\) applicants is in the first \(K\) applicants. We thus have \[ P(\mathrm{best\, is\, chosen})=\frac{K}{N}\sum_{n=K+1}^{N}\frac{1}{n} \] Let us assume we are dealing with a relatively large number of applicants. In that case, we can approximate \(\sum_{n=A+1}^{B}\frac{1}{n} \approx \ln \left(\frac{B}{A} \right )\). Thus \[ P(\mathrm{best\, is\, chosen})=\frac{K}{N}\ln \left(\frac{N}{K}\right )=-\frac{K}{N}\ln \left(\frac{K}{N}\right ) \] If we then let \(x=\frac{K}{N}\), we just need to maximize \(-x\ln(x)\), which happens at \(x=e^{-1}\). From this, we find that \(P(\mathrm{best\, is\, chosen})=e^{-1}\). Thus, the best strategy is to interview and reject the first \(36.8 \%\) of the applicants, and then choose the next applicant who is better than all the preceding ones. This will get us the best applicant with a probability of \(36.8 \%\). </br></br>A related problem involves finding a strategy that minimizes the expected rank of the selected candidate (the best candidate has rank 1, the second best rank 2, etc.). <a href="http://www.math.upenn.edu/~ted/210F10/References/Expectations.pdf">Chow, Moriguti, Robbins and Samuels</a> have found that the optimal strategy involves the following (in the limit of large \(N\)): skip the first \(c_0 N\) applicants, then, for all applicants before the number \(c_1 N\), we stop looking if the applicant is the best so far. If we have not yet selected an applicant, we choose the best or second best so far before the number \(c_2 N\). If we have not yet selected an applicant, we choose the best or second best or third best so far before the number \(c_3 N\). And so on. By choosing the \(c_n\) optimally, we can get an expected rank of \(3.8695\). This is quite surprising: we can expect an applicant in the top 4, even among billions of applicants! </br>The optimal values for the \(c_n\) are \[ c_0=0.2584... \\ c_1=0.4476... \\ c_2=0.5639... \] The <a href="http://www.math.ucla.edu/~tom/Stopping/sr2.pdf">general formula</a> for \(c_n\) is \[ c_n=\prod_{k=n+2}^{\infty}\left ( \frac{k-1}{k+1} \right )^{1/k}=\frac{1}{3.86951924...}\prod_{k=2}^{n+1}\left ( \frac{k+1}{k-1} \right )^{1/k} \]Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-45474377591235970292014-06-16T18:10:00.000-07:002014-08-16T16:02:57.324-07:00Some Set Theory<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> Set theory is a fundamental branch of mathematics that studies <b>sets</b>, that is, collections of objects that can be treated as wholes. <blockquote>"A set is a Many that allows itself to be thought of as a One."</blockquote> <div align="center">- Georg Cantor. </div></br>A set has various <b>elements</b> or <b>members</b>, which may either be individual objects or themselves sets. An extension to this is the inclusion of the <b>empty set</b> (the set containing no objects, symbolized \(\emptyset\) or \(\left \{\right\}\)) as an allowable set. If set A has a, b and c as elements, we symbolize that as A={a,b,c}. Note that, in set theory, order and multiplicity don't matter, so A is also equal to {b,c,a,a,b}. </br><hr></br><h3> Definitions and Notation </h3><table cellpadding="5"> <tr><td height="*">\( x \in A \) </td><td height="*"> </td><td height="*">x is an element/member of A. </td></tr> <tr><td height="*">\( x \notin A \) <td height="*"> </td><td height="*">x is not an element/member of A. </td></tr> <tr><td height="*">\( A \subseteq B \) </td><td height="*"> </td><td height="*">Set A is a <b>subset</b> of B. That is, for all x, if x is an element of A, then x is an element of B. Note that "A is a subset of B" is equivalent to "B is a <b>superset</b> of A". </td></tr> <tr><td height="*">\( A = B \) </td><td height="*"> </td><td height="*">A is equivalent to B. That is, for all x, x is an element of A if and only if x is an element of B. </td></tr> <tr><td height="*">\( A \neq B \) </td><td height="*"> </td><td height="*">A is not equivalent to B. </td></tr> <tr><td height="*">\( A \subset B \) </td><td height="*"> </td><td height="*">\( A \subseteq B \) and \( A \neq B \). If \( A \subset B \), we say A is a <b>proper subset</b> of B. </td></tr> <tr><td height="*">\(A \cap B \) </td><td height="*"> </td><td height="*">The <b>intersection</b> of sets A and B. That is, the set that contains all of the elements that are in both A and B. Sets A and B are said to be <b>disjoint</b> if \(A \cap B = \emptyset\). </td></tr> <tr><td height="*">\(A \cup B \) </td><td height="*"> </td><td height="*">The <b>union</b> of sets A and B. That is, the set that contains all of the elements that are in either A or B. </td></tr> <tr><td height="*">\(A \setminus B \) </td><td height="*"> </td><td height="*">The <b>set difference</b> of sets A and B. That is, the set that contains all of the elements in A that are not in B. </td></tr> <tr><td height="*">\(\mathbb{Z} =\left \{ ...,-2,-1,0,1,2,... \right \} \) </td><td height="*"> </td><td height="*">The set of all <b>integers</b>. That is, the set that includes 0 and every other number that can be achieved by successively adding or subtracting 1 from 0. </td></tr> <tr><td height="*">\(\mathbb{N} =\left \{1,2,3,... \right \} \) </td><td height="*"> </td><td height="*">The set of all <b>natural numbers</b>. That is, the set that includes 1 and every other number that can be achieved by successively adding 1 to 1. </td></tr> <tr><td height="*">\(\mathbb{W} =\left \{0,1,2,3,... \right \} \) </td><td height="*"> </td><td height="*">The set of all <b>whole numbers</b>. That is, the set that includes 0 and every other number that can be achieved by successively adding 1 to 0. </td></tr> <tr><td height="*">\( \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} Px \end{matrix}\right\} \) </td><td height="*"> </td><td height="*">The set of all elements of the form \(x\) such that \(Px\) is true of x. For instance \(\mathbb{N}=\left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{Z} \wedge x>0 \end{matrix}\right\} \). </td></tr> <tr><td height="*">\( \mathbb{Q}= \left\{\begin{matrix} \frac{x}{y} \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{Z} \wedge y \in \mathbb{N} \end{matrix}\right\} \) </td><td height="*"> </td><td height="*">The set of all <b>rational numbers</b>. That is, the set of all real numbers expressible as a ratio of integers with non-zero denominator. </td></tr> <tr><td height="*">\( \mathbb{N}_{n}=\left\{1,2,3,...,n\right\} \) </td><td height="*"> </td><td height="*">The truncated set of the natural numbers up to \(n\). Another way of expressing it would be: \( \mathbb{N}_{n}=\left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{N} \wedge x \leq n \end{matrix}\right\} \). </td></tr> <tr><td height="*">\( \mathbb{R}=\left (-\infty, \infty \right) \) </td><td height="*"> </td><td height="*">The set of all <b>real numbers</b>. As a note, we often see the shorthand \((a,b)\) (not to be confused with an ordered pair!) to mean \( \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{R} \wedge a < x \wedge x < b \end{matrix}\right\} \). We may also see the shorthand \([a,b]\) to mean \( \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{R} \wedge a \leq x \wedge x \leq b \end{matrix}\right\} \). The combined notation \((a,b]\) should be clear enough. </td></tr> <tr><td height="*">\(\mathbb{W} =\left \{0,1,2,3,... \right \} \) </td><td height="*"> </td><td height="*">The set of all <b>whole numbers</b>. That is, the set that includes 0 and every other number that can be achieved by successively adding 1 to 0. </td></tr> <tr><td height="*">\(\mathbb{P} =\left \{2,3,5,7,11,... \right \} \) </td><td height="*"> </td><td height="*">The set of all <b>prime numbers</b>. That is, the subset of the natural numbers that have exactly two factors: 1 and themselves. By a well-known theorem, we know that there are infinitely many prime numbers. They are often represented as a sequence of integers in ascending order: \(\mathbb{P}=\{p_{0},p_{1},p_{2},p_{3},...\}\). </td></tr> <tr><td height="*">\(\mathbb{A} \) </td><td height="*"> </td><td height="*">The set of all <b>algebraic numbers</b>. That is, the set of numbers that solve polynomials with integer coefficients. We can also introduce the notation \(\mathbb{A}_{n}\) which is the set of all <b>algebraic numbers of order \(n\)</b>, that is the set of numbers that solve polynomials of order \(n\) with integer coefficients. In other words: \( \mathbb{A}_n=\left\{\begin{matrix} x \\ \; \end{matrix}\right|\left.\begin{matrix} x \in \mathbb{R} \wedge \exists A = \{a_{0},a_{1},a_{2},...a_{n}\} \sim \mathbb{N}_{n+1} \\ \wedge c \in A\Rightarrow c \in \mathbb{Z} \wedge 0=\sum_{k=0}^{n} a_{k} x^{k} \end{matrix}\right\} \). </br>We then take the union: \(\mathbb{A}=\mathbb{A}_{1} \cup \mathbb{A}_{2} \cup \mathbb{A}_{3} \cup ... \). A number in the set \(\mathbb{R} \setminus \mathbb{A}\) is called a <b>transcendental number</b>. </td></tr> <tr><td height="*">\( (a,b) \) </td><td height="*"> </td><td height="*">An <b>ordered pair</b>, that is a pairing of mathematical objects in which the order matters. We can define them in the following way, for technical purposes: \( (a,b) = \left \{ \left \{ a \right \}, \left \{ a,b \right \} \right \} \). </td></tr> <tr><td height="*">\( A \times B \) </td><td height="*"> </td><td height="*">The <b>cartesian product</b> of sets A and B. That is, the set of all ordered pairs \( (a,b) \) such that \( a \in A \) and \( b \in B \). </td></tr> <tr><td height="*">\( \mathcal{P}(A) \) </td><td height="*"> </td><td height="*">The <b>power set</b> of set A. That is, the set of all subsets of /(A/). We can write this as \( \mathcal{P}(A)=\left\{\begin{matrix} X \end{matrix}\right|\left.\begin{matrix} X \subseteq A \end{matrix}\right\} \). </td></tr> <tr><td height="*">\( f: A \rightarrow B \) </td><td height="*"> </td><td height="*">f is a <b>function</b> from A to B. That is, f takes an element of A and returns an element of B. Strictly speaking, a function from A to B is a subset of \( A \times B \) with the properties <ul><li> \( \left\{\begin{matrix} a \end{matrix}\right|\left.\begin{matrix}(a,b)\in f \end{matrix}\right\} = A \) </li> <li> If \((a,b)\in f \) and \((a,c)\in f \), then \(b=c \). </li></ul>A is called the <b>domain</b> of f and B is called the <b>range</b> or <b>codomain</b> of f. Often we will see the notation \(y=f(x)\), which means that f associates y with x, or \((x,y)\in f\). In this case, y is called the <b>image</b> of x under f. More generally, if \(X \subseteq A \), then \(f(X)=Y= \left\{\begin{matrix} y \end{matrix}\right|\left.\begin{matrix} y=f(x) \wedge x \in X \end{matrix}\right\}\). Y is likewise called the image of X under f. Clearly \(Y \subseteq B \). </td></tr> <tr><td height="*">\( f: A \rightarrowtail B \) </td><td height="*"> </td><td height="*">f is an <b>injective</b> function from A to B. That is, the function is <b>one-to-one</b>, or, if \(f(a)=f(b)\) then \(a=b\). Every injective function f has an <b>inverse</b> function, \(f^{-1}: f(A) \rightarrow A \). We can say \(f^{-1}=\left\{\begin{matrix} (b,a) \end{matrix}\right|\left.\begin{matrix} (a,b) \in f \end{matrix}\right\} \). Clearly the inverse of an injective function is injective. We can also see that \(f(f^{-1}(a))=a\), and \(f^{-1}(f(a))=a\). </td></tr> <tr><td height="*">\( f: A \twoheadrightarrow B \) </td><td height="*"> </td><td height="*">f is a <b>surjective</b> function from A to B. That is, the function is <b>onto</b>. In other words, \(f(A)=B\), i.e. \( \left\{\begin{matrix} b \end{matrix}\right|\left.\begin{matrix}(a,b)\in f \end{matrix}\right\} = B \). </td></tr> <tr><td height="*">\( f: A \leftrightarrow B \) </td><td height="*"> </td><td height="*">f is a <b>bijective</b> function from A to B. That is, f is both injective and surjective. The inverse of a bijective function is also bijective. </td></tr> <tr><td height="*">\( A \preceq B \) </td><td height="*"> </td><td height="*">There exists an injective function from \(A\) to \(B\). </td></tr> <tr><td height="*">\( A \sim B \) </td><td height="*"> </td><td height="*">There exists a bijective function from \(A\) to \(B\). </td></tr> <tr><td height="*">\( A \nsim B \) </td><td height="*"> </td><td height="*">There does not exist a bijective function from \(A\) to \(B\). </td></tr> <tr><td height="*">\( A \prec B \) </td><td height="*"> </td><td height="*">There exists an injective function from \(A\) to \(B\), but no bijective function from \(A\) to \(B\). That is, \( A \preceq B \) and \( A \nsim B \). </td></tr> <tr><td height="*">\( \left | A \right | \) </td><td height="*"> </td><td height="*">The <b>cardinality</b> of \(A\). That is, a value associated with a set that can be used to compare sets with respect to some sense of magnitude. </td></tr> <tr><td height="*">\( \left | A \right | \leq \left | B \right |\) </td><td height="*"> </td><td height="*">\( A \preceq B \). We say this as "the cardinality of A is less than or equal to the cardinality of B", though this is only something of a manner of speaking. </td></tr> <tr><td height="*">\( \left | A \right | = \left | B \right |\) </td><td height="*"> </td><td height="*">The cardinality of \(A\) is the same as the cardinality of \(B\), meaning there exists a bijective function from \(A\) to \(B\). In other words, \( A \sim B \). When this is the case, it is often convenient to say "A and B are <b>equinumerous</b>", but this is a technical term, and should not be naively interpreted. </td></tr> <tr><td height="*">\(\aleph_{0}\) </td><td height="*"> </td><td height="*">\(\aleph_{0}=\left | \mathbb{N} \right |\). \(\aleph_{0}\) is the "smallest" infinite cardinal. In other words, if \( \left | A \right |< \aleph_{0}\), then A is a finite set. Also, a set A is said to be <b>countable</b> iff \( \left | A \right | \leq \aleph_{0}\). If \(A \sim \mathbb{N} \), then A is said to be <b>countably infinite</b>. </td></tr> <tr><td height="*">\(\mathbb{S}_{n}\) </td><td height="*"> </td><td height="*">\(\mathbb{S}_{n}=\left | \mathbb{N}_{n}\right |\). A is a finite set iff for some \(n\), \(\left | A \right |= \mathbb{S}_{n} \), where \(n \in \mathbb{N} \). Typically, if \(\left | A \right |= \mathbb{S}_{n} \), where \(n \in \mathbb{N} \), we say "A has n elements", though we will here avoid all notions of the "number of elements". </td></tr> <tr><td height="*">\( \left | A \right |+\left |B \right |\) </td><td height="*"> </td><td height="*">\( \left | A \right |+\left |B \right |=\left |A \cup \tilde{B} \right |\), where \(\tilde{B} \sim B \) and \(A \cap \tilde{B}=\emptyset \). </td></tr> <tr><td height="*">\( \left | A \right |\times\left |B \right |\) </td><td height="*"> </td><td height="*">\( \left | A \right |\times\left |B \right |=\left | A \times B \right |\). </td></tr> <tr><td height="*">\( \max \left (\left | A \right |,\left |B \right | \right)\) </td><td height="*"> </td><td height="*">\( \max \left (\left | A \right |,\left |B \right | \right) = \left\{\begin{matrix} \left | A \right |, \\ \left | B \right |, \end{matrix}\right. \begin{matrix} B \preceq A \\ A \preceq B \end{matrix}\) . </td></tr> <tr><td height="*">\( \min \left (\left | A \right |,\left |B \right | \right)\) </td><td height="*"> </td><td height="*">\( \min \left (\left | A \right |,\left |B \right | \right) = \left\{\begin{matrix} \left | A \right |, \\ \left | B \right |, \end{matrix}\right. \begin{matrix} A \preceq B \\ B \preceq A \end{matrix}\) . </td></tr> </table> <hr><h3> Axioms of Set Theory </h3></br>The most typical axiomatization of set theory is that of Zermelo-Fraenkel with the <b>axiom of choice</b> (ZFC). We will here list these axioms: </br><table cellpadding="5"> <tr><td valign="top" width="20%"><b>Extensionality</b></td><td> </td><td valign="top">Sets A and B are equivalent (symbolized \(A=B\)), iff, for all x, \(x \in A\) iff \(x \in B\). </td></tr> <tr><td valign="top"><b>Regularity</b></td><td> </td><td valign="top">If A is a non-empty set, then, for some element x of A, \(A \cap x=\emptyset\). </td></tr> <tr><td valign="top"><b>Schema of Specification</b></td><td> </td><td valign="top">If, for any x, we can determine whether or not Px is true of x (P being some predicate), then we can always construct the set \( \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} Px \end{matrix}\right\} \). </td></tr> <tr><td valign="top"><b>Pairing</b></td><td> </td><td valign="top">If A and B are sets, then there exists a set C such that \(A \in C\) and \(B \in C\). </td></tr> <tr><td valign="top"><b>Union</b></td><td> </td><td valign="top">If A is a set, then there exists a set B such that, if \(Y \in A\) and \(x \in Y\) then \(x \in B\). More generally, the union of any collection of sets exists. </td></tr> <tr><td valign="top"><b>Schema of Replacement</b></td><td> </td><td valign="top">If \(f\) is a definable function, and A is a set in the domain of \(f\), then, \(f(A)\) is a set. </td></tr> <tr><td valign="top"><b>Infinity</b></td><td> </td><td valign="top">There exists a set X such that \(\emptyset \in X\), and, if \(y \in X\) then \(\left \{y \right \} \in X\). That is, \(X=\left \{\emptyset, \left \{\emptyset\right \}, \left \{\left \{\emptyset\right \}\right \},...\right \} \). In particular, X contains infinitely many elements. </td></tr> <tr><td valign="top"><b>Power Set</b></td><td> </td><td valign="top">If A is a set, then \(\mathcal{P}(A) \) is also a set. </td></tr> <tr><td valign="top"><b>Choice</b></td><td> </td><td valign="top">If \(A_{1}\), \(A_{2}\), \(A_{3}\), ... is a sequence of non-empty sets, then there exists a set \(X=\left \{x_{1},x_{2},x_{3},...\right \}\), such that \(x_{n} \in A_{n}\). </td></tr> </table> <hr></br><h3> Some Elementary Theorems </h3></br> We will now discuss a few theorems. Theorems stated without proof are beyond the level of our discussion or are too involved. The reader is directed to a more technical discussion if more detail is desired. <dl> <dt><u>Theorem</u>: If \(A \subseteq B \) then \(A \preceq B \). </dt><dd><b>Proof</b>: Let \(f(x)=x\). Then \(f\) is injective from A to B. </dd></br> <dt><u>Theorem</u>: If \(f: A \rightarrowtail B\), \(C \subseteq A\), \(D \subseteq A\) and \(C \cap D=\emptyset\), then \(f(C) \cap f(D)=\emptyset\). </dt><dd><b>Proof</b>: Assume \(C \cap D=\emptyset\) and \(f(C) \cap f(D)=F\neq\emptyset\). Let \(\phi \in F\) then, for some \(c \in C\) and some \(d \in D\), \(f(c)=f(d)=\phi\). However, \(f\) is injective, and therefore, if \(f(c)=f(d)\), then \(c=d\). Therefore, \(c \in C \cap D\), and therefore \(C \cap D \neq \emptyset\), which contradicts our assumption. Therefore \(f(C) \cap f(D)=\emptyset\). </br><b>Corollary</b>: If \(f: A \rightarrowtail B\), then \(f(C \cap D)=f(C) \cap f(D)\). </br><b>Corollary</b>: If \(f: A \rightarrowtail B\), then \(f(C \cup D)=f(C) \cup f(D)\). </dd></br> <dt><u>Theorem</u>: \(A \sim A \). </dt><dd><b>Proof</b>: The identity function \(f(x)=x\) is a bijection from A to A. </dd></br> <dt><u>Theorem</u>: If \(A \sim B \) then \(B \sim A \). </dt><dd><b>Proof</b>: Let \(f\) be a bijective function from A to B. As all bijective functions have inverses, \(f^{-1}\) is a bijection from B to A. </dd></br> <dt><u>Theorem</u>: If \(A \sim B \) and \(B \sim C \) then \(A \sim C \). </dt><dd><b>Proof</b>: Let \(f\) be a bijective function from A to B, and \(g\) be a bijective function from B to C. Then the composition \(g \circ f\) is a bijective function from A to C. </dd></br> <dt><u>Theorem</u>: If \(a \notin A \) and \(A \sim \mathbb{N}_{n} \) then \(A \cup \left\{a \right\} \sim \mathbb{N}_{n+1}\). </dt><dd><b>Proof</b>: Let \(f\) be a bijective function from A to \(\mathbb{N}_{n}\). Let \(g(x)=\left\{\begin{matrix} f(x), \;\; \;\; \\ n+1, \;\; \;\; \end{matrix}\right. \begin{matrix} x \in A \\ x=a \end{matrix}\). </br> Then g is bijective from \(A \cup \left\{a \right\}\) to \(\mathbb{N}_{n+1}\). </br><b>Corollary</b>: If \(B \cap A=\emptyset \) and \(B \sim \mathbb{N}_{m} \) and \(A \sim \mathbb{N}_{n} \) then \(A \cup B \sim \mathbb{N}_{n+m}\). </dd></br> <dt><u>Theorem</u>: If \(a \in A \) and \(A \sim \mathbb{N}_{n} \) then \(A \setminus \left\{a \right\} \sim \mathbb{N}_{n-1}\). </dt><dd><b>Proof</b>: Let \(f\) be a bijective function from \(\mathbb{N}_{n}\) to A. Suppose \(a=f(m)\). Let \(g(x)=\left\{\begin{matrix} f(x), \;\; \;\; \\ f(x+1), \;\; \;\; \end{matrix}\right. \begin{matrix} x < m \\ x \geq m \end{matrix}\). </br> Then g is bijective from \(\mathbb{N}_{n-1}\) to \(A \setminus \left\{a \right\}\). </br><b>Corollary</b>: If \(B \subseteq A \) and \(B \sim \mathbb{N}_{m} \) and \(A \sim \mathbb{N}_{n} \) then \(A \setminus B \sim \mathbb{N}_{n-m}\). </dd></br> <dt><u>Theorem</u>: If \(A \sim B \) and \(a \in A \) and \(a \in B \) then \(A \setminus \left\{a \right\} \sim B \setminus \left\{a \right\} \). </dt><dd><b>Proof</b>: Let \(f\) be a bijective function from A to B. Suppose \(f(c)=a\). Define \(g(x)=\left\{\begin{matrix} a, \\ c, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=c \;\;\;\; \;\; \\ x=a \;\;\;\; \;\; \\ x \neq a \wedge x \neq c \end{matrix}\). </br>Clearly g is bijective (in fact, it is its own inverse) and \(g(A)=A\). Also, \((f \circ g)(a)=a\), and \(f \circ g\) is bijective. Therefore, \(f \circ g\) is a bijection from \(A \setminus \left\{a \right\}\) to \(B \setminus \left\{a \right\}\). </br><b>Corollary</b>: If \(A \sim B \), and \(C \sim \mathbb{N}_{n}\) for some \(n \in \mathbb{N}\), where \(C \subseteq A \) and \(C \subseteq B \), then \(A \setminus C \sim B \setminus C \). </br><b>Proof</b>: Apply the theorem above \(n\) times. Note that the set subtracted must be finite. The theorem is inapplicable for infinite sets. In fact, as a counterexample for infinite sets: \(\mathbb{Z} \setminus \mathbb{N} \nsim \mathbb{W} \setminus \mathbb{N}\) even though \(\mathbb{Z} \sim \mathbb{W}\). </dd></br> <dt><u>Theorem</u>: \(\mathbb{N}_{n} \sim \mathbb{N}_{m} \) if and only if \( {n = m} \). </dt><dd><b>Proof</b>: Suppose that \({n \neq m}\) and \(\mathbb{N}_{n} \sim \mathbb{N}_{m} \). Without loss of generality, \(n < m \). By a theorem proved above, \(\mathbb{N}_{n} \setminus \mathbb{N}_{n} \sim \mathbb{N}_{m} \setminus \mathbb{N}_{n} \). But \(\mathbb{N}_{n} \setminus \mathbb{N}_{n} = \emptyset \). However, \(m \in \mathbb{N}_{m} \setminus \mathbb{N}_{n} \), so \(\mathbb{N}_{m} \setminus \mathbb{N}_{n} \neq \emptyset\). Clearly, then, it is impossible that \(\mathbb{N}_{n} \setminus \mathbb{N}_{n} \sim \mathbb{N}_{m} \setminus \mathbb{N}_{n} \), and so our assumption that \({n \neq m}\) must be false. Conversely, if \({n=m} \) then \(\mathbb{N}_{n}= \mathbb{N}_{m} \), and thus \(\mathbb{N}_{n} \sim \mathbb{N}_{m} \). </dd></br> <dt><u>Theorem</u>: For any sets \(A\) and \(B\), there is an injection from \(A\) to \(B\) if and only if there is a surjection from \(B\) to \(A\). </dt><dd><b>Proof</b>: Suppose \(f: A \rightarrowtail B\). Then, let \(a \in A\) be any element in \(A\). Now, define \(g(x)=\left\{\begin{matrix} f^{-1}(x), \;\; \;\; \\ a, \;\; \;\; \end{matrix}\right. \begin{matrix} x \in f(B) \\ x \notin f(B) \end{matrix}\) </br>Clearly \(g\) is surjective from \(B\) to \(A\). </br>Now suppose \(f: A \twoheadrightarrow B\). That means that, for every \(b\in B\) there is at least one \(a \in A\) such that \(f(a)=b\). Now define \(g(x)=a\), where \(a\) is arbitrarily chosen for any \(x\), from among those \(a \in A\) such that \(f(a)=x\). Clearly \(g\) is injective from \(B\) to \(A\). </br><b>Corollary</b>: "There is a surjection from \(A\) to \(B\) " \( \Leftrightarrow B \preceq A\). </dd></br> <dt><u>Theorem</u>: For any sets \(A\) and \(B\), there exists either an injection from \(A\) to \(B\) or a surjection from \(A\) to \(B\). </dt><dd>Proof (Informal): If there is no function from \(A\) to \(B\) that is either an injection or an injection, then, for any function \(f\), there are \(a_1,a_2 \in A\) such that \(a_1 \neq a_2\) and \(f(a_1)=f(a_2)\) and a \(b \in B\) such that \(b \notin f(A)\). But clearly we can construct a new function such that \(f(a_2)=b\). Moreover, we can continue this for an arbitrary set of \(\left\{\begin{matrix} b \end{matrix}\right|\left.\begin{matrix} b \in B \wedge b \notin f(A) \end{matrix}\right\}\) and/or \(\left\{\begin{matrix} a \end{matrix}\right|\left.\begin{matrix} a \in A \wedge \exists a' \in A, a' \neq a, f(a)=f(a') \end{matrix}\right\}\). Thus, we can always exhaust at least one set, as each set has a set of larger cardinality, and we can continue this up to any cardinality. Thus, we can exhaust either the domain (which would make the function injective) or the co-domain (which would make the function surjective). </br><b>Corollary</b>: For any sets A and B, either \( A \preceq B \) or \( B \preceq A \). </br><b>Corollary</b>: For any sets A and B, exactly one of the following is true: <ul><li> \(A \prec B\) </li><li> \(B \prec A\) </li><li> \(A \sim B\) </li></ul></dd></br> <dt><u>Theorem</u>: If \( A \preceq B \) and \( B \preceq A \) then \(A \sim B \). </dt><dd><b>Proof</b>: Let \( f: A \rightarrowtail B \) and \( g: B \rightarrowtail A \) and \(h=g \circ f\). Let us introduce the notation \(h^{k}(x)\) defined by \(h^{0}(x)=x\) and \(h^{n+1}(x)=h(h^{n}(x))\). We now let \( C=A \setminus g(B)\) and \( X=h^{0}(C) \cup h^{1}(C) \cup h^{2}(C) \cup ...\). </br>We then define \(k(x)=\left\{\begin{matrix} f(x), \\ g^{-1}(x), \end{matrix}\right. \begin{matrix} x \in X \;\;\;\; \\ x \notin X \;\;\;\; \end{matrix}\). </br>We can then verify that \(k\) is bijective from \(A\) to \(B\): </br><ul><li>Let \(b \in B\). If \(b \in f(X)\), then, for some \(a \in X \subseteq A\), \(k(a)=b\). Otherwise, we can find an \(a\) such that \(a=g(b)\). By construction, \(a \notin C\), and \(b \notin f(X)\), and since \(f(h^{n}(X)) \subseteq f(X)\), we find \(a \notin X\) and so \(k(a)=b\). Thus, for any \(b \in B\), \(b=k(a)\) for some \(a\in A\). Therefore \(k\) is surjective. </li><li>Assume, for some \(c \in X\) and some \(d \notin X\), \(f(c)=g^{-1}(d)\), or \(d=g(f(c))\). Therefore, for some \(n \in \mathbb{W}\), \(c \in h^{n}(C)\). But then \(d \in g(f(h^{n}(C)))=h^{n+1}(C) \subseteq X\). But that contradicts our assumption that \(d\notin X\), and thus, if \(k(c)=k(d)\), then \(c=d\). Therefore, \(k\) is injective. </li></ul></dd></br> <dt><u>Theorem</u>: For any set A, \(A \prec \mathcal{P}(A) \). </dt><dd><b>Proof</b>: Clearly \(f(x)=\left\{x \right\}\) is an injection from A to \(\mathcal{P}(A)\). Let \(f: A \rightarrow \mathcal{P}(A)\) be any function. Define \(S= \left\{\begin{matrix} x \end{matrix}\right|\left.\begin{matrix} x \notin f(x) \end{matrix}\right\} \). Clearly, then, \(x \in S \Leftrightarrow x \notin f(x)\). Assume that, for some \(k \in A\), \(S=f(k)\). In that case, \(k \in f(k) \Leftrightarrow k \notin f(k)\), which is clearly impossible. Thus, our assumption fails, and there is no k such that \(k \in A\), \(S=f(k)\). Therefore, as \(S \in \mathcal{P}(A)\), \(f\) is not surjective, and thus not bijective. </br><b>Corollary</b>: For any set A, \(A \prec \mathcal{P}(A) \prec \mathcal{P}(\mathcal{P}(A)) \prec ... \). </br><b>Corollary</b>: \(\mathbb{N} \prec \mathcal{P}(\mathbb{N}) \prec \mathcal{P}(\mathcal{P}(\mathbb{N})) \prec ... \). </dd></br> <dt><u>Theorem</u>: \(\mathbb{N} \sim \mathbb{Z}\). </dt><dd><b>Proof</b>: Let \(f(n)=\left\{\begin{matrix} \frac{n-1}{2} \;\; n \textrm{ odd} \\ -\frac{n}{2} \;\; n \textrm{ even} \end{matrix}\right.\). Then \(f\) can be seen to be bijective from \(\mathbb{N}\) to \(\mathbb{Z}\). </dd></br> <dt><u>Theorem</u>: \(\mathbb{N} \sim \mathbb{Q}\). </dt><dd><b>Proof</b>: As \(\mathbb{N} \subset \mathbb{Q}\), we have \(\mathbb{N} \preceq \mathbb{Q}\). Let us now take \(f \left( \frac{a}{b} \right)= 2^{\textrm{sgn}(\frac{a}{b})}3^{a}5^{b}\). Clearly \(f(\mathbb{Q}) \subset \mathbb{N}\), and so \(f\) is an injection from \(\mathbb{Q}\) to \(\mathbb{N}\), and thus \(\mathbb{Q} \preceq \mathbb{N}\). Therefore \(\mathbb{N} \sim \mathbb{Q}\). </dd></br> <dt><u>Theorem</u>: \(\mathbb{N} \sim \mathbb{A}\). </dt><dd><b>Proof</b>: As \(\mathbb{N} \subset \mathbb{A}\), we have \(\mathbb{N} \preceq \mathbb{A}\). Let us now take \(f \left( x \right)\) to be defined as \(f(x)=p_{1}^{k}p_{2}^{a_{0}}p_{3}^{a_{1}}...p_{n+2}^{a_{n}}\) where \(x\) is the kth real solution of \(0=\sum_{k=0}^{n} a_{k} x^{k}\), sorted according to increasing absolute magnitude, and \(p_{k}\) is the kth prime number. Then, \(f(\mathbb{A}) \subset \mathbb{Q} \sim \mathbb{N}\), and \(f\) is injective. Thus \(\mathbb{A} \preceq \mathbb{N}\), and thus \(\mathbb{N} \sim \mathbb{A}\). </dd></br> <dt><u>Theorem</u>: \((0,1) \sim \mathbb{R}\). </dt><dd><b>Proof</b>: The function \(f=\frac{2 x-1}{x-x^{2}}\) is bijective from \((0,1)\) to \(\mathbb{R}\). </dd></br> <dt><u>Theorem</u>: \((0,1) \sim [0,1]\). </dt><dd><b>Proof</b>: Let \(f(x)=\left\{\begin{matrix} \frac{1}{2}, \\ \frac{1}{2^{n-2}}, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=0 \;\;\;\;\;\;\;\;\;\;\;\;\; \\ x=\frac{1}{2^{n}}, n \in \mathbb{W} \\ x \neq 0, x \neq \frac{1}{2^{n}} \;\; \end{matrix}\). Then f is bijective from \([0,1]\) to \((0,1)\). </dd></br> <dt><u>Theorem</u>: \([0,1) \sim [0,1]\). </dt><dd><b>Proof</b>: Let \(f(x)=\left\{\begin{matrix} \frac{1}{2^{n-1}}, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=\frac{1}{2^{n}}, n \in \mathbb{W} \\ x \neq \frac{1}{2^{n}} \;\; \end{matrix}\). Then f is bijective from \([0,1]\) to \([0,1)\). </dd></br> <dt><u>Theorem</u>: \([0,1) \sim (0,1]\). </dt><dd><b>Proof</b>: Let \(f(x)= 1-x \). Then f is bijective from \((0,1]\) to \([0,1)\). </dd></br> <dt><u>Theorem</u>: \((a,b) \sim (0,1)\). </dt><dd><b>Proof</b>: Let \(f(x)= \frac{x-a}{b-a} \). Then f is bijective from \((a,b)\) to \((0,1)\). Relations of the form \([a,b) \sim [0,1) \) can be identically proven. </dd></br> <dt><u>Theorem</u>: \((a,\infty) \sim (0,1)\). </dt><dd><b>Proof</b>: Let \(f(x)=\frac{|x-a|}{|x-a|+1}\). Then f is bijective from \((a,\infty)\) to \((0,1)\). This function can also be used to prove \([a,\infty) \sim [0,1)\), \((-\infty, a) \sim (0,1)\) and \((-\infty, a] \sim [0,1)\). </dd></br> <dt><u>Theorem</u>: If \(a \in (0,1)\), then \([0,a)\cup (a,1]=[0,1]\setminus \left \{a \right \} \sim [0,1]\). </dt><dd><b>Proof</b>: Let \(f(x)=\left\{\begin{matrix} a^{n}, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=a^{n+1}, n \in \mathbb{N} \\ x \neq a^{n+1} \;\;\;\; \;\; \end{matrix}\). </br>Then f is bijective from \([0,1]\setminus \left \{a \right \}\) to \([0,1)\). </dd></br> <dt><u>Theorem</u>: \(\mathbb{R} \sim \mathbb{R} \setminus \mathbb{A}\) </dt><dd><b>Proof</b>: Let \( P= \left\{\begin{matrix} \frac{1}{p} \end{matrix}\right|\left.\begin{matrix} p \in \mathbb{P} \end{matrix}\right\}\). Clearly \(P \sim \mathbb{N}\), as \(P \sim \mathbb{P}\) and \(\mathbb{P} \sim \mathbb{N}\). Moreover, as \(\mathbb{N} \sim \mathbb{A}\), we have \(P \sim \mathbb{A}\), and since \((0,1) \sim \mathbb{R}\), we just need to show that \((0,1) \sim (0,1) \setminus P \). Let </br>\(f(x)=\left\{\begin{matrix} \frac{1}{p_k^{n+1}}, \\ x, \;\; \end{matrix}\right. \begin{matrix} x=\frac{1}{p_k^{n}}, n,k \in \mathbb{N} \\ x \neq \frac{1}{p_k^{n}} \;\;\;\;\;\;\;\;\;\;\; \end{matrix}\). </br>Then \(f\) is bijective from \((0,1)\) to \((0,1) \setminus P \). Therefore, it follows that \(\mathbb{R} \sim \mathbb{R} \setminus \mathbb{A}\). </dd></br> <dt><u>Theorem</u>: \(\mathcal{P}(\mathbb{N}_{n})\sim \mathbb{N}_{2^{n}}\). </dt><dd><b>Proof</b>: We use the fact that every number has a unique binary representation. Let \(A \in \mathcal{P}(\mathbb{N}_{n}) \). We then let \(f(A)=1+\sum_{x \in A} 2^{x-1}\) and \(f(\emptyset)=1\). Then f is bijective from \(\mathcal{P}(\mathbb{N}_{n})\) to \(\mathbb{N}_{2^{n}}\). </dd></br> <dt><u>Theorem</u>: If \(A \sim B\) then \(\mathcal{P}(A) \sim \mathcal{P}(B)\). </dt><dd><b>Proof</b>: Let \(f\) be a bijective function from A to B. Let \(X \in \mathcal{P}(A)\). Then \(Y= f(X)\) will be a bijective function from \(\mathcal{P}(A)\) to \(\mathcal{P}(B)\). However, by <a href="http://math.stackexchange.com/questions/29366/do-sets-whose-power-sets-have-the-same-cardinality-have-the-same-cardinality">Easton's Theorem</a>, the converse is only true if we assume the <b>generalized continuum hypothesis</b>, which is the proposition that, if \(X\) is an infinite set, there is no set \(Y\) such that \(X \prec Y \prec \mathcal{P}(X)\). </dd></br> <dt><u>Theorem</u>: \(\mathbb{N}_{m} \times \mathbb{N}_{n} \sim \mathbb{N}_{mn}\). </dt><dd><b>Proof</b>: Let \((a,b)\in \mathbb{N}_{m} \times \mathbb{N}_{n}\). Let \(f((a,b))= (a-1)n+b\), then f is bijective from \(\mathbb{N}_{m} \times \mathbb{N}_{n}\) to \(\mathbb{N}_{mn}\). </dd></br> <dt><u>Theorem</u>: \(\mathcal{P}(\mathbb{N}) \sim (0,1)\). </dt><dd><b>Proof</b>: We use the fact that every number has a binary representation. However, there are equivalent representations for numbers with finite binary representations. For instance, \(\frac{3}{4}=0.11_{2}=0.101111..._{2}\). Let \(A\in \mathcal{P}(\mathbb{N})\). We the define </br>\(f(A)=\left\{\begin{matrix} \frac{1}{8}+\frac{1}{4}\sum_{k \in A} 2^{-k}, \\ \frac{1}{2}+\frac{1}{4}\sum_{k \in A} 2^{-k}, \end{matrix}\right. \begin{matrix} A \prec \mathbb{N} \\ A \sim \mathbb{N} \end{matrix}\). </br>Then we can see that \(f\) is injective from \(\mathcal{P}(\mathbb{N})\) to \((0,1)\). We then define \(g(x)= \left\{\begin{matrix} n \end{matrix}\right|\left.\begin{matrix} n\in \mathbb{N} \wedge \left \lfloor 2^n x \right \rfloor \equiv 1 \pmod 2 \end{matrix}\right\}\). Then we can see that \(g\) is injective from \((0,1)\) to \(\mathcal{P}(\mathbb{N})\). Thus, as \(\mathcal{P}(\mathbb{N}) \preceq (0,1)\) and \((0,1) \preceq \mathcal{P}(\mathbb{N})\), we find that \((0,1) \sim \mathcal{P}(\mathbb{N})\). </br><b>Corollary</b>: \(\mathcal{P}(\mathbb{N}) \sim \mathbb{R}\). </dd></br> <dt><u>Theorem</u>: \(\mathbb{N} \times \mathbb{N} \sim \mathbb{N}\). </dt><dd><b>Proof</b>: The function \(f(n)=(n,1)\) is injective from \(\mathbb{N}\) to \(\mathbb{N} \times \mathbb{N}\). Therefore \(\mathbb{N} \preceq \mathbb{N} \times \mathbb{N}\). The function \(f((a,b))=2^{a}3^{b}\) is injective from \(\mathbb{N} \times \mathbb{N}\) to \(\mathbb{N}\). Therefore \(\mathbb{N} \times \mathbb{N} \preceq \mathbb{N}\). Therefore \(\mathbb{N} \times \mathbb{N} \sim \mathbb{N}\). </dd></br> <dt><u>Theorem</u>: \([0,1] \times [0,1] \sim [0,1]\). </dt><dd><b>Proof</b>: We use the fact that ever number has a unique binary representation. Let \(a,b \in [0,1]\), and \(a=\sum_{k=1}^{\infty} a_{k} 2^{-k}\) and \(b=\sum_{k=1}^{\infty} b_{k} 2^{-k}\), where \( a_k , b_k \in \left \{ 0,1 \right \} \). Then, let \(f((a,b))=\sum_{k=1}^{\infty} \left ( 2 a_k+b_k \right ) 4^{-k}\) and then we can see that \(f\) is bijective from \([0,1] \times [0,1]\) to \([0,1]\). </br><b>Corollary</b>: \(\mathbb{R} \times \mathbb{R} \sim \mathbb{R}\). </dd></br> <dt><u>Theorem</u>: If \(\mathbb{N} \preceq A\) or \(\mathbb{N} \preceq B\), then \(|A|+|B|=|A|\times|B|=\max(|A|,|B|)\). </dt><dd></dd></br> <dt><u>Theorem</u>: \(\mathbb{S}_{m}+\mathbb{S}_{n}=\mathbb{S}_{m+n} \) and \(\mathbb{S}_{m}\times\mathbb{S}_{n}=\mathbb{S}_{mn} \). </dt><dd></dd></br> <dt><u>Theorem</u>: The following proposition is undecidable (cannot be proved or disproved) in ZFC: "There does not exist a set \(X \subseteq \mathbb{R}\) such that \(\mathbb{N} \prec X \prec \mathbb{R}\)." </dt><dd></dd></br> </dl>Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-34580470000331098712014-05-02T08:19:00.000-07:002019-01-11T11:35:04.025-08:00Syllogisms and Probability<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> A syllogism is a systematic way of laying out a sequence of logical steps. They are very useful in formulating arguments, constructing proofs, or checking the coherence of a certain train of thought. A syllogism is a systematic way of laying out an argument. A syllogism consists of three parts: two <b>premises</b> (which can sometimes be construed as <b>major</b> and <b>minor</b>) and a <b>conclusion</b>. A syllogism is <b>valid</b> if the conclusion follows logically from the premises, whereas is it invalid if it does not. A syllogism is <b>sound</b> if both its premises are true, or at least plausibly true, and the syllogism is valid. Note that a syllogism may be invalid and yet have a true conclusion: it may even have true premises and a true conclusion and still be invalid. <br /><br /> We can also draw the distinction between a <b>propositional</b> and a <b>categorical</b> syllogism. In a propositional syllogism, we deal only with bare propositions connected by logical operators like "or", "and" "implies" etc. In a categorical syllogism, we deal only with statements of inclusion or exclusion in certain categories that are quantified, using such terms as "all" "some" and "no/none". <br /><br />For example:<br />1) If Socrates is a man then Socrates is mortal.<br />2) Socrates is a man.<br />3) Therefore Socrates is mortal. <br /><br />This is an example of a propositional syllogism. We can also write this in a categorical form:<br />1) All men are mortal.<br />2) Socrates is a man<br />3) Therefore Socrates is mortal.<br />We will henceforth focus on the propositional variety.<br /><br /> There are a number of standard propositional syllogisms (also called propositional forms)<br /><ul><li><b>Modus Ponens</b>: If A, then B. A. Therefore B.</li><li><b>Modus Tollens</b>: If A then B. Not B. Therefore, not A.</li><li><b>Hypothetical Syllogism</b>: If A then B. If B then C. Therefore, if A then C.</li><li><b>Disjunctive Syllogism</b>: Either A or B. Not A. Therefore B.</li><li><b>Conjunctive Syllogism</b>: A. B. Therefore A and B.</li></ul><br />We can further pare these down by noting that modus ponens and modus tollens are both actually disjunctive syllogisms when we state them using material conditionals. Remember that, in a material conditional, "if A then B" is equivalent to "Either not A, or B". <br /><br />In a valid argument, if we grant that the premises are true, we must grant that the conclusion is also true. However, we typically are required to evaluate premises of which we are not certain. Namely, the premises have epistemic probabilities that are less than one. In that case, we can use our knowledge of probabilities to give bounds on the probability of the conclusion, and possibly even an estimate of the probability of the conclusion. That is, we can give limits to how plausible we are to take the conclusion to be. We will merely state it as a chart. In each case x is the probability of the first premise (major premise), y is the probability of the second premise (minor premise), and z is the probability of the conclusion. That is, \(P(p1)=x\), \(P(p2)=y\), \(P(c)=z\). <div><br /><table style="width: 800px; border-collapse:collapse; border:1px solid black;"><colgroup><col width="800"></col><col width="800"></col><col width="1200"></col><col width="1200"></col></colgroup><tbody><tr> <th colspan="4" style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><h3>Bounds on Syllogism Probabilities</h3></th></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Name</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Form</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Bounds on Conclusion</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Estimate of Conclusion</b></td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Disjunctive Syllogism</td> <td style="border-collapse:collapse; border:1px solid black;">\(p1)\> A \cup B\)<br /> \(p2)\> \sim A\)<br />\(c)\>\> \therefore B\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(x+y-1 \leq z \leq x\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(z=x+\frac {y-1}{2}\)</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Conjunctive Syllogism</td> <td style="border-collapse:collapse; border:1px solid black;">\(p1)\> A\)<br /> \(p2)\> B\)<br />\(c)\>\> \therefore A \cap B\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(x+y-1 \leq z \leq \min(x,y)\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(z=xy\)</td></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Hypothetical Syllogism</td> <td style="border-collapse:collapse; border:1px solid black;">\(p1)\> A \Rightarrow B\)<br /> \(p2)\> B \Rightarrow C\)<br />\(c)\>\> \therefore A \Rightarrow C\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(x+y-1 \leq z \leq 1\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(z= \frac {x+y}{2}\)</td></tr></tbody></table></div><br /><br />For example, suppose we say "If it is raining, then Bob will bring an umbrella. It is raining. Therefore Bob will bring an umbrella." We are 80% sure that Bob will bring an umbrella if it is raining and 70% sure that it is raining. Thus, the probability that Bob does bring an umbrella is somewhere between 50% and 80%, with an estimate of 56%. That is not very confident. <br />Suppose we have two propositions of which we are 70% certain of each. Then the probability that both are true can be anywhere from 40% to 70% with an estimated value of 49%. Thus it is not enough that each of the premises be more plausible than their opposites: we must demand more in order that an argument be a good one. <br /><br />Sometimes when people say "if A then B", they mean something different than the material conditional ("Either not A, or B"). Frequently, they mean something like "Given A, B will happen/be true". Thus the probability of "if A, then B" wouldn't be \(P(\sim A \cup B)\), but rather \(P(B|A)\). We will here give the table for the conditional probability case. That is, in every case that a conditional statement is given a probability, that probability is of the probability of the consequent given the antecedent. <br /><div><br /><table style="width: 800px; border-collapse:collapse; border:1px solid black;"><colgroup><col width="800"></col><col width="800"></col><col width="1200"></col><col width="1200"></col></colgroup><tbody><tr> <th colspan="4" style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><h3>Bounds on Syllogism Probabilities</h3></th></tr><tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Name</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Form</b></td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle"><b>Bounds on Conclusion</b></td> <td style="border-collapse:collapse; border:1px solid black;"><b>Estimate of Conclusion</b></td></tr> <tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Modus Ponens</td> <td style="border-collapse:collapse; border:1px solid black;">\(p1)\> A \Rightarrow B\)<br /> \(p2)\> A\)<br />\(c)\>\> \therefore B\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(xy \leq z \leq xy+1-y\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(z=xy+\frac {1-y}{2}\)</td></tr> <tr> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">Modus Tollens</td> <td style="border-collapse:collapse; border:1px solid black;">\(p1)\> A \Rightarrow B\)<br /> \(p2)\> \sim B\)<br />\(c)\>\> \therefore \sim A\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(\frac {x+y-1}{x} \leq z < 1\)</td> <td style="border-collapse:collapse; border:1px solid black;" align="center" valign="middle">\(z=\frac {2x+y-1}{2x}\)</td></tr></tbody></table></div><br /> For a hypothetical syllogism with conditional probabilities, we can give no bounds or estimates.Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-52822433817571835092014-04-23T16:28:00.000-07:002014-04-23T18:06:03.919-07:00Rosmarus et Naupegus<head><style>table, th, td { border-collapse:initial; border:initial; } th, td { padding:initial; } </style></head> <br /><u>The Walrus and the Carpenter</u> is another poem from Lewis Carroll's <u>Through the Looking-Glass</u>. It is the semi-nonsensical story of a walrus, a carpenter and some oysters. The poem is one of my favorites, and thus an obvious candidate to be translated into Latin. A literal de-translation is given below the parallel translation.<br /><br /><table border="0" cellpadding="0" cellspacing="10"><tbody><tr><td valign="top" width="50%"><b><u>Rosmarus et Naupegus</u></b><br /><br /> A Luviso Carollo</br> Latinatum a Nadavo Cravito<br /><br />Sol nitebat in mare, <br />Nitens summa cum vire <br />Fluctus conans facere <br />Blandos et clarosque<br />Visu mirabilissime <br />In media nocte<br /><br />Maeste luna nitebat, <br />Putabat nam soli<br />Haud opus esse adesse <br />Post finem quin diei. <br />Dicit "venisse, irruere <br />Improba sunt mihi," <br /><br />Mare umet udissimum, <br />Litus aridissimum<br />Nimbus nullus videatur <br />Abest nam videndum:<br />Super volat avis nulla—<br />Abest nam volandum.<br /><br />Rosmarus et Naupegus <br />Juxtim ambulabant;<br />Moles harenae videre <br />Quam misere flebant<br />"Si modo hoc depurgatum, <br />Quam bonum." dicebant<br /><br />"Si servae septem scopis quot <br />Averrent quot menses,<br />"Num possent" dicit Rosmarus <br />"Purgarene putes?"<br />Naupegus dicit "Dubito," <br />Flens lacrimas tristes. <br /><br />"O Ostreae, sequamini" <br />Obsecrat Rosmarus<br />“Loquamur salsam per actam <br />atque ambulemus:<br />Sed summum quattuor sumamus, <br />Manum cuique demus.”<br /> <br />Ostrea maxima spicit, <br />Nulla verba dicit:<br />Ostrea maxima nictat, <br />Grave caput quatit—<br />Linquere ostrearium <br />Videlicet nolit.<br /><br />Minores quattuor festinant, <br />Muneri stundentes:<br />Aliclae tersae, frontes lautae, <br />Calcei elegantes—<br />Sane mirabilissime, <br />Non pedes habentes.<br /><br />Ostreae quattuor sequuntur, <br />Et quattuor aliae.<br />Veniunt iam et gregatim, <br />Et crescenti multae.<br />Per undas spumas saliunt, <br />Petunt portum harenae.<br /><br />Rosmarus et Naupegus <br />Stadium gradiuntur<br />Tunc commode in scopulo <br />Demisso nituntur<br />Ordineque ostreulae <br />Nunc opperiuntur.<br /><br />"Adest tempus," dicit Rosmarus <br />"Semonis de multis:<br />Naves, calcei, cera sigilli, <br />Et reges et caulis<br />Ac porci num alarentur, <br />Causa fervoris maris."<br /><br />“Mora sodes,” testae clamant <br />“Antequam loquaris;<br />Exanimatae et pingues <br />Sunt namque e nobis!”<br />“Festina lente!” Naupegus <br />Dicit valde gratis.<br /><br />“Massae panis” dicit Rosmarus <br />“Nobis opus maximum”<br />“Enimvero optimi sunt <br />Piper et acetum—<br />Ostreulae, si paratae, <br />Inchoamus comesum.”<br /><br />Ostreae “Atqui non nostrum!” <br />Clamant livescentes.<br />“Factum atrum sit nobis <br />Post gratias tales!”<br />Rosmarus dicit “Bella nox, <br />Miramini species?”<br /><br />“Quam comes sunt quod venisitis, <br />Et estis quam belli!”<br />Tacebat Naupegus nisi <br />“Seca frustum mihi:<br />Opto ut minus surdus sis—<br />Iam bis te poposci!”<br /><br />“Turpe eas dolo capere <br />Est” dicit Rosmarus<br />“Post tam longe eduximus <br />Tam cursim egimus!”<br />Tacebat Naupegus nisi <br />“Butyrum crassius!”<br /><br />“Vos lacrimo” ait Rosmarus <br />“Vostrum me miseret.” <br />Singultibus et lacrimis <br />Maximas diribet<br />Mucinium ante oculos <br />Fundentes praetendet.<br /><br />“Ostreae,” dicit Naupegus, <br />“Bene cucurrerunt! <br />Debemus domum redire?” <br />At voces nullae reddunt—<br />Et haud mirabilissime, <br />Quod quamque ederunt.<br /></td> <td valign="top" width="50%"><b><u>The Walrus and the Carpenter</u></b><br /><br /> By Lewis Carroll<br /><br /><br />The sun was shining on the sea,<br />Shining with all his might:<br />He did his very best to make<br />The billows smooth and bright--<br />And this was odd, because it was<br />The middle of the night.<br /><br />The moon was shining sulkily,<br />Because she thought the sun<br />Had got no business to be there<br />After the day was done--<br />"It's very rude of him," she said,<br />"To come and spoil the fun!"<br /><br />The sea was wet as wet could be,<br />The sands were dry as dry.<br />You could not see a cloud, because<br />No cloud was in the sky:<br />No birds were flying overhead--<br />There were no birds to fly.<br /><br />The Walrus and the Carpenter<br />Were walking close at hand;<br />They wept like anything to see<br />Such quantities of sand:<br />"If this were only cleared away,"<br />They said, "it would be grand!"<br /><br />"If seven maids with seven mops<br />Swept it for half a year.<br />Do you suppose," the Walrus said,<br />"That they could get it clear?"<br />"I doubt it," said the Carpenter,<br />And shed a bitter tear.<br /><br />"O Oysters, come and walk with us!"<br />The Walrus did beseech.<br />"A pleasant walk, a pleasant talk,<br />Along the briny beach:<br />We cannot do with more than four,<br />To give a hand to each."<br /><br />The eldest Oyster looked at him,<br />But never a word he said:<br />The eldest Oyster winked his eye,<br />And shook his heavy head--<br />Meaning to say he did not choose<br />To leave the oyster-bed.<br /><br />But four young Oysters hurried up,<br />All eager for the treat:<br />Their coats were brushed, their faces washed,<br />Their shoes were clean and neat--<br />And this was odd, because, you know,<br />They hadn't any feet.<br /><br />Four other Oysters followed them,<br />And yet another four;<br />And thick and fast they came at last,<br />And more, and more, and more--<br />All hopping through the frothy waves,<br />And scrambling to the shore.<br /><br />The Walrus and the Carpenter<br />Walked on a mile or so,<br />And then they rested on a rock<br />Conveniently low:<br />And all the little Oysters stood<br />And waited in a row.<br /><br />"The time has come," the Walrus said,<br />"To talk of many things:<br />Of shoes--and ships--and sealing-wax--<br />Of cabbages--and kings--<br />And why the sea is boiling hot--<br />And whether pigs have wings."<br /><br />"But wait a bit," the Oysters cried,<br />"Before we have our chat;<br />For some of us are out of breath,<br />And all of us are fat!"<br />"No hurry!" said the Carpenter.<br />They thanked him much for that.<br /><br />"A loaf of bread," the Walrus said,<br />"Is what we chiefly need:<br />Pepper and vinegar besides<br />Are very good indeed--<br />Now if you're ready, Oysters dear,<br />We can begin to feed."<br /><br />"But not on us!" the Oysters cried,<br />Turning a little blue.<br />"After such kindness, that would be<br />A dismal thing to do!"<br />"The night is fine," the Walrus said.<br />"Do you admire the view?<br /><br />"It was so kind of you to come!<br />And you are very nice!"<br />The Carpenter said nothing but<br />"Cut us another slice:<br />I wish you were not quite so deaf--<br />I've had to ask you twice!"<br /><br />"It seems a shame," the Walrus said,<br />"To play them such a trick,<br />After we've brought them out so far,<br />And made them trot so quick!"<br />The Carpenter said nothing but<br />"The butter's spread too thick!"<br /><br />"I weep for you," the Walrus said:<br />"I deeply sympathize."<br />With sobs and tears he sorted out<br />Those of the largest size,<br />Holding his pocket-handkerchief<br />Before his streaming eyes.<br /><br />"O Oysters," said the Carpenter,<br />"You've had a pleasant run!<br />Shall we be trotting home again?'<br />But answer came there none--<br />And this was scarcely odd, because<br />They'd eaten every one. </td></tr></tbody></table><hr><br /><b><u>The Walrus and the Carpenter (De-translated)</u></b><br /><br />The sun was shining on the sea<br />shining with [his] greatest might. <br />Trying to make the billows<br />Both smooth and bright<br />[A sight] most astonishing to see<br />in the middle of the night<br /><br />The moon was shining gloomily<br />For she thought, for the sun<br />it was not [his] business to be present<br />Indeed, after the end of the day.<br />She says “To have come and to intrude<br />are rude [acts] to me.”<br /><br />The very wet sea is wet<br />And the shore is most dry<br />No cloud might be seen<br />For anything to be seen is absent<br />No bird flies above—<br />For anything to be flown is absent.<br /><br />The Walrus and the [ship-wright] Carpenter<br />Were walking together<br />To see such masses of sand,<br />how wretchedly they wept.<br />“If only this might be cleaned away<br />How good [it would be]” they said.<br /><br />“If seven maids with as many brooms<br />were to sweep for as many months,<br />don’t you think” says the Walrus<br />“they couldn’t clean [it]?”<br />The Carpenter says “I doubt [it]”,<br />Weeping sad tears.<br /><br />“O oysters, let you follow [us]”<br />Entreats the Walrus<br />“Let us talk along the briny beach<br />and also let us walk:<br />But we might only select at most four,<br />that we might give a hand to each.”<br /><br />The oldest oyster looks<br />[and] says no words:<br />The oldest oyster winks,<br />[and] gravely shakes [her] head—<br />To leave the oyster-bed<br />evidently, she would be unwilling.<br /><br />Four younger [oysters] hurry,<br />eager for [the] gift<br />[Their] child’s-cloaks [were] wiped, [their] faces washed<br />[And their] shoes were handsome—<br />Certainly most astonishingly,<br />[as they] had no feet.<br /><br />Four [more] oysters follow,<br />and four others.<br />Now they come even in flocks<br />and to an increasing[ly] many [oysters].<br />Through waves [and] foam they leap<br />Seeking the refuge of the sand.<br /><br />The Walrus and the Carpenter<br />walk for a furlong<br />Then lean upon a rock <br />conveniently low.<br />And in a line, the little oysters<br />now wait.<br /><br />“The time has arrived” says the Walrus<br />“of speaking concerning many things:<br />Ships, shoes, wax of a seal,<br />both kings and cabbage<br />And whether pigs be winged,<br />and the cause of the boiling of the sea.”<br /><br />“A pause, please” the shellfish cry<br />Before you might speak;<br />For exhausted and fat<br />are there among us!”<br />“Hasten slowly!” said the Carpenter<br />To the greatly thankful [oysters].<br /><br />“A lump of bread,” the Walrus says<br />“is our greatest need:<br />Certainly, pepper and vinegar<br />are also very good—<br />Little oysters, if [you are] ready,<br />we begin the eating.”<br /><br />The oysters cry “But not of us!”<br />becoming blue.<br />“[That] would be a terrible deed,<br />after so great kindnesses!”<br />The Walrus says “The night [is] pretty,<br />Do you admire the sights?”<br /><br />How kind you are that you came<br />And you are so charming!”<br />The Carpenter was silent except [for saying]<br />“Cut a scrap for me:<br />I wish that you might be less deaf—<br />I have already asked you twice!”<br /><br />“A disgraceful [thing] it is <br />to catch them with a trick” says the Walrus<br />“After we have led them out so far<br />and drove them so swiftly.”<br />The Carpenter was silent except [for saying]<br />“The butter is too thick!”<br /><br />“I weep for you” says the Walrus<br />“I feel sorry for you.”<br />With sobbing and with tears<br />he sorted out the biggest ones.<br />He extended a handkerchief <br />before his pouring eyes.<br /><br />“Oysters,” says the Carpenter<br />“You have run well!<br />Ought we return home?”<br />But no voices return—<br />And hardly is it most astonishing<br />because they had eaten every one.<br />Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-74920194366249403502014-04-14T08:01:00.003-07:002015-07-02T22:22:17.063-07:00Probability<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script> Probability is a concept that is intuitively fairly easy to understand, yet difficult to give a comprehensive, universally acceptable interpretation. In general, probabilities are given with respect to events or propositions and give a way of quantitatively answering such questions as “How certain are we that X will occur/is true?”. Probabilities range from 0, (<a href="#almost">almost*</a>) certain not to happen/be true, to 1, (<a href="#almost">almost*</a>) certain to happen/be true. <br /><br />There is division as to whether probabilities are objective facts or only subjective. Some say the probability of an event is a measure of the propensity of a certain situation to yield a certain outcome, while others say that the probability of an event is the relative frequency of that event in the limit of a large number of relevantly identical cases, or <b>trials</b>. Those who say it is subjective give, for instance, the conception that the probability of an event can be defined as “the price (in arbitrary units) at which you would buy or sell a bet that paid 1 unit if the event occurred and 0 if it did not occur”. <br /><br />One way to circumvent all of these is to leave probability somewhat vague and give it a thorough mathematical basis. This can readily be done. We will deal with the probability of some event which will occur as a result of some experiment in individual trials. What is needed for a probabilistic model are two things: <br /><ul> <li>The <b>sample space</b> \(\Omega\): the set of all possible outcomes of the experiment.</li> <li>The <b>probability law</b> \(P\). This is a function that takes a subset of the sample space and returns a real number. This law, to qualify as a proper probability law, must satisfy three conditions. Let \(A\) be some subset of \(\Omega\). <ol> <li><b>Non-negativity</b>: For any set \(A\) subset of \(\Omega\), \(P(A) \geq 0\).</li> <li><b>Countable Additivity</b>: Let \(A_{1}, A_{2}, ...\) be a countable sequence of mutually disjoint sets (that is, no element in one set is in any other set), each a subset of \(\Omega\). Then \(P(A_{1} \cup A_{2} \cup ...)=P(A_{1})+P(A_{2})+...\).</li> <li><b>Normalization</b>: the probability of some event in the space is unity, that is \(P(\Omega)=1\).</li></ol></li></ul>If the model satisfies these conditions, it is at least admissible, though typically we have other considerations that help us choose a model, such as simplicity. These conditions imply that the empty set, that is, the set containing no elements, has probability zero. <br /><br />Very typical in probability theory is the use of set-theoretic or logical-operator notation. While notation varies, the fundamental concepts remain consistent. When we want the probability that events \(A\) and \(B\) will both happen (e.g. a die lands on an even number and on a number above three), we ask for the probability of their conjunction, represented as \(P(A \cap B)\) or \(P(A \& B)\) or \(P(A \wedge B)\). When we want the probability that at least one event of the events \(A\) and \(B\) will happen (e.g. a die lands on an even number or on a number above three, or both) we ask for the probability of their disjunction, represented as \(P(A \cup B)\) or \(P(A \vee B)\). When we want the probability that some event will not happen (e.g. a die does not land on an even number), we ask for the probability of the complement of the event, represented \(P(\sim A)\) or \(P(\bar{A})\) or \(P(A^{c})\) or \(P(\neg A)\). The empty set is symbolized as \(\varnothing\) and represents the set with no elements. Thus, taking the union of \(\varnothing\) with any other set gives the latter set, and taking the intersection yields the empty set. In addition, we can say \(\varnothing=\sim \Omega \) and \(\sim \varnothing= \Omega \). Lastly, a partition of set \(C\) is a countable sequence of sets such that no two sets in the partition share an element (the sets are mutually exclusive) and every element in \(C\) is in some set (the collection is collectively exhaustive). <br /><br />Also important in the study of probability is the concept of <b>conditional probability</b>. Thus is the measure of probability based on some information: assuming that something is the case, what is the chance that some event will occur? For instance, we could ask what the chance is that a die landed six, given that it landed on an even number. While a more thorough discussion of conditional probability can be found elsewhere, we will here merely give the formula. \(P(A|B)\), the probability that \(A\) will occur given \(B\) (read “the probability of \(A\) given \(B\)” or “the probability of \(A\) on \(B\)”), is given by the expression \[ P(A|B)=\frac{P(A \cap B)}{P(B)}\] whenever \(P(B) \neq 0\). Sometimes it is possible to assign a meaningful value to \(P(A|B)\) when \(P(B)=0\). For instance, suppose we ask “what is the probability that a homogeneous, spherical marble, when rolled, will land on point A, given that it landed either on point A or point B?” The answer then seems clearly to be 0.5. A good interpretation of the conditional is a change in the sample space: when we condition on \(B\), we are changing the sample space from \(\Omega\) to \(B\). We find that all the axioms and theorems are consistent with this view. We can also mention here the notion of <b>independence</b>. Two events \(A\) and \(B\) are independent iff \(P(A \cap B)=P(A) P(B)\). This implies that \(P(A|B)=P(A)\) and \(P(B|A)=P(B)\). This means that, given the one, we gain no information about the other: it remains just as probable. <br /><br />While the probability of events is relatively easy to understand, the probability of propositions is not as easy, as propositions can have only two values: true and false. How is it that we can say “The probability that you are female is 51%” when you are either definitely male or definitely female? This is where the notion of <b>epistemic probability</b> comes into play. Epistemic probability has to do with how likely something seems to us, or some other (rational) person, given some set of background information. For instance, in some murder, given that we see Joe’s fingerprints on the murder weapon, we deem it likely that Joe committed the murder. Though it is very difficult to give a good account, a rough way to quantify it would be in the following sense: <br /><blockquote>\(X\) has epistemic probability \(p\) given background information \(B\) (i.e. \(P(X|B)=p\)) iff the following is true: supposing we encountered this set of information \(B\) in many scenarios, we would expect \(X\) to be true in fraction \(p\) of those scenarios. </blockquote>Again, this may not be a perfect analysis, but it does give a rough way to understand it. However, we must note that epistemic probability is of a significantly inferior sort than, say, experimental probability (observing that \(X\) happens in fraction p of experimental cases), or even a good theoretical probability (theory predicts that a homogeneous cube of material will, when haphazardly tossed, land on any given face with equal probability). There is a principle called the <b>principle of indifference</b> that says one should assign equal epistemic probabilities to two events or propositions when we have no justification to prefer one to the other. That may be a good principle as far as epistemic probability goes, but it is very deeply restricted by background information (clearly: lacking any background information to prefer one possibility to another, we are to assign them equal probabilities), and at least somewhat subjective. It is thus greatly limited by what we know: in fact, what we think is a possibility, based on our background information, may not be a possibility at all (it could be what is called an epistemic possibility). Thus, while epistemic probability may be the best we can do, given our background information, it may not be very good at all. <br/><br/><b>Statistical probability</b> is of the epistemic sort: suppose that fraction p of population S has property X. We then come across a member M of S. Suppose we have no way to tell immediately whether M has property X, but we know M comes from S. We therefore say that M has property X with (epistemic) probability p. This is a statistical probability: based on facts about the population, we deduce a probability as regards a given individual, even though, if we had more information, we could say that M had X with probability either zero or one. This is to be contrasted with what we might call <b>stochastic probability</b>. If we have a perfect coin, and flip it fairly, before we do so, there is no information anywhere, even possibly, as to what its outcome will be. We don't know what will happen when we flip it, not because we aren't privy to some information, but because there is no information to be had. This will be the case with any genuinely indeterministic event. We might demonstrate the difference between statistical and stochastic probabilities as between a coin that was flipped but is hidden from view and a coin yet to be flipped, respectively. Most physicists believe many quantum processes are genuinely stochastic, and some philosophers believe free will is also stochastic in some sense ("You will probably choose X" does not mean that based on what I know now, there is a pretty high epistemic probability that you will choose X, but if I knew more, I would be able to predict with certainty whether you will choose X or not (e.g. you chose X most of the time when you are in certain circumstances). Instead, it is that you are more disposed to choose X). <br /><br /><hr /><br />We will here give a few theorems of probability theory. We will try to present them such that their derivation is clear, but if not, then any introductory text on probability theory can give a more thorough exposition. \(A\) and \(B\) are some subsets of \(\Omega\):<br />\[P(\Omega)=1;\;\;\;\ P(\varnothing)=0\] \[P(A \cap \Omega)=P(A);\;\;\;\ P(A \cap \varnothing)=P(\varnothing)=0\] \[P(A \cup \Omega)=P(\Omega)=1;\;\;\;\ P(A \cup \varnothing)=P(A)\] \[0 \leq P(A) \leq 1\] \[0 \leq P(A|B) \leq 1\] \[P(A \cup \sim A)=P(A)+P(\sim A)=P(\Omega)=1\] \[P(A \cup B)+P(A \cap B)=P(A)+P(B)\] \[P(A \cap B)\leq \min(P(A),P(B))\] \[P(A\cup B) \leq P(A)+P(B)\] \[P(A \cup B) \geq \max(P(A),P(B))\] \[P(A \cap B)+P(A \cap \sim B)=P(A)\] \[P(A \cap B)=P(A)P(B|A)\] \[P(A|B)=\frac{P(B|A)P(A)}{P(B)}\] \[\frac{P(A|B)}{P(A)}=\frac{P(B|A)}{P(B)}\] <br /><hr /><br />Let \(B_{1},B_{2},…\) be a partition of \(C\), then: \[P(A \cap B_{1})+P(A \cap B_{2})+…=P(A \cap C) \] \[P(A \cap C)=P(A|B_{1})P(B_{1})+ P(A|B_{2})P(B_{2})+…\] \[P(C|A)=P(B_{1}|A)+ P(B_{2}|A)+…\] Particularly, \[1=P(B|A)+P(\sim B|A)\] <br /><hr /><h4>De Morgan’s Laws</h4><div><br /></div>\[P(\sim A \cap \sim B)+P(A \cup B)=1\]\[P(\sim A \cup \sim B)+P(A \cap B)=1\] <br /><hr /><br /><h4>Bayes’ theorem</h4><div><br /></div>Let \(H_{1}, H_{2}, …\) be a partition of \(\Omega\). Then: \[P(H_{m}|E)=\frac{P(H_{m}) P(E|H_{m})}{P(H_{1})P(E|H_{1})+ P(H_{2})P(E|H_{2})+…}\] This is typically applied to choosing a hypothesis to explain a certain fact, or given a certain set of evidence. \(P(E|H_{m})\) is the (epistemic) probability that we would get evidence \(E\) supposing hypothesis \(H_{m}\) is true, and \(P(H_{m}|E)\) is the (epistemic) probability that hypothesis \(H_{m}\) is true, given evidence \(E\). Thus, hypothesis \(H_{m}\) becomes more likely on evidence \(E\) the more probable it is without the evidence, the more likely the evidence would be on that hypothesis, the less likely the evidence would be on alternate hypotheses, and the less likely the alternate hypotheses are without the evidence. <br /><br /><hr /><br /><h4>Probability of a Union</h4><div><br /></div>We can here give a useful formula for determining the probability of the union of events, which we can deduce from DeMorgan’s laws: suppose we want to find the probability of the union of some events \(Q=P(A_{1} \cup A_{2} \cup …)\). <br />We take the product \(Q'=1-\prod_{n}(1-A_{n})\) <br />We then replace every occurrence of \(A_{m_{1}}A_{m_{2}}…\) with \(P(A_{m_{1}} \cap A_{m_{2}}…) \). For instance, to find \(P(A \cup B \cup C) \), we take \(1-(1-A)(1-B)(1-C)=A+B+C-AB-AC-BC+ABC\) <br />We then make replacements as described to get \[P(A \cup B \cup C)=P(A)+P(B)+P(C)-P(A \cap B) -P(A \cap C) -P(B \cap C)+ P(A \cap B \cap C) \] If the events are all independent, we can simplify the formula to: \[Q=1-\prod \nolimits_{n}(1-P(A_{n}))\] If \(P(A_{m})=p\) for all m, we can further simplify: \[Q=1-(1-p)^{N}\] Where \(N\) is the number of events. For p fairly small, we can approximate this as \[Q \approx 1-e^{-pN}\] And from this we can rearrange to get \[N \approx \frac{-\ln(1-Q)}{p}\] This gives the number of independent trials necessary to get a probability Q of at least one success, if the probability of success in each trial is p. <br />As an application, we can ask what is the probability that an event will happen on a given day if it has a 50% probability of happening in a year? In this case, we want to solve for \(p\) given \(Q=0.5\) and \(N=365\). We find that \(p \approx \frac{-\ln(1-Q)}{N}=0.19\%\). <br /><br />Using this, we can also show that improbable events are likely in a collection of many trials. Suppose we have \(N\) trials, in each of which X happens with probability \(p\). We then have the probability that X never happens as given by \((1-p)^{N} \approx e^{-pN}\). We thus see that, as N increases, the probability of no X occurring tends to zero; in fact, it tends to zero exponentially. Thus, given enough trials we would expect to see the individually improbable: long strings of all heads while flipping a coin, the same person winning the lottery multiple times, someone has two unrelated rare diseases, etc. Coincidences will always crop up given enough opportunities. These coincidences combined with <b>confirmation bias</b>--remembering the hits and forgetting the misses--result in muddled thinking. A coincidence happens and it is interpreted as a sign from on high, even though they ignored the hundreds of other times no coincidence happened. It is important to remember that coincidences are basically inevitable in large enough samples: if something has a one in a billion chance of happening to any given person on any given day, we can expect it will happen seven times per day worldwide. <!-- principle of multiplicity??? --><br /><br /><hr /><br /><h4>Implication and Conditional Probability</h4><div><br /></div>We can also prove the following interesting theorem. Note that “if \(A\), then \(B\)” or “\(A\rightarrow B\)” is logically equivalent to “\( \sim A \) or \(B\)” or “\( \sim A \cup B\)”. Thus \(P(A \rightarrow B)=P(\sim A \cup B)\). We then have <br />\(1.\;\; 1 \geq P(A)\) <br />\(2.\;\; P(\sim B|A) \geq P(\sim B|A)P(A)=P(\sim B \cap A)\) <br />\(3.\;\; 1-P(\sim B|A) \leq 1-P(A \cap \sim B)\) <br />\(4.\;\; P(B|A) \leq P(\sim A \cup B)=P(A \rightarrow B)\) <br />That is, the probability of “if \(A\) then \(B\)” is not less than that of “\(B\), given \(A\)”. <br /><br />We can also note that, as \(P(A)=P(A|B)P(B)+P(A|\sim B)(1-P(B))\), then \[\min(P(A|B),P(A|\sim B)) \leq P(A) \leq \max(P(A|B),P(A|\sim B))\] <br /><hr /><br /><h4>Conditional Changes in Probability and How it Relates to Evidence</h4><div><br /></div>We can demonstrate the following: <br />Suppose \(P(A|B)>P(A)\). Then \(P(A \cap B)>P(A)P(B)\) and \(P(B|A)>P(B)\). In fact, all three are equivalent. In that case: <br />\( 1.\;\; P(A \cap B)> P(A)P(B) \) <br />\( 2.\;\; P(A) - P(A \cap B)< P(A)- P(A)P(B)=P(A)(1-P(B)) \) <br />\( 3.\;\; P(A \cap \sim B) < P(A)P(\sim B) \) <br />\( 4.\;\; P(A | \sim B) < P(A) \) <br />We can easily prove the greater-than case in the same way. <br />In English: "if A is more probable on B, B is more probable on A" and "if A is more probable on B, A is less probable on not-B". <br /><br />An important consequent of this theorem is in discerning what counts as evidence. In a loose sense, we can say that \(A\) provides evidence for \(B\) if \(P(B|A)>P(B)\). We thus see that a necessary and sufficient condition for \(A\) providing evidence for \(B\) is that \(\sim A\) would need to provide evidence <u>against</u> \(B\). Thus, if we do some experiment to test a claim, we must be willing to accept failure as evidence against the claim if we would be willing to accept success as evidence for the claim, and vice versa. We must be willing to accept the possibility of weakening the claim if we are willing to accept the possibility of strengthening it by some test. It is often said that "absence of evidence is not evidence of absence", but this needs some qualification. Suppose we want to test the claim that there is life on Mars. We then do some test, like looking at a sample of martian soil under a microscope, and it comes up negative: is that evidence against life on Mars? Certainly, albeit very weak evidence. If we had found microbes in that sample, we would certainly have said that was evidence for life on Mars, therefore we must necessarily admit that the lack of microbes is evidence against life on Mars. It may only reduce the (epistemic) probability that there is life on Mars by something like a millionth of a percent, but if we do a million tests, that amounts to about a whole percent. If we do a hundred million tests, that amounts to over 60%.<br /><br />In short, <b>absence of evidence does count as evidence of absence in any and every instance where a presence of evidence would count as evidence of presence</b>. <br /><br /><hr /><br /><h4>"Extraordinary Claims Require Extraordinary Evidence"</h4><br />This phrase is declared nearly as often as it is denounced. However, it is clearly not specific enough to be definitively evaluated. One way of interpreting it is to say "Initially improbable hypotheses require improbable evidence to make them probable". This formulation is relatively easy to demonstrate as being true: \[ P(E)=\frac{P(H \cap E)}{P(H|E)} \leq \frac{P(H)}{P(H|E)} \] For example, if \(P(H)=1 \%\) and \(P(H|E)=75 \%\) then \(P(E) \leq 1.33 \%\). <br />If \(P(H|E) \geq 0.5\), then \(P(E) \leq 2 P(H)\). Thus, it is clear that the evidence required to make an initially improbable hypothesis probable must be comparably improbable. <br /><br /><br /><hr /><br /><h4>Inscrutable Probabilities, Meta-probabilities and Widening Epistemic Probability</h4><br />Sometimes, in cases of certain probabilities, we cannot estimate the probabilities, either at all, or to an adequate degree. We call such probabilities <b>inscrutable</b>. For all we know, these probabilities could have any value. We can use the concept of inscrutable probabilities to improve the descriptive accuracy of our epistemic probability judgements. For instance, suppose we have a die, and we are \(90\%\) sure that it is fair. We then want to find the probability that a six will be rolled. We make use of the formula: \[P(A)=P(A|B)P(B)+P(A|\sim B)P(\sim B)\] In this case, A is the event "a six is rolled" and B is the event "the die is fair". In this case, \(P(A|\sim B)\) is inscrutable: given that the dies is not fair, we cannot predict what the outcome will be. However, we do know that this probability, like all probabilities, is between zero and 1. Thus: \[P(A|B)P(B) \le P(A) \le P(A|B)P(B)+P(\sim B)\] In this case, we find \[P(6|\text{fair})P(\text{fair}) \le P(6) \le P(6|\text{fair})P(\text{fair})+P(\sim \text{fair})\] \[\frac{1}{6} \cdot 0.9 \le P(6) \le \frac{1}{6} \cdot 0.9 + 0.1\] \[0.15 \le P(6) \le 0.25\] Here we may introduce the concept of <b>meta-probabilities</b>. These take the form of the probability that something is true about a probability, for instance \(P(P(X) \ge \alpha)\) is the probability that the probability of X is not less than \(\alpha\). Returning to our example, suppose we are only \(80\%\) confident that the probability that the die is fair is \(90\%\). Applying the above formula: \[P(\text{fair}|P(\text{fair})=0.9)P(P(\text{fair})=0.9) \le P(\text{fair}) \le P(\text{fair}|P(\text{fair})=0.9)P(P(\text{fair})=0.9)+P(P(\text{fair}) \neq 0.9)\] \[0.9 \cdot 0.8 \le P(\text{fair}) \le 0.9 \cdot 0.8+0.2\] \[0.72 \le P(\text{fair}) \le 0.92\] This then implies \(0.08 \le P(\sim \text{fair}) \le 0.28\). <br />Returning to our former equation, we then have: \[P(6|\text{fair})P(\text{fair}) \le P(6) \le P(6|\text{fair})P(\text{fair})+P(\sim \text{fair})\] \[\frac{1}{6} \cdot 0.72 \le P(6) \le \frac{1}{6} \cdot 0.92 + 0.28\] \[0.12 \le P(6) \le 0.433...\] We thus see that adding in our meta-probabilistic uncertainty in our estimate for \(P(\text{fair})\) has further widened our uncertainty in the likelihood of rolling a six. This highlights the importance of both accounting for and minimizing any potential sources of uncertainty. We must factor in our confidence in a model in assessing the results it predicts to be likely or unlikely, if we are to use that model to form our epistemic probabilities of the predicted results. <br /><br /><hr /><br /><a id="almost">*</a> \(P(X)=0\) does not mean that X can and will never happen. If you roll a marble, the chance of it landing on any given point is zero (ideally), and yet you will have it land on some point. What \(P(X)=0\) means is specific: it means that the measure of the space in which X holds is zero relative to the measure of \(\Omega\). There may still be a possibility that X happens, just that the region in which X happens is of zero "area", compared to \(\Omega\) (e.g. it is a point, and \(\Omega\) is a line segment). If \(P(X)=0\) we say that X will <a href="http://en.wikipedia.org/wiki/Almost_surely">almost surely</a> not happen, as opposed to \(\varnothing\), which will surely not happen.Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0tag:blogger.com,1999:blog-6860306700804724173.post-89025254834906869762014-03-18T23:14:00.001-07:002014-04-25T08:08:50.462-07:00Language as a Tool for Contemplation and Communication<div><br /><table style="width: 400px; border-collapse:collapse; border:1px solid black;"><colgroup><col width="50"></col><col width="350"></col></colgroup><tbody><tr> <th colspan="2" style="border-collapse:collapse; border:1px solid black;"><h3>Table of Contents</h3></th></tr><tr> <td style="border-collapse:collapse; border:1px solid black;">I</td> <td style="border-collapse:collapse; border:1px solid black;"><a href="#sI" style="display: block; height: 100%; text-decoration: none; width: 100%;"> Language Development and Conventions</a></td> </tr><tr> <td style="border-collapse:collapse; border:1px solid black;">II</td> <td style="border-collapse:collapse; border:1px solid black;"><a href="#sII" style="display: block; height: 100%; text-decoration: none; width: 100%;"> Communication</a></td> </tr><tr> <td style="border-collapse:collapse; border:1px solid black;">III</td> <td style="border-collapse:collapse; border:1px solid black;"><a href="#sIII" style="display: block; height: 100%; text-decoration: none; width: 100%;"> Errors and Obstacles to Communication</a></td> </tr><tr> <td style="border-collapse:collapse; border:1px solid black;">IV</td> <td style="border-collapse:collapse; border:1px solid black;"><a href="#sIV" style="display: block; height: 100%; text-decoration: none; width: 100%;"> Priming</a></td> </tr><tr> <td style="border-collapse:collapse; border:1px solid black;">V</td> <td style="border-collapse:collapse; border:1px solid black;"><a href="#sV" style="display: block; height: 100%; text-decoration: none; width: 100%;"> Problems with Coreferential Communication</a></td> </tr><tr> <td style="border-collapse:collapse; border:1px solid black;">VI</td> <td style="border-collapse:collapse; border:1px solid black;"><a href="#sVI" style="display: block; height: 100%; text-decoration: none; width: 100%;"> Definitions</a></td> </tr><tr> <td style="border-collapse:collapse; border:1px solid black;">VII</td> <td style="border-collapse:collapse; border:1px solid black;"><a href="#sVII" style="display: block; height: 100%; text-decoration: none; width: 100%;"> Problems with Poorly Primed Communication</a></td> </tr><tr> <td style="border-collapse:collapse; border:1px solid black;">VIII</td> <td style="border-collapse:collapse; border:1px solid black;"><a href="#sVIII" style="display: block; height: 100%; text-decoration: none; width: 100%;"> Relevance and Applications to Discussion</a></td> </tr><tr> <td style="border-collapse:collapse; border:1px solid black;">IX</td> <td style="border-collapse:collapse; border:1px solid black;"><a href="#sIX" style="display: block; height: 100%; text-decoration: none; width: 100%;"> Glossary</a></td> </tr></tbody></table></div><br /><br /><h3><span style="font-family: Times, Times New Roman, serif;"><a id="sI">I. Language Development and Conventions</a></span></h3><br /><span style="font-family: Times, Times New Roman, serif;">You awake one day in a new and strange place. You have no memories of how you got there, indeed, few or no memories. All you can tell is that you are having certain experiences: experiences that do not yet make much sense to you. You feel certain instinctive urges. You are confused and scared and you do not understand what is happening. <br /><br />You are an infant. <br /><br />We have all gone through something like this in the earliest stages of our lives. Our life begins as a submersion into the new and unfamiliar. What we are forced to do, to make sense of all of our experiences, is something like the following: from our experiences, we form perceptions; from these perceptions, we get impressions; and from these impressions, we form concepts. As an example, from a certain visual experience, I form an image (of a tree), the perception, from this image, I get certain impressions (it has a certain texture and color, it has various dimensions, some parts seem farther away than others, etc.). Finally, from all these impressions, I form the concept (of a tree standing before me). <br /><br />Once we have concepts, what we need is some way to understand them, some way to handle them so as to get something from them. We thus embark on a project of organization and systematization. What we want, or what would be useful, is a way to categorize, generalize, simplify and synthesize these concepts into something meaningful. <br /><br />This system is the seed of a language, which we use to handle the concepts we have obtained, and derive new concepts. Were we never introduced to formal, communicative languages (were we, for instance, stranded from birth on a desert island, what philosophers call a <b><a href="#g1">lifelong Crusoe</a></b>), we might possibly develop a rudimentary language for our own use. <br /><br />Shakespeare, through Juliet, famously said: “What’s in a name? that which we call a rose/ By any other name would smell as sweet”. We can take this to mean that a name is an arbitrary label we attach to a thing. The thing a name refers to is some set of particular set or class of objects, descriptions or designations: whatever its name, the thing remains constant, as the name itself is arbitrary and variable. In some sense, the name is almost abstract. <br /><br />Every language has a system by which it attaches names to various concepts, and these names constitute the units of the language: the words. We call this system of naming the <b><a href="#g4">nominal convention</a></b> of the language, which comprises a large part, if not most, of it. Let us call the entirety of all the concepts a person knows his <b><a href="#g3">concipulary</a></b>, as contrasted with and similar to a <b><a href="#g2">vocabulary</a></b>. A nominal convention is then a two-way mapping from the concipulary to a vocabulary: this concept of a tree corresponds to the word “tree”, and vice versa. However, not every concept might be mapped to a name, some concepts might be mapped to more than one name, and some names to more than one concept. It is then clear that for a nominal convention to be fully grasped, the user must have a certain concipulary as well, from which and to which to map. This instantiation of the requisite concipulary is one of the important aspects of learning a language, and learning in general. <br /><br />Using a nominal convention has a great number of advantages and uses. It permits us to identify something and refer back to it: we label it, and can then keep it for later consideration. The name itself can serve as a mnemonic, as with onomatopoetic names. Names also allows us easily to refer to and handle various concepts, consider how they relate to each other, construct new ideas, and, importantly, open up the possibility of communication. <br /><br />The rest of a language can be loosely broken up into two more components: a syntactical convention and an expressive/social convention. The <b><a href="#g5">syntactical convention</a></b> delineates how the concepts are to be arranged into propositions, including grammar and punctuation. It describes how the various names and designations can be put together into a complete idea. These units of ideas, composed of concepts, are sentences or propositions. The syntactical convention is then the system by which words are formed into sentences, or by which concepts are formed into propositions. Note that a given nominal convention can have multiple syntactic conventions: for example, we can consider standard English and <a href="http://itre.cis.upenn.edu/~myl/languagelog/archives/002173.html">Yoda English</a> (“John is a boy” vs. “A boy John is”). <br /><br />Lastly, the <b><a href="#g6">expressive/social convention</a></b> encompasses the various ways of expressing or conveying the ideas. This also covers the various social norms expected in communication. For instance, this would include spelling and pronunciation, tone and accent, but also social aspects, such as context-dependent vocabularies, honorifics, euphemisms, etc. Punctuation falls somewhere between this convention and the syntactical convention. Thus, this convention is an important component of language, as it constitutes the system whereby it is applied: it establishes the language as a communal effort. In addition, it establishes the language in the context of a culture, in which language has always played an important role. <br /><br />Note that the various conventions have different roles in conveying meaning. The nominal convention specifies the referents in the communication, that is, what is being talked about. The syntactical convention specifies the content of the communication, that is, what relation or information is being conveyed. The expressive/social convention specifies how it is being communicated: what the context is and the means by which communication is taking place. </span><br /><br /><h3><span style="font-family: Times, Times New Roman, serif;"><a id="sII">II. Communication</a></span></h3><br /><span style="font-family: Times, Times New Roman, serif;">Communication is one of the main purposes of a language: language is the means by which the concepts in one head can appear in another head, so to speak. We will call the “speaking” party the sender and the “listening” party the recipient. There are essentially three components in communication: composition, transmission, and interpretation. <b><a href="#g7">Composition</a></b> involves the translation of concepts by the sender into a message via a language. <b><a href="#g8">Transmission</a></b> involves moving the message from the sender to the recipient. <b><a href="#g9">Interpretation</a></b> involves the recipient translating the message back into concepts using a language. Note that the interpretation we speak of is only first level; it is only the translation of words into concepts and sentences into propositions and ideas, not the full extraction of implications. (For instance, “John washed his hands before lunch” could be interpreted to mean that John takes care to have sterile hands before having a meal, or it could be interpreted to mean that John had dirty hands sometime before lunch, or that John wanted to get something incriminating off his hands before a social occasion, etc. All the interpretation stage pertains to is the literal, first-level meaning of the sentence.) All of these steps are necessary and are found in any type of communication we recognize. <br /><br />It is thus key that both communicators have a mutual agreement on the nominal, syntactical and expressive/social conventions being used. If the nominal convention is not fixed, the recipient may get a message in characters she recognizes, with a syntax that she can comprehend, but full of words that do not mean anything to her. If the syntactical convention is not fixed, the recipient may get a message with words she understands, in characters she recognizes, but in a structure she can’t understand or is ambiguous (“Tom saw Joe” means something different than “Joe saw Tom”). If the expressive/social convention is not fixed, the recipient may get a message in a way she can’t understand (mispronounced, written in strange characters, using obscure jargon, etc.). <br /><br />The conventions being unspecified leads, then, mainly to difficulty for the recipient. If, however, the recipient is versatile enough, she may be able to infer some of the conventions (a new word, an unusual sentence structure, a new euphemism or honorific, etc.), but some background knowledge is essential, and a thorough knowledge of all the conventions makes communication much easier. We may here differentiate between <b><a href="#g10">coreferential communication</a></b> and <b><a href="#g11">areferential communication</a></b>. The former is distinguished from the latter by the use of a common reference or experience. For instance, in coreferential communication, the sender can point out something with just a demonstrative (“that thing in the sky” “that sound”), while in areferential communication, whatever is specified must be done so using only language (“the bright, yellow orb in the sky” “the harsh, ringing clang”). One of the goals of language is to move away from coreferential communication toward areferential communication. We will return to this later. <br /><br /></span><span style="font-family: Times, Times New Roman, serif;"><h3><a id="sIII">III. Errors and Obstacles to Communication</a></h3><br />In attempts to communicate and achieve mutual understanding, many problems can arise, both in language and in communication. In a previous section, we divided communication into three stages: composition, transmission, and interpretation. Errors can occur at every stage, but there are even problems with the idea of communication. <br /><br />The trouble with communication is that before communication can even take place, there has to be some base understanding of what will be communicated and how. For instance, before I tell someone something in English, there must be an implicit understanding that the communication will be done using spoken English (I can’t walk up to a Russian and begin speaking English, assuming that he will understand). Before I use a word, I have to assume that the word will be understood, and if the recipient does not, she may be able to infer the meaning from context, or else I will have to provide an explanation. This requisite preceding step we will call <b><a href="#g12">priming</a></b> and poses some serious difficulties for communication, as it seems impossible to prime the conversation without communication. Since priming precedes communication, it seems priming must precede itself. As we will discuss later, priming serves as one of the main difficulties of communication. For now, we will assume priming has taken place. <br /><br />An error in composition is one of the more trivial ones. It amounts to an incompetence on the part of the sender, in which the sender has either a malformed thought or has mistranslated that thought into language. A similar error can occur in interpretation due to incompetent translation of the message into concepts. This error is easily dealt with through diligence and care. We will thus henceforth assume both communicators are both competent once primed. <br /><br />An error in transmission is simply where the message cannot get from sender to recipient either wholly or in part. It amounts to nothing more than an engineering problem: setting up an adequate system for relaying the message, intact, from sender to recipient. An error in this case would be a downed telegraph line, wax in one’s ear, a crummy postal service: something at least in principle correctable. There may be other, complex cases, but these are mere physical limitations. The problem is either physically irresolvable, which would preclude communication altogether, or it is resolvable, in which case a suitable apparatus would be sufficient. <br /><br />Lastly, if priming is successful, the only thing that can impede interpretation is competence, and thus we can move straight into priming. Clearly, a proper priming will prevent any misunderstandings in the various conventions. But how can we prime at all? It seems that to prime, we need to communicate something, and to communicate anything, we first need to prime. So communication seems impossible, unless there is some sort of priming that comes standard, and allows further communication to be built up from it. As we shall see, there is indeed such a priming. <br /><br /></span><span style="font-family: Times, Times New Roman, serif;"><h3><a id="sIV">IV. Priming</a></h3><br />As noted above, priming is the required step preceding any communication of establishing the various conventions that will be used. In effect, priming is the declaration of the language in which the rest of communication will be conducted. But how can we do this at all? I can’t say to someone who doesn’t understand English “this conversation will be conducted in English” as the recipient won’t even understand this establishing message: in order to understand the message, the recipient must already know that we are communicating in English, and, moreover, have an adequate knowledge of the English language. <br /><br />So how can we prime at all? It seems that to prime, we need to communicate something, and to communicate anything, we first need to prime. Communication from the ground up thus seems impossible, unless there is some sort of priming that comes “built-in”, and allows further communication to be built up from it. As it happens, there is such a priming: it is the sort of fundamental understanding we would have of certain sorts of communication, which we shall call the <b><a href="#g13">natural priming</a></b>. For instance, when someone points to something (or holds up an object, or draws a picture of something) and articulates a sound, we take that to mean that they are referring to the thing by the name given by the sound. Or, if the sender points to a certain sort of action, or imitates it, we can understand the name to refer to the action. The nominal convention is of gestures (gesturing in such and such a way means this), and seems innate: children and even animals seem to have such an understanding. The syntactical convention is merely in connecting the uttered noise or attached characters to the gesture. <br /><br />The basis of natural priming is repetition and induction. Given that the sender does something repeatedly, the recipient infers that such an action means a certain thing. This is a sort of pavlovian response: the sender repeatedly says “tree” while pointing at a tree, and the recipient learns that, when the sender says “tree”, he is still referring to a physical tree, though he does not point to it. In essence, this is the installation of a certain nominal convention, whereby a meaning (the natural understanding of what is meant by pointing at a tree) is attached to a word (the association of the word/sound “tree” with the pointing to a tree). <br /><br />Thus, clearly, this sort of communication is wholly coreferential: there must be some shared experience for establishing this sort of understanding. Moreover, this sort of priming is mostly if not only of the nominal convention, and is restricted by particular instances. For example, I can point to an object and say “tree”, but the recipient cannot understand that I mean “this, and everything like it in certain respects, is called ‘tree’,” but only “this particular thing is called ‘tree’.” We can perhaps ameliorate this by giving several different instances and using the same word to describe them all, thereby setting the stage for a generalization, but we cannot directly impart this generalization from examples. The generalization must be made by the recipient. Once a nominal convention basis has been established, the other conventions can be established pretty easily, either by imitation or instruction. <br /><br />We can also establish a <b><a href="#g14">primitive basis</a></b>, that is, a basis of concepts and their nominal designations derived from experience alone that themselves do not depend on other concepts. From this primitive basis we can construct other concepts and attach other names to them. This is the way language and its understanding develop, and it will feature importantly in our discussion of definitions. <br /><br />Thus, as far as priming goes, if we have a reliable way to engage in coreferential communication, using nothing but natural priming, we can then build up communication to be quite widely useful. The goal is to be able to communicate areferentially, such that the sender and receiver can communicate with language alone. However, as we will see, even coreferential communication faces some problems. <br /><br /></span><span style="font-family: Times, Times New Roman, serif;"><h3><a id="sV">V. Problems with Coreferential Communication</a></h3><br />It seems one would think that this coreferential communication based solely on natural priming would be a sound foundation to begin establishing communication. We convey to the recipient the name of certain objects or actions by gestures, imitation or depiction, and thence to generalizations. This can be used to generate the primitive basis, from which higher levels of complexity and abstraction can be developed, leading to all concepts and all language. Everything seems to fall into place. <br /><br />However, let us take a specific example: the sender points to a tree, says “tree” and the recipient from this learns that the thing pointed to is called a tree. But all this means is that the concept formed of the thing experienced has the same name according to both, but how can we be sure that both experiences are the same and that both concepts are the same? It certainly seem intuitive to think that they would be, and, lacking any reason to think otherwise, it seems reasonable to think they are, but how can we be sure? This is the classic problem of <a href="http://plato.stanford.edu/entries/qualia/">qualia</a>, the individual instances of subjective experience. Two men can both agree that a thing is green (they both agree to call it that), but how can the one man be sure that the other man’s green is identical to his own? Perhaps what is red through the eyes of one is green through the eyes of the other, though the two both agree to give it the same name. Though it may in theory be possible to transmit or check qualia directly (some sort of device to transmit qualia as-is from one mind to another), it seems that it is a far cry in practice. This difficulty, which rarely causes problems, is thus correctable in theory, though intractable in practice. <br /><br />But there is also trouble with the fact that each individual must (either by necessity or practicality), or at least often does, make her own generalizations or categorizations. A generalization or categorization is the procedure of taking a group of things all called by the same name and extracting common features so that new things can be included. This entails equating the categorization to a list of necessary and sufficient criteria, whether vague or specific. This set of criteria is used as a definition of what it means to be in that group, and will be very important in our later discussion of definitions. <br /><br />However, often there is no specific set of criteria that is transmitted or even one that has widespread acceptance. This can allow even an individual’s understanding of a category to be vague: based on unknown or uncertain criteria. This often makes it difficult to discern the boundaries of a category (the necessary or sufficient conditions to qualify for inclusion). An example of this is the famous <a href="http://plato.stanford.edu/entries/sorites-paradox/">sorites problem</a> of determining what a heap is. Namely, it asks what is the number of grains in the smallest heap. Other cases are the height of the tallest “short” man, or the hairiness of the hairiest “bald” man. This vagueness also leads to many disputes over border cases, in which the vagueness of common understanding allows for multiple conceptualizations that cover the same typical cases, but differ with respect to bordering cases. This shows why priming is so important, and also why it can be so difficult. <br /><br />Some coreferential communication can be quite reliable in that we can pretty accurately check some concepts. It seems mathematical or geometric concepts can be generally well checked based on the exactness of what they correspond to and the precision that must hold for the many theorems that result from their application. In particular, numbers seem to be perfectly communicable (or at least nearly so), and thus anything that can be quantified can be perfectly well communicated as well. This plays into the importance of quantification in specificity, exactitude and understanding. But nevertheless there are, it seems quite many qualitative properties that cannot be exactly communicated. </span><span style="font-family: Times, Times New Roman, serif;"><br /><br /><h3><a id="sVI">VI. Definitions</a></h3><br />As we have been discussing, prior to any communication, a stage of priming must take place, wherein the stage is set for communication. This involves specifying the various conventions (nominal, syntactical, expressive/social) such that the message sent by the sender can be properly understood by the recipient. A very important part of priming is the introduction of definitions, that is, explanations in other terms of what a word or term means, what it will be used to mean, or how it will be used. We will call the term being defined the <b><a href="#g21">definiendum</a></b>, and the collection of other terms used to define it the <b><a href="#g22">definiens</a></b>. The terms used in the definiens we will call <b><a href="#g25">subterms</a></b> (note how even now I am priming the discussion by introducing definitions!). There are various sorts of definitions that we will discuss, but to begin, we will introduce a basic theory of definitions. <br /><br />For a definitions to be useful, all the subterms in the definiens must be understood: if we know the subterms, we can know the new term, given the definition. Thus, for the definition to be useful, its subterms cannot include the definiendum (as then we would need to know the term before we could understand the term, which is obviously not possible). Sometimes even subterms will need to be defined, and so on. But clearly this cannot go on forever: there must be some point of understanding at which we can stop. If not, then either this presents a vicious infinite regress, wherein to understand a term we first need to understand another term, and so on ad infintum, or becomes circular, wherein we need to understand A to understand B, B to understand C, and C to understand A. <br /><br />Thus, there must be terms that do not require other terms to be understood. These are called <b><a href="#g32">primitives or primitive notions</a></b>, and serve as the axioms of the definition system (examples typically given are color, being, number, and duration). Typically, however, it is unnecessary to reduce all definitions to the level of primitives, and so we merely must reduce the definitions to the level of acceptability: the point at which the meaning of the terms is mutually accepted as understood. It is possible that two conversers could hold some mutual level of acceptability and then a third comes along with a lower level and definitions would need to be added to get everyone on the same page (this often happens when a new person is introduced into a technical field). In whole, our terminology forms, as it were, a pedigree, with each of the various terms tracing its lineage back to some combination of primitives. <br /><br />Let us now distinguish between various sorts of definitions. The classic distinction is between intensive and extensive definitions. An <b><a href="#g15">intensive definition</a></b> gives necessary and sufficient conditions: what, categorically, would include something under the term. For instance, we can define a “square” to be “a closed planar figure of four congruent straight line segments and all angles equal”. In contrast, rather than provide the conditions for inclusion and exclusion, an <b><a href="#g16">extensive definition</a></b> gives an exhaustive list of all the things in the specified category. For example, we can define a continent to be any one of the landmasses Africa, Asia, Europe, Antarctica, Australia, North America, or South America. Ideally, an intensive definition is always preferred, as it allows us to understand what it means to be in a category, and is in no way contingent upon extant examples. A third kind of definition is called an <b><a href="#g17">ostensive definition</a></b> which is formed through coreferential communication. Essentially, it is what we have already described: we gesture toward an object and say a name and thereby define that name by that object (or name that object by that word). <br /><br />We may also distinguish between definitions with different functions. The first is <b><a href="#g18">stipulative definition</a></b>, in which a term is applied to a novel concept as a way to refer to it. This is typical in such fields as mathematics and philosophy (“stipulative definition”, for instance, is here given a stipulative definition). Another sort is a <b><a href="#g19">descriptive definition</a></b>, which attempts to clarify or specify a common term that does not have a robust definition. This sort of definition can be disproven by showing that there is something the definition includes or excludes that the common notion does not. A <b><a href="#g20">precising definition</a></b> is something of an intersection of the two, in which we take a subset of what is commonly considered part of the meaning of a term and specify that we will use the term to mean that subset. For instance, though “ball” can often be used to describe many sorts of round objects (e.g. footballs), we can say that, in our discussion, we will only use them to refer to the spherical variety. <br /><br />The general form of a definition is a <b><a href="#g23">genus</a></b> with <b><a href="#g24">differentia</a></b>, that is, the class to which it belongs and what distinguishes it within that class. For example, we earlier defined a square as “a closed planar figure of four congruent straight line segments and all angles equal”. In this case, the genus is closed planar figures, and the differentia are that it has four congruent straight sides. We can also define terms as the intersection of genera. For instance, we can define squares as the intersection of regular polygons and quadrilaterals, or as the intersection of rectangles and rhombi, etc. (this is still basically using the genus/differentia approach, as the differens in this case would be the inclusion in the other genus) In this way, we can keep generalizing, using classes that take more and more elements, but we will somewhere need to stop, and this is arguably the point at which basic concepts come in (such as “entity” or “class”, etc.). <br /><br />In forming definitions, there are several precepts that serve as generally if not universally helpful. As we have discussed, whenever possible, it is best to define terms using simpler or more well-understood subterms. If the subterms are as complex or obscure (or more complex or obscure) than the definiendum, then the definition is clearly counterproductive, and doing so is said to be <b><a href="#g26">obscurum per obscuris</a></b> ([clarifying] the obscure through the obscure). The definition should not be circular, either individually, or globally. The definition should be intensive wherever possible (that is, it should delineate the essence of the definiendum). The definition should be adequate, in the sense of including all the things to which the term applies, and exclude all the things to which the term does not apply. It is typically preferable, where possible, not merely to recast a definition in terms of a synonym or the negation of an antonym. <br /><br />As communication and discussion hinges on mutual understanding, defining terms is always necessary (unless the terms fall into the “acceptable” category already discussed). If mutual understanding is the goal, definitions are indispensable. Thus, in discussion, we should always be able and willing to define our terms down to the acceptable level if needed, and we should preemptively define new, vague, or unusual terms, and communicate how they will be used. A failure to do so is a failure to prime the communication, and leaves communication gravely at risk. <br /><br /></span><span style="font-family: Times, Times New Roman, serif;"><h3><a id="sVII">VII. Problems with Poorly Primed Communication</a></h3><br />As we have seen, priming is an essential prerequisite for successful communication. Priming mainly involves, in practice, introducing the requisite definitions (mainly stipulative, descriptive and precising) so that both communicators are on the same page. But too often it happens that priming does not take place, or it is inadequate (or even that it is impossible or impracticable), and this can lead to serious misunderstanding and miscommunication. <br /><br />As we have mentioned, an important part of priming is the fixing of the nominal convention that will be used, which is mainly done through definitions. Also as we have mentioned, the nominal convention is the two-way mapping between the concipulary (set of concepts) and the vocabulary (set of names). A definition is merely the introduction or specification of the correspondence of one concept to one name and vice versa, that is, a word (this may be a nonstandard usage, in that “word” typically refers to the name with potentially multiple meanings, but I will use it to mean a name-meaning pair. Thus “can” meaning a “cylindrical metal container” is a different word than “can” meaning “be able to”). <br /><br />Let us examine the case of one word used in communication: the sender has a concept, attaches to it a name, and sends this (via some medium) to the recipient, who takes the name and finds what concept it maps to in her vocabulary. All we will assume is that the two communicators are not using the same nominal convention. Let us call the concept of the sender’s word the intended concept. We can distinguish five cases, not all of which are mutually exclusive: (1) recipient connects the name to the intended concept (2) the recipient connects the name to a different concept (3) the recipient connects the name to no concept (4) the recipient attaches a different name to the intended concept (5) the recipient does not have the intended concept in her concipulary. <br /><br />Case (1) is clearly a case of successful communication, priming notwithstanding. <br /><br />Case (2) is a conceptual confusion, in which a name is mistaken to mean something other than the intended concept. This is a language glut: the same name is doing too much. We will call such a case a <b><a href="#g27">homonymopathy</a></b> (a mistaken use of the same name to mean something unintended: the name is a pathological homonym). <br /><br />Case (3) is a nominal failure, in which the recipient receives what is to her nonsense. Perhaps she then tries to assign it a concept by guessing, in which case it would become (1) or (2) if she guessed rightly or wrongly respectively. <br /><br />Case (4) is a nominal confusion, in which, had the user used a different name, communication could have been successful. This may easily be combined with the preceding three cases, but the point is that communication was unsuccessful because of a poor choice of words. This is a language gap: the same concept can be reached, but the two users are separated by different names for it (both an Englishman and a Frenchman know what water is, but “water” means nothing to one and “eau” means nothing to the other). We will call such a case a <b><a href="#g28">synonymopathy</a></b> (a mistaken attempt to communicate the a concept by an unknown name: the two names are pathologically synonymous). <br /><br />Case (5) is merely a case of conceptual failure: priming and education must precede in order to get the recipient to apprehend the intended concept, much less recognize it by name. <br /><br />We thus see that the five cases amount to one success, two failures and two confusions. The success obviously needs no mending, and the failures are ameliorable, in that they serve to halt communication, not lead it astray. However, even a synonymopathy can halt communication, so the major troublemaker is a homonymopathy. As I will later argue, I think homonymopathy is one of the greatest impediments to successful communication. <br /><br />As discussed, nominal and conceptual failures can be remedied by definitions and explanations respectively. But what about nominal and conceptual confusions? The mythical case of the tower of Babel in the Old Testament is a clear case of nominal confusion: in order to halt the progress of the construction of a tower, god says “let us go down, and there confound their language, that they may not understand one another's speech.” That is, formerly, they had well-synchronized nominal conventions, but god perturbs these conventions, so that, though they are referring to the same thing (the building of the tower) they cannot understand one another. Thus, we may say that a name for such a case of nominal confusion is a Babel, or a <b><a href="#g29">Babel phenomenon</a></b>. The case of conceptual confusion is somewhat similar, and so we will give it the inventive name of inverse Babel, or an <b><a href="#g30">inverse Babel phenomenon</a></b>. Again, an inverse Babel is a case in which the users use names that one another recognizes, but the meaning to one is different than the meaning to another. For example, both an Englishman and Frenchman recognize the word “pain”, but the meaning to one is not the meaning to another: if one said “pain” to another, the second might give the first a baguette when he wanted an aspirin. <br /><br />Before we address solutions to each, it is important to identify how we can recognize each sort of confusion. A Babel can be identified if two people point to the same things and say something different, or if the explanation for the term in other language is the same “here, we call that…”. Another, simpler way is if we have some background knowledge: if you are an architect and you go to a painting conference in a foreign country, you will expect the people there to be discussing architecture, even if you don’t know what their language means when you encounter them. Much more simply, if you believe the sender is competent, but can’t understand what he is sending you, there is probably a language gap (though there may also be a conceptual failure). The ways to correct for this are well-studied and well-documented (it basically involves developing a method of translation, or choosing a standard nominal convention to use). It is possible that the nominal conventions are so dissimilar that coreferential priming is required. <br /><br />An inverse Babel is much trickier to discern. Typically the sender will send you something but the way he uses a term will be jarring or strange. This can be verified by asking “what do you mean?” and seeing if his reply matches your understanding. The lower down in the conceptual hierarchy the homonymopathy lies, the more difficult it is to remedy: this was raised in the section on problems with priming and coreferential priming. It may be the case, as with a vague or contested concept, that his concept and yours will be similar, but nevertheless, you will be using the same word to mean different things. Sometimes a loose, imperfect definition will do, as we have discussed, but one should not expect these vague conceptions to overlap everywhere. In general, once a homonymopathy has been identified, all that is needed is to assign each concept a different, mutually agreed-upon name. If your conception of a “pile” is different than mine, we need not call them by the same name. <br /><br />We can here identify a sort of fallacy which I will call the <b><a href="#g31">nominal fallacy</a></b> (note that there is already a fallacy of this name but of different meaning). One commits a nominal fallacy when one thinks that two words must mean different things because they have different names (synonymopathy) or two words must mean the same thing because they have the same name (homonymopathy). The fallacy can be easily demonstrated by showing that “Little” means the same thing as “small” and “bow (archery)” has the same name as “bow (knot)”. Though this fallacy can be easily addressed, it can also be quite tenacious, especially when the difference in meaning is slight or when personal stock is put in the names. <br /><br />Priming is a vital and important step which cannot be foregone. Failing to prime adequately leads to such problems as synonymopathies and homonymopathies, in which the communicators fail to use language the other understands, or worse, uses language the other thinks he understands. Avoiding these takes care but whatever the cost in preventing them greatly outweighs the hazards of ignoring them. In the final section, we will look at the relevance and applications of the preceding discussion. <br /><br /></span><span style="font-family: Times, Times New Roman, serif;"><h3><a id="sVIII">VIII. Relevance and Applications to Discussion</a></h3><br />Discussion and dialogue depend upon communication. Communication, as we have been discussing, depends crucially on having both communicators first agreed on the conventions that will be used, particularly the nominal convention. If communicators fail to prime the conversation by establishing the requisite conventions, the discussion can devolve into synonymopathies and homonymopathies, such that the communicators are merely speaking past one another. In many areas, these can manifest as major controversies. <br /><br />Armed with our previous discussion, we can distinguish two different sorts of disputes that arise: terminological disputes: disputes over the meaning and use of terminology; and factual disputes: disputes over what is the case, as described by the agreed-upon terminology (some may add to this other disputes, such as normative disputes). Clearly any (or at least most) terminological dispute can be dealt with through proper priming and clarification, or at least any dispute arising. <br /><br />Philosophy is the study of ideas generally, though even this definition is not uncontroversial. It is known more for providing questions and perspectives than hard answers, for broadening perspectives rather than narrowing them. Indeed, there is just about no area of philosophy that is not at least somewhat controversial. Philosophy is also known as being home to many disputes which have persisted since antiquity. I hypothesize, however, that many of these disputes are terminological, perhaps even a majority, or at least arise therefrom. <br /><br />Vagueness is a common source of terminological dispute. Two people have two different criteria for what qualifies for inclusion in a certain category, with the vast majority of cases mutually agreed upon. However, certain marginal cases are included in one but excluded in another. Each thinks he is correct, and in a sense both are: they are just speaking about different things—things that are very similar, and yet subtly different. There is no shortage of vague or essentially contested concepts in philosophy: knowledge, self, consciousness, explanation, cause, abstract, existence, good, obligation, nature, reason, beauty, purpose, essence, time, meaning, experience, person, to name a few. Vagueness also breeds uncertainty, which can be loosely characterized as a dispute with oneself: part of you thinks it should be interpreted one way, another part thinks it should be another way. If we are to resolve both disputes and uncertainty, we must find a way to eliminate or at least reduce vagueness. <br /><br />We can thus suggest ways in which to present and defend an argument. Before anything, the first step must be to define the relevant terms initially, or else be careful to define them as they come up. Definitions cannot in practice be perfect, but one must endeavor to make them as clear and complete as necessary and possible. If one is presenting an argument, one define to acceptability all important terms, making sure to limit the senses of vague terms especially. If one is rebutting an argument, one must examine the possible senses in which the terms of one’s opponent’s argument can be taken. If there are multiple, interacting terms, for charity, one must examine the various combinations of meanings and address each. This is especially important in order to avoid and demonstrate equivocation: one must be sure that the terms are being used in the same way throughout the argument, or else that the various meanings all conspire to reach the same conclusion. If one is merely engaged in a less argumentative discussion, one must be sure to consider, evaluate and decide on the terms before the discussion has proceeded too far. It is no use discussing something for hours only to find out that the entire point could have been avoided had you recognized an inverse Babel early on. <br /><br />This is perhaps the most important point: fixing definitions is an indispensable part of any discussion. Sometimes, an entire discussion can be wholly or mostly forgone if the definitions are made clear at the outset. I am convinced that a significant part of current philosophical discussion results from little more than vagueness and homonymopathy. It may be just as difficult to get another to understand what you mean as it is to get another to agree with you. What is most needed is clarity and mutual understanding: conversation must always precede discussion. <br /><br /></span><hr /><div><h3><span style="font-family: Times, Times New Roman, serif;"><a id="sIX">IX. Glossary</a></span></h3><br /><span style="font-family: Times, Times New Roman, serif;"><b><a id="g1">Lifelong Crusoe</a></b></span><br /><ul><span style="font-family: Times, Times New Roman, serif;">A person who, for his whole life, has never come into contact with others or society. </span></ul><span style="font-family: Times, Times New Roman, serif;"><b><a id="g2">Vocabulary</a></b><ul>The set of names in a certain language an individual can recognize as corresponding to certain concepts or which an individual can use to convey certain concepts. </ul><b><a id="g3">Concipulary</a></b><ul>The set of concepts an individual understands, can recognize as corresponding to certain names in a language, or can convey using names in a certain language. </ul><b><a id="g4">Nominal Convention</a></b><ul>An agreed upon system of attaching names to various concepts (and concepts to certain names) which comprises one of the main components of a language. </ul><b><a id="g5">Syntactical Convention</a></b><ul>An agreed upon system of forming sentences from words, that is, forming propositions or ideas from concepts. This typically involves a grammar, punctuation, syntax and various constructions. </ul><b><a id="g6">Expressive/Social Convention</a></b><ul>An agreed upon system of expressing ideas, typically including situation- dependent vocabulary (e.g. honorifics, jargon, complexity level) methods of communication (e.g. spelling, script, diction, enunciation, accent, intonation). </ul><b>Steps of Communication:<br /> <a id="g7">Composition</a></b><ul>The step of communication that involves the sender translating the idea to be communicated into a message via a language. </ul><b> <a id="g8">Transmission</a></b><ul>The step of communication that involves moving the message from sender to recipient. </ul><b> <a id="g9">Interpretation</a></b><ul>The step of communication that involves the recipient translating the received message into ideas via a language. </ul><b><a id="g10">Coreferential Communication</a></b><ul>Communication in which there is a common reference or experience that both communicators can directly experience. </ul><b><a id="g11">Areferential Communication</a></b><ul>Communication in which there is no common reference or experience that both communicators can directly experience. </ul><b><a id="g12">Priming</a></b><ul>The necessary prerequisite for successful communication that involves establishing the various conventions that will be used. </ul><b><a id="g13">Natural Priming</a></b><ul>Priming one has naturally that allows one to infer meaning by gestures, repetition or imitation. </ul><b><a id="g14">Primitive Basis</a></b><ul>The fundamental concepts which themselves cannot be formed or described using more basic concepts, generally derived from raw experiences or even a priori. </ul><b>Definitions:<br /> <a id="g15">Intensive</a></b><ul>A definition that specifies the necessary or sufficient conditions for qualifying for inclusion. </ul><b> <a id="g16">Extensive</a></b><ul>A definition that gives an exhaustive listing of all things that qualify for inclusion. </ul><b> <a id="g17">Ostensive</a></b><ul>A definition in which, by pointing to a certain thing or set of things (either figuratively or literally), one either gives instances which can be generalized or determines what qualifies for inclusion. </ul><b> <a id="g18">Stipulative</a></b><ul>A definition in which one specifies by assertion what a term means or will be used to mean. </ul><b> <a id="g19">Descriptive</a></b><ul>A definition in which one attempts to give a meaning to a term that reflects how it is commonly understood. </ul><b> <a id="g20">Precising</a></b><ul>A definition in which one narrows the scope of a term from how it is commonly understood to reflect how it will be used. </ul><b><a id="g21">Definiendum</a></b><ul>The word or term to be defined. </ul><b><a id="g22">Definiens</a></b><ul>Whatever is used to define a word or a term, typically composed of a set of composed subterms. </ul><b><a id="g32">Primitive (notion)</a></b><ul>Terms that do not require other terms to be understood that function as axioms of conceptual understanding. </ul><b><a id="g23">Genus</a></b><ul>The super-category into which the thing being defined falls. </ul><b><a id="g24">Differens</a></b><ul>The property that sets apart the thing being defined from other things in its genus. </ul><b><a id="g25">Subterm</a></b><ul>One of the terms in the definition of a term. </ul><b><a id="g26">Obscurum per Obscuris</a></b><ul>Defining a term using more difficult to understand subterms. </ul><b><a id="g27">Homonymopathy</a></b><ul>A failure of communication in which a term with an intended meaning is mistaken to mean something else instead. </ul><b><a id="g28">Synonymopathy</a></b><ul>A failure of communication in which a term is not understood to have the intended meaning, but a different term could have been used to achieve an understanding of the intended meaning. </ul><b><a id="g29">Babel (phenomenon)</a></b><ul>A case in which communicators are trying to communicate the same meaning using different terms or languages. </ul><b><a id="g30">Inverse Babel (phenomenon)</a></b><ul>A case in which the same term is interpreted to mean different things by different communicators, or in which the same term is being used in different ways. </ul><b><a id="g31">Nominal Fallacy</a></b><ul>Any of the following fallacious lines of reasoning (X and Y being words or terms): <ol><li> X and Y have different names. </li><li> Therefore X and Y refer to different things.</li></ol>Or <ol><li> X and Y refer to different things.</li><li> Therefore X and Y have different names.</li></ol>Or <ol><li> X and Y have the same name.</li><li> Therefore X and Y refer to the same thing.</li></ol>Or <ol><li> X and Y refer to the same thing.</li><li> Therefore X and Y have the same name.</li></ol>(Not to be confused with <a href="http://edge.org/response-detail/11730">a different fallacy of the same name)</a>. </ul></span></div>Nadavhttp://www.blogger.com/profile/08410204738776306784noreply@blogger.com0