Friday, December 25, 2015

The Double Angle Formula


Deriving the formula: \(\sin(2x)=2\sin(x)\cos(x)\)

Way 1: From Geometry

\[ RB=QA \;\;\;\;\;\;\;\;\;\; RQ=BA \] \[ \frac{RQ}{PQ}=\frac{QA}{OQ}=\sin(\alpha) \;\;\;\;\;\;\;\; \frac{PR}{PQ}=\frac{OA}{OQ}=\cos(\alpha) \] \[ \frac{PQ}{OP}=\sin(\beta) \;\;\;\;\;\;\;\; \frac{OQ}{OP}=\cos(\beta) \] \[ \frac{PB}{OP}=\sin(\alpha+\beta) \;\;\;\;\;\;\;\; \frac{OB}{OP}=\cos(\alpha+\beta) \] \[ PB=PR+RB=\frac{OA}{OQ}PQ+QA \] \[ \frac{PB}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OQ}\frac{OQ}{OP} \] \[ \sin(\alpha+\beta)=\cos(\alpha)\sin(\beta)+\sin(\alpha)\cos(\beta) \] Particularly, if \(\alpha=\beta=x, \;\;\;\; \sin(2x)=2\sin(x)\cos(x)\).

Way 2: From the Product Formula

Recall from this post that the product formulas for sine and cosine are, respectively: \[ \sin(x)=x\prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 n^2} \right ) \] And \[ \cos(x)=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \] Thus \[ \sin(2x)=2x\prod_{n=1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) =2\cdot x\prod_{n=\mathrm{even}\geq1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) \cdot \prod_{n=\mathrm{odd}\geq1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) \] \[ \sin(2x) =2\cdot x\prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 n^2} \right ) \cdot \prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 (n-1/2)^2} \right ) \] \[ \sin(2x)=2\cdot \sin(x) \cdot \cos(x) \]

Way 3: From the Taylor Series

The Taylor series for sine and cosine can be construed as, respectively: \[ \frac{\sin(\sqrt{x})}{\sqrt{x}}=\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k+1)!}x^k \] \[ \cos(\sqrt{x})=\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^k \] Thus \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{j=0}^{\infty}\frac{(-1)^j}{(2j+1)!}x^j \sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^k \] Using a Cauchy product, we find: \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{j=0}^{\infty}c_j x^j \] Where \[ c_m=\sum_{n=0}^{m} \frac{(-1)^n}{(2n+1)!}\frac{(-1)^{m-n}}{(2(m-n))!} =\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{m} \binom{2m+1}{2n+1} =\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{m} \binom{2m}{2n+1}+\binom{2m}{2n} \] \[ c_m=\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{2m} \binom{2m}{n}=\frac{(-1)^m}{(2m+1)!}2^{2m} \] And thus \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{m=0}^{\infty}\frac{(-1)^m}{(2m+1)!}(4x)^m=\frac{\sin(\sqrt{4x})}{\sqrt{4x}}=\frac{\sin(2\sqrt{x})}{2\sqrt{x}} \] Substituting \(x=y^2\) and rearranging, we find: \( 2\sin(y)\cos(y)=\sin(2y) \)

Way 4: From Euler's Formula

Euler's formula is: \[ e^{ix}=\cos(x)+i\sin(x) \] Thus \[ e^{i2x}=\cos(2x)+i\sin(2x)=\left ( e^{ix} \right)^2=\left (\cos(x)+i\sin(x) \right )^2 \] \[ e^{i2x}=\left [\cos^2(x)-\sin^2(x) \right ]+i\left [ 2\sin(x)\cos(x) \right ] \] Thus, by equating real and imaginary parts, \(\sin(2x)=2\sin(x)\cos(x)\) and \(\cos(2x)=\cos^2(x)-\sin^2(x)\)

The Half-Angle Formulas

We find from the last demonstration \[ \cos(2x)=\cos^2(x)-\sin^2(x)=2\cos^2(x)-1=1-2\sin^2(x) \] Substituting \(2x=y\) and solving, we find: \[ \sin\left ( \frac{y}{2} \right )=\sqrt{\frac{1-\cos(y)}{2}} \] \[ \cos\left ( \frac{y}{2} \right )=\sqrt{\frac{1+\cos(y)}{2}} \]

An Infinite Product Formula

We can write the double-angle formula as \[ \sin(x)=2\sin\left ( \frac{x}{2} \right )\cos\left ( \frac{x}{2} \right ) \] Iterating this, we then have \[ \sin(x)=2^n\sin\left ( \frac{x}{2^n} \right ) \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] However, in the limit as n gets large, \(2^n\sin\left ( \frac{x}{2^n} \right )\rightarrow x\). Thus, letting n go to infinity, we have \[ \sin(x)=x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] A simple theorem of this general result is \[ \frac{\pi}{2}=\frac{1}{\cos(\tfrac{\pi}{4})\cos(\tfrac{\pi}{8})\cos(\tfrac{\pi}{16})\cdots } =\frac{1}{\sqrt{\tfrac{1}{2}}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}}}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}}}}\cdots }=\frac{2}{\sqrt{2}}\frac{2}{\sqrt{2+\sqrt{2}}}\frac{2}{\sqrt{2+\sqrt{2+\sqrt{2}}}}\cdots \] This is known as Viète's formula.

A Nested Radical Formula

We note that \[ 2\cos(x/2)=\sqrt{2+2\cos(x)} \] Thus, by iterating, we find \[ 2\cos(x/2^n)=\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}} \] Thus \[ 2\sin(x/2^{n+1})=\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}}} \] And we can thus conclude that \[ x=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}}} \] For example \[ \pi/3=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+1}}}}}} \] \[ \pi/2=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2}}}}}} \]

An Infinite Series

Above, we derived \[ \sin(x)=x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] Taking the log of both sides and differentiating \[ \frac{\mathrm{d} }{\mathrm{d} x}\ln\left (\sin(x) \right )=\frac{\mathrm{d} }{\mathrm{d} x}\ln\left (x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \right ) \] \[ \cot(x)=\frac{1}{x}-\sum_{k=1}^{\infty}\frac{1}{2^k}\tan \left ( \frac{x}{2^k} \right ) \] \[ \\ \frac{1}{x}-\cot(x)=\sum_{k=1}^{\infty}\frac{1}{2^k}\tan \left ( \frac{x}{2^k} \right ) \] From this we can easily derive \[ \frac{1}{\pi}=\sum_{k=2}^{\infty}\frac{1}{2^k}\tan \left ( \frac{\pi}{2^k} \right ) \]

A Definite Integral

Let \[ I=\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )dx =\int_{\pi/2}^{\pi}\ln\left ( \sin(x) \right )dx =\int_{0}^{\pi/2}\ln\left ( \cos(x) \right )dx \] Then \[ 2I=\int_{0}^{\pi}\ln\left ( \sin(x) \right )dx =2\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )dx =\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )+\ln\left ( \cos(x) \right )dx \] \[ 2I=\int_{0}^{\pi/2}\ln\left ( \sin(x) \cos(x) \right )dx=\int_{0}^{\pi/2}\ln\left (\tfrac{1}{2} \sin(2x) \right )dx=-\frac{\pi}{2}\ln(2)+\int_{0}^{\pi/2}\ln\left (\sin(2x) \right )dx \] By the substitution \(u=2x\), we then have \[ 2I=-\frac{\pi}{2}\ln(2)+\tfrac{1}{2}\int_{0}^{\pi}\ln\left (\sin(u) \right )du=-\frac{\pi}{2}\ln(2)+I \] Therefore \[ I=\int_{0}^{\pi/2}\ln\left (\sin(x) \right )dx=-\frac{\pi}{2}\ln(2) \]

Tuesday, December 15, 2015

Some Introductory Quantum Mechanics: Mathematico-Theoretical Background

  Quantum mechanics (QM), being a novel and revolutionary framework for describing phenomena, requires a substantially different mathematical tool-set and way of thinking about physical systems and objects. There is dispute over how exactly to interpret the mathematical system used, but we will not discuss here the various interpretations. Rather, we will just describe and examine the framework and how it can be used to make predictions, all of which is agreed upon.

This will be a multi-part series giving a general introduction to quantum theory. This is part 2.

Hilbert, State, and Dual Spaces

Hilbert space is a generalized vector space: a sort of extended analog of the usual Euclidean space. Elements of a Hilbert space are sorts of vectors, and are denoted using a label (basically just a name) and some indication of vector-hood. We will use "bra-ket notation", in which elements of the vector space are denoted as\(\left | \phi \right >\) (a ket) (\(\phi\) is merely a label. We may sometimes use numbers, or other symbols, but these are all merely labels). Every such element has a corresponding "sister" in what is called the dual space, which is denoted by \(\left < \phi \right |\) (a bra). (The name is basically a joke: two halves of the word "bracket"). The use of the dual space will become apparent in our later discussion. In general, and in QM especially, the vector space is complex, meaning the vector's "components" (loosely speaking) are complex numbers.

Inner Products

To be a Hilbert space, there must also be an inner product, or a way of associating a complex number to each pair of vectors (the order may be important: the inner product of A and B need not be the same as that of B and A). The inner product of \(\left | \phi \right > \) and \(\left | \psi \right > \) is denoted by \(\left \langle \psi \right | \left. \phi \right \rangle\), that is the dual of \(\left | \psi \right > \) acting on \(\left | \phi \right > \). In particular, to be a Hilbert space, we must have that if \(\left \langle \psi \right | \left. \phi \right \rangle = z \), \(\left \langle \phi \right | \left. \psi \right \rangle = \bar{z} \), that is, the complex conjugate. If \(\left | \phi \right \rangle= r \left | \psi \right \rangle\) then \(\left \langle \phi \right |= \bar{r} \left \langle \psi \right |\). Also, we must have \(\left \langle \psi \right | \left. \psi \right \rangle \geq 0\), with equality holding iff \(\left | \psi \right >\) is the zero vector. Clearly \(\left \langle \psi \right | \left. \psi \right \rangle \) will be real.

Beyond this, the inner product is linear. In general, if \(\left | \phi \right \rangle= a\left | \alpha \right \rangle+b\left | \beta \right \rangle \) and \( \left | \psi \right \rangle= c\left | \gamma \right \rangle+d\left | \delta \right \rangle \), then we have: \[ \left \langle \psi \right | \left. \phi \right \rangle =a\bar{c}\left \langle \gamma \right | \left. \alpha \right \rangle + a\bar{d}\left \langle \delta \right | \left. \alpha \right \rangle + b\bar{c}\left \langle \gamma \right | \left. \beta \right \rangle + b\bar{d}\left \langle \delta \right | \left. \beta \right \rangle \] We can also prove the famous Cauchy-Schwartz Inequality, namely, that: \[ \left |\left \langle \psi \right | \left. \phi \right \rangle \right |^2 \leq \left \langle \psi \right | \left. \psi \right \rangle \left \langle \phi \right | \left. \phi \right \rangle \] Two vectors \(\left | \phi \right > \) and \(\left | \psi \right > \) are said to be orthogonal if \(\left \langle \psi \right | \left. \phi \right \rangle=0\). A vector is said to be normal or normalized if \(\left \langle \phi \right | \left. \phi \right \rangle =1\). If we have a set of vectors \({| \left. \phi_1 \right \rangle} , {| \left. \phi_2 \right \rangle} , {| \left. \phi_3 \right \rangle},...\) such that \( \left \langle \phi_j \right. | \left. \phi_k \right \rangle = 0 \) for all \(j \neq k\), then the set is called orthogonal set. If it is also the case that \( \left \langle \phi_k \right. | \left. \phi_k \right \rangle = 1 \) for all k, then the set is called orthonormal.


An operator is something which acts on a vector to produce another vector: \(A \left | \phi \right \rangle= \left | \phi' \right \rangle\). The operator \(A\) is linear if, for any \(\left | \phi \right \rangle= a\left | \alpha \right \rangle+b\left | \beta \right \rangle\), we have \( A\left | \phi \right \rangle= a A\left | \alpha \right \rangle+b A\left | \beta \right \rangle \).
Let \(A \left | \phi \right \rangle= \left | \phi' \right \rangle\) and \(B \left | \psi \right \rangle= \left | \psi' \right \rangle\). If \(\left \langle \psi \right | \left. \phi' \right \rangle=\left \langle \psi' \right | \left. \phi \right \rangle\) then A and B are called conjugate operators, denoted \(A=B^{\dagger}\) and \(B=A^{\dagger}\), so \(A=\left (A^{\dagger} \right )^\dagger\). We also have \(\left \langle \phi' \right |= \left \langle \phi \right | A^\dagger\). If \(A=A^\dagger\), then A is called Hermitian. If \(A=-A^\dagger\), then A is called anti-Hermitian. If \(\left \langle \psi' \left | \right. \phi'\right \rangle = \left \langle \psi \left | \right. \phi\right \rangle \), for all pairs of vectors, then A is called unitary.
We also have the following properties: \[ (A+B)\left | \phi \right \rangle= A\left | \phi \right \rangle+B\left | \phi \right \rangle \] \[ AB\left | \phi \right \rangle= A\left (B\left | \phi \right \rangle \right ) \] Note that it is not necessarily the case that \[ AB\left | \phi \right \rangle= BA\left | \phi \right \rangle \] That is, operators need not commute. In fact, we commonly use the notation \([A,B]=AB-BA\) (this is called the commutator of A and B). Non-commutativity will play an important role in the theory.

For a given A, in some cases, for certain \(\left | \phi \right \rangle\), we have that \(A\left | \phi \right \rangle= \lambda \left | \phi \right \rangle \) for some constant \(\lambda\). In this case, we call \(\lambda\) an eigenvalue of the operator A and \(\left | \phi \right \rangle\) the corresponding eigenvector.
Often it is the case that we can find a set of orthonormal vectors that are the eigenvectors of a given linear operator, such that we can also write any vector as a linear sum of the eigenvectors. In that case, \[| \left. \psi \right \rangle = a_1 | \left. \phi_1 \right \rangle +a_2 | \left. \phi_2 \right \rangle+a_3 | \left. \phi_3 \right \rangle+...\]where \(a_k=\left \langle \phi_k \right. | \left. \psi \right \rangle\) (\(a_k\) is called the projection of \(\psi\) into the \(\phi_k\) direction). Then \[\left \langle \psi\left. \right | \psi \right \rangle=|a_1|^2+|a_2|^2+|a_3|^2+...\] \[A\left| \psi \right \rangle = \lambda_1 a_1 | \left. \phi_1 \right \rangle + \lambda_2 a_2 | \left. \phi_2 \right \rangle+\lambda_3 a_3 | \left. \phi_3 \right \rangle+...\] \[\left \langle \psi \right | A\left| \psi \right \rangle = \lambda_1 \left |a_1 \right |^2 + \lambda_2 \left |a_2 \right |^2+\lambda_3 \left |a_3 \right |^2 +...\] If the operator is also Hermitian, then we call it an observable. Particularly, if an operator is Hermitian, all its eigenvalues are real.
If \(| \left. \psi \right \rangle \) is normalized, then we can use the notation \(\left \langle A \right \rangle_\psi=\left \langle \psi\left | A \right |\psi \right \rangle\) and \(\sigma^2_A=\left \langle A^2 \right \rangle_\psi-\left \langle A \right \rangle^2_\psi\).

Postulates of Quantum Mechanics

Given that mathematical background, we can now lay out the fundamental postulates of QM. Exactly how to interpret these postulates will be left for later discussion.
  1. Wavefunction Postulate
    The state of a physical system at a given time is defined by a wavefunction which is a ket vector in the Hilbert space of possible states. Generally, the vector is required to be normalized.
  2. Observable Postulate
    Every physically measurable quantity corresponds to an observable operator that acts on the vectors in the Hilbert space of possible states.
  3. Eigenvalue Postulate
    The possible results of a measurement of a physically measurable quantity are the eigenvalues of the corresponding observable.
  4. Probability Postulate
    Suppose the set of orthonormal eigenvectors of observable A \({| \left. \phi_{k_1} \right \rangle} , {| \left. \phi_{k_2} \right \rangle} , {| \left. \phi_{k_3} \right \rangle},...\) all have eigenvalue \(\lambda\). Suppose the initial wavefunction can be written as \(| \left. \psi \right \rangle = a_1 | \left. \phi_1 \right \rangle +a_2 | \left. \phi_2 \right \rangle+a_3 | \left. \phi_3 \right \rangle+...\) (i.e. the linear sum of orthonormal eigenvectors of A). Note that \(\psi\) is a superposition of other eigenstates. That is, it is a sort of combination of states that have definite properties. Each eigenstate has a well-defined value for the observable, but \(\psi\) does not.
    The probability of measuring the observable to have the value \(\lambda\) is given by \(P(\lambda)=\left | a_{k_1} \right |^2+\left | a_{k_2} \right |^2+\left | a_{k_3} \right |^2+...\). More simply, if no two eigenvectors have the same eigenvalue, then the probability that we will measure the observable to have value \(\lambda_k\) is \(| \left \langle \phi_k\left | \right. \psi\right \rangle |^2\). This is called the Born Rule.
    Given this, it is easy to see that \(\left \langle A\right \rangle_\psi=\left \langle \psi \left | A \right | \psi\right \rangle\) is the expected value of the operator A.
  5. Collapse Postulate
    Immediately after measurement, the wavefunction becomes the normalized projection of the prior wavefunction onto the sub-space of values that give the measured eigenvalue. That is, using the above description, the wavefunction immediately after measurement becomes \(\alpha \cdot( a_{k_1}| \left. \phi_{k_1}\right \rangle +a_{k_2}| \left. \phi_{k_2}\right \rangle+a_{k_3}| \left. \phi_{k_3}\right \rangle +...)\) where \(\alpha\) is a suitable normalization constant, chosen to make the resulting vector normalized. More simply, if no two eigenvectors have the same eigenvalue, then the wavefunction immediately after we measure the observable to have value \(\lambda_k\) is \(| \left. \psi \right \rangle=| \left. \phi_k \right \rangle\).
  6. Evolution Postulate
    The time-evolution of the wavefunction, in the absence of measurement, is given by the time-dependent Schrodinger Equation: \[ \hat{E} \left.|\psi \right \rangle=\hat{H}\left.|\psi \right \rangle \] Where \(\hat{E}\) is the energy operator, which is given by \(i \hbar \frac{\partial }{\partial t}\), and \(\hat{H}\) is the Hamiltonian operator, which is defined analogously as in classical mechanics. In particular, it is the sum of the kinetic and potential energy operators.

Spatial Dimensions

A common Hilbert space to use is that of functions of one spatial dimension and time. This is an example of an infinite dimensional Hilbert space (at any x-coordinate, the wavefunction could take on a completely independent value). We often speak of eigenfunctions instead of eigenvectors in such a space. In this Hilbert space, we define the inner product of two wavefunctions to be \[\left \langle \phi\left | \right. \psi\right \rangle =\int_{-\infty}^{\infty}\bar{\phi}(x,t)\psi(x,t)dx\]. The momentum operator in the x-direction is given by \(P_x=\frac{\hbar}{i}\frac{\partial }{\partial x}\). The position operator is quite simply \(X=x\). The (un-normalized) eigenfunctions for each are easily found to be, respectively \[ \left. | \psi\right \rangle_p=e^{ipx/\hbar} \] \[ \left. | \psi\right \rangle_{x_0}=\sqrt{\delta(x-x_0)} \]
The classical kinetic energy is given by \(E_k=\frac{1}{2}mv^2=\frac{p^2}{2m}\). The potential energy is given simply by \(E_p=V(x,t)\), that is, merely a specification of the potential energy as a function of position and possibly time. Thus, the time-dependent Schrodinger Equation can be written as \[ i \hbar \frac{\partial }{\partial t} \left.|\psi \right \rangle=\left ( \frac{-\hbar ^2}{2m} \frac{\partial^2 }{\partial x^2}+V(x,t) \right)\left.|\psi \right \rangle \] If the wavefunction is an eigenfunction of energy, with eigenvalue E, then its energy does not change with time and we can write the time-independent Schrodinger Equation: \[ E \left.|\psi \right \rangle=\left ( \frac{-\hbar ^2}{2m} \frac{\partial^2 }{\partial x^2}+V(x,t) \right)\left.|\psi \right \rangle \] That is, \(\psi\) is an eigenfunction of the Hamiltonian. We can often then solve this to find not only the wavefunction solutions, but the energy solutions: often such an equation will only be soluble with a discrete set of possible energies. The conditions of normalizability and normalization, as well as boundary conditions contribute toward determining energies and solutions.
The extension to multiple dimensions follows analogously.


The Hilbert space to describe the spin state of an electron (or other spin 1/2 particle) is typically that of a two-by-one matrix. That is, a ket will be of the form \[ \left. |\psi \right \rangle= \begin{pmatrix} a\\ b \end{pmatrix} \] And the corresponding bra will be \[ \left \langle \psi | \right.= \begin{pmatrix} \bar{a} & \bar{b} \end{pmatrix} \] The condition for normalization is that \(|a|^2+|b|^2=1\). A similar description can be used for polarization for photons. The operators for spin in the x, y and z directions, are, respectively: \[ S_x=\frac{\hbar}{2}\begin{pmatrix} 0 & 1\\ 1 & 0 \end{pmatrix} \] \[ S_y=\frac{\hbar}{2}\begin{pmatrix} 0 & -i\\ i & 0 \end{pmatrix} \] \[ S_z=\frac{\hbar}{2}\begin{pmatrix} 1 & 0\\ 0 & -1 \end{pmatrix} \] All of these have eigenvalues \(+\frac{\hbar}{2}\) and \(-\frac{\hbar}{2}\), with corresponding eigenvectors: \[ \left. |+x \right \rangle=\left. |+ \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} 1\\ 1 \end{pmatrix},\; \; \left. |-x \right \rangle=\left. |- \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} 1\\ -1 \end{pmatrix} \] \[ \left. |+y \right \rangle=\left. |\rightarrow \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} -i\\ 1 \end{pmatrix},\; \; \left. |-y \right \rangle=\left. |\leftarrow \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} 1\\ i \end{pmatrix} \] \[ \left. |+z \right \rangle=\left. |\uparrow \right \rangle=\begin{pmatrix} 1\\ 0 \end{pmatrix},\; \; \left. |-z \right \rangle=\left. |\downarrow \right \rangle=\begin{pmatrix} 0\\ 1 \end{pmatrix} \]

Multiple Particles

In the case of more than one particle, we can construct a total wavefunction by composing those of each particle. For instance, if we have two particles, the first with spin up and the second with spin down, we can write that in a variety of ways. For instance: \[ \left. |\uparrow \right \rangle_1 \otimes \left. |\downarrow \right \rangle_2=\left. |\uparrow \right \rangle_1\left. |\downarrow \right \rangle_2=\left. |\uparrow \downarrow \right \rangle \] Clearly this case can be described in a way that treats each particle separately: the first particle is in one state and the second particle is in another state. However, sometimes it can be the case that the total wavefunction cannot be described in such a way. For instance: \[ \left. |\psi \right \rangle=\frac{1}{\sqrt{2}}\left ( \left. |\uparrow \downarrow \right \rangle +\left. | \downarrow \uparrow \right \rangle \right ) \] In this case, if we measure the first particle to have spin up, the wavefunction collapses to the state \(\left. |\uparrow \downarrow \right \rangle\). This is an example of entanglement, which is where two objects' states cannot be independently described.

Tuesday, November 3, 2015

Stirling's Approximation: Derivation and Corollaries


Lemma 1: \(\lim_{n \rightarrow \infty} \sqrt[n]{n!}/n=1/e\)

Way 1 (somewhat rigorous)

From elementary calculus, we have that: \[ \int_{0}^{1} \ln(x) dx =-1 \] Taking this as a Riemann sum, as done in introductory calculus, we have: \[ -1=\int_{0}^{1}\ln(x)dx=\lim_{N \rightarrow \infty} \sum_{k=1}^{N}\ln\left (\frac{k}{N} \right ) \cdot \frac{1}{N} \] \[ -1=\lim_{N \rightarrow \infty} -\ln(N)+\frac{1}{N} \sum_{k=1}^{N}\ln\left (k \right ) \] \[ -1=\lim_{N \rightarrow \infty} -\ln(N)+\frac{1}{N} \ln\left (N! \right ) \] Therefore, \[ \lim_{N \rightarrow \infty} \frac{\sqrt[N]{N!}}{N}=\frac{1}{e} \]

Way 2 (less rigorous)

\[ \lim_{n \rightarrow \infty} \frac{\sqrt[n]{n!}}{n}=x \] So, for n big, in a certain sense: \[ n! \approx (nx)^n \] \[ \frac{(n+1)!}{n!(n+1)}=1 \approx \frac{((n+1)x)^{n+1}}{(nx)^n (n+1)}=\left ( 1+ \frac{1}{n} \right )^n x \] Thus, in order to get equality in the limit, we must have: \[ x = \lim_{n \rightarrow \infty} \left ( 1+ \frac{1}{n} \right )^{-n}=\frac{1}{e} \]

Lemma 2: Wallis Product in Factorial Form

Recall from this article the following expression for pi: \[ \frac{\pi}{2}=\prod_{k=1}^{\infty}\frac{2k \cdot 2k}{(2k-1)(2k+1)}=\lim_{N \rightarrow \infty}\prod_{k=1}^{N}\frac{2k \cdot 2k}{(2k-1)(2k+1)}=\lim_{N \rightarrow \infty} \frac{\left ( 2^N \cdot N! \right )^4}{\left ( (2N)! \right )^2(2N+1)} \]

Lemma 3: An Inequality for the Natural Logarithm

Let \(x,y > 0\). Clearly \[ 0 \leq \frac{1}{y^2 (1+y)^2 (2y+1)^2} \] Therefore \[ 0 \leq \int_{x}^{\infty}\frac{dy}{y^2 (1+y)^2 (2y+1)^2}=\frac{1}{x}+\frac{1}{x+1}+\frac{4}{x+1/2}-6\ln \left ( 1+\frac{1}{x} \right ) \] \[ 6\ln \left ( 1+\frac{1}{x} \right ) -\frac{6}{x+1/2} \leq \frac{1}{x}+\frac{1}{x+1}-\frac{2}{x+1/2} \] \[ (x+\tfrac{1}{2})\ln \left ( 1+\frac{1}{x} \right ) -1 \leq \frac{(x+\tfrac{1}{2})}{6}\left (\frac{1}{x}+\frac{1}{x+1} \right )-\frac{1}{3}=\frac{1}{12x(x+1)} \] Also, clearly \[ 0 \leq \frac{16y^2+41y+24}{y(1+y)^2 (2+y)^2 (2y+1)^2} \] Therefore \[ 0 \leq \int_{x}^{\infty}\frac{16y^2+41y+24}{y(1+y)^2 (2+y)^2 (2y+1)^2} dy=6\left (\ln \left ( 1+\frac{1}{x} \right )-\frac{1}{x+\tfrac{1}{2}} \right)-\frac{1}{2(x+\tfrac{1}{2})(x+1)(x+2)} \] And so \[ \frac{1}{12(x+1)(x+2)} \leq (x+\tfrac{1}{2})\ln \left ( 1+\frac{1}{x} \right )-1 \]

Theorem: Stirling's Approximation

Let us define a function and sequence of coefficients as follows: \[ g(n)=\ln\left ( \frac{n!}{\left ( \tfrac{n}{e} \right )^n \sqrt{2\pi n}} \right )=\sum_{k=-\infty}^{\infty} A_k n^k \] We then have, from lemma 1, \[ \frac{1}{e}=\lim_{n \rightarrow \infty} \frac{\sqrt[n]{n!}}{n}=\lim_{n \rightarrow \infty} \frac{\sqrt[n]{\left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{g(n)}}}{n}=\frac{1}{e} \lim_{n \rightarrow \infty} \sqrt[2n]{2\pi n} \cdot e^{g(n)/n} \] Thus \[ 1=\lim_{n \rightarrow \infty} e^{g(n)/n}=\exp\left (\lim_{n \rightarrow \infty} \sum_{k=-\infty}^{\infty} A_k n^{k-1} \right )=\exp\left (\lim_{n \rightarrow \infty} \sum_{k=1}^{\infty} A_k n^{k-1} \right ) \] And therefore \(A_k=0\) for \(k \geq 1\). From lemma 2, \[ \frac{\pi}{2}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot n! \right )^4}{\left ( (2n)! \right )^2(2n+1)}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{g(n)} \right )^4}{\left ( \left (\tfrac{2n}{e} \right )^{2n} \sqrt{4\pi n} \cdot e^{g(2n)} \right )^2(2n+1)} \] \[ \frac{\pi}{2}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot n! \right )^4}{\left ( (2n)! \right )^2(2n+1)}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{g(n)} \right )^4}{\left ( \left (\tfrac{2n}{e} \right )^{2n} \sqrt{4\pi n} \cdot e^{g(2n)} \right )^2(2n+1)} \] \[ \frac{\pi}{2}=\lim_{n \rightarrow \infty} \frac{n \pi}{2n+1} \cdot e^{4g(n)-2g(2n)} \] \[ 0=\lim_{n \rightarrow \infty} 2g(n)-g(2n)=2A_0-A_0=A_0 \] Therefore, \(A_k=0\) for \(k \geq 0\), and thus \(\lim_{n \rightarrow \infty} g(n)=0\). Thus it follows that \[ \lim_{n \rightarrow \infty} \frac{n!}{\left ( \tfrac{n}{e} \right )^n \sqrt{2\pi n}}=1 \] This fact is known as Stirling's Approximation. Moreover, we have \[ g(n)-g(n+1)=\ln\left ( \frac{n!\left ( \tfrac{n+1}{e} \right )^{n+1} \sqrt{2\pi (n+1)}}{(n+1)!\left ( \tfrac{n}{e} \right )^n \sqrt{2\pi n}} \right )=\ln\left ( \frac{(n+1)^{n+\tfrac{1}{2}}}{e \cdot n^{n+\tfrac{1}{2}}} \right ) \] \[ g(n)-g(n+1)=(n+\tfrac{1}{2})\ln\left ( 1+\frac{1}{n} \right )-1 \] By lemma 3, we then have \[ \frac{1}{12(n+1)(n+2)} \leq g(n)-g(n+1) \leq \frac{1}{12n(n+1)} \] \[ \sum_{k=n}^{\infty} \frac{1}{12(k+1)(k+2)}=\frac{1}{12(n+1)} \leq \sum_{k=n}^{\infty} g(k)-g(k+1)=g(n)-g(\infty)=g(n) \leq \sum_{k=n}^{\infty} \frac{1}{12k(k+1)}=\frac{1}{12n} \] That is \(\tfrac{1}{12(n+1)} \leq g(n) \leq \tfrac{1}{12n}\). And therefore: \[ \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n}\cdot e^{\tfrac{1}{12(n+1)}} \leq n! \leq \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{\tfrac{1}{12n}} \] In fact, it is possible to obtain exact formulas for \(g(n)\). For example, by more advanced calculations, we can show that \[ g(n)=\int_{0}^{\infty}\frac{2 \tan^{-1}\left ( \tfrac{y}{n} \right )}{e^{2\pi y}-1}=\sum_{k=1}^{\infty} \frac{B_{2k}}{2k(2k-1)n^{2k-1}}=\frac{1}{12n}-\frac{1}{360n^3}+\frac{1}{1260n^5}- \cdots \] Where \(B_m\) is the mth Bernoulli number. These two expressions are, respectively Binet's second expression and Stirling's series.

Corollary: Product of a Rational Function

Firstly, since \[ \prod_{k=1}^N \left(ak+b\right)=a^N\prod_{k=1}^N \left(k+\frac{b}{a}\right) \] We will just evaluate \[ \prod_{k=1}^N \left(k+b\right)=\frac{(N+b)!}{b!} \approx \left(\frac{N+b}{e}\right)^{N+b}\frac{\sqrt{2\pi(N+b)}}{b!}=N^{N+b+\tfrac{1}{2}}e^{-N}\frac{\sqrt{2\pi}}{b!}e^{-b}\left(1+\frac{b}{N}\right)^{N+b+\tfrac{1}{2}} \] \[ \prod_{k=1}^N \left(k+b\right)=\frac{(N+b)!}{b!} \approx N^{N+b+\tfrac{1}{2}}e^{-N}\frac{\sqrt{2\pi}}{b!} \] More generally, given the above, it is not difficult to demonstrate the following generalization. Let \(m,n > 0\). Let \(a_1,a_2,...,a_m\) and \(b_1,b_2,...,b_n\) and \(r_1,r_2,...,r_m\) and \(s_1,s_2,...,s_n\) be sequences of numbers, such that \[ \sum_{k=1}^m r_k=\sum_{k=1}^n s_k \] and \[ \sum_{k=1}^m a_k r_k=\sum_{k=1}^n b_k s_k \] Then \[ \prod_{k=1}^\infty\frac{(k+a_1)^{r_1}(k+a_2)^{r_2}\cdots (k+a_m)^{r_m}}{(k+b_1)^{s_1}(k+b_2)^{s_2}\cdots (k+b_n)^{s_n}}=\frac{\prod_{j=1}^n (b_j!)^{s_j}}{\prod_{j=1}^m (a_j!)^{r_j}} \] In cases where the coefficients are non-integral, we use the Gamma function (an extension of the factorial to non-integers), instead of factorials: \[ \prod_{k=1}^\infty\frac{(k+a_1)^{r_1}(k+a_2)^{r_2}\cdots (k+a_m)^{r_m}}{(k+b_1)^{s_1}(k+b_2)^{s_2}\cdots (k+b_n)^{s_n}}=\frac{\prod_{j=1}^n (\Gamma (b_j+1))^{s_j}}{\prod_{j=1}^m (\Gamma (a_j+1))^{r_j}} \] For instance \[ \prod_{k=0}^\infty \frac{(k+1)(k+a+b)}{(k+a)(k+b)}=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}=B(a,b) \] \[ \frac{\sin(\pi x)}{\pi x}=\prod_{k=1}^\infty\frac{(k-x)(k+x)}{k^2}=\frac{\Gamma(1)^2}{\Gamma(1-x) \Gamma(1+x)}=\frac{1}{\Gamma(1-x)x \Gamma(x)} \] \[ \prod_{k=1}^\infty\frac{(1+\tfrac{1}{k})^x}{1+\tfrac{x}{k}}=\prod_{k=1}^\infty\frac{(k+1)^x k}{k^x (k+x)}=\frac{\Gamma(1)^x \Gamma (1+x)}{\Gamma(2)^x \Gamma(1)}=\Gamma(1+x) \]

Corollary: Asymptotic Behavior of Bernoulli Numbers

In this article, we found that \[ \zeta(2n)=\frac{1}{2}\frac{(2\pi)^{2n}}{(2n)!}\left | B_{2n} \right | \] Combining this with Stirling's approximation, we find that \[ \left | B_{2n} \right |=2\zeta(2n)\frac{(2n)!}{(2\pi)^{2n}} \approx 4\left ( \frac{n}{\pi e} \right )^{2n} \sqrt{n\pi} \cdot e^{1/24n} \]

Corollary: Approximation for Binomial Coefficients

\[ \binom{a}{b}=\frac{a!}{b!(a-b)!} \approx \frac{\left (\tfrac{a}{e} \right )^a \sqrt{2\pi a}} {\left (\tfrac{b}{e} \right )^b \sqrt{2\pi b}\left (\tfrac{a-b}{e} \right )^{a-b} \sqrt{2\pi (a-b)}}=\frac{1}{\sqrt{2\pi}}\sqrt{\frac{a}{b(a-b)}} \frac{a^a}{b^b (a-b)^{a-b}} \]

Corollary: Normal from Binomial

Let \(0 < p < 1\) and \(p+q=1\). Let \[ F_n(x)=\binom{n}{x}p^x q^{n-x} \] \[ f_n(x)=\sqrt{npq}F_n(np+x\sqrt{npq}) \] \[ \phi_n(x)=\ln(f_n(x)) \\ \\ \phi_n(x)=\ln(n!)-\ln((np+x\sqrt{npq})!)-\ln((nq-x\sqrt{npq})!)+(np+x\sqrt{npq})\ln(p)+(nq-x\sqrt{npq})\ln(q) \] Using Stirling's Approximation and some algebra \[ \phi_n(x) = -\tfrac{1}{2}\ln(2\pi)-\left (\tfrac{1}{2}+ np+x\sqrt{npq} \right )\ln\left ( 1+x\sqrt{\frac{q}{np}} \right)-\left (\tfrac{1}{2}+ nq-x\sqrt{npq} \right )\ln\left ( 1-x\sqrt{\frac{p}{nq}} \right)+O(\tfrac{1}{n}) \] Using the series expansion \(\ln(1+x)=x-\tfrac{1}{2}x^2+O(x^3) \) \[ \phi_n(x) = -\tfrac{1}{2}\ln(2\pi)-\tfrac{1}{2}x^2+O(\tfrac{1}{\sqrt{n}}) \] Thus, as \(n\) goes to infinity \[ \phi_\infty(x) = -\tfrac{1}{2}\ln(2\pi)-\tfrac{1}{2}x^2 \] \[ f_\infty(x) = \frac{e^{-x^2/2}}{\sqrt{2\pi}} \] Thus, in the limit, scaling for the changing means and variances, the binomial distribution tends to the normal distribution. Moreover, since the binomial distribution is normalized, we find that \[ \int_{-\infty}^{\infty}\frac{e^{-x^2/2}}{\sqrt{2\pi}}dx=1 \]

Tuesday, October 27, 2015


  A classic problem in philosophy and the philosophy of science is how to justify induction. That is, how to rationally go from the fact that X is true in N previously observed cases to the belief that it is true in all cases, or at least in an additional, unobserved case. We will here propose a quick and simple method to justify induction, based on the combination of Occam's razor (to choose hypotheses) and Bayesian inference to update epistemic probabilities.


Let us introduce the following notation. Let \(H\) be some hypothesis which we want to judge for plausibility. Let \(X_k\) be the fact that \(X\) is true in the kth instance. Let \(X^n\) be the fact that \(X\) is true in the first n cases, that is \[X^n=X_1 \cap X_2 \cap \cdots \cap X_n=\bigcap_{k=1}^{n}X_k\] so that \[X^{n-1}\cap X_n=X^n\] Thus \(P\left ( X^n|H \right ) \) is the (epistemic) probability that we observe X in n cases, supposing H is true, and \(P\left ( H|X^n \right ) \) is the (epistemic) probability that H is true, supposing we observe X to be the case in n cases.

Occam's Razor

There are three basic, simplest hypotheses we can form, all the rest being more complex. These three are the
  • Proinductive (P) hypothesis: the chance of X happening again increases as we see more instances of it.
  • Contrainductive (C) hypothesis: the chance of X happening again decreases as we see more instances of it.
  • Uninductive (U) hypothesis: the chance of X happening again stays the same as we see more instances of it.
For concreteness, let \(F_H(n)=P\left ( X_{n}|H \cap X^{n-1} \right )\). Thus we say that, for \(m > 0\), \(F_P(n+m) > F_P(n)\), and \(\lim_{n \rightarrow \infty} F_P(n)=1\), and \(F_C(n+m) < F_C(n)\), and \(\lim_{n \rightarrow \infty} F_C(n)=0\), and \(F_U(n)=F_U(0)\).

Bayesian Inference

We want to find \(P\left ( H|X^n \right ) \) for the hypotheses listed in the previous section. We have \[ P\left ( X^n|H \right )=P\left ( X_n \cap X^{n-1}|H \right )=P\left ( X_n |X^{n-1} \cap H \right ) \cdot P\left ( X^{n-1} |H \right )=F_H(n) \cdot P\left ( X^{n-1} |H \right ) \] Therefore \[ P\left ( X^n|H \right )=\prod_{k=1}^{n} F_H(k) \] Suppose that there are \(N\) mutually exclusive and collectively exhaustive hypotheses. Then, Bayes' formula states: \[ P(H_m|A)=\frac{P(A|H_m)P(H_m)}{P(A|H_1)P(H_1)+P(A|H_2)P(H_2)+\cdots+P(A|H_N)P(H_N)} \] Thus, we have \[ P(H_m|X^n)=\frac{P(X^n|H_m)P(H_m)}{P(X^n|H_1)P(H_1)+P(X^n|H_2)P(H_2)+\cdots+P(X^n|H_N)P(H_N)} \] Therefore \[ P(H_m|X^n)=\frac{P(H_m)\prod_{k=1}^{n} F_{H_m}(k)}{P(H_1)\prod_{k=1}^{n} F_{H_1}(k)+P(H_2)\prod_{k=1}^{n} F_{H_2}(k) + \cdots + P(H_N)\prod_{k=1}^{n} F_{H_N}(k)} \] Let us suppose that the three hypotheses mentioned above are collectively exhaustive. Suppose, for concreteness that \(F_P(n)=\frac{n}{n+1}\), \(F_C(n)=\frac{1}{n+1}\), and \(F_U(n)=\frac{1}{2}\). Thus \(\prod_{k=1}^{n} F_{P}(k)=\frac{1}{n+1}\), and \(\prod_{k=1}^{n} F_{C}(k)=\frac{1}{(n+1)!}\), and \(\prod_{k=1}^{n} F_{U}(k)=\frac{1}{2^n}\). Let \(P(P)=p\) and \(P(C)=q\) and \(P(U)=r\) where \(p+q+r=1\). Then: \[ P(P|X^n)=\frac{p\frac{1}{n+1}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] \[ P(C|X^n)=\frac{q\frac{1}{(n+1)!}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] \[ P(U|X^n)=\frac{r\frac{1}{2^n}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] A simple assessment of limits shows that the former goes to 1 quite rapidly, for increasing n, for any nonzero p, and the latter two go to zero. In fact, for \(p=q=r=1/3\), for \(n>10\), \(P(P|X^n)>0.99\), and for \(n>17\), \(P(P|X^n)>0.9999\).

This example is meant to be only illustrative, to show the general way in which Occam's razor, combined with Bayesian inference, leads to a support of induction. The same things happening repeatedly lends credence to the hypothesis that the same things happen repeatedly, and detracts from the hypothesis that the same things are unlikely to happen repeatedly, or always happen with the same probability. In a very similar way, a coin repeatedly coming up heads supports the hypothesis that it is biased to come up heads, and detracts from the hypotheses that it is biased to come up tails or is fair. This may seem obvious, but it is beneficial to see exactly how the mathematical machinery supports this intuition.

We may also wish to include other hypotheses, but we must first assess the prior probabilities that they are true, and Occam's razor advises taking the inverse probability as inverse to the complexity of the hypothesis. Thus, even if on the hypothesis, observing n X's is more likely than on the three discussed, it would needs be more complex or ad hoc, and so would have a significantly lower prior probability.

Friday, October 23, 2015

Some Introductory Quantum Mechanics: Classical Background and Non-Classical Phenomena

  Quantum mechanics (QM) is a theoretical framework that describes the fundamental nature of reality, of particles of matter and light among potential others. QM arose from and in contrast to classical mechanics (CM), with many formulations and features still relying heavily on CM ideas. However, several phenomena established that CM cannot be the whole story, and would need to be amended. A new theory would need to be introduced to account for these phenomena, which would also predict some startling other ones. However, the best way to interpret the new theory is still disputed.

This will be a multi-part series giving a general introduction to quantum theory.

Classical Mechanics

QM is distinct from CM, though similar in several respects. CM, in general, looks at the behavior of idealized geometrical bodies, rigid, elastic, and fluid. The state is always definite, and in this state, momentum, energy, position and the like are well-defined and definite (we may make an exception for statistical mechanics, but in that case, these quantities may take on distributions only in the sense of an ensemble: it would still be in principle possible to determine these properties for each element in the ensemble, as Maxwell's demon would do). CM is how we tend naively to see the world. Things look like they are definite, spatially constrained, like a bunch of tiny definite parts, or large definite volumes moving along definite paths. This is decidedly not the case in QM.

CM has three main, equivalent formulations: Newtonian, Lagrangian and Hamiltonian.

  • Newtonian: Newtonian mechanics is the typical pedagogical formulation. It deals with the position and velocity of point masses, extended bodies, fluids, etc. in terms of forces, which relate back to position via Newton's second Law (which is really more of a definition). That is, for each asymptotically infinitesimal bit of matter in the system, find the net forces (and torques/stresses), in terms of the positions of the other bits of matter, relate it to the acceleration via Newton's second Law, and then solve the big set of differential equations (or use an iterative approximation method like Runge-Kutta) to find the trajectories of each bit of matter (often the problem is much simplified by various symmetries, homogeneities, localities, redundancies, and conservation considerations).

  • Lagrangian: Lagrangian mechanics deals more in energies, specifically a certain function of time space and momentum (of all the degrees of freedom) called the Lagrangian, which is typically just the kinetic minus the potential energy. Lagrangian mechanics allows one to deal with constraints in a simpler and more elegant way. Integrating the Lagrangian over time gives the action. The principle of stationary action states that objects move so as to make the action at a minimum (or sometimes, though rarely, at a maximum). This can be roughly and loosely interpreted as saying that objects go along the "easiest" trajectories. Lagrangians are still used extensively in modern physics, such as in quantum field theory and the path integral formulation.

  • Hamiltonian: Hamiltonian mechanics also deals in energies, specifically a certain function, related to the Lagrangian, called the Hamiltonian, which generally is equal to the total energy of the system. The trajectories are then found via Hamilton's equations, which are a set of differential equations relating changes of the Hamiltonian to changes of the position and momentum. This formalism uses rather abstract notions, such as frames of reference, generalized coordinates, phase space and the like. However, it is one of the most powerful formulations of classical mechanics and serves as one of the basic frameworks for the development of quantum mechanics.

Measurement in CM is intuitive and simple. We measure the position of a thing by looking at where it is and recording that. Measurement need not affect the thing being measured, at least in principle. But even if it can't be done in practice, the information being sought is still there and definite regardless. A hypothetical Laplace's demon could know all the parameters as they really are. This is very plausibly not the case in QM

As a rule, in CM, if an object requires energy \(E\) to do X, but only has energy \(E'< E\), then the object won't be able to do X. For example, if a marble is in a bowl with sides at a height requiring energy E to overcome (i.e. if the object is of mass m, the height of the sides is \(E/mg\)), but the marble only has energy \(E'< E\), the marble cannot escape the bowl. There is no chance that anyone will ever make a measurement of the position of the marble and have that be outside the bowl. Interestingly, this is not the case in QM.

CM is generally deterministic in a rather strict sense (though there are certain rare exceptions). Given that all of the above formulations are equivalent, they are all reducible to a set of second-order differential equations of various initial positions. This means that if all initial positions and velocities are known, even if the relevant forces are time-dependent, the trajectory of each object at all future times is unique and determinable. Any apparent indeterminism is merely apparent, namely epistemic. Assigning probabilities to different states or outcomes is done not because the state is ill-defined or there is some amount of indeterminism that emerges somehow. Rather, it is due to not knowing the initial state or not knowing how the system evolves. Were we to know completely the initial state and how it evolves, there would be no indeterminism. Moreover, any correlations arise from epistemically vague definite correlations. For instance, if we have two marbles, one of mass 100g and one of mass 105g, give one to one experimenter and the other to another, though they do not know which they received, once one experimenter weighs his marble, he immediately knows the weight of the other marble, even if it is very far away. We will find that this is not the case in QM.

A further development of CM was the inclusion of electromagnetic phenomena. These were incorporated in Maxwell's equations, which describe how electromagnetic fields are generated and changed by charges and currents. In essence, there is a ubiquitous, continuous electromagnetic field, which can be excited and disturbed in various ways, producing effects like radiation and induction (which lend themselves to a huge array of engineering and technological applications). A relatively simple theorem of electromagnetic theory is that accelerating charges radiate energy. This is most easily seen as being due to producing electric fields of varying strengths, combined with the fact that electromagnetic changes travel at a finite speed. For example, an oscillating charge will produce fields now weaker now stronger as it moves closer and further from a point. If we put a charge on a spring a distance away, it would begin oscillating, too, due to the varying force acting on it. Thus we could extract energy from the oscillating charge, and so it must be radiating energy, and so its oscillations will gradually decay. (Note that this implies that charges in orbit around one another will gradually radiate off their energy and fall into one another.) One of the outcomes of Maxwell's electromagnetic theory was the demonstration that light was electromagnetic in nature: electromagnetic disturbances propagated at the speed of light, and thinking of light as electromagnetic radiation accounted for a huge array of optical phenomena.

Also, electromagnetism is decidedly a wave-theory. The electromagnetic field is continuous and ubiquitous: it doesn't come in discrete "chunks" or "lumps" and it can have any value. It can have arbitrary energy (or energy density, a the case may be). This is opposed to particles, objects like little marbles, with definite extents, centers. When particles move, the stuff they are made of literally goes from one place to another. Whereas, when a wave moves, the field in one place increases, and decreases in another place: the pattern as opposed to the substance moves. Waves display interference effects: two waves could interfere constructively (increasing the size of the wave) or destructively (decreasing the size of the wave), whereas this seems impossible for particles. Destructive interference for particles would mean that when two particles came together, suddenly there was less substance there. We will return to this in discussing the two-slit experiment below.

Non-Classical Phenomena

There were several phenomena that indicated that CM was not the whole story, that it failed to give a full description of the world. These then paved the way for the development of QM.
  • Millikan's and Rutherford's Experiments

    Millikan discovered, by a very ingenious experiment, that charge was quantized, i.e. it came in "chunks" or "lumps". There was a smallest unit of charge. The existence of electrons as objects with a definite mass had already been discovered by Thompson, experimenting with cathode ray tubes, but it was not known whether electrons had a definite, single charge. Millikan found that charge only came in integer multiples of the fundamental charge, known to be about \(1.6 \times 10^{-19} \mathrm{C}\). Rutherford then demonstrated that the atom was structured, not as Thomson supposed, like a plum pudding, but rather with a small, dense, positively charged nucleus with the electrons in some arrangement around it.

  • Stability and Discrete Radiation of the Atom

    Rutherford's model of the atom (as well as any similar model) is impossible, according to classical electromagnetic theory. As discussed above, orbiting charges cannot persist indefinitely, as they will radiate off energy, and the orbit will eventually decay, the particles eventually colliding. As this clearly does not happen, there must be some modification to the understanding of the atom. In addition, it was noticed that an excited atom only emitted radiation at definite frequencies, not in a continuous spectrum. In the case of hydrogen, the radiation frequencies followed a very simple pattern. This behavior, however, could not be accounted for on classical mechanics, as the electron orbiting the nucleus could potentially have any energy. Moreover, if the electron could only have certain definite energies, it became difficult to see how it could go from one definite energy to another without taking on the intermediate energies. Clearly classical theory would have to be modified to allow for this.

  • Photoelectric Effect

    It was observed that shining light on a metal induced a current. This by itself was predictable by CM, given the understanding that the metal had electrons in it, and when light shone on the metal, some electrons absorbed the energy and so were able to escape the metal to produce a current. However, according to CM, the energy of the light depended solely on the amplitude (i.e. brightness): it would not depend on the frequency (i.e. color) of the light used. Also, for sufficiently dim light, there should be a lag time between when the light comes on and electrons are emitted, due to the electrons needing to absorb a sufficient amount of light energy. However, neither of these predictions were correct: very bright light of sufficiently low frequency induced no current. And at sufficiently high frequencies, regardless of how dim the light was, the current began immediately, with no delay. This led Einstein correctly to conclude that light was quantized, in units called photons. The energy of each photon was related to the frequency of the light. The brighter the light, the greater the number of photons per unit time. This would entail that for light of a low frequency, even if bright, no electrons would be ejected from the metal, as each photon lacks enough energy to eject an electron, and the chance of multiple photons hitting the same electron is negligible (and the energy that is absorbed is dissipated as heat in the meantime). Moreover, for high enough frequencies, the energy per electron is linear with respect to frequency, with slope \(h= 6.626 \times 10^{-34} \mathrm{J}\cdot \mathrm{s}\), known as Planck's constant (however, the current, is dependent on the brightness of the light). This leads to the conclusion that the energy of each photon is given by \(E=hf\).

  • Black Body Radiation

    A black body is defined as a perfect radiating source: it absorbs all radiation that falls on it, at a constant temperature. Such a body is known to radiate electromagnetic radiation, but finding and making sense of the spectrum of such a body is non-trivial. According to classical electromagnetic theory, the amount of radiation produced is expected to be proportional to the square of the frequency. That is, the higher the frequency, the more radiation. This is clearly not what happens in nature: otherwise hot objects would emit huge amounts of X-rays and gamma rays, and would instantaneously reach absolute zero, transforming all the thermal energy into electromagnetic radiation, as the total radiation is unbounded. However, Planck found that, by postulating that electromagnetic radiation was quantized as photons, with energies given by \(E=hf\), the total radiation was bounded, and tailed off at higher frequencies. The resulting formula is well born out by experiments, lending support to his postulation.

  • Double Slit Experiment

    An experiment was performed in which a very dim coherent light source was placed in front of a photographic plate, behind an opaque plate with two narrow slits. The light source was so dim that it emitted no more than one photon at a time. What was found was very strange, according to classical mechanics. The photographic plate produced a pattern of spots where each photon hit it, indicating that the light had been behaving like particles. However the pattern produced is what the classical wave theory predicted: an interference pattern. Had the photons been acting like genuine classical particles, a different pattern would have emerged, one with only two peaks as opposed to many. Classical theory had no way to account for this. In addition, whenever any sort of measuring apparatus was put in place to detect which slit the photon passed through (if it was behaving like a classical particle, it would need to have a definite position and hence pass through a definite slit), the wave-pattern disappeared and a particle-pattern emerged. Classical physics has no way to explain this. Moreover, the experiment has these same features, even when performed with electrons, atoms and even molecules. In each case, the interference pattern produced is consistent with thinking of each object as if it were a wave with wavelength \(\lambda=h/p\), where p is the momentum of the object. More generally, \(\mathbf{p}=\frac{h}{2\pi}\mathbf{k}\), where \(\mathbf{k}\) is the wave vector (a sort of generalized, multidimensional wavelength). In fact, the quantity \(\frac{h}{2\pi}\) comes up so frequently that it is given its own symbol: \(\hbar\).

  • Stern-Gerlach Experiment

    It was noticed that when a stream of certain atoms passed through an inhomogeneous magnetic field, the stream separated into several beams, two in the case of silver atoms. This demonstrated not only that the atoms had a magnetic dipole moment, but also that this moment was quantized, as otherwise it would have produced a smear, as opposed to several beams. The magnetic moment was correctly attributed to the charged particles in the atom, in particular the electrons. This implied that the electron had angular momentum. In classical mechanics, an object has angular momentum purely in terms of its structure and rotation. For example a wheel has angular momentum given its distribution of mass combined with its rotation. A point particle in classical mechanics cannot have angular momentum. Thus, as the electron was not known to have any internal structure, nor any literal rotation, the angular momentum could not be accounted for by classical physics. The angular momentum was thus given the name spin. An electron always has a measured angular momentum of either \(+h/2\) (called spin up) or \(-h/2\) (called spin down), relative to the axis of measurement. This itself is non-classical: classically, if an object has angular momentum about a certain axis, its angular momentum about an orthogonal axis will be zero, but electrons are never measured to have zero spin.

  • Apparent Indeterminacy

    Suppose we have an electron with measured spin up along the x-axis. If it is measured along the y-axis, it will be found to have either spin up or spin down along that axis. Moreover, the spin measured along that axis will appear to be perfectly random: the results of such an experiment pass every known test for statistical randomness. This feature arises often in similar cases. For instance, in the two-slit experiment, where the next photon (or electron) hits the screen is also apparently random. A half-silvered mirror is a common device in optics, which transmits half the light shone on it and reflects the other half. However, if we put two detectors at points where transmitted and reflected light would go, and shine very dim light on it, such that no more than one photon is reaching the half-silvered mirror at a time, the pattern of detectors registering will be also apparently random. The pattern of detection passes every known test for statistical randomness. This type of behavior is very different from the usual CM sort. This apparent indeterminacy or randomness is a major aspect of quantum mechanics, and belies much of the disputes and misunderstandings surrounding it.

Tuesday, October 20, 2015

Product Formula for Sine and Some Interesting Corollaries


Deriving the Product Formula: The Easy Way

Recall from this post that: \[ \sum_{n=1}^{\infty} \frac{1}{x^2+n^2}=\frac{\pi}{2x} \coth(\pi x)-\frac{1}{2x^2} \] We then substitute \(x=i z\): \[ \sum_{n=1}^{\infty} \frac{1}{n^2-z^2}=-\frac{\pi}{2z} \cot(\pi z)+\frac{1}{2z^2} \] We then go down the following line of calculation: \[ \sum_{n=1}^{\infty} \frac{2z}{n^2-z^2}=\frac{1}{z}-\pi\cot(\pi z) \] \[ \int\sum_{n=1}^{\infty} \frac{2z}{n^2-z^2}dz=C+\int \frac{1}{z}-\pi\cot(\pi z) dz \] \[ \sum_{n=1}^{\infty} -\ln \left (1-\frac{z^2}{n^2} \right )=C+\ln (z) - \ln (\sin (\pi z) ) \] \[ \sin(\pi z)=C' z\prod_{n=1}^{\infty}\left ( 1-\frac{z^2}{n^2} \right ) \] We can find \(C'\) by looking at the behavior near zero, and so find that: \[ \sin(\pi z)=\pi z\prod_{n=1}^{\infty}\left ( 1-\frac{z^2}{n^2} \right ) \] Therefore: \[ \sin(z)=z\prod_{n=1}^{\infty}\left ( 1-\frac{z^2}{\pi^2 n^2} \right ) \]

Deriving the Product Formula: The Overkill Way, by Weierstrass' Factorization Theorem

Suppose a function can be expressed as \[ f(x)=A\frac{\prod_{n=1}^{M}\left ( x-z_n \right )}{\prod_{n=1}^{N}\left ( x-p_n \right )} \] Where \(M \leq N\) and \(N\) can be arbitrarily large, even tending to infinity. Assuming there are no poles of degree >1 (all poles are simple), we can rewrite this as \[ f(x)=K+\sum_{n=1}^{\infty} \frac{b_n}{x-p_n} \] Where some of the \(b_n\) may be zero. We can also write this as \[ f(x)=f(0)+\sum_{n=1}^{\infty} b_n \cdot \left ( \frac{1}{x-p_n}+\frac{1}{p_n} \right ) \] Suppose \(f(0) \neq 0\), and that \(f\) is an integral function (i.e. an entire function). In that case, the logarithmic derivative \(f'(x)/f(x)\) has poles of degree 1. Moreover, \[\lim_{x \rightarrow z_n} (x-z_n)\frac{f'(x)}{f(x)}=d_n \] Where \(d_n\) is the degree of the zero at \(z_n\). Thus: \[ \frac{f'(x)}{f(x)}=\frac{f'(0)}{f(0)}+\sum_{n=1}^{\infty} d_n \cdot \left ( \frac{1}{x-z_n}+\frac{1}{z_n} \right ) \] Integrating: \[ \ln(f(x))=\ln(f(0))+x \frac{f'(0)}{f(0)}+\sum_{n=1}^{\infty} d_n \cdot \left ( \ln \left (1-\frac{x}{z_n} \right ) +\frac{x}{z_n} \right ) \] \[ f(x)=f(0) e^{x \frac{f'(0)}{f(0)}} \prod_{n=1}^{\infty} \left (1-\frac{x}{z_n} \right )^{d_n} e^{x\frac{d_n}{z_n}} \] This is our main result, called the Weierstrass factorization theorem. In particular, for the function \(f(x)=\sin(x)/x\) \[ \frac{\sin(x)}{x}=\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x}{n \pi} \right ) e^{x\frac{1}{n \pi}}=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{n^2 \pi^2} \right ) \] Thus \[ \sin(x)=x\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 n^2 } \right ) \]

Corollary 1: Wallis Product

Let us plug in \(x=\pi/2\): \[ \sin(\pi/2)=1=\frac{\pi}{2}\prod_{n=1}^{\infty} \left (1-\frac{1}{4 n^2 } \right ) \] \[ \pi=2\prod_{n=1}^{\infty} \left (\frac{4 n^2}{4 n^2-1 } \right )=2\frac{2 \cdot 2}{1 \cdot 3} \cdot \frac{4 \cdot 4}{3 \cdot 5} \cdot \frac{6 \cdot 6}{5 \cdot 7} \cdot \frac{8 \cdot 8}{7 \cdot 9} \cdots \] More generally: \[ \pi=\frac{N}{M} \sin(\pi M/N) \prod_{n=1}^{\infty} \left (\frac{N^2 n^2}{N^2 n^2 -M^2} \right ) \] This is useful when \(\sin(\pi M/N)\) is easily computable, such as when \(\sin(\pi M/N)\) is algebraic (e.g. \(M=1\), \(N=2^m\) ). For example: \[ \pi=2 \sqrt{2} \prod_{n=1}^{\infty} \left (\frac{4^2 n^2}{4^2 n^2 -1^2} \right ) \] \[ \pi=\frac{2}{3} \sqrt{2} \prod_{n=1}^{\infty} \left (\frac{4^2 n^2}{4^2 n^2 -3^2} \right ) \] \[ \pi=\frac{3}{2} \sqrt{3} \prod_{n=1}^{\infty} \left (\frac{3^2 n^2}{3^2 n^2 -1^2} \right ) \] \[ \pi=\frac{3}{4} \sqrt{3} \prod_{n=1}^{\infty} \left (\frac{3^2 n^2}{3^2 n^2 -2^2} \right ) \] \[ \pi=3 \prod_{n=1}^{\infty} \left (\frac{6^2 n^2}{6^2 n^2 -1^2} \right ) \] \[ \pi=\frac{3}{5} \prod_{n=1}^{\infty} \left (\frac{6^2 n^2}{6^2 n^2 -5^2} \right ) \] \[ \pi=3\sqrt{2}(-1+\sqrt{3}) \prod_{n=1}^{\infty} \left (\frac{12^2 n^2}{12^2 n^2 -1^2} \right ) \]

Corollary 2: Product Formula for Cosine

Let us evaluate the sine formula at \(x+\pi/2\): \[ \sin(x+\pi/2)=\cos(x)=\left (x+\frac{\pi}{2} \right )\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x+\pi/2}{\pi n } \right ) \] \[ \cos(x)=\frac{\sin(x+\pi/2)}{\sin(\pi/2)}=\left (1+\frac{x}{\pi/2} \right )\prod_{n=-\infty, n \neq 0}^{\infty} \frac{\left (1-\frac{x+\pi/2}{\pi n } \right )}{\left (1-\frac{\pi/2}{\pi n } \right )} \] \[ \cos(x)=\left (1+\frac{x}{\pi/2} \right )\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x}{\pi (n-1/2) } \right )=\prod_{n=-\infty}^{\infty} \left (1-\frac{x}{\pi (n-1/2) } \right ) \] \[ \cos(x)=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \] Alternatively, we can derive this directly from the Weierstrass factorization theorem.
Additionally, by using imaginary arguments, we can derive the formulae: \[ \sinh(x)=x\prod_{n=1}^{\infty} \left (1+\frac{x^2}{\pi^2 n^2 } \right ) \] \[ \cosh(x)=\prod_{n=1}^{\infty} \left (1+\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \]

Corollary 3: Sine is Periodic

Let us evaluate the sine formula at \(x+\pi\): \[ \sin(x+\pi)=\left (x+\pi \right )\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x+\pi}{\pi n } \right ) \] \[ \sin(x+\pi)=\cdots \left (1+\frac{x+\pi}{3\pi} \right ) \left (1+\frac{x+\pi}{2\pi} \right )\left (1+\frac{x+\pi}{\pi} \right )\left (x+\pi \right ) \left (1-\frac{x+\pi}{\pi} \right )\left (1-\frac{x+\pi}{2\pi} \right ) \left (1-\frac{x+\pi}{3\pi} \right ) \cdots \] \[ \sin(x+\pi)=\cdots \left (\frac{4}{3}+\frac{x}{3\pi} \right ) \left (\frac{3}{2}+\frac{x}{2\pi} \right )\left (2+\frac{x}{\pi} \right ) \pi \left (1+\frac{x}{\pi}\right ) \left (\frac{-x}{\pi} \right )\left (\frac{1}{2}-\frac{x}{2\pi} \right ) \left (\frac{2}{3}-\frac{x}{3\pi} \right ) \cdots \] \[ \sin(x+\pi)=\cdots \frac{4}{3}\left (1+\frac{x}{4\pi} \right ) \frac{3}{2}\left (1+\frac{x}{3\pi} \right )2\left (1+\frac{x}{2\pi} \right ) \pi \left (1+\frac{x}{\pi}\right ) \left (\frac{-x}{\pi} \right ) \frac{1}{2}\left (1-\frac{x}{\pi} \right ) \frac{2}{3}\left (1-\frac{x}{2\pi} \right ) \cdots \] \[ \sin(x+\pi)=-2x\left ( \prod_{k=2}^{\infty} \frac{k^2-1}{k^2} \right ) \left ( \prod_{n=1}^{\infty} \left (1-\frac{x^2}{n^2 \pi^2} \right ) \right )=-\sin(x) \] As the first product easily telescopes. Thus \(\sin(x+2\pi)=\sin((x+\pi)+\pi)=-\sin(x+\pi)=\sin(x)\). Therefore, sine is periodic with period \(2\pi\).

Corollary 3: Some Zeta Values

Let us begin expanding the product for sine in a power series \[ \sin(x)=x\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 n^2 } \right )=x-\frac{x^3}{\pi^2}\left (\frac{1}{1^2}+\frac{1}{2^2}+\cdots \right )+\frac{x^5}{\pi^4}\left (\frac{1}{1^2 \cdot2^2}+\frac{1}{1^2 \cdot3^2}+\cdots \frac{1}{2^2 \cdot3^2}+\frac{1}{2^2 \cdot4^2}+\cdots \right )+\cdots \] \[ \sin(x)=x-\frac{x^3}{\pi^2}\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )+\frac{x^5}{\pi^4}\left (\sum_{m=1,n=1, m < n}^{\infty}\frac{1}{m^2n^2} \right )+\cdots \] \[ \sin(x)=x-\frac{x^3}{\pi^2}\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )+\frac{x^5}{2\pi^4}\left (\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )^2- \sum_{k=1}^{\infty}\frac{1}{k^4} \right )+\cdots \] By comparing this to the Taylor series for sine, we find: \[ \frac{1}{3!}=\frac{1}{\pi^2}\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right ) \] \[ \frac{1}{5!}=\frac{1}{2\pi^4}\left (\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )^2- \sum_{k=1}^{\infty}\frac{1}{k^4} \right ) \] From which it follows that \[ \sum_{k=1}^{\infty}\frac{1}{k^2}=\frac{\pi^2}{6} \] \[ \sum_{k=1}^{\infty}\frac{1}{k^4}=\frac{\pi^4}{90} \] In fact, for the fourth term, we find, similarly, that \[ \frac{1}{7!}=\frac{1}{6\pi^6}\left ( \left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )^3-3\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )\left (\sum_{k=1}^{\infty}\frac{1}{k^4} \right )+2\left (\sum_{k=1}^{\infty}\frac{1}{k^6} \right ) \right ) \] From which it follows that \[ \sum_{k=1}^{\infty}\frac{1}{k^6}=\frac{\pi^6}{945} \]

Saturday, October 10, 2015

Derivation of a Formula for the Even Values of the Riemann Zeta Function


Lemma 1: Fourier Series of the Dirac Comb

A Dirac comb of period T is defined as \[{\mathrm{III}}_T(x)=\sum_{k=-\infty}^{\infty} \delta(x-kT)\] Where \(\delta(x)\) is the Dirac delta function. Since the Dirac comb is periodic with period T, we can expand it as a fourier series: \[\sum_{k=-\infty}^{\infty} \delta(x-kT)=\sum_{n=-\infty}^{\infty} A_n e^{i 2 \pi n x/T}\] We solve for the \(A_m\) in the usual way: \[ \int_{-T/2}^{T/2}\sum_{k=-\infty}^{\infty} \delta(x-kT)e^{-i 2 \pi m x/T} dx=1=\int_{-T/2}^{T/2}\sum_{n=-\infty}^{\infty} A_n e^{i 2 \pi (n-m) x/T} dx=T\cdot A_m \]\[ A_m=1/T \] Thus: \[\sum_{k=-\infty}^{\infty} \delta(x-kT)=\frac{1}{T}\sum_{n=-\infty}^{\infty} e^{i 2 \pi n x/T}\]

Lemma 2: An Infinite Series

\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=\frac{1}{x}+\sum_{n=1}^{\infty} \frac{1}{x+i n}+\frac{1}{x-i n}=\frac{1}{x}+2x\sum_{n=1}^{\infty} \frac{1}{x^2+n^2} \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=\int_{0}^{\infty} \sum_{n=-\infty}^{\infty} e^{-y(x+i n)} dy \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=\int_{0}^{\infty} e^{-yx} \sum_{n=-\infty}^{\infty} e^{-iyn} dy \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=2\pi \int_{0}^{\infty} e^{-yx} \sum_{k=-\infty}^{\infty} \delta(x-2\pi k) dy \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=2\pi \left (\frac{1}{2}+ \sum_{k=1}^{\infty} e^{-2\pi k x} \right ) \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=2\pi \left (\frac{1}{2}+ \frac{e^{-2\pi x}}{1-e^{-2\pi x}} \right )= \pi \frac{e^{2\pi x}+1}{e^{2\pi x}-1} \] Therefore, combining the first and last expressions and rearranging, we find: \[ \sum_{n=1}^{\infty} \frac{1}{x^2+n^2}=\frac{\pi}{2x} \frac{e^{2\pi x}+1}{e^{2\pi x}-1}-\frac{1}{2x^2}=\frac{\pi}{2x} \coth(\pi x)-\frac{1}{2x^2} \] Additionally, by taking the limit as x approaches zero, we find: \[ \sum_{n=1}^{\infty} \frac{1}{n^2}=\frac{\pi^2}{6} \]

Theorem: Formula for the Even Values of the Riemann Zeta Function

Recall that, by definition: \[ \zeta(n)=\sum_{k=1}^{\infty}\frac{1}{k^n} \] Let us then analyze \[ f(x)=1-\frac{x}{2}+\sum_{n=2}^{\infty}\frac{x^{n}}{n!} A_{n} \] Where \[ A_n=-2 \cdot n! \cdot \cos(n\pi/2) \cdot 2^{-n}\pi^{-n} \zeta(n) \] Thus: \[ f(x)=1-\frac{x}{2}-2\sum_{n=1}^{\infty}\left (\frac{-x^2}{4\pi^2} \right )^n \zeta(2n) \]\[ f(x)=1-\frac{x}{2}-2\sum_{n=1}^{\infty}\left (\frac{-x^2}{4\pi^2} \right )^n \sum_{k=1}^{\infty}\frac{1}{k^{2n}} \]\[ f(x)=1-\frac{x}{2}-2\sum_{k=1}^{\infty}\sum_{n=1}^{\infty}\left (\frac{-x^2}{4\pi^2 k^2} \right )^n \]\[ f(x)=1-\frac{x}{2}-2\sum_{k=1}^{\infty} \frac{-x^2}{4\pi^2 k^2}\frac{1}{1+\frac{x^2}{4\pi^2 k^2}} \]\[ f(x)=1-\frac{x}{2}+\frac{x^2}{2\pi^2}\sum_{k=1}^{\infty} \frac{1}{k^2+\frac{x^2}{4\pi^2}} \]\[ f(x)=1-\frac{x}{2}+\frac{x^2}{2\pi^2} \left ( \frac{\pi^2}{x} \frac{e^x+1}{e^x-1} -\frac{2\pi^2}{x^2} \right ) \]\[ f(x)=\frac{x}{2} \left ( \frac{e^x+1}{e^x-1} -1 \right )=\frac{x}{e^x-1} \] Therefore, for n>1, \[ A_n=\lim_{x \rightarrow 0} \frac{\mathrm{d}^n }{\mathrm{d} x^n} \frac{x}{e^x-1} \] These numbers are called the Bernoulli Numbers, symbolized as \(B_n\) and they are easily found to be all rational. Thus, by rearranging, we find: \[ \zeta(2n)=\frac{\pi^{2n} 2^{2n-1} \left | B_{2n} \right |} {(2n)!} \] Thus, all the even values of the zeta function can be found by finding the appropriate Bernoulli number, which itself can be found by simple differentiation. Moreover, we see that all the values are rational multiples of the corresponding power of pi. Specifically, we find that: \[ \zeta(2)=\frac{\pi^2}{6} \]\[ \zeta(4)=\frac{\pi^4}{90} \]\[ \zeta(6)=\frac{\pi^6}{945} \]\[ \zeta(8)=\frac{\pi^8}{9450} \]\[ \zeta(10)=\frac{\pi^{10}}{93555} \]

Tuesday, September 29, 2015

Liars, Logic, and Information Theory

One of the most common types of logic puzzles involves two tribes, one that always tells the truth and another that always tells lies. There are many versions and variations of puzzles with this setup, but we can develop a method of approach that will work generally. The two main versions fall into 2 categories:
  1. Identification: we have a group of N people with some known possible set of identifications, and we ask questions to determine what tribe each is from.
  2. Information: We have a group of N people with some known possible set of identifications, and we ask them questions to determine M bits of information (independent yes/no questions). We do not need to identify the tribe of each person. For concreteness, we will take the bits to be 1s or 0s (i.e. we want to find whether the bit is 1 or 0)
The questions must be asked individually, and must be yes/no questions. We assume that the persons asked know all information relevant to the puzzle and understand the questions, supposing they are comprehensible.

A Brief Primer in some Information concepts

The fundamental unit of information is the bit. A single bit answers one yes/no question. If both answers are equally likely, the answer gives the most information, as otherwise you could guess the answer more easily. (In fact, the formula for the effective number of bits, if the chance of a "yes" answer is p, is given by: \(-p\log_2(p)-(1-p)\log_2(1-p) \approx 4p(1-p)\)). If there are \(2^N\) equally possible options, it takes N bits of information to narrow it down to one: in general, an additional bit halves the possibility space. If there are M possibilities, and \(2^{N-1}< M \leq 2^N\), then N bits of information are required. From a deterministic source--that is, a source with known, predictable behavior--one answer to one yes/no question yields at most 1 bit of information, and exactly one if both answers are equally probable. In general, if we discover M bits of information with N questions, if we only want a smaller number of bits, we will need fewer questions.

We will discuss some specific cases, describing some general methods of approaching the problem. We will forgo trivial cases, like asking a 1-bit question to someone of a known tribe, or identifying a person from an unknown tribe.

Information: One Person of Unknown Tribe, One Bit

Clearly we must ask at least one question, but can we determine it in exactly one question? Indeed we can. Our goal is to formulate a question such that, regardless of whether the person is a liar or a truther, the answer will correspond to the truth. We thus construct the following table, and look for a question such which would produce the listed "real answers" (answers taking into account whether the teller is a liar or truther).

Bit Value Identity Given Answer Honest Answer
1 Truther Yes Yes
1 Liar Yes No
0 Truther No No
0 Liar No Yes

The simplest way to construct such a question is just to ask one that corresponds to affirmative answers. In this case, the most easily constructed question is
Is one of the following true: the bit is 1 and you are a truther, the bit is 0 and you are a liar?
Regardless of whether the person asked is a truther or liar, the answer will always be "yes" if the bit is 1 and "no" if it is 0. The question may be found to simplify to something more natural sounding, but the question as given is sufficient. Moreover, if we require N bits of information, we can achieve such in exactly N questions. This will be our general approach. We will make a table in which the given answer corresponds to the information we seek. We will then formulate a question such as to produce the desired answer. This can be done most easily by forming a disjunction of the answers producing an affirmative.

Identification: One Person of Unknown Tribe, Unknown Language

In this case, the tribespeople have a language different than yours. They can understand your questions but reply in a way you can't understand. We will assume that you know the words for "yes" and "no" are "da" and "ja", but you don't know which corresponds to which. If you do not even know what the possible words for "yes" and "no" are, you can find this out with one additional question, merely by asking anything and then knowing that the response either means "yes" or "no". The question is then whether you can identify the tribe of the person, and in as few questions as possible. Given that we only seek one bit of information (the person is either a truther or a liar), we will attempt to do so with a single question. We will look for a question such that the response corresponds to the identity of the person. For concreteness, we will take "Da" to be indicative of a truther, "Ja" of a liar.

Identity Translation of "Da" Given Answer Translated Answer Honest Answer
Truther Yes Da Yes Yes
Truther No Da No No
Liar Yes Ja No Yes
Liar No Ja Yes No

Again, the simplest way to construct such a question is just to ask one that corresponds to affirmative answers, by a simple disjunction. In this case, the honest answer is "yes" exactly when "Da" means "yes". So we simply ask:
Does "Da" mean "yes"?
A truther will always answer "Da", and a liar will always answer "Ja". Note that we cannot determine what "Da" actually means from this question, and this accords with information theory concepts. We can only get one bit of information from one question. If we wanted to identify what "Da" meant without knowing the identiy, by a similar method we would find that the following question achieves that:
Is one of the following true: "Da" means "yes" and you are a truther, "Da" means "no" and you are a liar?
If the answer is "Da", "Da" means "yes".

Information: One Person of Unknown Tribe, Unknown Language, One Bit

This case is much like the preceding one, except we require neither then meaning of "Da" nor the identity of the person. As we need only one bit of information, we require at least one question. We will show how to do it in exactly one question. As before, we construct a table, but this time with three independent variables: the value of the bit, the identity of the person, and the meaning of "Da".

Bit Value Identity Translation of "Da" Given Answer Translated Answer Honest Answer
1 Truther Yes Da Yes Yes
1 Truther No Da No No
1 Liar Yes Da Yes No
1 Liar No Da No Yes
0 Truther Yes Ja No No
0 Truther No Ja Yes Yes
0 Liar Yes Ja No Yes
0 Liar No Ja Yes No

By the same method, the easiest (though not simplest) question to ask is:
Is one of the following true: you are a truther and "Da" means "yes" and the bit is 1, you are a liar and "Da" means "no" and the bit is 1, you are a truther and "Da" means "no" and the bit is 0, you are a liar and "Da" means "yes" and the bit is 0 ?
A simpler way would be to ask:
Is an odd number of the following true: the bit is 1, you are a truther, "Da" means "yes"?
In general, we can see that we can always get exactly one bit of information from one question, given certain other constraints. Not knowing the language or the identity of the person asked are no hindrances to getting information. Also, if we have \(2^M\) people from potentially different tribes who speak the same unknown language, or even if we only know the potential words for "yes" and "no" for one of their languages, we can still identify all of them in exactly M questions just by asking the one person M questions.

Identification: Truther, Liar, and Unhelpful in Unknown Order.

In this case, we have three people known to be some permutation of truther, liar and a third kind we call unhelpful. The unhelpful is a third type of tribesperson who answers so as to be maximally unhelpful. That is, he will answer so as to prevent you from getting information. The goal is to identify him regardless, as well as the other two. The first question is whether we can identify the three, and then, if it is possible, to do so in as few questions as we can. As there are 6 possible orderings, we will need 3 bits of information, corresponding to at least 3 questions. We must ask each person at least one question, as only asking 2 or fewer risks only asking the unhelpful, who provides no information. However, if we ask each of them one question, we only get two bits of information, as the unhelpful provides none. Thus we must ask at least 4 questions, with the 4th question being asked of one of the non-unhelpfuls.

In fact there is a way to do this. We ask a question which the truther and the liar will answer differently. We then take the odd one out among the three, who is guaranteed to be either a truther or a liar (in fact, the way he answers will decide which) and then ask him for one more bit of information to identify one of the others, which we have already described how to do. So, for instance, we can ask all three "Do you exist?" (or, if the language is unknown "Does 'Da' mean 'yes'?"). And then concoct a question to ask the odd one out to get the final requisite bit (left as an exercise for the reader). Thus we can achieve it in exactly 4 questions. In fact, for the first three questions, we only get 2/3 of a bit of information per answer, as, for each answer, we get 1 bit with 2/3 probability.

Thursday, July 16, 2015

Preliminary Matters Relating to Morality


Obligations and Duties

We wish to characterize obligations and duties (taken as essentially synonymous) in a more definite way than is typically used, specifically, as pertains to morality, as typically conceived. Many uses of the term have no relation to morality whatsoever. For instance, a legal obligation is merely something demanded of someone by the governing laws which, should he fail to fulfill it, would result in some sort of penalty. If the penalty were absent, the so-called obligation would be rendered irrelevant, as it would be merely up to the disposition of the one obligated whether to fulfill it or not, and no enforcement could be possible. Thus, legal obligations are no more than demands with enforced consequences: it is demanded of the person to do something, and, failing to do so, punishment will result. Another sort is a social or societal obligation. In this case, there is a certain expectation to behave in a certain way, and failing to behave results in some loss of social esteem, stigmatization, shunning, demotion, reduced access to social assets (like favors or company), etc.

However, clearly moral obligations are not of either of these sorts: with a moral obligation, even if no punishment or repercussion would be visited from without, there would still be the internal drive to act. Moreover, even if it were demanded of us by law to act immorally, or our society expected us to do so, that would have no moral bearing on whether we should so act. The missing ingredient, then, if an obligation or duty is to be different from a mere demand or expectation, with or without penalties for transgression, is the drive from within: there is no duty without a sense of dutifulness. If one feels no obligation to do a thing, then one simply has no such obligation.
"[D]uty has no hold on [a man] unless he desires to be dutiful."
-B. Russell

Truth and Objectivity

The simplest way to analyze objective truth is to begin by looking at statements already agreed to be objectively true: (A) "If X is a triangle, X has three sides", (B) "Horses exist". How is it that these statements are objectively true? Surely it is that, when we interpret them correctly, we get a claim about the world that accurately describes it. The truth value of the propositions will depend on how we interpret the terms. For instance, if we interpret the term "triangle" (merely a word: a set of symbols) to mean what we normally mean by the word "square", then (A) would be false. It is only when the semantic content of the terms is specified (as well as the way in which the content of the sentence is to be educed from the terms, e.g. grammar) that the sentence or proposition can have an objective truth value. When the terms are left unspecified, or determined on a subject by subject basis, then the proposition is subjective. Thus, all that is needed to make a system as objective as, say, geometry, is to have the terms well-defined, be it a moral system or any other.

Voluntary Action

We will define a voluntary action as one a person does as a result of a choice they make. Involuntary actions are basically irrelevant to considerations, in any practical sense, except insofar as they can be changed via voluntary actions. It is then also clear that voluntary actions are the only ones that can be considered in any plausible morality: someone is not moral or immoral based on actions they can't control. This is often summarized in the dictum "ought implies can": regardless of what, exactly, "ought" is taken to mean in the end, it must imply that the thing one ought to do is one that one can do (though "can" might itself need some further analysis). Furthermore, "ought" seems to imply also "can not", as in "can do otherwise". If one can't help but do something, it cannot be meaningfully said that she ought to do it. Thus, oughts imply a choice, where the alternatives can each be acted on

In any choice between alternatives, choosing one must mean that one wanted that option, for if one wanted a different option, she would have chosen it. "Want" here is to be taken in a more general sense than it may often be. You may want to go with your friends to the movies, but do homework instead, and why? Because though you may prefer movies to homework in general, you prefer doing well in a class at the expense of spending less time with your friends to spending more time with your friends and doing worse in the class. In the greater context, you prefer doing homework to going to the movies in this case, as opposed to generally preferring movies to homework with no context. Thus, all voluntary choices are the result of the person doing what she wants: everyone always does what they want most, as far as they can. A clear corollary of this is that to change voluntary behavior, one must appeal to what the person in question wants or cares about. This is abundantly clear in experience as well. Moreover, the converse is also manifestly true: if something affected someone's voluntary behavior, it must have appealed to what she wanted or cared about. For, as everyone does what they want most, as far as they can, what affects their voluntary actions must have appealed to what they want or cared about.

Sunday, February 22, 2015

Valuation Theory


Valuation Systems

A valuation system (VS) is any system by which value is assigned to things. That is, the way in which terms like "better", and "worse", "good", and "bad", are given meaning or are understood. For example, in choosing a hinge for a door, one system of saying "hinge X is better than hinge Y" is to consider price (cheaper being better, for instance), or resistance to rust, or weight, or color, or size, etc. After all, there is no unqualified way to say "hinge A is better than hinge B", and any statement that does not explicitly state the way in which hinge A is deemed better than hinge B will have some implicit VS.

All VSs have a domain, which is the set of all things which can be valued by that VS. The VS used to compare hinges won't be able to compare the value of microprocessors, or political parties, or cake recipes. It is important to keep in mind the domain of a VS when discussing it. We will denote the domain of VS X as DX.

Types of Valuation Systems

There are two general sorts of valuation systems:

  • Comparative Valuation Systems (CVSs): Determines only the ranking of value for the elements of a given, countable set. If X is a CVS and X values A above B, we will write that as \( (A>B)_X\), which we can read as "A is better than B, according to X". Note that CVSs don't have any notion of "good" or "bad", but only "better" and "worse", and possibly "best", if there is some element better than the rest.

    • A subset of CVSs are Bi-comparative VSs (bCVSs, or C2VSs), which only rank sets with exactly two elements, either with one better and one worse, or with both equal. If the bCVS has the additional property of being transitive, then the system can be used to impose a partial ordering on the elements of its domain.

  • Evaluative Valuation Systems (EVSs): Determines the plain value of every element in its domain, like a function. Namely, we can symbolize "the value of A, according to EVS X" as \(V_X(A)\). Without loss of generality, we can take the values assigned to be real numbers. If only order is important, we can take the range to be the numbers in the interval \([-1,1]\). Note that EVSs can have a notion of "good" and "bad", in that we can define "A is bad, according to EVS X" as \(V_X(A)< c \), for some number c, which we can take to be 0. Similar statements can be similarly defined. To keep notation consistent, we will write \((A>B)_X\) iff \(V_X(A)>V_X(B)\), for some EVS X.

Indifferent Extensions

We can also define the indifferent extension of a valuation system X with domain DX as the valuation system that is identical to X for any elements in DX, and is indifferent to all other things. More exactly, we can define it for the cases of CVSs and EVSs as follows:
  • CVSs:
    Let \(X\) be a CVS with domain \(D_X\). The CVS \(X'\) is the indifferent extension of \(X\), such that, for any \( a,b \notin D_X\) and \(c \in D_X\), \((a< c )_{X'} \), \((a=b)_{X'}\).

  • EVSs:
    Let \(X\) be an EVS with domain \(D_X\). The EVS \(X'\) is the indifferent extension of \(X\), such that, for any \( a\notin D_X\), \(V_{X'}(a)=0\).

Optimal Elements

We can also give meaning to statements like "t is the best element in set S, according to X", in two senses. We can say that t is the optimal element of S according to VS X if, for every element s of S such that \(s \neq t\), then \( (t > s)_X \). We can say that t is an equi-optimal element of S according to VS X if, for every element s of S, \( (t \geq s)_X \). We can also say that "t is the best element in set S, according to set A", for some set A of VSs, if, for each VS X in A, s is the optimal element in X. We might also stipulate that for every VS in A there is an optimal element in S. Similarly for equi-optimal.

If we want to say something like "t is the best element in S" without qualifying it by a VS, it must be the case that all valuation systems agree (or perhaps there is some "best VS" which would deem s optimal, but we will get to that later). Namely, we say that s is the universo-optimal (UO) element of S if, for every VS X for which there is an optimal element in S, s is the optimal element of X. We also can say that s is a universo-equi-optimal(UEO) element of S if, for every VS X for which there is an equi-optimal element in S, s is an equi-optimal element of X. Note that for there to be a universo-optimal element, all relevant VSs must agree: if there is even one VS for which there is a different optimal element than another, then there is no universo-optimal element in S.

Meta-Valuation Systems, Optimal Valuation Systems, and Recommendation

We can also have VSs whose domain includes some subset of the set of all VSs. We can call these meta-valuation systems (MVS). We can also define the set of totally meta-VSs (TMVS), which is the set of all VSs whose domain includes the set of all VSs.
Now, if there is to be some VS that can be called "the best VS", it must be the case that it is UO (or at least UEO) in the set of all VSs. Thus we define:
a VS X is the objectively best VS iff, for ever VS Y in the set TMVSs for which there is an optimal element, X is the optimal element of Y in the set of all VSs.
However, it seems not hard to very strongly suggest if not prove that there is no such VS, for all it takes are two TMVSs with optimal elements that disagree as to this optimal element, and this seems very easy to construct. Thus there simply is no such objectively best VS. We can call this the Universo-Optimality Absence Theorem.

Also, we can say that VS A recommends VS B if \((B>A)_A\). We denote this by \(A \rightarrow B\). Clearly A must be a MVS, as it includes the VS B in its domain. The relevance is that, if we hold to VS A, and A recommends B, then we should discard A and take up B instead. We may have some issues if A recommends multiple VSs, but then the solution would then be to follow the recommendation that is outranks the rest. For example, if \(A \rightarrow B\) and \(A \rightarrow C\), and \((B>C)_A\), then we should choose B, rather than C. However, we will say that a VS A is a consistent recommender if it is the case that if \(A \rightarrow B\), and \(A \rightarrow C\), and \((B>C)_A\), then \(C \rightarrow B\), and it is not the case that \(B \rightarrow C\).