Showing posts with label Math. Show all posts

Monday, September 14, 2020

A Fairly Rigorous Derivation of Euler's Formula

Exponential Functions

The general exponential function \(b^x\) for base \(b > 0 \) and real number \(x\) (*) is defined as the function that satisfies the conditions \[ b^x > 0 \\ b^x\cdot b^y=b^{x+y} \\ b^1=b \] It follows that: \[ \prod_{k=1}^{N}b^{x_k}=b^{\sum_{k=1}^{N}x_k} \\ b^0=1 \\ b^{-x}=1/b^x \\ (ab)^x=a^x b^x \\ b^{m/n}=\sqrt[n]{b^m}=\left ( \sqrt[n]{b} \right )^m \\ b^x=\underset{n \to \infty}{\lim}b^{\left \lfloor xn \right \rfloor/n} \]

(*) We will extend this definition to complex \(x\), for which, we will find, that \(b^x>0\) may not hold. Moreover, there is some ambiguity for non-integer \(x\), as, for example, \(4^{1/2}\) may be \(2\) or \(-2\).

Some Exponential Inequalities

Let \(b>0\). By a simple argument we find: \[ 0 \leq \left ( b^{(y-x)/2}-1 \right )^2 \\ b^{(y-x)/2} \leq \frac{b^{(y-x)}+1}{2} \\ b^xb^{(y-x)/2}\leq b^x\left (\frac{b^{y-x}+1}{2} \right ) \\ b^{(y+x)/2} \leq \tfrac{1}{2}b^y+\tfrac{1}{2}b^x \] Suppose that \(0 \leq \alpha,\beta \leq 1\) and that \[ b^\alpha\leq\alpha b + (1-\alpha) \\ b^\beta\leq\beta b + (1-\beta) \] Then \[ b^{(\alpha+\beta)/2} \leq \tfrac{1}{2}b^\alpha+\tfrac{1}{2}b^\beta \\ b^{(\alpha+\beta)/2} \leq \tfrac{1}{2}(\alpha b + (1-\alpha))+\tfrac{1}{2} (\beta b + (1-\beta)) \\ b^{(\alpha+\beta)/2} \leq \tfrac{\alpha+\beta}{2}b+(1-\tfrac{\alpha+\beta}{2}) \] As \(b^0=1\leq 0 \cdot b + (1-0)=1\), and \(b^1=b\leq 1 \cdot b + (1-1)=b\), it follows that, for all dyadic fractions of the form \(x=M/2^N\) for some whole numbers M and N with \(0 \leq M \leq 2^N\): \[ b^x \leq x b + (1-x) \] Moreover, as all real numbers \(0 \leq x \leq 1\) can be written as the limit \[ x=\underset{N \to \infty}{\lim} \frac{\left \lfloor x \cdot 2^N \right \rfloor}{2^N} \] It follows that \[ b^x \leq x b + (1-x) \] Holds for all real x in the interval \( [0,1]\) for all \(b>0\), with equality holding only at the extremes. It follows that \(2^x < 1+x\). Additionally, \((1/2)^x < 1-x/2\). We may then make the following argument: for \(0 < x < 1\) \[ x^2 > 0 \\ 1-x^2=(1+x)(1-x) < 1 \\ 1+x < \frac{1}{1-x} \\ (1/2)^x < 1-x/2 \\ 2^x > \frac{1}{1-x/2} \\ 2^x >{1+x/2} \\ 4^x >(1+x/2)^2 > 1+x \] Thus, we have \(2^x < 1+x < 4^x\).

Derivatives and Derivatives of Exponentials

The definition of a derivative of a function is: \[ \frac{\mathrm{d} }{\mathrm{d} x}f(x)=f'(x) \triangleq \underset{h \to 0}{\lim}\frac{f(x+h)-f(x)}{h} \] Thus, for an exponential, the derivative would be given by: \[ \frac{\mathrm{d} }{\mathrm{d} x}b^x\triangleq \underset{h \to 0}{\lim}\frac{b^{x+h}-b^x}{h}=b^x\underset{h \to 0}{\lim}\frac{b^{h}-1}{h}=b^x L(b) \] Where \(L(b)=\underset{h \to 0}{\lim}\frac{b^{h}-1}{h}\), provided this limit exists. This limit can be proven to exist as follows: for \(0 < q < 1 \), and \(0 < x\), for \(y = q x < x\) by the derived inequality \[ (b^x)^q = b^{qx} < (b^x -1) q +1 \\ \frac{b^{qx}-1}{qx} < \frac{(b^x -1)}{x} \\ \frac{b^{y}-1}{y} < \frac{(b^x -1)}{x} \] Thus, the limit is monotonically decreasing (from the right, increasing from the left). Moreover, the limit is bounded from below and above (for |x| < 1), as \[ 1-\tfrac{1}{b} < \frac{(b^x -1)}{x} < b-1 \] Thus, the limit exists, and so \(b^x\) is everywhere differentiable. As exponentials with \(b > 0\) are eveywhere differentiable and thus continuous, we may take the limit for \(h > 0\). From the above inequalities, we have \[ L(2)=\underset{h \to 0}{\lim}\frac{2^{h}-1}{h} < \underset{h \to 0}{\lim}\frac{1+h-1}{h}=1 \\ L(4)=\underset{h \to 0}{\lim}\frac{4^{h}-1}{h} > \underset{h \to 0}{\lim}\frac{1+h-1}{h}=1 \] As the limits are decreasing and both are bounded below ( \(L(2) > 1/2, \; L(4) > 1 \)), it follows that both limits converge. Thus \(L(2) < 1 < L(4) \). As L is clearly continous, by the intermedate value theorem, it follows that there is some real number \(2 < e < 4 \) such that \( L(e)=1 \). Let us define this number \(e\) to be that number that satisfies \[L(e)=\underset{h \to 0}{\lim}\frac{e^{h}-1}{h}=1\] This implies that \[ \frac{\mathrm{d} }{\mathrm{d} x}e^x=e^x \] This is a defining feature of the number \(e\). We may also notice that, for any real x, if \(h \to 0\) then \(xh \to 0\). Thus \[ \underset{h \to 0}{\lim}\frac{e^{xh}-1}{xh}=1 \\ \underset{h \to 0}{\lim}\frac{e^{xh}-1}{h}=x \] Thus \(L(e^x)=x\). This implies that, by definition, \(L(x)=\log_e (x)=\ln (x)\). Moreover, given the chain rule \(\frac{\mathrm{d} }{\mathrm{d} x}f(g(x))=f'(g(x))g'(x)\), we find \[ \frac{\mathrm{d} }{\mathrm{d} x} L(e^x)=L'(e^x)e^x=1 \] And thus \(L'(x)=1/x\). This is a very helpful result. For example, by rewriting and using the chain rule, we find: \[ \frac{\mathrm{d} }{\mathrm{d} x} x^a=\frac{\mathrm{d} }{\mathrm{d} x} e^{aL(x)}=e^{aL(x)} \frac{a}{x}=a x^{a-1} \] A result that is otherwise difficult to establish in the general case. We may write the limit derived above in an equivalent way as \[ \underset{n \to \infty}{\lim}n \cdot (e^{x/n}-1)=x \] Which directly implies that \[ e^x=\underset{n \to \infty}{\lim} \left ( 1+\frac{x}{n} \right )^n \] Let us expand the above expression using the binomial theorem: \[ e^x=\underset{n \to \infty}{\lim} \left ( 1+\frac{x}{n} \right )^n \\ e^x= \underset{n \to \infty}{\lim}\sum_{k=0}^{n}\binom{n}{k}\left ( \frac{x}{n} \right )^k \\ e^x= \underset{n \to \infty}{\lim}1+\sum_{k=1}^{n}\frac{x^k}{k!}\prod_{j=1}^{k}\left ( 1-\frac{j-1}{n} \right ) \] Clearly, in the limit, all the factors in the products from 1 to k go to 1. Thus, we find: \[ e^x=1+\sum_{k=1}^{\infty}\frac{x^k}{k!} \] It can be checked that this series converges for all real x by the ratio test. This is an extremely useful formula, and can be taken to be a more robust and easy-to-work-with definition for \(e^x=\exp(x)\). Note this formula directly implies that: \[ e=\sum_{k=0}^{n}\frac{1}{k!} \] Note that, as a verification, we can check that \(e^0=1=1+\sum_{k=1}^{n}\frac{0^k}{k!}\) and \[ \frac{\mathrm{d} }{\mathrm{d} x} e^x=e^x=1+\sum_{k=2}^{n}\frac{kx^{k-1}}{k!}=1+\sum_{k=2}^{n}\frac{x^{k-1}}{(k-1)!}=1+\sum_{k=1}^{n}\frac{x^{k}}{k!} \] Which verifies the differentiation formula.

Trigonometric Functions and Inequalities

Figure 1

The definitions of the basic trigonometric functions are given by Figure 1. The curve between points C and D is the set of points equidistant from A between the line segments \(AC\) and \(AD\), i.e. a circular arc. Let us call the length of this curve \(L\). Then the standad definition for the basic trigonometic functions is given by: \[ \theta=\frac{L}{\overline{AD}} \\ \\ \sin(\theta)\triangleq \frac{\overline{BD}}{\overline{AD}}, \; \;\; \cos(\theta)\triangleq\frac{\overline{AB}}{\overline{AD}}, \; \;\; \tan(\theta)\triangleq\frac{\overline{BD}}{\overline{AB}} \]

Figure 2

Using these, let us look at figure 2. This figure will serve to evaluate bounds on the trigonometric functions for small angles (\(0 < \theta < 1 \)) Let us denote the length of the curve \(BE\), which is a circular arc, by \(L\). It is clear that \[ \overline{BD} < L < \overline{BF} \] (An alternative way to demonstrate this is through areas, as triangle ABD is a strict subset of sector ABE which is a strict subset of triangle ABF.) Using the definitions above, and defining \(\theta=L/\overline{AB}\), we have: \[ \frac{\overline{BD}}{\overline{AB}}=\sin(\theta) < \frac{L}{\overline{AB}}=\theta < \frac{\overline{BF}}{{\overline{AB}}}=\tan(\theta)=\frac{\sin(\theta)}{\cos(\theta)} \] And so it follows that \[ \theta\cdot\cos(\theta) < \sin(\theta) <\theta \] It follows from the Pythagorean theorem that \[ \sin(\theta)^2+\cos(\theta)^2=1 \] From which we find: \[ \cos(\theta)^2 > 1-\theta^2 > (1-\theta^2)^2 \] The last inequality following from the fact that \(0 < \theta < 1 \). We thus find \[ 1-\theta^2 < \cos(\theta) < 1 \\ \theta-\theta^3 < \sin(\theta) <\theta \]

Figure 3

Let us now find the summation formulas for sine and cosine. These are easily found using the construction in figure 3. \[ RB=QA \;\;\;\;\;\;\;\;\;\; RQ=BA \] \[ \frac{RQ}{PQ}=\frac{QA}{OQ}=\sin(\alpha) \;\;\;\;\;\;\;\; \frac{PR}{PQ}=\frac{OA}{OQ}=\cos(\alpha) \] \[ \frac{PQ}{OP}=\sin(\beta) \;\;\;\;\;\;\;\; \frac{OQ}{OP}=\cos(\beta) \] \[ \frac{PB}{OP}=\sin(\alpha+\beta) \;\;\;\;\;\;\;\; \frac{OB}{OP}=\cos(\alpha+\beta) \] \[ PB=PR+RB=\frac{OA}{OQ}PQ+QA \] \[ \frac{PB}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OQ}\frac{OQ}{OP} \] \[ \sin(\alpha+\beta)=\cos(\alpha)\sin(\beta)+\sin(\alpha)\cos(\beta) \] \[ OB=OA-BA=\frac{OA}{OQ}OQ-\frac{BA}{PQ}PQ \] \[ \frac{OB}{OP}=\frac{OA}{OQ}\frac{OQ}{OP}-\frac{BA}{PQ}\frac{PQ}{OP} \] \[ \cos(\alpha+\beta)=\cos(\alpha)\cos(\beta)-\sin(\alpha)\sin(\beta) \]

Complex Numbers

Complex numbers can be defined and used in the usual way, namely, as algebraic objects with the symbol \(i\) having the property that \(i^2=-1\). Additionally, we can define the norm of a complex number as \(|a+bi|^2=a^2+b^2\). Some simple theorems we will make use of: \[ (a+bi)\cdot (c+di)=(ac-bd)+i(ad+bc) \\ |(a+bi)\cdot (c+di)|=|a+bi|\cdot|c+di| \\ \frac{1}{a+bi}=\frac{a-bi}{a^2+b^2} \] Let us define the function \[ \mathrm{cis}(x)=\cos(x)+i\sin(x) \] This function has the property that \[ \mathrm{cis}(\alpha)\cdot \mathrm{cis}(\beta)= \left (\cos(\alpha)+i\sin(\alpha) \right ) \cdot \left(\cos(\beta)+i\sin(\beta) \right ) \\ \mathrm{cis}(\alpha)\cdot \mathrm{cis}(\beta)= \left (\cos(\alpha)\cos(\beta)-\sin(\alpha)\sin(\beta) \right ) + i\left(\sin(\alpha)\cos(\beta)+\sin(\beta)\cos(\alpha) \right ) \\ \mathrm{cis}(\alpha)\cdot \mathrm{cis}(\beta)= \cos(\alpha+\beta) + i\sin(\alpha+\beta) \] And thus \(\mathrm{cis}(\alpha)\cdot \mathrm{cis}(\beta)=\mathrm{cis}(\alpha+\beta)\). It follows by induction and the definition of the exponential that, for any natural number \(n\): \[ (\mathrm{cis}(x))^n=\mathrm{cis}(nx) \] And, thus \[ \mathrm{cis}(x)=(\mathrm{cis}(x/n))^n \] Importantly, this is true in the limit of large \(n\). We can always pick \(n\) large enough to make \(x/n\) as small as needed. Thus, we can use the inequalities derived above, namely: \[ \mathrm{cis}\left ( \frac{x}{n} \right )=1+i\frac{x}{n}-\frac{x^2}{n^2}g(x) \] Let \(g(x)=g_r(x)+i g_i(x)\) where \(g_r, g_i\) are real. Then \(0 < g_r(x) < 1\) and \(0 < g_i(x) < \tfrac{x}{n} \). Clearly, then, for \(n > |x|\), \[ |g(x)|^2 < 1+\frac{x^2}{n^2} < 2 \] And so \(|g(x)| < 2\). Also important to note is that a generic complex number can be written as \[ z=a+bi=r \cdot\mathrm{cis}(\theta) \] Where \(r=|z|\) and \(\theta\) satisfies \(r \cos(\theta)=a, \;\;\; r \sin(\theta)=b\). From the above geometric argument, assuming \(a,b > 0\) we have \[ \sin(\theta)=\frac{b}{|z|} < \theta < \tan(\theta)=\frac{b}{a} \] From the fact that \((\mathrm{cis}(x))^n=\mathrm{cis}(nx)\), we find that \[ z^n=(a+bi)^n=r^n \cdot\mathrm{cis}(n\theta) \]

A Lemma for a Family of Limits

From the above we have, for \(0 < x < 1\): \[ 2^{x} < 1+x < 4^{x} \] Let \(x=B/n^2 \), for \(B>0\) and sufficiently large \(n\). Then \[ 2^{B/n^2} < 1+\frac{B}{n^2} < 4^{B/n^2} \\ 2^{B/n} < \left (1+\frac{B}{n^2} \right )^n < 4^{B/n} \] In the limit of large \(n\), \(B/n \to 0\). As \(2^0=4^0=1\), we have \[ \underset{n \to \infty}{\lim} \left (1+\frac{B}{n^2} \right )^n=1 \] A similar argument applies to the case that \(B < 0\). In fact, suppose B is complex, then: \[ \underset{n \to \infty}{\lim} \left (1+\frac{B}{n^2} \right )^n =\underset{n \to \infty}{\lim} \left |1+\frac{B}{n^2} \right |^n \mathrm{cis}\left ( n\theta \right ) \] Where \[ \frac{1+\frac{B}{n^2}}{\left | 1+\frac{B}{n^2} \right |}=\mathrm{cis}(\theta) \] For sufficiently large \(n\), the real part is always positive. It's clear that \(-|B|/n^2 \leq b \leq |B|/n^2\), and so \(-\frac{|B|}{n^2} \leq \theta \leq \frac{|B|}{n^2}\). It clearly follows that \(-\frac{|B|}{n} \leq n\theta \leq \frac{|B|}{n}\). Thus, i nthe limit of large n, \(n\theta \to 0\), and so \(\mathrm{cis}(n\theta)\to 1\). Therefore, for all complex \(B\): \[ \underset{n \to \infty}{\lim} \left (1+\frac{B}{n^2} \right )^n=1 \] Finally, let us note that \[ 1+\frac{A}{n}+\frac{B}{n^2}=\left ( 1+\frac{A}{n} \right )\frac{1+\frac{A}{n}+\frac{B}{n^2}}{1+\frac{A}{n}}= \left ( 1+\frac{A}{n} \right )\left ( 1+\frac{1}{n^2}\frac{B}{1+\frac{A}{n}} \right ) \] For sufficiently large \(n\), we have, then \[ \left |\frac{B}{1+\frac{A}{n}} \right | < 2|B| \] It follows from the above that \[ \underset{n \to \infty}{\lim}\left (1+\frac{A}{n}+\frac{B}{n^2} \right )^n=\underset{n \to \infty}{\lim}\left ( 1+\frac{A}{n} \right )^n \] Clearly this applies to any \(B(n)\) such that, there is some M such that, for \(n>M\), \(|B(n)| < K\) for some real \(K>0\).

Euler's Formula and Identity

We recall the following from a previous section: \[ \mathrm{cis}(x)=(\mathrm{cis}(x/n))^n \] And, for sufficiently large \(n\): \[ \mathrm{cis}\left ( \frac{x}{n} \right )=1+i\frac{x}{n}-\frac{x^2}{n^2}g(x) \] Where \(|g(x)| < 2\). Combining yields: \[ \mathrm{cis}(x)=\left ( 1+i\frac{x}{n}-\frac{x^2}{n^2}g(x) \right )^n \] Equality must hold in the limit of large \(n\), and so, using the above lemma, we have: \[ \mathrm{cis}(x)=\underset{n \to \infty}{\lim}\left ( 1+i\frac{x}{n} \right )^n \] Using the limit definition of \(e^x\), this yields, at last, Euler's celebrated formula: \[ e^{ix}=\cos(x)+i\sin(x) \] This has the special case, by the definition of \(\pi\) and the trigonometric functions: \[ e^{i\pi}+1=0 \] Using the power series expansion for the exponential function, and equating realand imaginary parts yields the two power series expansions: \[ \cos(x)=1+\sum_{k=1}^{\infty}\frac{(-x^2)^k}{(2k)!} \\ \sin(x)=\sum_{k=0}^{\infty}\frac{(-1)^k x^{2k+1}}{(2k+1)!} \]

Thursday, February 22, 2018

A Theorem about Circles and a Volumizing Algorithm

A Circle Theorem

Take a circle of radius \(R\). Select a point \(A\) inside it a distance \(a\) from the center, with \(a < R\). From \(A\), construct \(N>1\) line segments starting from A and touching the circle, segment k touching the circle at \(P_k\), such that if \(a-b\equiv \pm 1 \mod N\), then \(\measuredangle P_aAP_b=2\pi/N\), that is all the segments are equally-angularly-spaced. Let \(d_k=\overline{AP_k}\). Then \[ \prod_{k=1}^{N}d_k=\prod_{k=1}^{N}\left ( a\cos \left ( \theta_0+\frac{2 k \pi}{N} \right )+\sqrt{R^2-a^2+a^2\cos^2 \left ( \theta_0+\frac{2 k \pi}{N} \right )} \right ) \\ \prod_{k=1}^{N}d_k=\prod_{k=1}^{N}\left (\sqrt{R^2-a^2}\exp\left ( \sinh^{-1}\left (\frac{a}{\sqrt{R^2-a^2}}\sin \left ( \theta'_0+\frac{2 k \pi}{N} \right ) \right ) \right ) \right ) \] Therefore \[ \sqrt[N]{\prod_{k=1}^{N}d_k}=\sqrt{R^2-a^2}\exp \left (\frac{1}{N}\sum_{k=1}^{N} \sinh^{-1}\left (\frac{a}{\sqrt{R^2-a^2}}\sin \left ( \theta'_0+\frac{2 k \pi}{N} \right ) \right ) \right ) \] It follows that, for N even, as the summation will cancel in every term, \[ \sqrt[N]{\prod_{k=1}^{N}d_k}=\sqrt{R^2-a^2} \] This also holds asymptotically, as the error approaches zero. it is generally not true for N odd.

Note that this much more widely generalizes the well-known geometric mean theorem. This can be seen as a consequence of the power of a point theorem.

A pleasant interpretation of this is that if we take a diametric cross-section of a sphere and choose a point on that disk, the height of the sphere above that point is the geometric mean of the legs of any \(2N>1\) equiangular, planar, stellar net connecting that point to the boundary of the disk.

A Related Volumizing Algorithm

This theorem suggests an algorithm for producing a 3D volume given a closed 2D boundary shape. If we assume the 2D shape is of a diametric cross-section, we simply apply the method detailed above to produce the height above that point. That is, for a given point inside the shape, we take an N-leg equiangular stellar net emanating from that point to the boundary of the shape. The height of the surface at that point is then the geometric mean of the N legs of that net.

This method ensures that circular shapes produce spherical surfaces. However, if N is low, for less regular boundary shapes, the resulting surface may be quite lumpy or sensitive to how the angles of each net are chosen. One solution, then, is simply to make N large enough. However, this may end up being computationally expensive.

In theory, it may be possible to find the asymptotic value: find all parts of the boundary shape visible from the given point, and find the integral of the log of the distance, sweeping over the angle. If the boundary is a polygon, this involves evaluating (or approximating) integrals of the form \[ \int\ln\left ( \sin(x) \right)dx \] Which have no general closed form in terms of elementary functions. However, we can evaluate certain cases. One easy example is that of an infinite corridor formed from two parallel lines. We find that the height profile is double that of a circular cylinder. It may be desirable, then, to determine another function to multiply by which will halve the heights of corridors but leave hemispheres undisturbed.

Below we give some visual examples of the results of the algorithm. The original 2D shapes are shown in red.

In order, an equilateral triangle, an icosagon, a five-pointed star, an almost-donut, an Escherian tesselating lizard, and a tesselating spider.

Sunday, February 18, 2018

Rotating Fluid

Suppose we have an infinitely tall cylinder of radius R, filled to a height H with an incompressible fluid. We then set the fluid rotating about the cylindrical axis at angular speed \(\omega\). Suppose we take a differential chunk of fluid on the surface, a radius r from the axis.

The resulting normal force will then be \(N=F_c+W\). This normal force, as the name suggests, will be normal to the fluid surface. It follows by simple geometry, that \[ \frac{dy}{dr}=\frac{F_c}{W}=\frac{r \omega^2}{g} \] From which it follows that the height of the surface at any radius will be given by \[ y=\frac{r^2 \omega^2}{2g}+C \] Let us define \[ \omega_0=2\sqrt{gH}/R \\ u=\omega/\omega_0 \] Given that the fluid is incompressible, we know that the total volume does not change. From this, we can determine that the height of the surface at any radius will be given by: \[ y(r)=2H\left ( ru/R \right )^2+\left\{\begin{matrix} H(1-u^2) \\ 2H(u-u^2) \end{matrix}\right. \, \, \, \, \, \, \, \begin{matrix} u \leq 1\\ u > 1 \end{matrix} \] The highest point on the liquid surface is then given by: \[ y_{\textrm{max}}=\left\{\begin{matrix} H(1+u^2)\\ 2Hu \end{matrix}\right. \, \, \, \, \, \, \, \begin{matrix} u \leq 1\\ u > 1 \end{matrix} \] If \(u > 1\), the center of the base of the cylinder is not covered by fluid. There is a minimum radius at which fluid can be found. This minimum radius is given by: \[ r_{\textrm{min}}=R\sqrt{1-\frac{1}{u}} \] If the fluid is of uniform density and of total mass M, then the moment of inertia of the rotating fluid is given by \[ I=\left\{\begin{matrix} \frac{MR^2}{2}\left ( 1+\frac{u^2}{3} \right )\\ MR^2\left ( 1-\frac{1}{3u} \right ) \end{matrix}\right. \, \, \, \, \, \, \, \begin{matrix} u \leq 1\\ u > 1 \end{matrix} \] Note for each of these piecewise functions, the functions and their first derivatives are continuous.

Tuesday, February 6, 2018

Bias in Statistical Judgment

Bias in Performance Evaluation

Suppose you are an employer. You are looking to fill a position and you want the best person for the job. To do this, you take a pool of applicants, and for each one, you test them N times on some metric X. From these N tests, you will develop some idea of what each applicant's performance will look like, and based on that, you will hire the applicant or applicants with the best probable performance. However, you know that each applicant comes from one of two populations which you believe to have different statistical characteristics, and you know immediately which population each applicant comes from.

We will use the following model: We will assume that the population from which the applicants are taken is made up of two sub-populations A and B. These two sub-populations have different distributions of individual mean performance that are both Gaussian. That is, an individual drawn from sub-population A will have an expected performance that is normally distributed with mean \(\mu_A\) and variance \(\sigma_A^2\). Likewise, an individual drawn from sub-population B will have an expected performance that is normally distributed with mean \(\mu_B\) and variance \(\sigma_B^2\). Individual performances are then taken to be normally distributed with the individual mean and individual variance \(\sigma_i^2\).

Suppose we take a given applicant who we know comes from sub-population B. We sample her performance N times and get performances of \(\{x_1,x_2,x_3,...,x_N\}=\textbf{x}\). We form the following complete pdf for the (N+1) variables of the individual mean and the N performances: \[ f_{\mu_i,\textbf{x}|B}(\mu_i,x_1,x_2,...,x_N)=\frac{1}{\sqrt{2\pi}^{N+1}}\frac{1}{\sigma_B \sigma_i^N} \exp\left ({-\frac{(\mu_i-\mu_B)^2}{2\sigma_B^2}} \right ) \prod_{k=1}^N\exp\left ({-\frac{(x_k-\mu_i)^2}{2\sigma_i^2}} \right ) \] It follows that the distribution conditioned on the test results is proportional to: \[ f_{\mu_i|,\textbf{x},B}(\mu_i)\propto \exp\left ({-\frac{(\mu_i-\mu_B)^2}{2\sigma_B^2}} \right ) \prod_{k=1}^N\exp\left ({-\frac{(x_k-\mu_i)^2}{2\sigma_i^2}} \right ) \] By normalizing we find that this implies that the individual mean, given that it comes from sub-population B and given the N test results, is normally distributed with variance \[ \sigma_{\tilde{\mu_i}}^2=\left ( {\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} \right )^{-1} \] and mean \[ \tilde{\mu_i}=\frac{\frac{\mu_B}{\sigma_B^2}+\frac{1}{\sigma_i^2}\sum_{k=1}^{N}x_k}{\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} =\frac{\frac{\mu_B}{\sigma_B^2}+\frac{N}{\sigma_i^2}\bar{\textbf{x}}}{\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} \] We will assume that this mean and variance are used as estimators to predict performance. Note that, in the limit of large N, \(\sigma_{\tilde{\mu_i}}^2\rightarrow \sigma_i^2/N\) and \(\tilde{\mu_i}\rightarrow \bar{\textbf{x}}\rightarrow \mu_i\), as expected.

Suppose we assume sub-populations A and B have the same variance \(\sigma_{AB}^2\), but \(\mu_A>\mu_B\). then we can note the following few implications:

The belief about the sub-population the applicant comes from acts effectively as another performance sample of weight \(\sigma_i^2/\sigma_{AB}^2\).
If applicant 1 comes from sub-population A and applicant 2 comes from sub-population B, even if they perform identically in their samples, applicant 1 would nevertheless still be preferred.
The more samples are taken, the less the sub-population the applicant comes from matters.
The larger the difference in means between the sub-populations is assumed to be, the better the lesser-viewed applicant will need to perform in order to be selected over the better-viewed applicant.
Suppose we compare \(\tilde{\mu_i}\) to \(\bar{\textbf{x}}\). Our selection criteria will simply be if the performance predictor is above \(x_m\). We want to find the probability of being from a given sub-population given that the applicant was selected by each predictor. For the sub-population-indifferent predictor: \[ P(A|\bar{\textbf{x}}\geq x_m)=\frac{P(\bar{\textbf{x}}\geq x_m|A)P(A)}{P(\bar{\textbf{x}}\geq x_m|A)P(A)+P(\bar{\textbf{x}}\geq x_m|B)P(B)} \\ \\ P(A|\bar{\textbf{x}}\geq x_m)= \frac{P(A)Q\left (\frac{x_m-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} {P(A)Q\left (\frac{x_m-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right ) + P(B)Q\left (\frac{x_m-\mu_B}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} \] Where \[ Q(z)=\int_{z}^{\infty}\frac{e^{-s^2/2}}{\sqrt{2\pi}}ds\approx \frac{e^{-z^2/2}}{z\sqrt{2\pi}} \] For the sub-population-sensitive predictor, we first note that \[ \tilde{\mu_i} \geq x_m \Rightarrow \bar{\textbf{x}}\geq x_m+(x_m-\mu_A)\frac{\sigma_i^2}{N\sigma_A^2}=x_m' \] Which then implies \[ P(A|\tilde{\mu_i}\geq x_m)=\frac{P(\tilde{\mu_i}\geq x_m|A)P(A)}{P(\tilde{\mu_i}\geq x_m|A)P(A)+P(\tilde{\mu_i}\geq x_m|B)P(B)} \\ \\ P(A|\tilde{\mu_i}\geq x_m)= \frac{P(A)Q\left (\frac{x_m'-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} {P(A)Q\left (\frac{x_m'-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right ) + P(B)Q\left (\frac{x_m'-\mu_B}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} \] As \(x_m > \mu_A\) and thus \(x_m' > x_m\), it is easy to see that \(P(A) < P(A|\bar{\textbf{x}}\geq x_m) < P(A|\tilde{\mu_i}\geq x_m) \). Thus the sensitivity further biases the selection towards sub-population A. We can call \(\bar{\textbf{x}}\) the meritocratic predictor and \(\tilde{\mu_i}\) the semi-meritocratic predictor.

Some Sociological Implications

Though the above effects may, in theory, be small, their effects in practice may not be. Humans are not perfectly rational and are not perfect statistical computers. The above is meant to give motivation for taking seriously effects that are often much more pronounced. If there is a perceived difference in means, there is likely a tendency to exaggerate it, to think that the difference in means should be visible, and hence that the two distributions should be statistically separable. Likewise, population variances are often perceived as narrower than they really are, leading to further amplification of the biasing effect. Moreover, the parameter estimations are not based simply on objective observation of the sub-populations, but also if not mainly on subjective, sociological, psychological, and cultural factors. As high confidence in one's initial estimates makes one less likely to take more samples, the employer's judgment may rest heavily on subjective biases. Given this, if the employer's objective is simply to hire the best candidates, she should simply use the meritocratic predictor (or perhaps at least invest some time into getting accurate sub-population parameters).

However, it is worth noting some effects on the candidates themselves. As a rule, the candidates are not subjected to this bias just in this bid for employment alone, but rather serially and repeatedly, in bid after bid. This may have any of the following effects: driving applicants toward jobs where they will be more favored (or less dis-favored) by the bias; affecting the applicant's self-evaluations, making them think their personal mean is closer to the broadly perceived sub-population mean; normalizing the broadly perceived sub-population mean, with an implicit devaluation of deviation from it. Also, we can note the following well-known problem: personal means tend to increase in challenging jobs, meaning that the unfavorable bias will perpetually stand in the way of the development of the negatively biased candidate, which then only serves to further feed into the bias. Both advantages and disadvantages tend to widen, making this a subtle case of "the rich get richer and the poor get poorer".

The moral of all this can be summarized as: the semi-meritocratic predictor should be avoided if possible as it is very difficult to implement effectively and has a tendency to introduce a host of detrimental effects. Fortunately, the meritocratic predictor loses only a small amount by way of informative-ness, and avoids the drawbacks mentioned above. Care should then be taken to ensure that the meritocratic selection system is implemented as carefully as can be managed to preclude the introduction of biasing effects. one way of washing out the effects of biasing in general is simply to give the applicants many opportunities to demonstrate their abilities.

Tuesday, August 1, 2017

Some Newtonian Gravitational Mechanics

Duration of a trajectory

Suppose we launch an object straight up. We wish to find how long it will take to return. Suppose we launch it up at a speed \(v_0\). It is well known that the classical escape velocity is given by \[ v_e=\sqrt{\frac{2MG}{R}} \] By examining the energy equation, we find that the speed when the object is a distance r from the center of the planet is given by: \[ v(r)=v_e\sqrt{\frac{R}{r}-\gamma} \] Where \[ \gamma=1-\frac{v_0^2}{v_e^2} \] To find the travel time, we integrate: \[ T=2\int_{R}^{R/\gamma}\frac{dr}{v(r)}=\frac{2}{v_e}\int_{R}^{R/\gamma}\frac{dr}{\sqrt{\frac{R}{r}-\gamma}}=\frac{2R}{v_e}\int_{\gamma}^{1}\frac{du}{u^2\sqrt{u-\gamma}} \] \[ T=\frac{2R}{v_e}\frac{\tan^{-1}\left ( \sqrt{\frac{1}{\gamma}-1} \right )+\sqrt{\gamma-\gamma^2}}{\gamma^{3/2}}=\frac{2R}{v_e}\frac{\sin^{-1}(u)+u\sqrt{1-u^2}}{(1-u^2)^{3/2}} \] Where \(u=v_0/v_e\).

Optimal Path through a Planet

We want to find the best path through a planet of radius R, connecting two points \(2\alpha\) radians apart (great circle angle). We assume the planet is of uniform density. As is well known, the acceleration due to gravity a radius r from the center of the planet is given by: \[ a=-gr/R \] Where \(g\) is the surface gravitational acceleration. Thus, if it falls from the surface along a path through the planet, its speed at a distance r from the center will be given by \[ \tfrac{1}{2}mv^2=\tfrac{1}{2}m\frac{g}{R}\left ( R^2-r^2 \right ) \] \[ v(r)=\sqrt{\frac{g}{R}} \sqrt{R^2-r^2} \] Let us suppose it falls along the path specified by the function \(r(\theta)\), where r is even and \(r(\pm\alpha)=R\). The total time is given by \[ T=2\int_{0}^{\alpha}\frac{d\ell}{v}=2\sqrt{\frac{R}{g}}\int_{0}^{\alpha}\frac{\sqrt{r^2(\theta)+r'^2(\theta)}}{\sqrt{R^2-r^2(\theta)}}d\theta \] In order to obtain conditions for the optimal path, then, we use calculus of variations. The Lagrangian is \[ L(r,r',\theta)=\frac{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}} \] Using the Beltrami Identity, we find: \[ \frac{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}}-\frac{r'^2}{{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}}}=\frac{r^2}{{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}}}=C \] Let \(1+ \tfrac{1}{C^2}=1/q^2\). Rearranging, we find: \[ r'=r\sqrt{\frac{\left (1+ \tfrac{1}{C^2} \right )r^2-R^2}{R^2-r^2}}=\frac{r}{q}\sqrt{\frac{r^2-R^2q^2}{R^2-r^2}} \] As \(r'(0)=0\), this implies that \[ r(0)=Rq \] \[ r=\frac{r(0)}{R} \] Let us make the change of variables: \(u=r^2/R^2\). This then gives: \[ u'=2u\sqrt{\frac{\tfrac{1}{q^2}u-1}{1-u}} \] \[ u(0)=q^2 \] In order to determine this value, we can integrate the differential equation: \[ \frac{1}{2u} \sqrt{\frac{1-u}{\tfrac{1}{q^2}u-1}}du=d\theta \] \[ \int_{q^2}^{1}\frac{1}{2u} \sqrt{\frac{1-u}{\tfrac{1}{q^2}u-1}}du=\frac{\pi}{2}(1-q)=\int_{0}^{\alpha}d\theta=\alpha \] Thus \[ q=1-\frac{2\alpha}{\pi} \] We can then find the total travel time: \[ T=2\sqrt{\frac{R}{g}}\int_{0}^{\alpha}\frac{\sqrt{r^2(\theta)+r'^2(\theta)}}{\sqrt{R^2-r^2(\theta)}}d\theta=2\sqrt{\frac{R}{g}}\int_{Rq}^{R}\frac{\sqrt{r^2(\theta)+r'^2(\theta)}}{\sqrt{R^2-r^2(\theta)}}\frac{1}{r'}dr \] \[ T=2\sqrt{\frac{R}{g}}\int_{Rq}^{R} \frac{q}{r} \sqrt{r^2+\frac{r^2}{q^2}{\frac{r^2-R^2q^2}{R^2-r^2}}}\frac{dr}{\sqrt{r^2-R^2q^2}} \] \[ T=\sqrt{\frac{R}{g}}\sqrt{1-q^2}\int_{Rq}^{R} \frac{2rdr}{\sqrt{R^2-r^2} \sqrt{r^2-R^2q^2}} \] \[ T=\sqrt{\frac{R}{g}}\sqrt{1-q^2}\int_{q^2}^{1} \frac{dx}{\sqrt{1-x^2} \sqrt{x^2-q^2}} \] \[ T=\pi \sqrt{\frac{R}{g}}\sqrt{1-q^2}=2\sqrt{\frac{R}{g}}\sqrt{\pi\alpha-\alpha^2} \] Below we show several trajectories along the optimal path for several values of alpha:

In fact, these solutions are hypocycloids.

Thursday, July 27, 2017

Golomb's Sequence

Definition

Golomb's sequence, named after Solomon Golomb, is a curious sequence of whole numbers that describes itself. It is defined in the following way: it is a non-decreasing sequence of whole numbers where the nth term gives the number of times n occurs in the sequence, and the first term is 1. From this we can begin constructing it: The second element must be greater than 1 as there is only one 1. It must be 2, and so must be the third element. Given this, there must be 2 threes, and from here on we may merely refer to the terms in the sequence and continue from there. The first several terms of the sequence are: \[ 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, \\ 9, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12,... \]

Recurrence Relation

The sequence can be given an explicit recurrence relation by stating it in the following way, using the self-describing property: To determine the next term in the sequence, go back the number of times that the previous term occurred (this will put you at the next-smallest value), then add one. For example, to determine the 12 term (6), count the number of times that the value of the 11th term (5) occurs (3 times). Step back that many terms (to the 9th term: 5) then add one to that value (6). This then gives the recurrence relation: \[ a(n+1)=1+a\left ( n+1-a(a(n)) \right ) \] Where \(a(1)=1\).

Asymptotic Behavior

The recurrence relation allows us to give an asymptotic expression for the value of the sequence. Let us suppose the sequence grows like \[ a(n)=A n^\alpha \] Let us put this into the recurrence relation: \[ A(n+1)^\alpha=1+A\left ( n+1-A(A n^\alpha)^\alpha \right )^\alpha \] Simplifying and rearranging, we obtain: \[ 1=\frac{1}{A(n+1)^\alpha}+\left (1-A^{1+\alpha}\frac{n^{\alpha^2}}{n+1} \right )^\alpha \] As \(\alpha<1\), \(\frac{n^{\alpha^2}}{n+1}\) goes to zero. For small x, \((1+x)^b\rightarrow 1+bx\). Thus, asymptotically: \[ 1\approx\frac{1}{A(n+1)^\alpha}+1-\alpha A^{1+\alpha}\frac{n^{\alpha^2}}{n+1} \] \[ \alpha A^{2+\alpha}n^{\alpha^2}(n+1)^{\alpha-1} \approx 1 \] Thus it must be the case that \[ \alpha^2+\alpha-1=0 \] \[ A=\alpha^{-\frac{1}{2+\alpha}} \] The solution to the first equation is \[ \alpha=\left \{\varphi-1,-\varphi \right \} \] Where \(\varphi\) is the golden ratio. As the exponent is clearly positive, we find the sequence is asymptotic to: \[ a(n)\rightarrow \varphi^{2-\varphi}n^{\varphi-1} \] Below we plot the ratio of these two expressions:

Friday, July 7, 2017

Continued Fractions

Definition and Background

A continued fraction is a representation of a number \(x\) in the form \[ x=a_0+\cfrac{b_0}{a_1+\cfrac{b_1}{a_2+\cfrac{b_2}{a_3+\cfrac{b_3}{\ddots}}}} \] Often, the b's are taken to be all 1's and the a's are integers. This is called the canonical or simple form. There are numerous ways of representing continued fractions. For instance, \[ x=a_0+\cfrac{1}{a_1+\cfrac{1}{a_2+\cfrac{1}{a_3+\cfrac{1}{\ddots}}}} \] can be represented as \[ x=a_0+\overset{\infty}{\underset{k=1}{\mathrm{K}}}\frac{1}{a_k} \] Or as \[ \left [ a_0;a_1,a_2,a_3,... \right] \]

Construction Algorithm

The continued fraction terms can be determined as follows: Given \(x\), set \(x_0=x\). Then \[ a_k=\left \lfloor x_k \right \rfloor \] \[ x_{k+1}=\frac{1}{x_k-a_k} \] Continue until \(x_k=a_k\).

Convergents

The convergents of a continued fraction are the rational numbers resulting from taking the first n terms of the continued fraction. Let \(P_n\) and \(Q_n\) be the numerators and deominators respectively of the nth convergent (the one that includes \(a_n\)). It is not difficult to show that \[ P_n=a_nP_{n-1}+P_{n-2} \] \[ Q_n=a_nQ_{n-1}+Q_{n-2} \] An alternate way of saying this is that \[ \begin{bmatrix} a_n & 1\\ 1 & 0 \end{bmatrix} \begin{bmatrix} P_{n-1} & Q_{n-1}\\ P_{n-2} & Q_{n-2} \end{bmatrix} = \begin{bmatrix} P_{n} & Q_{n}\\ P_{n-1} & Q_{n-1} \end{bmatrix} \] Where \[ \begin{bmatrix} P_{-1} & Q_{-1}\\ P_{-2} & Q_{-2} \end{bmatrix}= \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \] And therefore \[ {}^L\prod^n_{k=0} \begin{bmatrix} a_k & 1\\ 1 & 0 \end{bmatrix} = \begin{bmatrix} P_{n} & Q_{n}\\ P_{n-1} & Q_{n-1} \end{bmatrix} \] Let \[p_n=\frac{P_n}{P_{n-1}}\] \[q_n=\frac{Q_n}{Q_{n-1}}\] Then \[p_n=a_n+\frac{1}{p_{n-1}}\] \[q_n=a_n+\frac{1}{q_{n-1}}\] We find that \[ \frac{P_{n+1}}{Q_{n+1}}=a_0+\sum_{k=0}^{n}\frac{(-1)^k}{Q_kQ_{k+1}} \] And thus \[ \left | x- \frac{P_{n}}{Q_{n}}\right |<\frac{1}{Q_nQ_{n+1}} \] As \(a_n \geq 1\), \(Q_n \geq F_n\) i.e. the nth Fibonacci number. This, then, implies Hurwitz's theorem: For any irrational number x, there exist infinitely many ratios \(P/Q\) such that \[ \left | x-\frac{P}{Q} \right |<\frac{k}{Q^2} \] Only if \(k \geq 1/\sqrt{5}\).

Periodic Continued Fractions

Suppose that for \(k \geq N\), \(a_{k+M}=a_k\). Let \[ [a_0;a_1,a_2,...a_{N-2}]=\frac{P_{Y1}}{Q_{Y1}} \] \[ [a_0;a_1,a_2,...a_{N-1}]=\frac{P_{Y2}}{Q_{Y2}} \] \[ \left [a_N;a_1,a_2,...a_{N+M-2} \right ]=\frac{P_{Z1}}{Q_{Z1}} \] \[ \left [a_N;a_1,a_2,...a_{N+M-1} \right ]=\frac{P_{Z2}}{Q_{Z2}} \] Then x satisfies the formula \[ x=\frac{P_{Y2}\cdot y+P_{Y1}}{Q_{Y2} \cdot y+Q_{Y1}} \] Where y satisfies \[ y=\frac{P_{Z2}\cdot y+P_{Z1}}{Q_{Z2} \cdot y+Q_{Z1}} \] Thus a continued fraction will be eventually periodic if and only if it is the solution of some quadratic polynomial.

Generic Continued Fractions

Let x be uniformly chosen between 0 and 1. We define a sequence of random variables as follows \[ \xi_0=x \] \[ \xi_{n+1}=\frac{1}{\xi_n}-\left \lfloor \frac{1}{\xi_n} \right \rfloor \] Clearly, if \[x=[0;a_1,a_2,a_3,...] \] Then \[\xi_n=[0;a_{n+1},a_{n+2},a_{n+3},...]\] Let us assume that, asymptotically, the \(\xi\)'s approach a single distribution. Based on our definitions, this would imply that \[ P(\xi_{n+1} < z)=\sum_{k=1}^{\infty} P \left (\frac{1}{k} < \xi_n < \frac{1}{k+z} \right ) \] Differentiating both sides gives the required relationship: \[ f_\xi(z)=\sum_{k=1}^{\infty}\frac{f_\xi\left ( \tfrac{1}{k+z} \right )}{(k+z)^2} \] Let us test the function \[ f_\xi(z)=\frac{A}{1+z} \] \[ \sum_{k=1}^{\infty}\frac{A}{1+\tfrac{1}{k+z}}\frac{1}{(k+z)^2}=\sum_{k=1}^{\infty}\frac{1}{(1+k+z)(k+z)} \] \[ \sum_{k=1}^{\infty}\frac{1}{(1+k+z)(k+z)}=\sum_{k=1}^{\infty}\frac{1}{k+z}-\frac{1}{k+z+1}=\frac{A}{1+z} \] It can be proved more rigorously that this is indeed the asymptotic probability density function, with \(A=1/\ln(2)\). Thus \[ P(\xi_{n} < z)=\log_2(1+z) \] From this we can easily find the asymptotic density function for the continued fraction terms. The probability that \(a_{n+1}=k\) is the same as the probability that \(\left \lfloor \tfrac{1}{\xi_n} \right \rfloor=k\). This is then \[ P(a_{n+1}=k)=P\left ( \frac{1}{k+1} < \xi_n \leq \frac{1}{k} \right )=\log_2(1+\tfrac{1}{k})-\log_2(1+\tfrac{1}{k+1}) \] \[ P(a_{n+1}=k)=\log_2\left ( \frac{(k+1)^2}{k(k+2)} \right )=\log_2\left ( 1+\frac{1}{k(k+2)} \right ) \] This is called the Gauss-Kuzmin Distribution.

From this we can then easily find the asymptotic geometric mean of the terms \[ \underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}a_k}=\exp\left (\underset{n \to \infty}{\lim} \frac{1}{n}\sum_{k=1}^{\infty} \ln(a_k)\right )=\exp\left ( E(\ln(a_k)) \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}a_k}= \exp\left (\sum_{j=1}^{\infty}P(a_k=j)\ln(j) \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}a_k}= \prod_{j=1}^\infty \left ( 1+\frac{1}{j(j+2)} \right )^{\log_2(j)}=2.685452001...=K_0 \] This value is called Khinchin's Constant.

Let us now look at the asymptotic behavior of the convergents. Namely, we wish to examine the asymptotic behavior of the denominators. First note that \[ \xi_n=\frac{1}{\xi_{n-1}}-a_n \] If we let \(y_n=1/\xi_n\), we then have \[ y_{n-1}=a_n+\frac{1}{y_n} \] From above we have that \[q_n=a_n+\frac{1}{q_{n-1}}\] As, asymptotically, \(\xi_n \sim \xi_{n-1}\), this implies that, asymptotically, \(y_n \sim y_{n-1} \sim 1/\xi_n\) and therefore \(q_n \sim q_{n-1} \sim 1/\xi_n\). Thus \[ f_q(z)=\left\{\begin{matrix} \frac{1}{z^2}\frac{1}{\ln(2)}\frac{1}{1+1/z} \\ 0 \end{matrix}\right.\; \; \begin{matrix} z > 1 \\ z \leq 1 \end{matrix} \] As \[ Q_n=\prod_{k=1}^{n}q_k \] We have \[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}=\underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}q_k}=\exp\left (\underset{n \to \infty}{\lim}\frac{1}{n}\sum_{k=1}^{\infty}\ln(q_k) \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}=\exp\left (E(\ln(q_n)) \right )= \exp\left (\int_{1}^{\infty}\ln(z)\frac{1}{z^2}\frac{1}{\ln(2)}\frac{1}{1+1/z}dz \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}= \exp\left (-\frac{1}{\ln(2)}\int_{0}^{1}\frac{\ln(z)}{1+z}dz \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}=\exp\left ( \frac{\pi^2}{12\ln(2)} \right )=3.275823... \] This value (or sometimes its natural log) is called Levy's constant.

We want to know how efficient continued fractions are for representing numbers relative to place-value expansions. Suppose we are working in base b. We want to find how many terms in the continued fraction expansion are required to obtain an approximation good to m base-b digits. We will have obtained such an approximation when the error is less than \(b^{-m}\) but greater than \(b^{-(m+1)}\). From above we have \[ \left | x- \frac{P_{n}}{Q_{n}}\right |<\frac{1}{Q_nQ_{n+1}} \] Thus \[ b^{-(m+1)} < \left | x- \frac{P_{n}}{Q_{n}}\right | < \frac{1}{Q_nQ_{n+1}} < \frac{1}{Q_n^2} \leq b^{-m} \] Rearranging, we have \[ b^m \leq Q_n^2 < b^{m+1} \] \[ b^{\frac{m}{2n}} \leq \sqrt[n]{Q_n} < b^{\frac{m+1}{2n}} \] Thus, as the center expression approaches a limit for large n, it follows that \(m/n\) does as well. Namely, by rearranging, we find that for n the number of continued fraction terms needed to express x in base b up to m decimal places, \[ \underset{m,n \to \infty}{\lim}\frac{m}{n}=\frac{\pi^2}{6\ln(2)\ln(b)} \] This is known as Loch's Theorem. In particular, for base 10, this implies that each continued fraction term provides on average 1.03064... digits of precision. In fact, base 10 is the largest integral base for which the continued fraction is more efficient.

Wednesday, May 31, 2017

Iterated Radicals

The Case of Square Roots

We wish to examine the behavior of the iterated radical expression \[ R_a(n)=\underbrace{\sqrt{a+\sqrt{a+...\sqrt{a+R_a(0)}}}}_{n\textrm{ radicals}} \] Let \[ A=\lim_{n \rightarrow \infty} R_a(n) \] Then clearly \[ A^2=a+A \] And so \[ A=\tfrac{1+\sqrt{1+4a}}{2} \] In order to determine the nature of the convergence to this limit, let us examine a function defined as follows: \[ f(x/q)=\sqrt{a+f(x)} \] Where q is a value yet to be determined. Clearly \(f(0)=A\), and it is not hard to see that \[ R_a(n)=f\left ( \tfrac{f^{-1}(R_a(0))}{q^n} \right ) \] Thus the behavior of f, as well as the value of q, will determine the convergence of \(R_a(n)\). We rearrange the above relation to get \[ f^2(x)=a+f(qx) \] Let us expand f in a Taylor series. \[ f(x)=A+b_1 x +b_2 x^2 +b_3 x^3+... \] We can substitute this into our functional equation to get \[ A^2+2Ab_1 x+(2A b_2+b_1^2)x^2+(2Ab_3+2b_1b_2)x^3+...=a+A+qb_1x+q^2b_2x^2+q^3b_3x^3+... \] By equating coefficients, we find that \(q=2A\). Note that changing \(b_1\) only affects the scaling of the function. Assuming we want the inverse to be positive as we approach from below, \(b_1\) must be negative, thus we simply set \(b_1=-1\). Now the rest of the coefficients can be found algorithmically in sequence. In general, the coefficient of \(x^k\) will be \[ b_k=\frac{1}{(2A)^k-2A}\sum_{j=1}^{k-1}b_jb_{k-j} \] And thus \[ f\left ( \tfrac{x}{2A} \right )=\sqrt{a+f(x)} \]\[ f^2(x)=a+f(2Ax) \\ R_a(n)=f\left ( \tfrac{f^{-1}(R_a(0))}{(2A)^n} \right ) \] Where f is defined by the polynomial with the given coefficients. It follows that \[ \lim_{n \rightarrow \infty} (2A)^n(A-R_a(n))=\lim_{n \rightarrow \infty} (2A)^n(f(0)-f(f^{-1}(R_a(0))/(2A)^n)) \]\[ \lim_{n \rightarrow \infty} (2A)^n(A-R_a(n))=-f'(0)f^{-1}(R_a(0))=f^{-1}(R_a(0)) \]\[ \lim_{n \rightarrow \infty} (2A)^n \left (A-\underbrace{\sqrt{a+\sqrt{a+...\sqrt{a+z}}}}_{n\textrm{ radicals}} \right )=f^{-1}(z) \] Another way to construct \(f(x)\) is by the following approach, which converges fairly quickly: Let \(f_0(x)=A-x\). We define \[ f_{k+1}(x)= f_k^2\left (\frac{x}{2A} \right )-a \] Then \[ \lim_{k \rightarrow \infty}f_k(x)=f(x) \]

A Special Trigonometric Case

For the case of \(a=2\), it is easy to show by induction that \[ b_k=2(-1)^k\frac{1}{(2k)!} \] Which would imply that \[ f(x)=2\cos(\sqrt{x}) \] Therefore \[ \lim_{n \rightarrow \infty} 4^n \left( 2-\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2}}}}_{n \textrm{ radicals}}\right)=\pi^2/4 \]

An Infinite Product

Beginning with \[ f^2(x)=a+f(2Ax) \] Let us differentiate to obtain \[ f(x)f'(x)=Af'(2Ax) \] Thus, if we define \[ g(x)=-xf'(x) \] Then we easily see that \[ g(2Ax)=2g(x)f(x) \] Clearly \(g(0)=0, g'(0)=1\). Then \[ g(x)=2f\left (\tfrac{x}{2A} \right )g\left (\tfrac{x}{2A} \right )=2^2f\left (\tfrac{x}{2A} \right ) f\left (\tfrac{x}{(2A)^2} \right ) g\left (\tfrac{x}{(2A)^2} \right ) \]\[ g(x)=2^N g\left (\tfrac{x}{(2A)^N} \right )\prod_{k=1}^{N} f\left (\tfrac{x}{(2A)^k} \right ) \]\[ g(x)=(2A)^N g\left (\tfrac{x}{(2A)^N} \right )\prod_{k=1}^{N} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \] Taking the limit \[ g(x)=\underset{N \to \infty}{\lim}(2A)^N g\left (\tfrac{x}{(2A)^N} \right )\prod_{k=1}^{N} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right )=x\prod_{k=1}^{\infty} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \] Thus \[ -f'(x)=\prod_{k=1}^{\infty} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \] Thus we need only examine the zeros of f to find the zeros of f'. In fact, if f has zeros \[\left \{z_1,z_2,z_3,... \right \}\] Then f will have extrema at \[ \bigcup_{k=1}^{\infty}\left \{(2A)^kz_1,(2A)^kz_2,(2A)^kz_3,... \right \} \]

An Associated Infinite Series

Differentiating the log of both sides of the result above, we find the infinite series: \[ \frac{d}{dx}\ln\left (-f'(x) \right )=\frac{d}{dx}\ln\left (\prod_{k=1}^{\infty} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \right ) \]\[ \frac{f''(x)}{f'(x)}=\sum_{k=1}^{\infty}\frac{1}{(2A)^k}\frac{f'\left (\tfrac{x}{(2A)^k} \right )}{f\left (\tfrac{x}{(2A)^k} \right )} \]

Zeros of \(f(x)\)

Below is a plot of the zeros of for different values of a on the vertical axis, plotted semi-logarithmically.

Below is a plot the sign of f (Yellow is positive, blue is negative), from which the zero contours can be seen. However, we can also see that some zeroes of f for certain values of a are multiple roots, as f goes to zero without changing sign.

Special Cases

Two special cases bear mentioning. In the case \(a=1\), the zeros are given by \[ z_n=2.1973\cdot (1+\sqrt{5})^{2n} \] for \(n \geq 0\). In fact, in this case, after the first zero, f is always between -1 and 0. f is -1 at \[ x_n=2.1973\cdot (1+\sqrt{5})^{2n+1} \] for \(n \geq 0\). For \(a=2\), the zeros are at \[ z_n=\left ( (2n+1)\frac{\pi}{2} \right )^2 \] And, in fact, \(f(x)=2\) at \(x_n=\left ( 2n\pi \right )^2\), and \(f(x)=-2\) at \(x=\left ( (2n+1)\pi \right )^2\), for \(n\geq0\).

Periodic and Possible Fractal Structure

Although f is generally not very interesting close to zero, it exhibits remarkable behavior on larger scales. We find, namely, that if we take \[ h(x)=\left | f(x) \right |^{x^{-\log_{2A}(2)}} \] Then h is exponentially periodic, asymptotically. We define \[J(x)=h((2A)^x)\] This function has period 1, asymptotically. Below we show the behavior of J for some values of A

Note that the number of zeros remains constant. All seem to be single roots. In fact, the location of the dominant maxima seem constant as well However, within the periodicity, J appears to have a fractal structure. Below we show a zoom of \(J(x)\) for \(a=3\).

Complex Behavior

We can take the series and functional definitions of the function and use them to extend the function to the entire complex plane. Below we plot the complex sign of \(f(Cz|z|)\) for different values of a, and a certain value of C (this rescaling done to make the regularities more evident). The complex sign is given by the color:

Dark Blue\(\Leftrightarrow\textrm{Re}< 0 ,\textrm{Im} < 0 \)
Light Blue\(\Leftrightarrow\textrm{Re} < 0 ,\textrm{Im} > 0 \)
Orange\(\Leftrightarrow\textrm{Re} > 0,\textrm{Im} < 0 \)
Yellow\(\Leftrightarrow\textrm{Re} > 0,\textrm{Im} > 0 \)

This allows us to find zeros, which correspond to points where all four colors meet.

We note several remarkable features:

The function is conjugate-symmetric.
The function displays remarkable regularity away from the real line. Note the persistent ripples which reach total regularity at \(a=2\). There is a structure of "fingers" that gradually join, each finger corresponding to one zero. The position of certain features on the real line remains fixed, e.g. the prominent feature at about 0.8.
The evolution of the function over a can be broken into three eras.
1. Pre-Saturating: For \(a< 1 \), there is exactly one real zero.
2. Saturating: For \(1\leq a < 2 \), zeros join to form pairs of real zeros.
3. Saturated: For \(2 \leq a\), all zeros are real.
The number and larger-scale density of zeros remains roughly constant.
The function displays quasi-fractal properties, as it becomes increasingly self-similar on larger scales. In a sense, a cross between periodic and fractal behavior, as seen in the other figures.
The process of the fusing of complex zeros into pairs of real zeros can also be seen in the plots of the real zeros above, giving a new view of the branching features.
The fingers coalesce along elliptical paths. In fact, these ellipses are of the form \(x^2+2y^2=C'^2\)

The Case of Arbitrary Roots

More generally, suppose we examine \[ R_a(n)=\underbrace{\sqrt[p]{a+\sqrt[p]{a+...\sqrt[p]{a+R_a(0)}}}}_{n\textrm{ radicals}} \] Let \[ A=\lim_{n \rightarrow \infty} R_a(n) \] Then \[ f(x/q)=\sqrt[p]{a+f(x)} \] Clearly \(f(0)=A\), and it is not hard to see that, again \[ R_a(n)=f(f^{-1}(R_a(0))/q^n) \] If we do the same analysis as before we find that \(q=pA^{p-1}=p(1+a/A)\). Let \(f_0(x)=A-x\). We define \[ f_{k+1}(x)= f_k^p\left (\frac{x}{q} \right )-a \] Then \[ \lim_{k \rightarrow \infty}f_k(x)=f(x) \] Then similarly we have \[ \lim_{n \rightarrow \infty} q^n(A-R_a(n))=f^{-1}(R_a(0)) \] MATLAB code for evaluating the function for a given a and given radical can be found here.

Wednesday, January 6, 2016

Some Introductory Quantum Mechanics: Theorems of the Formalism

Quantum mechanics (QM) has a number of curious and interesting associated phenomena. Some of these were hinted at in the first part of this series. The effects can be inferred from the mathematical formalism discussed in the previous post in this series. Here we will discuss several of these, again without reference to interpretation.

This is part of a multi-part series giving a general introduction to quantum theory. This is part 3.

Heisenberg's Uncertainty Principle

The variance of any observable is defined as \[ \sigma_A^2=\left \langle A^2 \right \rangle-\left \langle A \right \rangle^2=\left \langle \left ( A-\left \langle A \right \rangle \right )^2 \right \rangle \] Where \(\left \langle Q \right \rangle=\left. \langle \psi \right. | Q\left. |\psi \right \rangle\) is the expected value of the operator Q. Roughly speaking, \(\sigma_A\) is the "width" of distribution of the potential values for A. We then define a new state vector as \[ \left. | a \right \rangle=\left (A-\left \langle A \right \rangle \right ) \left. |\psi \right \rangle \] So that \[ \sigma_A^2=\left \langle a \right. |\left. a \right \rangle \] We similarly define \[ \sigma_B^2=\left \langle B^2 \right \rangle-\left \langle B \right \rangle^2=\left \langle \left ( B-\left \langle B \right \rangle \right )^2 \right \rangle=\left \langle b \right. |\left. b \right \rangle \] Where \[ \left. | b \right \rangle=\left (B-\left \langle B \right \rangle \right ) \left. |\psi \right \rangle \] Then, by the Cauchy-Schwartz inequality: \[ \sigma_A^2\sigma_B^2=\left \langle a \right. |\left. a \right \rangle\left \langle b \right. |\left. b \right \rangle \geq \left | \left \langle a \right. |\left. b \right \rangle \right |^2 \] Let \(z= \left \langle a \right. |\left. b \right \rangle\). Then \[ \left | z \right |^2=[\mathrm{Re}(z)]^2+[\mathrm{Im}(z)]^2\geq[\mathrm{Im}(z)]^2=\left [\frac{z-\bar{z}}{2i} \right ]^2=\left [\frac{\left \langle a \right. |\left. b \right \rangle-\left \langle b \right. |\left. a \right \rangle}{2i} \right ]^2 \] However, \[ \left \langle a \right. |\left. b \right \rangle=\left \langle \left ( A-\left \langle A \right \rangle \right ) \left ( B-\left \langle B \right \rangle \right ) \right \rangle=\left \langle AB \right \rangle-\left \langle A \right \rangle\left \langle B \right \rangle \] \[ \left \langle b \right. |\left. a \right \rangle=\left \langle \left ( B-\left \langle B \right \rangle \right ) \left ( A-\left \langle A \right \rangle \right ) \right \rangle=\left \langle BA \right \rangle-\left \langle B \right \rangle\left \langle A \right \rangle \] So \[ \left | z \right |^2\geq\left [\frac{\left \langle AB \right \rangle-\left \langle BA \right \rangle}{2i} \right ]^2=\left [\frac{\left \langle [A,B] \right \rangle}{2i} \right ]^2 \] Where \([A,B]=AB-BA\) is the commutator of the two operators A and B (In general, two operators need not commute, and so the commutator will not vanish). Thus, we can state the general uncertainty principle for any two operators: \[ \sigma_A \sigma_B \geq\tfrac{1}{2}| \left \langle \left [A,B \right ] \right \rangle | \] For example, let us take the one-dimensional position and momentum operators: \[ A=x,\;B=\frac{\hbar}{i}\frac{\partial }{\partial x} \] \[ [x,p_x]\left. | \psi \right \rangle=xp_x\left. | \psi \right \rangle-p_xx\left. | \psi \right \rangle =\frac{\hbar}{i}\left (x\frac{\partial}{\partial x}\left. | \psi \right \rangle-\frac{\partial }{\partial x}x\left. | \psi \right \rangle \right )=i \hbar \left. | \psi \right \rangle \] Thus \[ \sigma_x \sigma_{p_x} \geq\frac{\hbar}{2} \] This is the famous position-momentum uncertainty relation.

No Cloning and Related Theorems

Suppose we want to find an operator that takes a quantum state and produces a copy of it. That is, we feed in a state and a "blank" state, operate on the two of them, and the result is the original state and a copy of it. Let this operator be called C, and the blank state be called b. That is: \[ C \left. | \psi \right \rangle_A\left. | b \right \rangle_B= \left. | \psi \right \rangle_A\left. | \psi \right \rangle_B \] As C is a transformation/evolution operator, it must be unitary, so it preserves inner products, and \(C^\dagger C=I\). Therefore \[ C \left. | \phi \right \rangle_A\left. | b \right \rangle_B=\left. | \phi \right \rangle_A\left. | \phi \right \rangle_B \] \[ \left \langle b \right.|_B \left \langle \phi \right.|_A C^\dagger=\left \langle \phi \right.|_B \left \langle \phi \right.|_A \] \[ \left \langle b \right.|_B \left \langle \phi \right.|_A \left. | \psi \right \rangle_A\left. | b \right \rangle_B = \left \langle b \right.|_B \left \langle \phi \right.|_A C^\dagger C \left. | \psi \right \rangle_A\left. | b \right \rangle_B =\left \langle \phi \right.|_B \left \langle \phi \right.|_A \left. | \psi \right \rangle_A\left. | \psi \right \rangle_B \] However, \(\left \langle b|b \right \rangle=1\), so \[ \left \langle \phi|\psi \right \rangle=\left \langle \phi|\psi \right \rangle^2 \] Thus \(\left \langle \phi|\psi \right \rangle \in \left \{ 0,1 \right \}\), that is, the two wavefunctions are orthogonal or identical. But the two states can be chosen arbitrarily, and need not be identical or orthogonal (indeed we can always construct a wavefunction as a linear combination of an orthogonal state and an identical state, and so achieve any inner product).

Moreover,as C must be linear, if \(\left. | \chi \right \rangle=\alpha \left. | \phi \right \rangle+\beta \left. | \psi \right \rangle\), then \[ C\left. | \chi \right \rangle_A \left. | b \right \rangle_B= C \left ( \alpha \left. | \phi \right \rangle_A+\beta \left. | \psi \right \rangle_B \right ) \left. | b \right \rangle_B =\alpha C \left. | \phi \right \rangle_A \left. | b \right \rangle_B+\beta C \left. | \psi \right \rangle_A \left. | b \right \rangle_B \] \[ C\left. | \chi \right \rangle_A \left. | b \right \rangle_B = \alpha \left. | \phi \right \rangle_A \left. | \phi \right \rangle_B + \beta \left. | \psi \right \rangle_A \left. | \psi \right \rangle_B \] However, \[ C\left. | \chi \right \rangle_A \left. | b \right \rangle_B=\left. | \chi \right \rangle_A \left. | \chi \right \rangle_B=\alpha^2 \left. | \phi \right \rangle_A \left. | \phi \right \rangle_B+\alpha\beta \left ( \left. | \phi \right \rangle_A \left. | \psi \right \rangle_B + \left. | \psi \right \rangle_A \left. | \phi \right \rangle_B \right )+\beta^2 \left. | \psi \right \rangle_A \left. | \psi \right \rangle_B \] And these two expressions clearly need not be equivalent. We are free to choose \(\alpha, \beta, \phi\), and \(\psi\) arbitrarily, and, in general, the two expressions will be unequal. Thus there cannot be a way to copy arbitrary quantum states.

Since there is no way to clone a quantum state, there is thus no way to go in the opposite direction, namely start with two identical states and transform that into a "blank" state and an original. The argument runs in much the same way, and can be seen as a dual of the no cloning theorem, called the no-deleting theorem.

Suppose it were possible to measure and communicate the state of an arbitrary quantum state as a sequence of classical bits. Since classical bits can be easily copied, it would then be possible to copy quantum states, in violation of the no cloning theorem. Thus it is not possible to measure and communicate the state of an arbitrary quantum state as a sequence of classical bits, and this is called the no teleportation theorem.

An extension of the no cloning theorem to mixed statesis the no broadcast theorem, which states that one can't convey a general quantum state to two or more recipients.

Correspondence Principle and the Ehrenfest Theorem

A rather clear demand on quantum mechanics is that its predictions tend to those of standard classical mechanics in the appropriate limits. Given that we do not observe macroscopic objects to display unusual, characteristically quantum phenomena, quantum mechanics must make the same predictions as classical mechanics, asymptotically. The probabilities for macroscopic objects to display such phenomena must be vanishingly small. In general, the observed classical parameters will correspond to the expected values of the quantum analogues.

As an example, let us find the rate of change of the expected value of a generic observable \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle A \right \rangle=\frac{\mathrm{d} }{\mathrm{d} t}\left \langle \psi|A|\psi \right \rangle =\left \langle \frac{\partial }{\partial t}\psi|A|\psi \right \rangle + \left \langle \psi|\frac{\partial }{\partial t}A|\psi \right \rangle + \left \langle \psi|A|\frac{\partial }{\partial t}\psi \right \rangle \] However, since the wavefunction satisfies the Schrodinger equation, we have \[ i \hbar \frac{\partial }{\partial t}\left. | \psi \right \rangle= H\left. | \psi \right \rangle \] And, moreover \[ -i \hbar \frac{\partial }{\partial t}\left \langle \psi | \right.= \left \langle \psi | \right. H \] Thus \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle A \right \rangle =-\frac{1}{i\hbar}\left \langle\psi|HA|\psi \right \rangle + \left \langle \psi|\frac{\partial }{\partial t}A|\psi \right \rangle + \frac{1}{i\hbar}\left \langle \psi|AH|\psi \right \rangle \] \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle A \right \rangle = \frac{1}{i\hbar}\left \langle[A,H] \right \rangle+\left \langle \frac{\partial }{\partial t}A \right \rangle \] Let \(x=A\) \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle x \right \rangle = \frac{1}{i\hbar}\left \langle[x,H] \right \rangle+\left \langle \frac{\partial }{\partial t}x \right \rangle =\frac{1}{i\hbar}\left \langle[x,H] \right \rangle =\frac{1}{i\hbar}\left \langle[x,\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}+V] \right \rangle \] \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle x \right \rangle =\frac{1}{i\hbar}\left \langle[x,\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}+V] \right \rangle =\frac{\hbar}{im}\left \langle \frac{\partial }{\partial x} \right \rangle=\frac{\left \langle p \right \rangle}{m} \] Let \(p=A\) \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle p \right \rangle = \frac{1}{i\hbar}\left \langle[p,H] \right \rangle+\left \langle \frac{\partial }{\partial t}p \right \rangle \] \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle p \right \rangle = \frac{1}{i\hbar}\left \langle[p,H] \right \rangle+\left \langle \frac{\partial }{\partial t}p \right \rangle =\frac{1}{i\hbar}\left \langle[p,\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}+V] \right \rangle = -\left \langle \frac{\partial V}{\partial x} \right \rangle \] These are the same as the classical dynamical equations for the position and momentum. Thus, as it is often the case that the wavefunctions are highly localized, at least compared to macroscopic scales, quantum mechanics predicts the same macroscopic behavior as classical mechanics.

Another fact derivable from the Ehrenfest theorem is the following. Suppose Q is an operator that does not depend explicitly on time. Then we have \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle Q \right \rangle = \frac{1}{i\hbar}\left \langle[Q,H] \right \rangle \] From our discussion in the section of the Heisenberg uncertainty principle \[ \sigma_H\sigma_Q\geq\tfrac{1}{2}|\left \langle \left [ H,Q \right ] \right \rangle|=\frac{\hbar}{2}\left | \frac{\mathrm{d} }{\mathrm{d} t} \left \langle Q \right \rangle \right | \] Though time is not an observable, let us nevertheless define \[ \sigma_t=\frac{\sigma_Q}{|\mathrm{d}\left \langle Q \right \rangle/\mathrm{dt}|} \] We then have \[ \sigma_H\sigma_t\geq\frac{\hbar}{2} \] A result analogous to that of position and momentum.

Bell's Theorem and the Kochen-Specker Theorem

Certain interpretations of quantum mechanics hold that the measurements and observations of the quantum systems are deterministic, and the only reason they seem indeterministic is because we lack full knowledge of the system. They hold that there are hidden variables in the system that we have not or maybe even can not uncover that govern the system, and it is only our ignorance of these that makes us unable to predict with certainty what we will observe. Models like these are called realistic, in the sense that, prior to the measurement, there is a definite, singular reality of what we will observe (or in some cases, counterfactually would observe).

Another principle typically regarded as fundamental is that the system is local, that is, causal effects cannot propagate faster than the speed of light. In principle, if the system could be appropriately manipulated, it would be possible to use non-local systems to send messages into the past.

Often, these interpretations are hard or impossible to test. However, certain versions can be tested, as they make predictions that would be inconsistent with those of standard quantum mechanics. Bell's inequality is one way to rule out certain types of local realistic models.

Let us take a source that produces a sequence of identical electron pairs in the entangled state \(\tfrac{1}{\sqrt{2}}\left (\left. | \uparrow\downarrow \right \rangle+\left. | \downarrow\uparrow \right \rangle \right )\). That is, the two particles are perfectly anti-correlated in the z direction.

We then send the particles in opposite directions to two detectors, A and B. These detectors measure the spin along axes at angles \(\alpha\) and \(\beta\) with respect to the z-axis respectively. Let us define \(p(\alpha,\beta)\) as +1 if the measured spins are the same (both up or both down) and -1 if they are different (one up, one down). \(P(\alpha,\beta)\) we then define as the average of p over many trials. Standard quantum theory predicts that \(P(\alpha,\beta)=-\cos(\alpha-\beta)\).

Suppose there are hidden variables that determine what will be measured. For simplicity, we consolidate them all, for the whole system, in the single variable \(\textbf{v}\). Let \(A(\alpha,\textbf{v})=1\) if the particle sent to A, which is set at angle \(\alpha\), with variables \(\textbf{v}\), will be found to have spin up, and similarly with \(A(\alpha,\textbf{v})=-1\) for spin down. Likewise with \(B(\beta,\textbf{v})=1\) and \(B(\beta,\textbf{v})=-1\) for detector B. Clearly \(p(\alpha,\beta,\textbf{v})=A(\alpha,\textbf{v})B(\beta,\textbf{v})\). Since the particles are perfectly anti-correlated when the detectors are aligned, we have \(A(\alpha,\textbf{v})=-B(\alpha,\textbf{v})\).

To average over many trials, we merely average over the different hidden variables, which are assumed to follow some sort of distribution, \(\rho(\textbf{v})\). Thus, we then have \[ P(\alpha,\beta)=\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v})A(\alpha,\textbf{v})B(\beta,\textbf{v})d\textbf{v} =-\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v})A(\alpha,\textbf{v})A(\beta,\textbf{v})d\textbf{v} \] \[ P(\alpha,\beta)-P(\alpha,\gamma)= -\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v}) \left [A(\alpha,\textbf{v})A(\beta,\textbf{v})-A(\alpha,\textbf{v})A(\gamma,\textbf{v}) \right] d\textbf{v} \] As \(A^2(\alpha,\textbf{v})\) for any input variables, we can write: \[ P(\alpha,\beta)-P(\alpha,\gamma)= -\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v}) \left [1-A(\beta,\textbf{v})A(\gamma,\textbf{v}) \right]A(\alpha,\textbf{v})A(\beta,\textbf{v}) d\textbf{v} \] Given that \[ \left |\int_{R} f(\textbf{x})d\textbf{x} \right |\leq\int_{R} |f(\textbf{x})|d\textbf{x}, \;\;\; | A(\alpha,\textbf{v})|=1, \;\;\; \rho(\textbf{v}) \left [1-A(\beta,\textbf{v})A(\gamma,\textbf{v}) \right]\geq 0 \] We then have \[ |P(\alpha,\beta)-P(\alpha,\gamma)|\leq \int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v}) \left [1-A(\beta,\textbf{v})A(\gamma,\textbf{v}) \right] d\textbf{v} \] \[ |P(\alpha,\beta)-P(\alpha,\gamma)|\leq1+P(\beta,\gamma) \] Which is Bell's Inequality. This equality should be satisfied for all local hidden variable interpretations (the reason locality is required is to preclude instantaneous or faster-than-light signals being sent from one detector to the other). However, it is incompatible with standard quantum mechanics. For instance, let \(\alpha=0\), \(\beta=\pi/2\) and \(\gamma=\pi/4\). Then \[ |P(\alpha,\beta)-P(\alpha,\gamma) |=\tfrac{1}{\sqrt{2}}\nleq 1+P(\beta,\gamma) =1-\tfrac{1}{\sqrt{2}} \] As there have been experiments performed that violate Bell's inequality, this provides strong evidence against local hidden variables interpretations. However, there are some loopholes: for instance, superdeterminism, a sort of conspiracy theory that not only are the systems we study deterministically governed, but so are our experiments, including us, and are so as to make us observe statistical violations of Bell's inequality regardless.

An associated result called the Kochen-Specker (KS) Theorem shows that non-contextual hidden variable interpretations are incompatible with quantum mechanics. That is, interpretations in which the observables measured have a single definite value independent of how they are measured are incompatible with quantum mechanics. However, it leaves open the possibility for contextual hidden-variables interpretations, in which the manner of measurement is relevant to the obtained result.

One might think that one could use entanglements to send messages faster than light, given that the effects are instantaneous (The moment Alice's electron is observed to have spin up on the z-axis, Bob's electron will have spin down on the z-axis). However, a theorem called the no communication theorem shows that it is not possible for one observer, by measuring some subset of a system, to communicate information to another observer. While the effects may be instantaneous, they do not carry information, and it is only after the two observers meet up and compare results that they note that they have correlations that defy local realism.

The Quantum Zeno Effect

Suppose we have a particle that can be in one of two states (spinning up or down, decayed or not decayed). We can represent it as a 2 by 1 matrix. Suppose it begins in the state \[ \left. |\psi(0) \right \rangle=\begin{bmatrix} 1\\ 0 \end{bmatrix} \] If it is allowed to evolve by itself, its time dependent state is given by \[ \left. |\psi(t) \right \rangle=\begin{bmatrix} \alpha(t)\\ \beta(t) \end{bmatrix} \] Where the functions satisfy the condition stated above at t=0, and the state is properly normalized. Suppose that the other state is stable, i.e., that once it "flips" it stays "flipped".

Suppose \(|\beta(t)|^2\approx (t/\tau)^n\) for t close to 0, where \(\tau\) is some characteristic time of the system. Suppose we allow the state to evolve unperturbed for a length of time T (small relative to \(\tau\)), and then measure it. The probability that it will be found in the original state is simply \[ P_1=|\alpha(T)|^2\approx 1- (T/\tau)^n \] However, suppose, instead, that we measure it N times, after each time of length T/N. Then the chance that it will be found in the original state is the chance that it hadn't been found to have changed after any interval. That can be found by the usual methods of probability theory: \[ P_N=(|\alpha(T/N)|^2)^N\approx (1- \left (\tfrac{T}{N\tau} \right)^n)^N\approx e^{- \left(\tfrac{T}{\tau}\right)^n N^{1-n}} \] Thus, if \(n>1\), the probability tends to 1 as N increases, and if \(n<1\) the probability tends to 0 as N increases (if \(n=1\), the probability tends to an exponential function of time). Thus, if the probability changes in the appropriate way, watching a system repeatedly tends to keep it in the same state. Moreover, if the system is measured continuously, it would never change at all. This has lead some to remark that a quantum watched pot never boils. This phenomenon is called the quantum Zeno effect, after the philosophical paradoxes of a similar nature.

Quantum Teleportation and Indirect Entanglement

Suppose Alice and Bob are in separate locations, but connected by classical communication channels. They also each have one of a pair of entangled particles in the state \[ \tfrac{1}{\sqrt{2}}\left. |\uparrow \right \rangle_A\left. |\uparrow \right \rangle_B+\tfrac{1}{\sqrt{2}}\left. |\downarrow \right \rangle_A\left. |\downarrow \right \rangle_B \] Where the subscripts denote whose particle it is. Alice also has another particle in the arbitrary state \[ \alpha\left. |\uparrow \right \rangle_C+\beta\left. |\downarrow \right \rangle_C \] The state of the entire system can be written as \[ \tfrac{\alpha}{\sqrt{2}}\left. |\uparrow \uparrow \uparrow \right \rangle_{ABC}+ \tfrac{\alpha}{\sqrt{2}}\left. |\downarrow \downarrow \uparrow \right \rangle_{ABC}+ \tfrac{\beta}{\sqrt{2}}\left. |\uparrow \uparrow \downarrow \right \rangle_{ABC}+ \tfrac{\beta}{\sqrt{2}}\left. |\downarrow \downarrow \downarrow \right \rangle_{ABC} \] This can also be written in the form \[ \frac{1}{2} \begin{pmatrix} \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}+\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}}\left (\alpha\left. |\uparrow \right \rangle_{B}+\beta\left. |\downarrow \right \rangle_{B} \right ) + \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}-\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}}\left (\alpha\left. |\uparrow \right \rangle_{B}-\beta\left. |\downarrow \right \rangle_{B} \right ) \\ + \tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}+\left. |\downarrow \uparrow \right \rangle_{AC}}{\sqrt{2}} \left (\beta\left. |\uparrow \right \rangle_{B}+\alpha\left. |\downarrow \right \rangle_{B} \right ) +\tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}-\left. |\downarrow \uparrow \right \rangle_{AC}}{\sqrt{2}}\left (\beta\left. |\uparrow \right \rangle_{B}-\alpha\left. |\downarrow \right \rangle_{B} \right ) \end{pmatrix} \] Thus, if Alice measures her pair of particles to be in any of the four entangled states (all of which are mutually orthogonal, and so are completely distinguishable) \[ \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}+\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}},\; \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}-\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}},\; \tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}+\left. |\downarrow \uparrow \right \rangle_{AC}}{\sqrt{2}},\; \tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}-\left. |\downarrow \uparrow \right \rangle_{AC}} {\sqrt{2}} \] Bob's state will become, respectively \[ \left (\alpha\left. |\uparrow \right \rangle_{B}+\beta\left. |\downarrow \right \rangle_{B} \right ),\; \left (\alpha\left. |\uparrow \right \rangle_{B}-\beta\left. |\downarrow \right \rangle_{B} \right ),\; \left (\beta\left. |\uparrow \right \rangle_{B}+\alpha\left. |\downarrow \right \rangle_{B} \right ),\; \left (\beta\left. |\uparrow \right \rangle_{B}-\alpha\left. |\downarrow \right \rangle_{B} \right ) \] It then suffices for Alice to communicate to Bob which entangled state she measured, and then Bob can apply an appropriate operator to put his particle in the state in which particle C was originally. Thus, the state has been teleported from Alice to Bob. Indeed, neither Alice nor Bob need know what particle C's original state was, though they can know that it was perfectly teleported. Note that the entangelment between Alice's and Bob's particles is, in the end, broken, and Alice's two particles are left entangled.

**********
Another example of the odd nature of entanglement can be demonstrated with the following. Suppose we have two independent sources that produce the entangled particle pairs \[ \tfrac{\left. |\uparrow \uparrow \right \rangle_{AB}+\left. |\downarrow \downarrow \right \rangle_{AB}}{\sqrt{2}},\;\; \tfrac{\left. |\uparrow \uparrow \right \rangle_{CD}+\left. |\downarrow \downarrow \right \rangle_{CD}}{\sqrt{2}} \] Particle A is sent to Alice, D to Dave, and B and C to Becca. We can write the total state of the system as follows \[ \tfrac{1}{2}\left ( \left. |\uparrow \uparrow\uparrow \uparrow \right \rangle_{ABCD}+ \left. |\uparrow \uparrow\downarrow \downarrow \right \rangle_{ABCD}+ \left. |\downarrow \downarrow\uparrow \uparrow \right \rangle_{ABCD}+ \left. |\downarrow \downarrow\downarrow \downarrow \right \rangle_{ABCD} \right ) \] Alternatively, we could write it the following, equivalent way \[ \frac{1}{2} \begin{pmatrix} \tfrac{\left. |\uparrow \uparrow \right \rangle_{BC}+\left. |\downarrow \downarrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \uparrow \right \rangle_{AD}+\left. |\downarrow \downarrow \right \rangle_{AD}}{\sqrt{2}}+ \tfrac{\left. |\uparrow \uparrow \right \rangle_{BC}-\left. |\downarrow \downarrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \uparrow \right \rangle_{AD}-\left. |\downarrow \downarrow \right \rangle_{AD}}{\sqrt{2}} \\ + \tfrac{\left. |\uparrow \downarrow \right \rangle_{BC}+\left. |\downarrow \uparrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \downarrow \right \rangle_{AD}+\left. |\downarrow \uparrow \right \rangle_{AD}}{\sqrt{2}} + \tfrac{\left. |\uparrow \downarrow \right \rangle_{BC}-\left. |\downarrow \uparrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \downarrow \right \rangle_{AD}-\left. |\downarrow \uparrow \right \rangle_{AD}}{\sqrt{2}} \end{pmatrix} \] Thus, if Becca measures to see if her two particles are in any of the standard entangled states, as in the quantum teleportation setup, Alice's and Dave's particles will become entangled, and in the same entangled state as Becca's particles, no less. Becca can disentangle her particles from Alice's and Dave's, while entangling Alice's and Dave's particles, which initially bore no relation to one another. In this way, two particles can become entangled without ever having interacted, so entanglement need not require interaction.

Spatial Phenomena

Let us look at the case of an electron in an infinite quantum well. That is, an electron in the potential that has the form \[ V(x)=\left\{\begin{matrix} 0\;\;\;0\leq x \leq L \\ \infty\;\;\mathrm{o.w.} \end{matrix}\right. \] As the wavefunction must be continuous, and clearly the wavefunction is zero outside the well, we have \(\psi(0)=0\) and \(\psi(L)=0\). Let us suppose the wavefunction is in an energy eigenstate. In that case, we solve the time-independent Schrodinger equation inside the well: \[ E\psi(x)=\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}\psi(x) \] This has the solutions \[ \psi(x)=A\cos(\lambda x)+B\sin(\lambda x) \] Where \(\lambda=\sqrt{2mE}/\hbar\). From the condition that \(\psi(0)=0\), \(A=0\). From the condition that \(\psi(L)=0\), \(\lambda=n\pi/L\), where n is a positive integer. From the normalization condition, we have \(B=\sqrt{2/L}\). Thus \[ \psi_n(x)=\sqrt{\frac{2}{L}}\sin\left ( \frac{n\pi x}{L} \right ) \] And the corresponding energy is \[ E_n=n^2\frac{\pi^2 \hbar^2}{2m L^2} \] Note that the energy is quantized, that is, it is only ever found to have a value in this discrete set of values. This feature is common in quantum mechanics: the boundary conditions, or conditions for convergence will restrict certain observables to fall into a discrete set. A related phenomenon is when the set of possible values for an observable falls into a fragmented set, of the form \([a_1,a_2]\cup[a_3,a_4]\cup...\) where the a's are strictly increasing. In such a case, the system will have allowed bands, and will need sizable "kicks" to get over the gaps. This is the basis for how transistors work.

Note also that in the case of the quantum well, all the eigenfunctions are orthogonal and form a complete set. An arbitrary initial wavefunction \(\psi(x,0)\) will, at time t be equal to \[ \psi(x,t)=\frac{2}{L}\sum_{n=1}^{\infty} c_n \sin\left ( \frac{n\pi x}{L} \right ) e^{-itE_n/\hbar} , \;\; c_n=\int_{0}^{L}\psi(x,0)\sin\left ( \frac{n\pi x}{L}\right )dx \] We can also see something of the correspondence principle, namely, that for high energies, the probability distribution is nearly uniform in the well (it oscillates, as it goes above and below its average value, but the scale of the oscillations, for high enough energy, is imperceptible at macroscopic scales). Classically, for a particle bouncing back and forth in such a well, we would expect a uniform distribution (supposing we didn't know where the particle began).

Another result from this, which is true of quantum systems in general, is that even in the lowest energy state, when the most energy possible has been removed (the system is as "cold" as possible), the energy is non-zero. This is called the zero-point energy. Thus, even at "absolute" zero, the electron would still not be motionless, since, in this case \(0< E_1 =\left \langle p^2 \right \rangle/2m\), and so the root-mean-square momentum would be non-zero.

An important case of a sort of quantum well is the atom, in which the nucleus attracts the electrons and so confines them. In the case of the atom, there are likewise quantized energy states. Since these are stationary states, the wavefunction does not vary with time, and so the effective charge density likewise is constant. This explains why the atom does not radiate energy, as it would in the classical case. However, in the case of the atom, which is necessarily a three-dimensional system, the states are also quantized with respect to angular momentum.

**********
Another interesting phenomenon is that of quantum tunneling. Suppose we have a particle moving in the +x direction impinging on a finite barrier of the form \[ V(x)=\left\{\begin{matrix} \tfrac{\hbar^2 q}{2m}\;\;\;0\leq x \leq L \\ 0\;\;\;\;\; \mathrm{o.w.} \end{matrix}\right. \] Let us call the regions before in and beyond the barrier regions I, II, and III respectively. Suppose it initially has momentum \(\hbar k\). Its energy will be given by \(\hbar^2 k^2/2m\), and further suppose that this energy is less than the potential barrier. Solving the schrodinger equation inside the barrier, we easily find that the wavefunction will be of the form \(A e^{\lambda x}+Be^{-\lambda x}\), where \(\lambda=\sqrt{q-k^2}\).

Then we can write the wavefunction (ignoring normalization) in the three regions as \[ \psi(x)=\left\{\begin{matrix} A_1 e^{ikx}+B_1 e^{-ikx} \;\;\;\, \mathrm{ I} \\ A_2 e^{\lambda x}+B_2 e^{-\lambda x} \;\;\;\;\; \mathrm{ II} \\ A_3 e^{ikx}+B_3 e^{-ikx} \;\;\;\;\; \mathrm{III} \end{matrix}\right. \] However, \(B_3=0\), since that term corresponds to a wave moving to the left, which would not happen in the case of an incident wave going in the +x direction. The other coefficients can be found by ensuring that the wavefunction and its derivative are continuous. In particular, we find that \[ T=|A_1/A_3|^2=\frac{1}{1+\tfrac{q^2}{4k^2(q-k^2)}\sinh(\lambda L)} \] This represents the probability that the particle will be found on the opposite side of the barrier. Note that, contrary to classical mechanics, there is a definite, non-zero probability of finding the particle on the opposite side of the barrier. This feature of particles doing classically impossible things is a frequent characteristics of quantum mechanics. This phenomenon helps explain why the sun continues to fuse hydrogen even though it is not hot enough for the atoms to overcome the electrostatic repulsion, as the particles have a probability of tunneling through the classically forbidden region.

Similarly, we can see that, even if the particle did have enough energy to cross the barrier, there is not a 100% chance of finding it on the other side of the barrier. Just as a particle may sometimes cross a classically forbidden barrier, sometimes it fails to cross a classically allowed barrier.

Friday, December 25, 2015

The Double Angle Formula

Deriving the formula: \(\sin(2x)=2\sin(x)\cos(x)\)

Way 1: From Geometry

\[ RB=QA \;\;\;\;\;\;\;\;\;\; RQ=BA \] \[ \frac{RQ}{PQ}=\frac{QA}{OQ}=\sin(\alpha) \;\;\;\;\;\;\;\; \frac{PR}{PQ}=\frac{OA}{OQ}=\cos(\alpha) \] \[ \frac{PQ}{OP}=\sin(\beta) \;\;\;\;\;\;\;\; \frac{OQ}{OP}=\cos(\beta) \] \[ \frac{PB}{OP}=\sin(\alpha+\beta) \;\;\;\;\;\;\;\; \frac{OB}{OP}=\cos(\alpha+\beta) \] \[ PB=PR+RB=\frac{OA}{OQ}PQ+QA \] \[ \frac{PB}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OQ}\frac{OQ}{OP} \] \[ \sin(\alpha+\beta)=\cos(\alpha)\sin(\beta)+\sin(\alpha)\cos(\beta) \] Particularly, if \(\alpha=\beta=x, \;\;\;\; \sin(2x)=2\sin(x)\cos(x)\).

Way 2: From the Product Formula

Recall from this post that the product formulas for sine and cosine are, respectively: \[ \sin(x)=x\prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 n^2} \right ) \] And \[ \cos(x)=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \] Thus \[ \sin(2x)=2x\prod_{n=1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) =2\cdot x\prod_{n=\mathrm{even}\geq1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) \cdot \prod_{n=\mathrm{odd}\geq1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) \] \[ \sin(2x) =2\cdot x\prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 n^2} \right ) \cdot \prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 (n-1/2)^2} \right ) \] \[ \sin(2x)=2\cdot \sin(x) \cdot \cos(x) \]

Way 3: From the Taylor Series

The Taylor series for sine and cosine can be construed as, respectively: \[ \frac{\sin(\sqrt{x})}{\sqrt{x}}=\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k+1)!}x^k \] \[ \cos(\sqrt{x})=\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^k \] Thus \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{j=0}^{\infty}\frac{(-1)^j}{(2j+1)!}x^j \sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^k \] Using a Cauchy product, we find: \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{j=0}^{\infty}c_j x^j \] Where \[ c_m=\sum_{n=0}^{m} \frac{(-1)^n}{(2n+1)!}\frac{(-1)^{m-n}}{(2(m-n))!} =\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{m} \binom{2m+1}{2n+1} =\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{m} \binom{2m}{2n+1}+\binom{2m}{2n} \] \[ c_m=\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{2m} \binom{2m}{n}=\frac{(-1)^m}{(2m+1)!}2^{2m} \] And thus \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{m=0}^{\infty}\frac{(-1)^m}{(2m+1)!}(4x)^m=\frac{\sin(\sqrt{4x})}{\sqrt{4x}}=\frac{\sin(2\sqrt{x})}{2\sqrt{x}} \] Substituting \(x=y^2\) and rearranging, we find: \( 2\sin(y)\cos(y)=\sin(2y) \)

Way 4: From Euler's Formula

Euler's formula is: \[ e^{ix}=\cos(x)+i\sin(x) \] Thus \[ e^{i2x}=\cos(2x)+i\sin(2x)=\left ( e^{ix} \right)^2=\left (\cos(x)+i\sin(x) \right )^2 \] \[ e^{i2x}=\left [\cos^2(x)-\sin^2(x) \right ]+i\left [ 2\sin(x)\cos(x) \right ] \] Thus, by equating real and imaginary parts, \(\sin(2x)=2\sin(x)\cos(x)\) and \(\cos(2x)=\cos^2(x)-\sin^2(x)\)

The Half-Angle Formulas

We find from the last demonstration \[ \cos(2x)=\cos^2(x)-\sin^2(x)=2\cos^2(x)-1=1-2\sin^2(x) \] Substituting \(2x=y\) and solving, we find: \[ \sin\left ( \frac{y}{2} \right )=\sqrt{\frac{1-\cos(y)}{2}} \] \[ \cos\left ( \frac{y}{2} \right )=\sqrt{\frac{1+\cos(y)}{2}} \]

An Infinite Product Formula

We can write the double-angle formula as \[ \sin(x)=2\sin\left ( \frac{x}{2} \right )\cos\left ( \frac{x}{2} \right ) \] Iterating this, we then have \[ \sin(x)=2^n\sin\left ( \frac{x}{2^n} \right ) \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] However, in the limit as n gets large, \(2^n\sin\left ( \frac{x}{2^n} \right )\rightarrow x\). Thus, letting n go to infinity, we have \[ \sin(x)=x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] A simple theorem of this general result is \[ \frac{\pi}{2}=\frac{1}{\cos(\tfrac{\pi}{4})\cos(\tfrac{\pi}{8})\cos(\tfrac{\pi}{16})\cdots } =\frac{1}{\sqrt{\tfrac{1}{2}}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}}}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}}}}\cdots }=\frac{2}{\sqrt{2}}\frac{2}{\sqrt{2+\sqrt{2}}}\frac{2}{\sqrt{2+\sqrt{2+\sqrt{2}}}}\cdots \] This is known as Viète's formula.

A Nested Radical Formula

We note that \[ 2\cos(x/2)=\sqrt{2+2\cos(x)} \] Thus, by iterating, we find \[ 2\cos(x/2^n)=\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}} \] Thus \[ 2\sin(x/2^{n+1})=\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}}} \] And we can thus conclude that \[ x=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}}} \] For example \[ \pi/3=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+1}}}}}} \] \[ \pi/2=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2}}}}}} \]

An Infinite Series

Above, we derived \[ \sin(x)=x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] Taking the log of both sides and differentiating \[ \frac{\mathrm{d} }{\mathrm{d} x}\ln\left (\sin(x) \right )=\frac{\mathrm{d} }{\mathrm{d} x}\ln\left (x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \right ) \] \[ \cot(x)=\frac{1}{x}-\sum_{k=1}^{\infty}\frac{1}{2^k}\tan \left ( \frac{x}{2^k} \right ) \] \[ \\ \frac{1}{x}-\cot(x)=\sum_{k=1}^{\infty}\frac{1}{2^k}\tan \left ( \frac{x}{2^k} \right ) \] From this we can easily derive \[ \frac{1}{\pi}=\sum_{k=2}^{\infty}\frac{1}{2^k}\tan \left ( \frac{\pi}{2^k} \right ) \]

A Definite Integral

Let \[ I=\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )dx =\int_{\pi/2}^{\pi}\ln\left ( \sin(x) \right )dx =\int_{0}^{\pi/2}\ln\left ( \cos(x) \right )dx \] Then \[ 2I=\int_{0}^{\pi}\ln\left ( \sin(x) \right )dx =2\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )dx =\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )+\ln\left ( \cos(x) \right )dx \] \[ 2I=\int_{0}^{\pi/2}\ln\left ( \sin(x) \cos(x) \right )dx=\int_{0}^{\pi/2}\ln\left (\tfrac{1}{2} \sin(2x) \right )dx=-\frac{\pi}{2}\ln(2)+\int_{0}^{\pi/2}\ln\left (\sin(2x) \right )dx \] By the substitution \(u=2x\), we then have \[ 2I=-\frac{\pi}{2}\ln(2)+\tfrac{1}{2}\int_{0}^{\pi}\ln\left (\sin(u) \right )du=-\frac{\pi}{2}\ln(2)+I \] Therefore \[ I=\int_{0}^{\pi/2}\ln\left (\sin(x) \right )dx=-\frac{\pi}{2}\ln(2) \]