Hyperphronesis

the Origin of the Easter Faith

2025-04-21T14:04:00.000-07:00

An Account of the Origin of the Easter Faith

(11,300 words — about a 50-minute read)

Download:

Introduction: Deciphering the Nicodemus Scroll

In one of the most remarkable archaeological discoveries of recent decades, researchers have deciphered a previously unreadable carbonized scroll discovered from the ruins of Herculaneum. The text—a narrative concerning Jesus of Nazareth and the early years of the Christian movement—is interwoven with excerpts from the epistles of Paul, for unknown reasons that remain debated. The narrative portion claims authorship by Nicodemus of Jerusalem, the Pharisee who appears in the Gospel of John as a secret inquirer and later defender of Jesus. This attribution has led scholars to refer to the manuscript as the “Nicodemus Scroll.” Though its origin, authorship, and intent remain under scholarly debate, the scroll offers what may be the earliest known account of the events at the heart of the Easter faith that became Christianity—presented here for the first time in full English translation.

The scroll was first uncovered in October of 2018 during excavations southeast of the Palestra in the ancient Roman city of Herculaneum. Archaeologists found it in a partially collapsed wooden chest in the storeroom of an ancient private home, sealed beneath the volcanic debris from the eruption of Mount Vesuvius in 79 CE. Like the scrolls from the Villa of the Papyri, the heat from the slag caused the scrolls to become carbonized and brittle, and thus impossible to open and read with conventional means.

The structure where it was discovered, designated Domus 24N, has since been informally nicknamed the “Domus of Theophilus” by researchers. The house—likely the residence of a prosperous Herculanean merchant or ship owner with ties to the eastern Mediterranean—featured a private east-facing exedra or meeting room with built-in benches and what some researchers think could be a menorah graffito, details that suggest it belonged to an educated Roman god-fearer, a non-Jewish sympathizer of Judaism and its scriptures. How, why, and from where the owner of the domus procured the text remains unclear, leaving the provenance a matter of speculation.

The scroll was recovered alongside several other carbonized papyri forming part of the same collection. These similarly deciphered texts included treatises by Philo of Alexandria, a number of complete books from the Septuagint (the Greek translation of the Old Testament), and partial texts of Paul's epistles to the Galatians and the Corinthians. The Nicodemus Scroll itself measures approximately 28 cm tall and 8 cm across in its rolled form, unrolling to about 1.8 meters. It is composed of fine papyrus, written in Koine Greek using a carbon-based ink, and appears to be the work of a single practiced scribe.

The decipherment of the Nicodemus Scroll was made possible through a combination of conservation, advanced imaging techniques, and sophisticated data analysis, work that was carried out between 2019 and 2024. The project was led by a team from the University of Naples in collaboration with the ENEA research center, the Italian National Institute of Nuclear Physics, and the Vesuvius Scrolls Project. Using a technique known as X-ray phase-contrast tomography and a synchrotron X-ray source at the European Synchrotron Radiation Facility in Grenoble, researchers were able to virtually unroll each scroll from the Theophilus Domus without physically opening it. By detecting subtle variations in carbon density, the team reconstructed ink traces and letter forms layer by layer, ultimately revealing a near-complete Greek text. The successful decipherment was announced in February 2025, paving the way for translation and scholarly analysis.

The scroll contains a structured literary composition blending two distinct strands: a narrative account in the voice of Nicodemus of Jerusalem and a series of interwoven excerpts from the letters of Paul, primarily drawn from 1 Corinthians, but also including material from Romans, Galatians, and 2 Corinthians. The narrative purports to be a personal letter from Nicodemus, a Pharisee mentioned in the Gospel of John, addressed to Simon of Samaria, also known as Simon Magus—the infamous magician encountered by Peter in the Acts of the Apostles. The text is written in an elevated formal Greek and framed as an orderly eyewitness account of Jesus's life, death, and the emergence of his followers.

The voice of Nicodemus narrates clearly the major events, while the Pauline passages appear as inserted quotations—sometimes aligning with the narrative, sometimes in tension with it—creating a layered and complex literary structure. The narrative, though spanning the life, death, and purported resurrection appearances of Jesus of Nazareth, compresses much of his Galilean ministry into a few brief yet vivid episodes before moving swiftly to the final week in Jerusalem. It then extends into the very early history of the church under James, Peter, and John, followed by a section recounting Paul's autobiography: first as a persecutor of the movement, then as the apostle to the Gentiles.

Jesus's baptism by John, the calling of disciples, and the preaching in Galilee are summarized with minimal elaboration. Particular emphasis is placed on the passion in Jerusalem—his confrontation in the temple, his arrest, and his execution under Pontius Pilate. The burial by Joseph of Arimathea is described in detail, including a striking and much-debated episode in which Joseph returns to the tomb on the night before Easter morning and reburies the body of Jesus. The resurrection appearances that follow closely align with Paul's list of appearances in the tradition he transmits in 1 Corinthians 15 and include moments that closely resemble events from Jesus's earthly ministry as accounted in the Gospels, sparking debate about chronology and retrojection.

The trial scenes before the Sanhedrin, while similar in outline to those found in the book of Acts, contain significant dialogues from the apostles and council members not preserved in the New Testament. Whoever composed together the narrative and Pauline epistles was clearly aware of the connections between the two; they may have been simply juxtaposing them for comparison.

Dr. Alessia Manfredi, professor of New Testament Studies at the University of Naples, remarked in a press release:

“What we're looking at is not just another fragment of early Christian literature. This is a sustained, structured, and dispassionate narrative that appears to predate or run parallel to the gospels, and it engages directly with Paul in theology. The layers may go back to the very earliest years of the church. So the value to the study of Christian origins cannot be overstated. Whether it was composed in support of Paul or in response to him is still unclear—but it is not derivative.”

Scholars remain sharply divided over the scroll’s purpose and point of view. Some argue that it reflects a pro-Pauline stance, using the embedded narrative to reinforce the authority of Paul's teachings, validate his list of resurrection witnesses, and support his authority. Others see it as anti-Pauline, subtly challenging Paul's version of events by embedding his quotations in a framework that emphasizes the authority of the Jerusalem apostles and portrays Paul as a latecomer and upstart. A third view holds that the juxtaposition is deliberately ambiguous, intending to explore—or even satirize—the growing tensions between competing interpretations of Jesus's legacy in the mid-first century. The literary strategy—placing Pauline material within a non-Pauline narrative—has fueled ongoing debate about whether the scroll represents a synthesis, a critique, or a theological provocation.

While the physical scroll has been securely dated to the first century through radiocarbon analysis and stratigraphic context, the question of when the text itself was composed remains unsettled. Most scholars agree that the core narrative, dubbed “The History of the Nazoreans,” likely originated around 35 to 40 CE—within just a few years of the events it describes. The finished composition, including the Pauline material, is generally dated somewhat later, possibly by 60 CE.

Despite the early manuscript date, some evangelical scholars have dismissed the scroll as a later theological fiction retrojected into an earlier period. A few voices, particularly within apologetic circles, have even raised the possibility of a modern forgery. But this view has not gained traction among papyrologists or archaeologists, who cite the material integrity, handwriting, and archaeological context as compelling evidence of authenticity.

A spokesperson from the Vatican responded to our request for comment:

“Whatever its origin, this document does not alter the Church's understanding of the apostolic tradition or the authority of the canonical Gospels. It may reflect a fringe voice from an unsettled time, but it holds no theological weight.”

Dr. Nathan Carr, senior fellow at the Evangelical Theological Research Institute, dismissed the scroll’s implications more bluntly:

“We've seen alleged discoveries like this before. People once said the Dead Sea Scrolls proved a proto-Christian movement existed before Jesus, but they ended up proving only how reliably the Bible was transmitted. This discovery will pass like all the others.”

Yet for some observers, the confidence in these responses only underscores the unease with which the scroll is being received. A small number of fringe voices have proposed that the scroll is an elaborate modern forgery planted to undermine Christian faith.

Reverend Dr. Alan Brewer, an independent apologist and former professor at the now-defunct Baptist Bible College of Clark Summit, Pennsylvania, suggested on his weekly podcast:

“We can't rule out the possibility that this so-called scroll was planted during the excavation. It could have been carbon dated using compromised methods and forged in Greek by someone with an anti-theological agenda and access to advanced imaging knowledge. For all we know, the enemy himself may have placed it there to lead the faithful astray or to test our discernment in these last days.”

Such claims have been widely dismissed by experts in archaeology, paleography, and material science.

The scroll is currently housed in a secure conservation facility within the National Archaeological Museum of Naples, under the joint stewardship of the University of Naples and the Herculaneum Conservation Project. It is not on public display, and access remains restricted to qualified researchers. While debate continues over its theological significance, its historical and archaeological importance is already beyond dispute.

What follows is the first full English-language translation of the deciphered text. Readers, especially those of faith, are advised to continue with discretion and an openness to historical curiosity.

The History of the Nazoreans

Paul, an apostle—not of men, neither by man, but by Jesus Christ and God the Father, who raised him from the dead—grace be to you and peace from God the Father and from our Lord Jesus Christ, who gave himself for our sins, that he might deliver us from this present evil world, according to the will of God and our Father, to whom be glory forever and ever. Amen.

I marvel that ye are so soon removed from him that called you into the grace of Christ unto another gospel, which is not another; but there be some that trouble you and would pervert the gospel of Christ. But though we, or an angel from heaven, preach any other gospel unto you than that which we have preached unto you, let him be accursed. As we said before, so say I now again: if any man preach any other gospel unto you than that ye have received, let him be accursed. But I certify you, brethren, that the gospel which was preached of me is not after man. For I neither received it of man, neither was I taught it, but by the revelation of Jesus Christ.

Moreover, brethren, I declare unto you the gospel which I preached unto you, which also ye have received, and wherein ye stand; by which also ye are saved, if ye keep in memory what I preached unto you—unless ye have believed in vain.

I, Nicodemus, son of Gurion of Jerusalem, being myself an eyewitness and having carefully investigated all things from the beginning, write to you, most excellent Simon of Samaria, Magus, an orderly account of the man Jesus called Christ and of the course of his followers after he was put to death. For it seemed good to you to know the certainty and truth of the many things said of him.

Now concerning the sect of the Nazoreans—those who have faith in the Christ crucified and raised from the dead—the origin was as follows:

Jesus had been a mason from the town of Nazareth in Galilee, the son of Mary, and, as was supposed, the son of Joseph, son of Jacob, of the line of David. He went down to the Jordan in the days of John the Baptizer and was baptized by him and was numbered among his closest disciples.

After Herod Antipas put John to death, Jesus returned to Galilee, preaching repentance throughout the synagogues and proclaiming the nearness of the kingdom of God. He debated with scribes and Pharisees and was a doer of wondrous deeds, healing many and casting out devils, and was held in great respect by the elders and teachers of the law.

Jesus ordained twelve disciples and sent them out. And they went forth, preaching that all should repent and that the kingdom of God had come near. They cast out many devils and anointed with oil many that were sick and healed them.

When his fame spread, many believed he was John the Baptizer raised from the dead, and some believed he was a prophet like those in the days of old. Some said Elijah, others Jeremiah, but others said he was the Christ of the line of David, the King. And great crowds gathered to him to hear his teaching, to be healed, and to receive the baptism of John for the remission of sins.

In the seventeenth year of Tiberius Caesar, when Pontius Pilate was governor of Judea and Caiaphas was high priest, Jesus went up to Jerusalem with his disciples for the feast of unleavened bread.

And when they drew near to Bethany at the Mount of Olives, they brought a colt to Jesus, threw their garments on it, and he sat upon it. Many spread their garments on the road, and others cut branches from the trees and laid them down before him. Those who went before and those who followed cried out, saying:

Hosanna! Blessed is he who comes in the name of the Lord!
Blessed is the kingdom of our father David that comes in the name of the Lord!
Hosanna in the highest!

Jesus entered Jerusalem and went into the temple. He began to drive out those who bought and sold in the temple, overturning the tables of the money changers and the seats of those who sold doves. He would not allow anyone to carry wares through the temple.

Then he taught, saying to them:

Is it not written, My house shall be called a house of prayer for all nations? But you have made it a den of thieves.

When the Roman guards heard this, they reported it to the centurion, who sought how they might arrest him, for great crowds had gathered, astonished at his teaching. And though they sought to lay hold of him, they feared the people, for the crowds seemed ready to do whatever he commanded, lest his great influence would lead them to rebellion.

So the Romans watched him and sent spies from among the Herodians, who pretended to be righteous, that they might seize on his words in order to deliver him to the power and authority of the governor. But they were not able to catch him in his words in the presence of the people.

When evening had come, he went out of the city. During the daytime, he taught in the temple, but at night he went out and stayed on the Mount of Olives. Early each morning, all the people came to him in the temple to hear him.

On the day before the Passover lambs were slain, Jesus came with the twelve into the city to a guest chamber in a large upper room. When the hour had come, he sat down, and the twelve apostles with him. Then he said to them:

With fervent desire I have desired to eat this coming Passover with you. For I say to you, I will not eat of it again until it is fulfilled in the kingdom of God.

Then he took the cup, gave thanks, and said:

Take this and divide it among yourselves. For I say to you, I will not drink of the fruit of the vine until the kingdom of God comes.

They sat and ate and drank. And when they had sung a hymn, they went out to the Mount of Olives. They came to a place called Gethsemane, and there they prayed and slept.

Then Judas, son of Simon Iscariot, one of the twelve, having slipped away, came with a detachment of Roman soldiers bearing swords and clubs, along with officers from the chief priests, the scribes, and the elders.

Now his betrayer had given them a sign, saying:

Whomever I kiss, he is the one. Seize him and lead him away safely.

As soon as he arrived, he went straight to Jesus and kissed him. Then they laid hands on him and took him.

His disciple Simon, surnamed Cephas—who is called Peter—drew his sword and struck the servant of the high priest, cutting off his ear. Then they all forsook him and fled.

They led Jesus away and bound him in prison. In the night, the chief priests, the elders, and the scribes came to visit him. Though they debated with him and, in fear for his life, urged him to recant, Jesus answered:

O you of little faith, do you not think that I cannot now pray to my Father, and he will provide me with more than twelve legions of angels?

And though they continued to plead with him, he answered nothing. They went away astonished and grieving, for they all knew he would be found guilty of death.

And immediately in the morning, when the sun had risen, the guards bound Jesus, led him away, and delivered him to Pilate. The centurion came, saying:

We found this man inciting the nation, forbidding them to pay taxes to Caesar, and saying that he himself is the Messiah, a king. We heard him say, ‘I will destroy this temple made with hands, and within three days I will build another made without hands.’ He stirs up the people, teaching throughout all Judea, beginning from Galilee even to this place.

Then Pilate called together the chief priests, the leaders, and the elders of the people to question Jesus. And Caiaphas the high priest stood up in the midst and asked him, saying:

Are you the Christ, the son of the Blessed David?

Jesus said:

I am. And you will see the Son of Man sitting at the right hand of power and coming with the clouds of heaven.

Then the high priest tore his robe and said:

You have heard him. What do you think?

And Pilate replied:

Am I a Jew?

For he did not understand the interpretation, being:

The Lord shall make my enemies my footstool, and the kingdom and dominion and the greatness of the kingdom under the whole heaven shall be given to us, the people of the saints of the Most High.

Then he asked Jesus plainly:

Are you the king of the Jews?

And Jesus answered and said to him:

You have said it.

The witnesses continued making accusations against him, so that Pilate asked him again, saying:

Do you answer nothing? See how many things they testify against you.

But Jesus still answered nothing, so that Pilate marveled.

For I delivered unto you first of all that which I also received: how that Christ died for our sins according to the Scriptures.

And so Pilate, after he had scourged Jesus, delivered him to be crucified.

Then the soldiers led him away into the hall called the Praetorium and called together the whole garrison. They twisted a crown of thorns and put it on his head, and they began to salute him:

Hail, King of the Jews!

They struck him on the head with a reed and spat on him. And when they had mocked him, they led him out to crucify him.

They compelled a certain man to bear his cross—Simon, a Cyrenian, the father of Alexander and Rufus, who would be deacons of the church in Rome—as he was coming out of the country. And they brought him to Golgotha, which is translated "Place of a Skull."

It was the third hour, and they crucified him. The inscription of his accusation was written above:

The King of the Jews.

With him, they also crucified two other men, one on his right and the other on his left. The inscription of their accusation was insurrectionist.

There were also women looking on from afar, among whom were Mary Magdalene, Mary the mother of Jesus, James the Just, and Joseph, and Salome—who also followed him and ministered to him when he was in Galilee—as well as many other women who came up with him to Jerusalem.

Those who passed by blasphemed Jesus, wagging their heads and mocking him. Others mourned and turned their faces away from him, and Jesus wept.

At the ninth hour, Jesus cried out with a loud voice:

Eloi, Eloi, lama sabachthani?

When some of the bystanders heard this, they said:

Look, he is calling for Elijah.

Then someone ran, filled a sponge with sour wine, put it on a reed, and gave it to him to drink, saying:

Wait, let us see if Elijah will come to take him down.

Then Jesus cried out with a loud voice and breathed his last.

And that he was buried...

Now when evening had come—because it was the preparation day and the bodies should not remain on the cross on the Sabbath—the chief priests and elders sent forth a man from among them named Joseph of Arimathea, an honorable council member who himself was waiting for the kingdom of God.

He came and went in boldly to Pilate and asked for the bodies of the condemned, saying:

For Moses said in our law: The body shall not remain overnight on the tree, but you shall surely bury him that day, for he who is hanged is accursed of God, that your land may not be defiled, which the Lord your God is giving you as an inheritance. And it is a holy day.

Though Pilate at first refused, Joseph earnestly pleaded with him. And Pilate marveled that they were already dead. At last, calling for the centurion, he commanded that their legs be broken so that they might be buried.

Then the soldiers came and broke the legs of Jesus and of the others who were crucified with him. When he had confirmed it with the centurion, Pilate gave over the bodies to Joseph. He bought fine linen, took them down, and wrapped each of them in linen.

Now in the place where they were crucified, there was a garden, and in the garden, a tomb hewn out of the rock. There they laid Jesus and the other crucified men because of the preparation day—for it was evening, and the tomb was nearby. They rolled a great stone against the door of the tomb.

And Mary Magdalene and Mary the mother of Jesus saw where he was laid. As the evening was quickly approaching, each of them returned to their homes to eat the Passover. And they rested on the Sabbaths according to the commandments.

When both the Sabbaths had passed, Mary Magdalene and the other women went out after evening and bought sweet spices, that they might come in the morning to anoint the body of Jesus.

And that he rose again the third day according to the Scriptures...

Joseph of Arimathea, however, came to the tomb after the Sabbaths in the first watch of the night, with his servants bearing torches and shovels. And Joseph said:

This Jesus, called Christ—was he not a son of David and a son of God? But he has borne the cross of the Law, being made a curse for us. For it is written: Cursed is everyone who hangs on a tree. Wise and righteous as he was, now he is numbered with the transgressors. Come, let us bury these men in cursed ground.

Then they rolled back the great stone, lifted the bodies, and carried them through the valley of Gehenna to that field known to all the inhabitants of Jerusalem, called in their own tongue Akeldama—that is to say, the field of blood. There they dug graves in the earth and laid the bodies of Jesus and the brigands therein. Then, having purified themselves, Joseph and his servants returned to the city and to their homes.

And it was night.

Very early in the morning on the first day of the week, as the sun was rising, Mary Magdalene, Mary the mother of James, Salome, and the other women came to the tomb. They said among themselves:

Who will roll away the stone from the door of the tomb for us?

But when they looked up, they saw that the stone had been rolled away—for it was very large. And entering the tomb, they saw a man sitting on the right side, handling the long white linen cloths, and they were afraid. The man's name was Malchus, who was the gardener.

He turned and asked them:

Whom are you seeking?

Mary Magdalene, taking courage, said to him:

Sir, if you have carried my Lord away, tell me where you have laid him, and I will take him away.

And he said to them:

Do not be afraid. You seek Jesus of Nazareth, who was crucified. He has been raised. He is not here. Behold the place where they laid him.

Then they went out quickly and fled from the tomb, for they trembled and were amazed. And they said nothing to anyone, for they were afraid.

But Mary Magdalene alone ran and came to Simon Peter and said to him:

They have taken away the Lord out of the tomb, and we do not know where they have laid him.

Then Peter and Mary set out and went toward the tomb. The two ran together—Peter following Mary—but she outran Peter and reached the tomb first. She stooped down to look in and saw the linen wrappings lying there, but she did not go in.

Then Simon Peter came, following after her, and went into the tomb. He saw the linen wrappings lying there, and the cloth that had been on Jesus’s head—not lying with the linen wrappings, but folded together in a place by itself.

Then Mary Magdalene, who had reached the tomb first, also went in, and she saw and believed that he must rise from the dead. And she cried out:

Rabboni! He has been taken up!

But Peter was amazed, and he departed, marveling to himself at what had happened.

Then Mary returned from the tomb and told all these things to the eleven and to all the rest. But her words seemed to them like idle tales, and they did not believe them.

And that he was seen of Cephas...

Now it was the final day of the Feast of Unleavened Bread, and many went out, returning to their homes, since the feast was over. But the twelve disciples of the Lord were weeping and sorrowful, and each one, grieving over what had come to pass, departed to his home.

But Simon Peter and his brother Andrew, having taken their nets, went off to the Sea of Galilee with several of the other disciples. Simon Peter said to them:

I am going fishing.

They said to him:

We will go with you.

So they went out and got into the boat and fished late into the evening. Now it was dark, and they set out across the sea toward Capernaum. But the sea became rough, for a strong wind was blowing, and they were straining at the oars against the wind.

Now about the fourth watch of the night, when they had rowed about three or four miles, they saw the figure of a man coming toward them, walking on the sea—and he would have passed by them. They supposed it was a ghost and cried out, for they all saw him and were troubled.

Just as day was breaking, Simon Peter cried out:

Lord, if it is you, command me to come to you on the water.

Then he put on his outer garment, for he had removed it, and cast himself into the sea, coming toward the man—for they were not far from the land, about a hundred yards off. But seeing the wind, he became afraid, and beginning to sink, he cried out:

Lord, save me!

Immediately, the man stretched out his hand, took hold of him, and said:

Be of good cheer. It is I. Do not be afraid.

And when Peter lifted up his eyes, he saw only a fisherman standing on the shore. Yet the disciples did not recognize who he was. When they reached land, they saw a charcoal fire in place, with fish laid on it and bread.

None of them dared ask the man, “Who are you?” Yet Peter knew it was the Lord. He came, took the bread, and gave it to them, and likewise the fish—and they ate.

Having fished all night, they became drowsy and slept. But when they fully awoke, they saw no one.

... then of the twelve.

Then the disciples of Jesus went out into the towns of Caesarea Philippi, to the mountain where Jesus had appointed them his twelve disciples, and where he had said to them:

In the coming kingdom, when I shall sit on the throne of my glory, you who have followed me will also sit on twelve thrones, judging the twelve tribes of Israel.

And along the way they asked among themselves, “Who do men say that Jesus is?” And some answered, “John the Baptizer,” but some said, “Elijah,” and others, “One of the prophets.”

Then they asked among themselves, “But who do you say that he is?”

And Peter answered and said to them:

He is the Christ.

Then he began to teach them that the Christ must suffer many things and be rejected by the elders, the chief priests, and the scribes, and be killed—and after three days rise again. And he spoke this saying openly.

Then the disciples took Peter and began to rebuke him. But when he had turned and looked at the other disciples, Peter rebuked them, saying:

Get behind me, you of Satan, for you do not savor the things of God, but the things of men.

And after six days, the twelve went up a high mountain by themselves. There they saw Jesus, and he was transfigured before them. His form had become radiant, exceedingly white, as no launderer on earth could whiten. And Elijah appeared also, with Moses, and they were talking with Jesus.

When they saw him in his glory, they worshiped him—but some doubted. And as they were departing, Peter said to Jesus:

Master, it is good for us to be here. Let us make three tabernacles—one for you, one for Moses, and one for Elijah,

—not knowing what he was saying.

While he was still speaking, a cloud came and overshadowed them. And they were afraid as they entered the cloud. And from the cloud, Peter heard a voice saying:

Listen to him.

And when the disciples heard it, they fell on their faces and were greatly afraid. When they lifted up their eyes, they saw no one.

Some of the disciples standing there heard it and said it was thunder. Others said, “An angel has spoken to him.”

As they came down from the mountain, Peter said:

Let us tell this to everyone, for the Christ is risen from the dead.

Then the other disciples asked him, “Why then do the scribes say that Elijah must come first?”

And Peter answered and said to them:

Indeed, Elijah comes first and restores all things. And how can it be of the Christ, that he must suffer many things and be treated with contempt? But I say to you that Elijah has already come, and they did to him whatever they wished. So also the Christ suffered at their hands.

Then the disciples understood that he spoke to them of John the Baptizer.

After that, he was seen of above five hundred brethren at once—of whom the greater part remain until this present, but some are fallen asleep.

Afterwards, having come to Capernaum, they departed in a boat to a city called Bethsaida. Some who had followed Jesus saw them going and recognized them, and many ran there together on foot from all the surrounding cities and arrived ahead of them.

When the disciples went ashore, they saw a large crowd and had compassion on them, because they were like sheep without a shepherd. And they began to teach them that the Christ, who was the Son of Man, must suffer many things, be rejected by the elders, the chief priests, and the scribes, and be killed—and after three days rise again, saying:

This is Jesus, whom God has raised up, of which we are all witnesses.

The people rejoiced and praised God and were filled with the Holy Spirit.

Then Stephen, a man full of faith and the Holy Spirit, gazed into heaven and saw the glory of God, and Jesus standing at the right hand of God. And he said:

Look, I see the heavens opened and the Son of Man standing at the right hand of God!

And the people looked up in fear and amazement. For many believed this was the coming of the Son of Man on the clouds of heaven in judgment, as foretold by Daniel.

Some said, “There he is,” and others, “Look, here is the Christ.” Some cried out, “Look, he is in the wilderness,” and others, “Look, he is in the inner rooms.” And many ran to and fro in fear and excitement.

And on that day there were more than five hundred men who saw the Lord—besides women and children.

After that, he was seen of James…

Now the feast of Weeks was near, and the disciples, along with many from Galilee, went up to Jerusalem. Every day they continued to meet together in the temple courts. They broke bread in their homes and ate together with gladness and sincerity of heart.

Now James, the brother of Jesus, had sworn that he would abide in the temple and not eat bread from the hour in which he had drunk the Lord’s cup until he should see him risen from the dead. For James was so zealous for God that he was known as James the Just, and was respected among the scribes and elders. He never left the temple but worshiped night and day, fasting and praying.

Then Jesus appeared to James in a vision and said,

Bring a table and bread.

He took bread, blessed it, broke it, and gave it to James his brother, saying:

My brother, eat your bread, for the Son of Man is risen from the dead.

And when he had eaten, James saw him no more. He arose and joined the other disciples, telling them what he had seen and glorifying God.

… Then of all the apostles.

When the day of Pentecost had come, they gathered at the house of Mary, the wife of Shabbatai the priest, the mother of John whose surname was Mark, where many were assembled together in prayer. And when they had entered, they went up into the upper room where they had eaten with the Lord. For they were staying there—Peter, James, John, and Andrew, Philip, James the Just, Simon the Zealot, and Judas, the brothers of Jesus.

These all continued with one accord in prayer and supplication with the women, among whom was Mary the mother of Jesus, and the other Mary, with the sons of her husband Shabbatai—Joseph surnamed Justus, and Matthias. And the apostles were all together in one place.

While they were praying, Jesus himself stood among them and said to them,

Peace be with you.

They were startled and terrified and thought that they were seeing a ghost. Then their eyes were opened and they recognized him—and he vanished from their sight.

And suddenly from heaven there came a sound like the rush of a violent wind, and it filled the entire house where they were sitting. Divided tongues, as of fire, appeared among them, and a tongue rested on each of them. All of them were filled with the Spirit and began to prophesy and speak in tongues, as the Spirit gave them ability.

And as it was the time of the feast, there were dwelling in Jerusalem Jews—devout men from every nation under heaven. And when this sound occurred, the multitude came together and were confused, saying to one another,

Whatever could this mean?

Some said,

The Spirit of the Lord has come upon them.

Others mocked them, saying,

They are full of new wine.

Thereafter James began to teach in the temple, saying:

"Men of Israel, hear these words: Jesus of Nazareth, a man attested by God to us by miracles, wonders, and signs which God did through him in our midst, as you yourselves also know—him, being delivered by the determined counsel and foreknowledge of God, the Gentiles have taken, and by lawless hands have crucified and put to death, whom God raised up, having loosed the pains of death, because it was not possible that he should be held by it."

And he said to them:

"Oh foolish ones and slow of heart to believe all that the prophets have spoken! Ought not the Christ to have suffered these things and to enter into his glory? These are the words which he spoke to us while he was still with us, that all things must be fulfilled which were written in the law of Moses and the prophets and the psalms concerning him."

Then he opened their understanding, that they might comprehend the scriptures, saying:

Moses declared:
‘The Lord your God will raise up for you a prophet like me from among your brethren; him you shall hear.’

David spoke concerning the Messiah:
‘I foresaw the Lord always before my face, for he is at my right hand that I may not be shaken. My flesh also will rest in hope, for you will not leave my soul in Sheol, nor allow your holy one to see corruption.’
And again:
‘The Lord said to my Lord, Sit at my right hand till I make your enemies your footstool.’

Isaiah proclaimed:
‘A root of Jesse shall stand as a banner to the people; the nations shall seek him, and his rest shall be glorious. Behold my servant, my elect one. I have put my spirit upon him; he will bring forth justice.’
And again:
‘He was despised and rejected, a man of sorrows acquainted with grief. Surely he has borne our griefs and carried our sorrows. He was wounded for our transgressions, bruised for our iniquities. The chastisement for our peace was upon him, and by his stripes we are healed.’

Hosea foretold:
‘After two days he will revive us; on the third day he will raise us up, that we may live in his sight.’

Joel spoke of the last days:
‘I will pour out my spirit on all flesh. Your sons and daughters shall prophesy. Your old men shall dream dreams. Your young men shall see visions.’

Jeremiah prophesied of a renewed covenant:
‘I will make a new covenant with the house of Israel and Judah. I will put my law in their minds and write it on their hearts. I will be their God, and they shall be my people. For I will forgive their iniquity and remember their sin no more.’

Daniel saw in visions:
‘Behold, one like the Son of Man coming with the clouds of heaven. To him was given dominion and a kingdom, that all nations should serve him. His dominion is everlasting, and his kingdom shall not be destroyed.’
And again:
‘There shall be a time of trouble such as never was since there was a nation. But at that time your people shall be delivered. And many who sleep in the dust shall awake—some to everlasting life, some to everlasting contempt.’

"Thus it is written, and thus it was necessary for the Christ to suffer and to rise from the dead on the third day, and that repentance and remission of sins should be preached in his name. Therefore, let all the house of Israel know assuredly that God has raised up this Jesus, whom they crucified, as his Christ—of which we are all witnesses."

Now when the people heard this, they were cut to the heart and said to James and the rest of the apostles:

"Men and brethren, what shall we do?"

Then James said to them:

"Repent, for the kingdom of heaven is at hand, and receive the baptism of John, every one of you, for the remission of sins, and you shall receive the gift of the Holy Spirit. For the promise is to you and to your children, as many as the Lord our God will call. Sell everything and give to the poor, the widows, and the orphans, for theirs is the kingdom of heaven. If you truly fulfill the royal law according to the scripture—‘You shall love your neighbor as yourself’—you do well. Keep yourselves unstained from the world, but store up for yourselves treasure in heaven. Do you not know that friendship with the world is enmity with God? Therefore whoever wishes to be a friend of the world becomes an enemy of God. Indeed I say, the time is short, and we shall live to see the Christ return in glory and judgment. For judgment will be without mercy to anyone who has shown no mercy. Mercy triumphs over judgment. Humble yourselves before the Lord, and he will exalt you."

And with many other words he testified and exhorted them, saying:

"Save yourselves from this perverse generation."

Then those who gladly received his word were baptized. And in those days about 600 souls were added to them. And the apostles all agreed that James should be the chiefest among them for his great wisdom and righteousness and his nearness to Jesus. With him, Simon Peter and John the son of Zebedee were next most chief, and they were the three of them called the pillars of the church. And so James the brother of the Lord has ruled the church in Jerusalem to this day. Indeed, he is so revered among the churches that it is commonly said:

"Wherever you go, you will turn to James the Just, for whose sake heaven and earth were created."

And they continued steadfastly in the apostles’ doctrine and fellowship, in the breaking of bread and in prayers. Then fear came upon every soul, and many wonders and signs were done through the apostles.

Now all who believed were together and had all things in common, and sold their possessions and goods, and divided them among all, as anyone had need. And thus they were known as Ebionim—that is, the Poor of Jerusalem—asking alms for their daily bread.

So they continued daily with one accord in the temple, and breaking bread from house to house, they ate their food with gladness and simplicity of heart, praising God and having favor with all the people. And they added to the church daily those who were being saved.

Now it was the festival of Booths, and James the brother of Jesus, with Peter and John, went up together to the temple at the hour of prayer—the ninth hour. And many people ran together to them in the porch of Solomon, and they preached and taught of Jesus and the resurrection.

Now as they spoke to the people, the priests, the captain of the temple, and the Sadducees came upon them, being greatly disturbed that they taught the people and preached Jesus as Christ and the resurrection from the dead. And they laid hands on them and put them in custody until the next day, for it was already evening. However, many of those who heard the word believed, and the number of the men came to be about five hundred.

And it came to pass on the next day that their rulers, elders, and scribes—and Joseph of Arimathea, as well as Annas the high priest, Caiaphas, John, and Alexander, and as many as were of the family of the high priest—were gathered together at Jerusalem. And when they had set them in the midst, they asked:

“By what authority do you teach these things? For this man who did wonders and was called the Christ was crucified, as all of us know. Now the Christ is to liberate Israel, not die as a hanged man. For any man hung on a tree is cursed, and no savior. So this Jesus of Nazareth cannot have been the Christ. And his mighty works must then have been done of Beelzebul. How then do you say he is the Christ?”

And James boldly answered and said:

“I proclaim it by the authority of the living God, who raised up from the dead Jesus the Christ, my own brother, and set him at his right hand. For we profess we have seen him with our own eyes, and he shall return in power and glory to judge the living and the dead and to restore the kingdom to the righteous of Israel.”

Then the council murmured against them, saying:

“Shall we listen again to these deceivers? Have we not seen how their leader perished? Did not the Lord strike him down for his blasphemy?”

But Joseph of Arimathea, a man of great honor, stood up among them and said:

“Hear me, O men of Israel. I know well this Jesus they speak of, for I was there when he was crucified. I was the one who asked Pilate for the bodies of Jesus and the others, to bury them that they might not hang on the cross after evening. And when Pilate had handed them over to me, I had them taken down and placed in the tomb in the garden. Never had I in mind to keep them there. But I placed them there only because the Sabbath was approaching and the tomb was at hand. But after the Sabbaths, I and my servants returned in the night, opened the tomb, carried away the bodies of Jesus and the others, and buried them in the potter’s field. Behold, come, and I will show you where he lies.”

And there was a great silence as the council sat amazed and awaiting a reply, but none of the apostles dared speak. The high priest arose and asked them:

“Men of Galilee, shall we then go to the grave and dig up the remains of this Jesus? Will you know him by his bones? For if you see them, truly, you cannot declare he is either risen or Christ.”

And others cried out:

“Surely by now there should be a great stench, for he has been dead these six months.”

Many were astonished, yet others scoffed, saying:

“If this Christ is not raised, then your faith is in vain.”

But John, surnamed Boanerges, filled with indignation and the Spirit, stepped forward and said:

“By the living God, I too will testify. We have seen Jesus our Lord, who was crucified, risen in glory from the dead. O you learned ones—Pharisees, Sadducees, and all who search the Scriptures—do you not know that the body which is raised is not the same as the body which is sown, but God gives a new body: glorious, powerful, eternal, heavenly, and shining as the stars?

As it was written by Isaiah:
‘All flesh is grass, and all its loveliness is like the flower of the field. The grass withers, the flower fades, but the word of our God stands forever.’

And as it was spoken by Daniel:
‘Those who are wise shall shine like the brightness of the firmament, and those who turn many to righteousness like the stars forever and ever.’

There is a body of earth, and there is a body of heaven. For flesh and blood cannot inherit the kingdom of God. What we saw was not the corpse of dust but the risen Lord—glorious and powerful, in a body of spirit given by God. For he is the firstfruits of those who sleep in Sheol.”

Then Peter, taking courage from John’s boldness, said:

“Men of Israel, the word which God sent to the children of Israel, proclaiming peace through Jesus the Christ—this word you know, for it was spread throughout all Judea, beginning from Galilee after the baptism which John preached. How God anointed Jesus of Nazareth with the Holy Spirit and with power, who went about doing good and healing all who were oppressed by the evil one—for God was with him.

And we are witnesses of all that he did both in Galilee and in Jerusalem, whom they put to death by hanging upon a tree. Him God raised on the third day and showed him openly—not to all the people, but to witnesses chosen beforehand by God, even to us who saw him in his glory after he rose from the dead.

And so we preach to the people and testify that he is the one appointed by God to be judge of the living and the dead. To him all the prophets bear witness that through him the kingdom of God shall be brought to Israel.

For this Jesus, whom you rejected, is the stone which the builders despised—yet he has become the chief cornerstone. And there is salvation in no other, for there is no other name under heaven given among men by which we must be saved.”

When they saw the courage of James, Peter, and John, and realized that they were unschooled, ordinary men, they were astonished, and they took note that these men had been with Jesus. So they ordered them to withdraw from the Sanhedrin and then conferred together:

“What are we going to do with these men?”

They asked. Some said:

“Let us hand them over to the Romans to be crucified also, for do they not follow their own king of the Jews?”

But others replied:

“Heaven forbid, or the last martyrdom will be worse than the first.”

So they agreed to stop this thing from spreading any further among the people.

“We must warn them to speak no longer to anyone in this name.”

Then they called them in again, and after flogging them, they commanded them not to speak or teach at all in the name of Jesus. But the apostles replied:

“Which is right in God’s eyes—to listen to you or to him? You be the judges. As for us, we cannot help speaking about what we have seen and heard.”

Seeing they could do no more, after further threats, they let them go.

And being let go, they went to their own companions and reported all that the chief priests and elders had said to them. So when they heard that, they raised their voice to God with one accord and prayed for boldness to preach Jesus, the risen Christ.

Now the multitude of those who believed were of one heart and one soul. Neither did anyone say that any of the things he possessed was his own, but they had all things in common. And with great power the apostles gave witness to the resurrection of the Lord Jesus, and great grace was upon them all. Nor was there anyone among them who lacked, for all who were possessors of lands or houses sold them and brought the proceeds of the things that were sold, and laid them at the apostles' feet, and they distributed to each as anyone had need.

Ananias with Sapphira his wife, and Joses a Levite of the country of Cyprus, the nephew of Shabbatai the priest, having land, sold it and brought the money and laid it at the apostles' feet.

And through the hands of the apostles, many signs and wonders were done among the people, and they were all with one accord in Solomon's porch. Yet none of the rest dared join them, though many esteemed them highly. And believers were added to the Lord, gathered from the surrounding cities to Jerusalem, bringing sick people and those who were tormented by unclean spirits—and they were healed.

Then the high priest rose up, and all those who were with him (which is the sect of the Sadducees), and seeing their arrogance and contempt for their threats, they were filled with indignation and laid their hands on the apostles and put them in the common prison.

Early in the morning the captain went with the officers and brought them without violence, for they feared the people lest the prisoners should be stoned. And when they had brought them, they set them before the council. And the high priest asked them, saying:

“Did we not strictly command you not to teach in this name? And look, you have filled Jerusalem with your doctrine and intend to bring this man's blood on us.”

But the apostles answered and said:

“We ought to obey God rather than men. The God of our fathers raised up Jesus, whom you murdered by hanging on a tree. Him God has exalted to his right hand to be Prince and Savior, to give deliverance to Israel and forgiveness of our sins. And we are his witnesses to these things.”

When they heard this, many were furious and plotted to kill them. Then one in the council stood up, a Pharisee named Gamaliel, a teacher of the law held in respect by all the people, and commanded them to put the apostles outside for a little while. And he said to them:

“Men of Israel, take heed to yourselves what you intend to do regarding these men. For some time ago, Judas son of Hezekiah rose up, claiming to be somebody. A number of men, about four hundred, joined him. He was slain, and all who obeyed him were scattered and came to nothing.

After this man, his son Judas of Galilee rose up in the days of the census and drew away many people after him. He also perished, and all who obeyed him were dispersed.

And now I say to you, keep away from these men and let them alone. For shall we join the side of Rome, striking at those who profess this is the Christ? And behold, the one these men call Christ is crucified and buried in the potter's field.

If this plan or this work is of men, it will come to nothing. But if it is of God, you cannot overthrow it—lest you even be found to fight against God.”

And they agreed with him. And when they had called for the apostles and beaten them, they again commanded that they should not speak in the name of Jesus, but did not threaten them, and let them go.

So they departed from the presence of the council, rejoicing that they were once more counted worthy to suffer shame for his name. And daily in the temple, and in every house, they did not cease teaching and preaching Jesus as the Christ.

And the sect of the Nazoreans—those who call themselves the followers of the Way—has endured to this day. Persecuted yet growing in number, they are led in faith and righteousness in the holy city Jerusalem by the one they call the brother of the Lord.

As for myself, O great Simon, if you wonder whether I believe in the crucified Christ, I can only say this: I wish that I could help my unbelief. Within me burns the longing for a savior—one who will put all enemies beneath his feet, raised in glory from the dead to the right hand of the Father, coming with the clouds and the hosts of heaven to bring the kingdom of God and restore the throne of David to Israel.

But knowing what became of the man Jesus, and what his disciples are said to have seen—as I have already told you—I cannot help but confess: to me, it seems a faith in vain. The true Christ is yet to come.

This is the testimony of Nicodemus of Jerusalem.

I write to you what I have seen and known.

The grace of God be with you.

Amen.

And last of all he was seen of me also, as of one born out of due time. For I am the least of the apostles, that am not meet to be called an apostle, because I persecuted the church of God.

For ye have heard of my conversation in time past in the Jews’ religion, how that beyond measure I persecuted the church of God and wasted it, and profited in the Jews’ religion above many my equals in my own nation, being more exceedingly zealous of the traditions of my fathers.

I verily thought with myself that I ought to do many things contrary to the name of Jesus of Nazareth—which thing I also did in Jerusalem. And many of the saints did I shut up in prison, having received authority from the chief priests. And when they were put to death, I gave my voice against them. I punished them oft in every synagogue and compelled them to blaspheme; and being exceedingly mad against them, I persecuted them even unto strange cities.

Whereupon, as I went to Damascus with authority and commission from the chief priests, at midday I saw in the way a light from heaven, above the brightness of the sun, shining round about me and them which journeyed with me. And when we were all fallen to the earth, I heard a voice speaking unto me and saying in the Hebrew tongue,

“Saul, Saul, why persecutest thou me? It is hard for thee to kick against the pricks.”

And I said,

“Who art thou, Lord?”

And he said,

“I am Jesus whom thou persecutest. But rise and stand upon thy feet, for I have appeared unto thee for this purpose—to make thee a minister and a witness both of these things which thou hast seen, and of those things in the which I will appear unto thee, delivering thee from the people and from the Gentiles, unto whom now I send thee, to open their eyes and to turn them from darkness to light, and from the power of Satan unto God; that they may receive forgiveness of sins and inheritance among them which are sanctified by faith that is in me.”

But when it pleased God, who separated me from my mother's womb and called me by his grace, to reveal his Son in me, that I might preach him among the heathen—immediately I conferred not with flesh and blood. Neither went I up to Jerusalem to them which were apostles before me, but I went into Arabia and returned again unto Damascus.

Then after three years I went up to Jerusalem to see Peter, and abode with him fifteen days. But other of the apostles saw I none, save James the Lord's brother. Now the things which I write unto you, behold, before God, I lie not. Afterwards I came into the regions of Syria and Cilicia and was unknown by face unto the churches of Judea which were in Christ. But they had heard only that he which persecuted us in times past now preacheth the faith which once he destroyed—and they glorified God in me.

Am I not an apostle? Am I not free? Have I not seen Jesus Christ our Lord?

I knew a man in Christ above fourteen years ago (whether in the body I cannot tell, or whether out of the body I cannot tell—God knoweth); such a one caught up to the third heaven. And I knew such a man (whether in the body, or out of the body, I cannot tell—God knoweth), how that he was caught up into paradise and heard unspeakable words, which it is not lawful for a man to utter.

Wherefore henceforth know we no man after the flesh: yea, though we have known Christ after the flesh, yet now henceforth know we him no more.

For I have received of the Lord that which also I delivered unto you: that the Lord Jesus, the same night in which he was betrayed, took bread; and when he had given thanks, he broke it and said,

“Take, eat. This is my body, which is broken for you. This do in remembrance of me.”

After the same manner also he took the cup, when he had supped, saying,

“This cup is the new testament in my blood. This do ye, as often as ye drink it, in remembrance of me.”

And I, brethren, when I came to you, came not with excellency of speech or of wisdom, declaring unto you the testimony of God. For I determined not to know anything among you, save Jesus Christ and him crucified. And I was with you in weakness and in fear and in much trembling. And my speech and my preaching were not with enticing words of man’s wisdom, but in demonstration of the Spirit and of power, that your faith should not stand in the wisdom of men, but in the power of God.

Howbeit we speak wisdom among them that are perfect—yet not the wisdom of this world, nor of the princes of this world, that come to nought. But we speak the wisdom of God in a mystery, even the hidden wisdom which God ordained before the world unto our glory— which none of the princes of this world knew, for had they known it, they would not have crucified the Lord of glory.

I am crucified with Christ. Nevertheless I live; yet not I, but Christ liveth in me. And the life which I now live in the flesh I live by the faith of the Son of God, who loved me and gave himself for me. I do not frustrate the grace of God, for if righteousness come by the law, then Christ is dead in vain.

Then, fourteen years after, I went up again to Jerusalem with Barnabas, and took Titus with me also. And I went up by revelation, and communicated unto them that gospel which I preach among the Gentiles. And when James, Cephas, and John, who seemed to be pillars, perceived the grace that was given unto me, they gave to me and Barnabas the right hands of fellowship—that we should go unto the heathen, and they unto the circumcision.

For they who seemed to be somewhat, in conference, added nothing to me. For I suppose I was not a whit behind the very chiefest apostles.

Are they Hebrews? So am I.
Are they Israelites? So am I.
Are they the seed of Abraham? So am I.
Are they ministers of Christ? (I speak as a fool) I am more—in labors more abundant, in stripes above measure, in prisons more frequent, in deaths oft.

But when Peter was come to Antioch, I withstood him to the face, because he was to be blamed. For before that, certain came from James, he did eat with the Gentiles. But when they were come, he withdrew and separated himself, fearing them which were of the circumcision. And the other Jews dissembled likewise with him; insomuch that Barnabas also was carried away with their dissimulation.

But when I saw that they walked not uprightly according to the truth of the gospel, I said unto Peter before them all:

“If thou, being a Jew, livest after the manner of Gentiles and not as do the Jews, why compellest thou the Gentiles to live as do the Jews?”

We who are Jews by nature, and not sinners of the Gentiles, knowing that a man is not justified by the works of the law but by the faith of Jesus Christ—even we have believed in Jesus Christ, that we might be justified by the faith of Christ, and not by the works of the law—for by the works of the law shall no flesh be justified.

For I, through the law, am dead to the law, that I might live unto God.

And unto the Jews I became as a Jew, that I might gain the Jews; to them that are under the law, as under the law, that I might gain them that are under the law; to them that are without law, as without law—being not without law to God, but under the law to Christ—that I might gain them that are without law.

To the weak became I as weak, that I might gain the weak.
I am made all things to all men, that I might by all means save some.
And this I do for the gospel's sake, that I might be partaker thereof with you.

For ye are all the children of God by faith in Christ Jesus.
For as many of you as have been baptized into Christ have put on Christ.
There is neither Jew nor Greek, there is neither bond nor free, there is neither male nor female—for ye are all one in Christ Jesus. And if ye be Christ's, then are ye Abraham’s seed and heirs according to the promise.

But I certify you, brethren, that the gospel which was preached of me is not after man. For I neither received it of man, neither was I taught it, but by the revelation of Jesus Christ.

As we said before, so say I now again: If any man preach any other gospel unto you than that ye have received, let him be accursed.

For such are false apostles, deceitful workers, transforming themselves into the apostles of Christ. And no marvel—for Satan himself is transformed into an angel of light. Therefore it is no great thing if his ministers also be transformed as the ministers of righteousness, whose end shall be according to their works.

But if our gospel be hid, it is hid to them that are lost—in whom the god of this world hath blinded the minds of them which believe not, lest the light of the glorious gospel of Christ, who is the image of God, should shine unto them.

But by the grace of God I am what I am. And his grace which was bestowed upon me was not in vain. But I labored more abundantly than they all—yet not I, but the grace of God which was with me. Therefore whether it were I or they, so we preach, and so ye believed.

Now to him that is of power to establish you according to my gospel and the preaching of Jesus Christ—according to the revelation of the mystery which was kept secret since the world began, but now is made manifest, and by the scriptures of the prophets, according to the commandment of the everlasting God—made known to all nations for the obedience of faith: to God only wise, be glory through Jesus Christ forever.

If any man love not the Lord Jesus Christ, let him be Anathema. Maranatha.
The grace of our Lord Jesus Christ be with you.
My love be with you all in Christ Jesus.
Amen.

Μὴ ἐκθαμβεῖσθε· Ἰησοῦν ζητεῖτε τὸν ἐσταυρωμένον·
ἠγέρθη, οὐκ ἔστιν ὧδε· ἴδε ὁ τόπος ὅπου ἔθηκαν αὐτόν.

[Do not be afraid. You seek Jesus who was crucified:
He has been raised, he is not here. Behold the place where they laid him.]

Constantinism in Exegesis: "Meek" Doesn't Mean "Meek"?

2022-10-30T17:38:00.013-07:00

Blessed are the Meek

We all have heard the famous beatitudes, offered by Jesus at the opening of the Sermon on the Mount in the gospel of Matthew, delivered to his disciples and the gathered crowds at the beginning of his ministry. Jesus begins by blessing the poor in spirit, the mourners, and later the hungry and thirsty for righteousness, the merciful, the pure in heart, the peacemakers, and finally the persecuted for righteousness. But in between we find:

Μακάριοι οἱ πραεῖς, ὅτι αὐτοὶ κληρονομήσουσιν τὴν γῆν.

Matthew 5:5. Blessed [are] the meek, for they will inherit the land.

The word translated as "meek" is πραυς, corresponding to Strong's G4235 or G4239. The standard definition of this word is "meek, humble, gentle, mild of disposition, tame, quiet" with its antonym being "angry, aggressive, resistant, violent, harsh, wild".

Πραυς is likely derived from the Proto-Indo-European *preyH- meaning "to love, to please". It is thus likely related to the Sanskrit प्रिय (priya) "beloved, favored", Old Church Slavonic приꙗзнь (prijaznĭ) “friendship, fidelity”, and, through Germanic languages, to English "free, friend".

In Luke 6:20-22, we find a similar set of beatitudes in the less-famous Sermon on the Plain. Luke's set is shorter and simpler, and less spiritualized than Matthew's version. For example, Jesus blesses the poor rather than the poor in spirit, and the hungry rather than the hungry for righteousness. This overlap extends far beyond this case, where Matthew and Luke contain many highly similar sections not found in Mark. The hypothesized explanation for this is the existence of a now-lost hypothetical document named "Q" (Short for "Quelle," German for "source"). Presumably, the original document would have been more like the simpler form found in Luke, also more similar to some sayings in the non-canonical gnostic Gospel of Thomas (e.g. sayings 54, 68, 69). The author of the Gospel of Matthew chose to fill out his version with the verse in question, found nowhere else.

Or, nearly nowhere else, for a clear parallel can be found in Psalm 37, verse 11

וַעֲנָוִים יִירְשׁוּ-אָרֶץ וְהִתְעַנְּגוּ עַל-רֹב שָׁלוֹם

οἱ δὲ πραεῗς κληρονομήσουσιν γῆν καὶ κατατρυφήσουσιν ἐπὶ πλήθει εἰρήνης (LXX)

But the meek shall inherit the land and delight themselves in abundant prosperity.

Note that the Septuagint translation is very nearly verbatim identical to the words used in the Greek of Matt 5:5. The word being translated is ענו/עני, corresponding to Strong's H6035. Other translations of this word are "poor, needy, lowly, weak, afflicted, humble". Another Hebrew word the Septuagint translates as πραυς is עָנִי, obviously related to the other. The only other occurrence is in Job 36:15, though here the Greek differs substantially from the Hebrew (Hebrew: "He delivers the afflicted by their affliction, and opens their ear by adversity." vs. Greek: "Because they afflicted the weak and helpless, and he will vindicate the judgment of the meek.")

One particular usage of πραυς is worth noting, namely Zechariah 9:9, as it is (inexactly) quoted in Matthew 21:5:

גִּילִי מְאֹד בַּת-צִיּוֹן, הָרִיעִי בַּת יְרוּשָׁלִַם, הִנֵּה מַלְכֵּךְ יָבוֹא לָךְ, צַדִּיק וְנוֹשָׁע הוּא; עָנִי וְרֹכֵב עַל-חֲמוֹר, וְעַל-עַיִר בֶּן-אֲתֹנוֹת

Εἴπατε τῇ θυγατρὶ Σιών, Ἰδοὺ ὁ βασιλεύς σου ἔρχεταί σοι, πραῢς καὶ ἐπιβεβηκὼς ἐπὶ ὄνον, καὶ ἐπὶ πῶλον υἱὸν ὑποζυγίου.

“Tell the daughter of Zion, Look, your king is coming to you, humble and mounted on a donkey, and on a colt, the foal of a donkey.”

As the historical Jesus likely spoke Aramaic rather than Greek (though it's not impossible he knew some Greek), it would be a safe bet that, if the saying in question goes back to the historical Jesus, the word he used was probably ענו/עני, with the aforementioned meaning. This is strongly backed up by the comparison of its usage in the Septuagint.

The only other usages of πραυς in the New Testament are:

Matthew 11:29 "Take my yoke upon you, and learn from me, for I am gentle [πραΰς] and humble [ταπεινὸς] in heart, and you will find rest for your souls."
1 Peter 3:4 "rather, let your adornment be the inner self with the lasting beauty of a gentle [πραέως] and quiet [ἡσυχίου] spirit, which is very precious in God’s sight."

There are a dozen usages of the related words πραυτης (Strong's G4240) and πραοτης/πραοτητος (Strong's G4240/G4236) both nominalization of πραυς, translated as "gentleness, humility, affliction, meekness, the quality of being πραυς". We can look at the instances below:

1 Cor 4:21 "What would you prefer? Am I to come to you with a stick, or with love in a spirit of gentleness [πραΰτητος]?"
2 Cor 10:1 "I myself, Paul, appeal to you by the meekness [πραΰτητος] and gentleness [ἐπιεικείας] of Christ—I who am humble [ταπεινὸς] when face to face with you, but bold toward you when I am away!"
Gal 5:22-23 "But the fruit of the Spirit is love, joy, peace, forbearance, kindness, goodness, faithfulness, gentleness [πραΰτης] and self-control. Against such things there is no law."
Gal 6:1 "My brothers and sisters, if anyone is detected in a transgression, you who have received the Spirit should restore such a one in a spirit of gentleness [πραΰτητος]. Take care that you yourselves are not tempted."
Eph 4:1-3 "I, therefore, the prisoner in the Lord, beg you to walk in a manner worthy of the calling to which you have been called, with all humility and gentleness [πραΰτητος], with patience, bearing with one another in love, making every effort to maintain the unity of the Spirit in the bond of peace."
Col 3:12 "Therefore, as God’s chosen ones, holy and beloved, clothe yourselves with compassion, kindness, humility, meekness [πραΰτητα], and patience."
1 Tim 6:9-11 "But those who want to be rich fall into temptation and are trapped by many senseless and harmful desires that plunge people into ruin and destruction. For the love of money is a root of all kinds of evil, and in their eagerness to be rich some have wandered away from the faith and pierced themselves with many pains. But as for you, man of God, shun all this; pursue righteousness, godliness, faith, love, endurance, gentleness [πραϋπαθίαν]."
2 Tim 2:24-26 "And the Lord’s servant must not be quarrelsome but kindly to everyone, an apt teacher, patient, correcting opponents with gentleness [πραΰτητι]. God may perhaps grant that they will repent and come to know the truth and that they may escape from the snare of the Devil, having been held captive by him to do his will."
Titus 3:1-2 "Remind them to be subject to rulers and authorities, to be obedient, to be ready for every good work, to speak evil of no one, to avoid quarreling, to be gentle, and to show every courtesy [πραΰτητα] to everyone."
James 1:19-21 "You must understand this, my beloved brothers and sisters: let everyone be quick to listen, slow to speak, slow to anger, for human anger does not produce God’s righteousness. Therefore rid yourselves of all sordidness and rank growth of wickedness, and welcome with meekness [πραΰτητι] the implanted word that has the power to save your souls."
James 3:13-18 "Who is wise and knowledgeable among you? Show by your good life that your works are done with gentleness [πραΰτητι] born of wisdom. But if you have bitter envy and selfish ambition in your hearts, do not be arrogant and lie about the truth. This is not wisdom that comes down from above but is earthly, unspiritual, devilish. For where there is envy and selfish ambition, there will also be disorder and wickedness of every kind. But the wisdom from above is first pure, then peaceable, gentle, willing to yield, full of mercy and good fruits, without a trace of partiality or hypocrisy. And the fruit of righteousness is sown in peace by those who make peace."
1 Peter 3:13-17 "Now who will harm you if you are eager to do what is good? But even if you do suffer for doing what is right, you are blessed. Do not fear what they fear, and do not be intimidated, but in your hearts sanctify Christ as Lord. Always be ready to make your defense to anyone who demands from you an accounting for the hope that is in you, yet do it with gentleness [πραΰτητος] and respect. Maintain a good conscience so that, when you are maligned, those who abuse you for your good conduct in Christ may be put to shame. For it is better to suffer for doing good, if suffering should be God’s will, than to suffer for doing evil."

This article by Margaret Mowczko offers many usage examples, notably several from Second Temple Jewish literature:

In 2 Maccabees 15:12, which was written sometime between 150 and 120 BCE, Onias the High Priest is presented as “virtuous, good, modest in all things, gentle (πρᾶον/ praon) of manners, and well-spoken.[14]
In the Testament of Abraham 1.3, possibly written in the first century CE, it is said that Abraham lived all his life “in quietness (hēsuchia) and gentleness (πραότητι/ praotēti) . . .” [15] (Cf. 1 Peter 3:4.)
In Against Appion 1.29 §267, Josephus (b. 37 CE) used the word praoteroi/ πρᾳότεροι to describe the attitudes of people who had been badly treated by the king of Egypt; they had a reason to be angry and hateful but had rather grown “milder.”[16]
In Jewish Antiquities 19.3 §330, Josephus describes Herod Agrippa’s manner as “mild” (πραῢς/ praus). He then explains how Agrippa is praus: he was “equally liberal to all men. He was humane to foreigners, and made them sensible of his liberality. He was in like manner rather of a ‘gentle and compassionate’ (chrēstos kai sympathēs) temper.” In §333, Herod Agrippa addresses a man who had slandered him and speaks to him “quietly (ērema) and gently (πρᾴως/ praōs).”[17]

From all these examples, we can be confident to state that, strictly from the textual evidence in the New and Old Testaments, the word πραυς (and related terms) has the connotation space of "gentle, humble, meek, mild, lowly, peaceful, courteous, respectful, yielding, forgiving, merciful, patient, longsuffering, forbearing, returning good for evil, submissive and obedient to a higher will." If we take it as a Greek translation of ענו/עני (as well it may be, given that it is a near-verbatim duplicate of Psalm 37:11), then it would mean "poor, needy, lowly, weak, afflicted, humble." The Hebrew word has a somewhat different meaning, but the Greek is not too dissimilar. For contrast, the meaning is antithetical to "aggressive, violent, arrogant, ambitious, retaliatory, haughty, selfish, harsh."

How and why will the meek inherit the earth? Clearly not because they will conquer it themselves. Instead, they will inherit the earth in one of two circumstances: (1) the advent of a just world-order built from the bottom up by human beings or with God's help in which all will be or become meek and live in peace and goodwill, or (2) the advent of the End of Days, the Eschaton, in which God will destroy or reform the wicked and invite the righteous meek, led by Jesus, the Meek King himself, into the inaugurated Kingdom of God, the New Jerusalem (as in Revelation 5:10). Any suggestion that "the meek" are themselves the conquerors is incoherent. Importantly, they will "inherit" as in "be given". They will not win it for themselves and will not have to.

As a final note, "Blessed are the meek" does not imply that everyone is or even should be meek (no more than "Blessed are those who mourn" implies we all should mourn) but rather that there is something positive to being meek. It does not necessarily imply "Un-blessed are the un-meek." There may well be a way for the non-πραυς to get some other blessing or benefit or even the same blessing by another means.

Translations in Other Languages

Let's look at how this specific verse has been translated into other languages.

English translations translate πραυς in Matt 5:5 as "meek, humble, gentle" the Amplified Bible gives the full "gentle: kind-hearted, the sweet-spirited, the self-controlled."

The Vulgate translates πραυς in Matt 5:5 as "mitis" meaning "mild, mellow, light, calm, gentle, placid, peaceful" when applied to non-humans, but specifically "meek, peaceful, gentle, mild, tolerable, soft, harmless" when applied to humans.

Italian translations translate πραυς in Matt 5:5 as "mite"="mild, moderate, meek" or "mansueto"="tame, gentle, docile".

French translations translate πραυς in Matt 5:5 as "doux" meaning "soft, sweet, mild, gentle, meek quiet genial" or "débonnaire"="kind, gentle, good (weak-willed, soft)".

Spanish translations translate πραυς in Matt 5:5 as "humilde"="humble, low" or "manso"="tame, meek, non-threatening". These are substantially the same as the words used in Portuguese translations.

Romanian translations translate πραυς in Matt 5:5 as "blând"="mild, tame, gentle, harmless, kind, calm".

German translations translate πραυς in Matt 5:5 as "auf Frieden bedacht" = "intent on peace" "Sanftmütig" = "gentle-minded, gentle, meek".

Dutch translations translate πραυς in Matt 5:5 as "vriendelijk en geduldig"="friendly/kind/obliging and patient" or "zachtmoedig"="mild, gentle, meek".

Swedish translations translate πραυς in Matt 5:5 as "ödmjuk"="meek, submissive, humble, unobtrusive, modest" or "saktmodig"="sweet/soft-minded, meek, gentle" or "milda och anspråkslösa"= "gentle and unassuming".

Norwegian translations translate πραυς in Matt 5:5 as "ydmyk"="humble, meek" or "saktmodig"="meek, gentle". The Danish translations are substantially the same.

Serbian translations translate πραυς in Matt 5:5 as "кротак"= "meek, tame, gentle, pacific". This is substantially the same as the Russian word used in their translations: "кро́ткий"="gentle, meek mild", and the Bulgarian "кро́тък"="gentle, meek"

Polish translations translate πραυς in Matt 5:5 as "cichy"="quiet, silent" or "pokorny"="humble, modest".

Hungarian translations translate πραυς in Matt 5:5 as "szelíd"="gentle, meek, empathic, tame" or "alázatos"="humble, submissive, servile."

Arabic translations "مُتَوَاضِع"="humble, modest; insignificant; condescending."

Chinese translations translate πραυς in Matt 5:5 as "谦和" = "modest and gentle" or "温和" = "mild, temperate" or "温柔" = "gentle". This is basically the same as the Japanese translation "柔和な"="meek, bland, gentle, mild-mannered."

In Tagalog, "maaamo"="gentle, tame, docile, domestic" or "mapagpakumbaba"="humble, modest, lowly."

In Thai, "อ่อน น้อม"="meek, docile, submissive, biddable, tame." or "ใจอ่อนโยน"="gentle".

In Punjabi "ਦੀਨ"="forlorn, humble, indigent, lowly, meek, miserable, needy, poor".

Hindi translations translate πραυς in Matt 5:5 as "नम्र" = "gentle, mild, subservient, humble, meek."

These are quite consistent, giving a consensus connotation of "gentle, meek, humble, mild-mannered", as can be expected from an honest and accurate translation of the Greek.

Aristotle and other Extra-Biblical Comparanda

A primary reference for the application of this term before Jesus is Aristotle in his Nicomachean Ethics (book 4 chapter 5, or Bekker page 1125b and 1126). There, he defines the virtue of πραότης (gentleness, meekness) in his discussion of dispositions related to anger:

Gentleness [πραότης] is the observance of the mean in relation to anger. There is as a matter of fact no recognized name for the mean in this respect—indeed there can hardly be said to be names for the extremes either—, so we apply the word Gentleness to the mean though really it inclines to the side of the defect. This has no name, but the excess may be called a sort of Irascibility, for the emotion concerned is anger, though the causes producing it are many and various.

Aristotle, however, famously adds that anger is not itself a vice, but has its place if it is applied with proper measure, with the proper object, and at the proper place and time, though this is hard to generalize. This is worth quoting at length:

Now we praise a man who feels anger on the right grounds and against the right persons, and also in the right manner and at the right moment and for the right length of time. He may then be called gentle-tempered, if we take gentleness to be a praiseworthy quality (for ‘gentle’ really denotes a calm temper, not led by emotion but only becoming angry in such a manner, for such causes and for such a length of time as principle may ordain; although the quality is thought rather to err on the side of defect, since the gentle-tempered man is not prompt to seek redress for injuries, but rather inclined to forgive them). The defect, on the other hand, call it a sort of Lack of Spirit or what not, is blamed; since those who do not get angry at things at which it is right to be angry are considered foolish, and so are those who do not get angry in the right manner, at the a right time, and with the right people. It is thought that they do not feel or resent an injury, and that if a man is never angry he will not stand up for him self; and it is considered servile to put up with an insult to oneself or suffer one's friends to be insulted... We consider the excess to be more opposed to Gentleness than the defect, because it occurs more frequently, human nature being more prone to seek redress than to forgive; and because the harsh-tempered are worse to live with than the unduly placable... [I]t is not easy to define in what manner and with whom and on what grounds and how long one ought to be angry, and up to what point one does right in so doing and where error begins. For he who transgresses the limit only a little is not held blameworthy, whether he errs on the side of excess or defect; in fact, we sometimes praise those deficient in anger and call them gentle-tempered, and we sometimes praise those who are harsh-tempered as manly, and fitted to command. It is therefore not easy to pronounce on principle what degree and manner of error is blameworthy, since this is a matter of the particular circumstances, and judgement rests with the faculty of perception. But thus much at all events is clear, that the middle disposition is praiseworthy, which leads us to be angry with the right people for the right things in the right manner and so on, while the various forms of excess and defect are blameworthy—when of slight extent, but little so, when greater, more, and when extreme, very blameworthy indeed. It is clear therefore that we should strive to attain the middle disposition.

However, Aristotle is giving a narrowed philosophical definition, as opposed to a broader descriptive definition going on popular usage. Obviously, the word preceded Aristotle and he is giving a philosophical and hence somewhat idiosyncratic and specific definition as he uses it in his ethical system. In his system, it is a technical term, and so cannot be taken to represent any given other usage or broader usage in general Greek culture. That is, it would be a mistake to assume that the word πραυς should be taken in the Aristotelian sense in Matt 5:5. We must always give preference to how the term is used elsewhere in the New Testament or Septuagint.

Other extra-biblical usages of this word or related words do not substantially change our understanding. It is sometimes applied to animals, specifically horses, where it has the general meaning of "tame", "docile", or "un-wild" and thus usable in agriculture, transportation, or the military, and not dangerous to their masters. Other times it is applied to winds to describe them as mild or soothing, to sounds to describe them as soft or gentle, or to medicines if they produce a soothing, palliative, or healing effect. If anything, we only slightly expand the breadth of meaning to include "reasonable, quiet, pleasant, soothed/soothing".

A curious idea comes from an article by Sam Whatley in River Region's Journey Magazine, which claims that πραυς is "A Greek military term":

The Greek word “praus” (prah-oos) [πραυς] was used to define a horse trained for battle. Wild stallions were brought down from the mountains and broken for riding. Some were used to pull wagons, some were raced, and the best were trained for warfare. They retained their fierce spirit, courage, and power, but were disciplined to respond to the slightest nudge or pressure of the rider’s leg. They could gallop into battle at 35 miles per hour and come to a sliding stop at a word. They were not frightened by arrows, spears, or torches. Then they were said to be meeked.

To be meeked was to be taken from a state of wild rebellion and made completely loyal to, and dependent upon, one’s master. It is also to be taken from an atmosphere of fearfulness and made unflinching in the presence of danger. Some war horses dove from ravines into rivers in pursuit of their quarry. Some charged into the face of exploding cannons as Lord Tennyson expressed in his poem, “The Charge of the Light Brigade.”
These stallions became submissive, but certainly not spineless. They embodied power under control, strength with forbearance.

However, quoting the article by Mowczko, again:

From these passages, we can see that prau– words may be translated into English as “most gentle,” “soothing,” to calm down/ be calm,” “gentle,” “to tame,” “tame,” and “more reasonable/ more quietly.”

I could not find any ancient source that mentions or alludes to implicit ideas of strength or fierceness in the word praus or a source that indicates an intrinsic, or original, military sense.

That a word can be applied to strong or powerful creatures does not imply that the word itself connotes strength or power. In the above example, the horses were strong and powerful before they were "meeked". But this does not mean that to be πραυς requires strength or power. A mouse could likewise be "meeked" if it was rendered docile, gentle, obedient, friendly, etc.

In short, we seem to have established the meaning of πραυς quite decisively from the foregoing examination of its usage in the New Testament, Septuagint, Second Temple Judaism, and the broader Greek context. We can now confidently call into question any substantially differing interpretation: we can identify them as having some addition of personal interpretation that does not derive from philology, but rather from some hermeneutical bent. That isn't to cast aspersions of a more theologically loaded reading, as meekness is clearly a central Christian virtue, and so can be expected to have collected some baggage over the millennia of interpretation. However, we can distinguish this from a strictly philological interpretation as laid out above by which we can interpret the first-century text of Matthew 5:5. We must also be sure not to read into it any technical usage, for example, its specific meaning in Aristotle's ethical system.

Biblical Commentaries

This might be a good point to look at some well-known commentaries on this verse. Notably, the website BibleHub.com offers a collection in an easily accessible, consolidated location. As this will be of some relevance later, they are worth quoting liberally:

Elliciott: "The meek.—The word so rendered was probably used by St. Matthew in its popular meaning, without any reference to the definition which ethical writers had given of it, but it may be worth while to recall Aristotle’s account of it (Eth. Nicom. v. 5) as the character of one who has the passion of resentment under control, and who is therefore tranquil and untroubled, as in part determining the popular use of the word, and in part also explaining the beatitude."

Benson: "Blessed [or happy] are the meek — Persons of a mild, gentle, long-suffering, and forgiving disposition, who are slow to anger, and averse from wrath; not easily provoked, and if at any time at all provoked, soon pacified; who never resent an injury, nor return evil for evil; but make it their care to overcome evil with good; who by the sweetness, affability, courteousness, and kindness of their disposition, endeavour to reconcile such as may be offended, and to win them over to peace and love."

Matthew Henry: "...The meek are happy. The meek are those who quietly submit to God; who can bear insult; are silent, or return a soft answer; who, in their patience, keep possession of their own souls, when they can scarcely keep possession of anything else. These meek ones are happy, even in this world. Meekness promotes wealth, comfort, and safety, even in this world."

Barnes: "The meek - Meekness is patience in the reception of injuries. It is neither meanness nor a surrender of our rights, nor cowardice; but it is the opposite of sudden anger, of malice, of long-harbored vengeance. Christ insisted on his right when he said, "If I have done evil, bear witness of the evil; but if well, why smitest thou me?" John 18:23. Paul asserted his right when he said, "They have beaten us openly uncondemned, being Romans, and have cast us into prison; and now do they thrust us out privily? nay verily; but let them come themselves, and fetch us out," Acts 16:37. And yet Christ was the very model of meekness. It was one of his characteristics, "I am meek," Matthew 11:29. So of Paul. No man endured more wrong, or endured it more patiently than he. Yet the Saviour and the apostle were not passionate. They bore all patiently. They did not press their rights through thick and thin, or trample down the rights of others to secure their own. Meekness is the reception of injuries with a belief that God will vindicate us. "Vengeance is his; he will repay," Romans 12:19. It little becomes us to take his place, and to do what he has promised to do." Meekness produces peace. It is proof of true greatness of soul. It comes from a heart too great to be moved by little insults. It looks upon those who offer them with pity. He that is constantly ruffled; that suffers every little insult or injury to throw him off his guard and to raise a storm of passion within, is at the mercy of every mortal that chooses to disturb him. He is like "the troubled sea that cannot rest, whose waters cast up mire and dirt."

Jamieson-Fausset-Brown: "Blessed are the meek: for they shall inherit the earth—This promise to the meek is but a repetition of Ps 37:11; only the word which our Evangelist renders "the meek," after the Septuagint, is the same which we have found so often translated "the poor," showing how closely allied these two features of character are. It is impossible, indeed, that "the poor in spirit" and "the mourners" in Zion should not at the same time be "meek"; that is to say, persons of a lowly and gentle carriage. How fitting, at least, it is that they should be so, may be seen by the following touching appeal: "Put them in mind to be subject to principalities and powers, to obey magistrates, to be ready to every good work, to speak evil of no man, to be no brawlers, but gentle, showing all meekness unto all men: FOR WE OURSELVES WERE ONCE FOOLISH, disobedient, deceived, serving divers lusts and pleasures … But after that the kindness and love of God our Saviour toward man appeared: … according to His mercy He saved us," &c. (Tit 3:1-7). But He who had no such affecting reasons for manifesting this beautiful carriage, said, nevertheless, of Himself, "Take My yoke upon you, and learn of Me; for I am meek and lowly in heart: and ye shall find rest unto your souls" (Mt 11:29); and the apostle besought one of the churches by "the meekness and gentleness of Christ" (2Co 10:1). In what esteem this is held by Him who seeth not as man seeth, we may learn from 1Pe 3:4, where the true adorning is said to be that of "a meek and quiet spirit, which in the sight of God is of great price." Towards men this disposition is the opposite of high-mindedness, and a quarrelsome and revengeful spirit; it "rather takes wrong, and suffers itself to be defrauded" (1Co 6:7); it "avenges not itself, but rather gives place unto wrath" (Ro 12:19); like the meek One, "when reviled, it reviles not again; when it suffers, it threatens not: but commits itself to Him that judgeth righteously" (1Pe 2:19-22)."

Matthew Poole: "Men count the hectors of the world happy, whom none can provoke but they must expect as good as they bring, an eye for an eye, and a tooth for a tooth: but I tell you these are not truly happy; they are tortured with their own passions; as their hand is against every one, so every man’s hand is against them; besides that there is a God, who will revenge the wrongs they do. But the meek, who can be angry, but restrain their wrath in obedience to the will of God, and will not be angry unless they can be angry and not sin; nor will easily be provoked by others, but rather use soft words to pacify wrath, and give place to the passions of others; these are the blessed men. For though others may by their sword and their bow conquer a great deal of the earth to their will and power, yet they will never quietly and comfortably inherit or possess it; they are possessors malae fidei, forcible possessors, and they will enjoy what they have, as rapacious birds enjoy theirs, loudly, every one hath his gun ready charged and cocked against them; but those who are of meek and quiet spirits, though they may not take so deep root in the earth as others more boisterous, yet they shall enjoy what God giveth them with more quiet and certainty; and God will provide for them, verily they shall be fed.

Gill: "Blessed are the meek,.... Who are not easily provoked to anger; who patiently bear, and put up with injuries and affronts; carry themselves courteously, and affably to all; have the meanest thoughts of themselves, and the best of others; do not envy the gifts and graces of other men; are willing to be instructed and admonished, by the meanest of the saints; quietly submit to the will of God, in adverse dispensations of providence; and ascribe all they have, and are, to the grace of God. Meekness, or humility, is very valuable and commendable... Here meekness is to be considered, not as a moral virtue, but as a Christian grace, a fruit of the Spirit of God; which was eminently in Christ, and is very ornamental to believers; and of great advantage and use to them, in hearing and receiving the word; in giving an account of the reason of the hope that is in them; in instructing and restoring such, who have backslidden, either in principle or practice; and in the whole of their lives and conversations; and serves greatly to recommend religion to others: such who are possessed of it, and exercise it, are well pleasing to God; when disconsolate, he comforts them; when hungry, he satisfies them; when they want direction, he gives it to them; when wronged, he will do them right; he gives them more grace here, and glory hereafter."

Meyer: "The πραεῖς ... are the calm, meek sufferers relying on God’s help, who, without bitterness or revenge as the ταπεινοὶ κ. ἡσύχιοι (Isaiah 66:2), suffer the cruelties of their tyrants and oppressors."

Cambridge: "... Thirdly, meekness, implying submission to the will of God, a characteristic of Jesus Himself, who says “I am meek and lowly in heart.”... Meekness is mentioned with very faint praise by the greatest of heathen moralists, Aristotle. He calls it “a mean inclining to a defect.” It is indeed essentially a Christian virtue. "

Bengel: " Οἱ πρᾳεῖς, the meek. Those are here named for the most part, whom the world tramples on.—πρᾷος is connected with the Latin pravus, which has frequently the meaning of segnis, slow, sluggish, etc... The meek are seen everywhere to yield to the importunity of the inhabitants of the earth; and yet they shall obtain possession of the earth, not by their own arm, but by inheritance, through the aid of the Father: cf. Revelation 5:10. In the mean time, even whilst the usurpation of the ungodly continues, all the produce of the earth is ordered for the comfort of the meek. In all these sentences, blessedness in heaven and blessedness on earth mutually imply each other. "

Pulpit: "Blessed are the meek...The meaning attributed by our Lord to the word meek is not clear. The ordinary use of the words πραυ'´ς, πραυ'´της, in the New Testament refers solely to the relation of men to men, and this is the sense in which οἱ πραεῖς is taken by most commentators here...Meekness is rather the attitude of the soul towards another when that other is in a state of activity towards it. It is the attitude of the disciple to the teacher when teaching; of the son to the father when exercising his paternal authority; of the servant to the master when giving him orders. It is therefore essentially as applicable to the relation of man to God as to that of man to man. It is for this reason that we find ענוה ענו very frequently used of man's relation to God, in fact, more often than of man's relation to man; and this common meaning of ענו must be specially remembered here, where the phrase is taken directly from the Old Testament. Weiss ('Matthaus-ev.') objects to Tholuck adducing the evidence of the Hebrew words, on the ground that the Greek terms are used solely of the relation to man, and that this usage is kept to throughout the New Testament. But the latter statement is hardly true. For, not to mention Matthew 11:29, in which the reference is doubtful, James 1:21 certainly refers to the meekness shown towards God in receiving his word. "The Scriptural πραότης," says Trench, loc. cit.," is not in a man's outward behaviour only; nor yet in his relations to his fellow-men; as little in his mere natural disposition. Rather is it an inwrought grace of the soul; and the exercises of it are first and chiefly towards God (Matthew 11:29; James 1:21). It is that temper of spirit in which we accept his dealings with us as good, and therefore without disputing or resisting; and it is closely linked with the ταπεωοφροσύνη, and follows directly upon it (Ephesians 4:2; Colossians 3:12; cf. Zephaniah 3:12), because it is only the humble heart which is also the meek; and which, as such, does not fight against God, and more or less struggle and contend with him." Yet, as this meekness must be felt towards God not only in his direct dealings with the soul, but also in his indirect dealings (i.e. by secondary means and agents), it must also be exhibited towards men. Meekness towards God necessarily issues in meekness towards men. Our Lord's concise teaching seizes, therefore, on this furthest expression of meekness. Thus it is not meekness in the relation of man to man barely staled, of which Christ here speaks, but meekness in the relation of man to man, with its prior and presupposed fact of meekness in the relation of man to God. Shall inherit the earth..."

Vincent: "The meek (οἱ πραεῖς). Another word which, though never used in a bad sense, Christianity has lifted to a higher plane, and made the symbol of a higher good. Its primary meaning is mild, gentle. It was applied to inanimate things, as light, wind, sound, sickness. It was used of a horse; gentle. As a human attribute, Aristotle defines it as the mean between stubborn anger and that negativeness of character which is inescapable of even righteous indignation: according to which it is tantamount to equanimity. Plato opposes it to fierceness or cruelty, and uses it of humanity to the condemned; but also of the conciliatory demeanor of a demagogue seeking popularity and power. Pindar applies it to a king, mild or kind to the citizens, and Herodotus uses it as opposed to anger. These pre-Christian meanings of the word exhibit two general characteristics. 1. They express outward conduct merely. 2. They contemplate relations to men only. The Christian word, on the contrary, describes an inward quality, and that as related primarily to God. The equanimity, mildness, kindness, represented by the classical word, are founded in self-control or in natural disposition. The Christian meekness is based on humility, which is not a natural quality but an outgrowth of a renewed nature. To the pagan the word often implied condescension, to the Christian it implies submission. The Christian quality, in its manifestation, reveals all that was best in the heathen virtue - mildness, gentleness, equanimity - but these manifestations toward men are emphasized as outgrowths of a spiritual relation to God. The mildness or kindness of Plato or Pindar imply no sense of inferiority in those who exhibit them; sometimes the contrary. Plato's demagogue is kindly from self-interest and as a means to tyranny. Pindar's king is condescendingly kind. The meekness of the Christian springs from a sense of the inferiority of the creature to the Creator, and especially of the sinful creature to the holy God. While, therefore, the pagan quality is redolent of self-assertion, the Christian quality carries the flavor of self-abasement. As toward God, therefore, meekness accepts his dealings without murmur or resistance as absolutely good and wise. As toward man, it accepts opposition, insult, and provocation, as God's permitted ministers of a chastening demanded by the infirmity and corruption of sin; while, under this sense of his own sinfulness, the meek bears patiently "the contradiction of sinners against himself," forgiving and restoring the erring in a spirit of meekness, considering himself, lest he also be tempted (see Galatians 6:1-5). The ideas of forgiveness and restoration nowhere attach to the classical word. They belong exclusively to Christian meekness, which thus shows itself allied to love. As ascribed by our Lord to himself, see Matthew 11:29. Wyc. renders "Blessed be mild men." "

Thayer's: mildness of disposition, gentleness of spirit, meekness. Meekness toward God is that disposition of spirit in which we accept His dealings with us as good, and therefore without disputing orresisting. In the OT, the meek are those wholly relying on God ratherthan their own strength to defend them against injustice. Thus,meekness toward evil people means knowing God is permitting theinjuries they inflict, that He is using them to purify His elect, and that He will deliver His elect in His time. (Is. 41:17, Lu. 18:1-

The Catena Aurea, (commentaries on the four Gospels; collected out of the works of the [Church] Fathers) by St. Thomas Aquinas (pg. 148-149) offers a number of interpretations from the Church Fathers:

Ambrose: When I have learned contentment in poverty, the next lesson is to govern my heart and temper. For what good is it to me to be without worldly things, unless I have besides a meek spirit? It suitably follows therefore, Blessed are the meek.[11]
Augustine: The meek are they who resist not wrongs, and give way to evil; but overcome evil of good.
Ambrose: Soften therefore your temper that you be not angry, at least that you be angry, and sin not. It is a noble thing to govern passion by reason; nor is it a less virtue to check anger, than to be entirely without anger, since one is esteemed the sign of a weak, the other of a strong, mind. [See Aristotle's account below]
Augustine: Let the unyielding then wrangle and quarrel about earthly and temporal things, the meek are blessed, for they shall inherit the earth, and not be rooted out of it; that earth of which it is said in the Psalms, Thy lot is in the land of the living, (Ps. 142:5.) meaning the fixedness of a perpetual inheritance, in which the soul that hath good dispositions rests as in its own place, as the body does in an earthly possession, it is fed by its own food, as the body by the earth; such is the rest and the life of the saints.
Pseudo-Chrysostom: This earth as some interpret, so long as it is in its present condition is the land of the dead, seeing it is subject to vanity; but when it is freed from corruption it becomes the land of the living, that the mortal may inherit an immortal country. I have read another exposition of it, as if the heaven in which the saints are to dwell is meant by the land of the living, because compared with the regions of death it is heaven, compared with the heaven above it is earth. Others again say, that this body as long as it is subject to death is the land of the dead, when it shall be made like unto Christ's glorious body, it will be the land of the living.
Hilary of Poitiers: Or, the Lord promises the inheritance of the earth to the meek, meaning of that Body, which Himself took on Him as His tabernacle; and as by the gentleness of our minds Christ dwells in us, we also shall be clothed with the glory of His renewed body.
Chrysostom: Otherwise; Christ here has mixed things sensible with things spiritual. Because it is commonly supposed that he who is meek loses all that he possesses, Christ here gives a contrary promise, that he who is not forward shall possess his own in security, but that he of a contrary disposition many times loses his soul and his paternal inheritance. But because the Prophet had said, The meek shall inherit the earth, (Ps. [37]:11.) He used these well-known words in conveying His meaning.
Glossa Ordinaria: The meek, who have possessed themselves, shall possess hereafter the inheritance of the Father; to possess is more than to have, for we have many things which we lose immediately.

The website preceptaustin.org, run by Bruce Hurt, offers a wealth of discussion on the scriptures, including Matthew 5:5 . It goes into depth on a particular elaborated theological perspective informed by 18th-20th century British and American theologians (and the 17th c. English Puritan Thomas Watson) such as Adam Clarke, Martyn Lloyd-Jones, William Barclay, Charles Spurgeon, William Edwyn Vine, John Charles Ryle, John Vernon McGee, Rod Mattoon, R. Kent Hughes, and John MacArthur, as well as some already mentioned. Their interpretations largely agree with the foregoing, but some notable excerpts can be given:

William Barclay gives an extra "amplified" translation of this verse "O THE BLISS OF THE MAN WHO IS ALWAYS ANGRY AT THE RIGHT TIME AND NEVER ANGRY AT THE WRONG TIME, WHO HAS EVERY INSTINCT, AND IMPULSE, AND PASSION UNDER CONTROL BECAUSE HE HIMSELF IS GOD-CONTROLLED, WHO HAS THE HUMILITY TO REALISE HIS OWN IGNORANCE AND HIS OWN WEAKNESS, FOR SUCH A MAN IS A KING AMONG MEN! [Recall Aristotle's account above]

D Martyn Lloyd-Jones in his classic treatise on the Sermon on the Mount draws a parallel with much of the modern church movement asking "is there not a rather pathetic tendency to think in terms of fighting the world, and sin, and the things that are opposed to Christ, by means of great organizations? Am I wrong when I suggest that the controlling and prevailing thought of the Christian Church throughout the world seems to be the very opposite of what is indicated in this text? 'There', they say, 'is the powerful enemy set against us, and here is the divided Christian Church. We must all get together, we must have one huge organization to face that organized enemy. Then we shall make an impact, and then we shall conquer.' But 'Blessed are the meek', not those who trust to their own organizing, not those who trust to their own powers and abilities and their own institutions. Rather it is the very reverse of that. And this is true, not only here, but in the whole message of the Bible. You get it in that perfect story of Gideon where God went on reducing the numbers, not adding to them. That is the spiritual method, and here it is once more emphasized in this amazing statement in the Sermon on the Mount.

MacArthur writes that "Meekness is the opposite of violence and vengeance. The meek person, for example, accepts joyfully the seizing of his property, knowing that he has infinitely better and more permanent possessions awaiting him in heaven (Heb. 10:34). The meek person has died to self, and he therefore does not worry about injury to himself, or about loss, insult, or abuse. The meek person does not defend himself, first of all because that is His Lord’s command and example, and second because he knows that he does not deserve defending. Being poor in spirit and having mourned over his great sinfulness, the gentle person stands humbly before God, knowing he has nothing to commend himself.

F. B. Meyer: Even now the meek soul gets the best out of life. The world does not think so. It thinks that the meek must be worsted because they will not stand upon their rights, nor wield the sword in self-defence, nor meet men on their own terms. But, as ever, Christ's words stand the test of experience. The meek find more pleasure in simple joys than wrong-doers in all their wealth. Pure hearts find wells of peace and bliss in common sights and sounds. There is no twinge of conscience or bitter memory of wrong-doing to jar on the sweet consent of holy song ever arising in nature.

Both Eduard Schweitzer (The Good News According to Matthew) and John Nolland (The Gospel of Matthew: a commentary on the Greek text) give an interpretation of "powerless" for πραυς.

Summing up, we might state the consensus position of these commentators: The term πραυς is a virtue of mildness, gentleness, humility, suffering injury or insult patiently and without retaliation, foregoing revenge (or entrusting to God to exact due vengeance), submission to the will of God, restraining anger, and bearing wrongs patiently. Jesus himself is a prime exemplar, who underwent punishment and insult and even execution with patient endurance, not retaliating but rather willing to suffer wrongs (in the synoptic gospels, he does not defend himself at his own trial). Going with the principle of Imitatio Christi, we ought to do likewise.

Strength And Weakness

What exactly does "meek" mean? Quoting from etymonline.com:

late 12c., mēk, "gentle or mild of temper; forbearing under injury or annoyance; humble, unassuming;" of a woman, "modest," from a Scandinavian source such as Old Norse mjukr "soft, pliant, gentle," from Proto-Germanic *meukaz (source also of Gothic muka-modei "humility," Dutch muik "soft"), a word of uncertain origin, perhaps from PIE *meug- "slippery, slimy." In the Bible, it translates Latin mansuetus [tame, mild, gentle, literally "accustomed to the hand"] from Vulgate (for which see mansuetude). Sense of "submissive, obedient, docile" is from c. 1300.

In commonly used dictionaries, we find such definitions as:

Showing patience and humility; gentle. Easily imposed upon; submissive. (American Heritage Dictionary)

Enduring injury with patience and without resentment: MILD. Deficient in spirit and courage : SUBMISSIVE. Not violent or strong : MODERATE. (Merriam Webster)

Quiet, gentle, and not willing to argue or express your opinions in a forceful way (Cambridge)

Having or showing a quiet and gentle nature : not wanting to fight or argue with other people. (Britannica)

These all seem broadly consistent with what we have discussed so far. Thus, "meek" as defined above is a reasonably fair and accurate translation of the Greek πραυς, though the connotations of "meek" don't perfectly align with those of πραυς laid out above.

Some people might think "meekness" connotes "weakness", perhaps because the two sound similar, but this is not the meaning of the word (though it can be a shade of meaning). In fact, nothing about physical capacities is necessarily implied by the word "meek". One can be weak/powerless and meek, or strong/powerful and meek, or anywhere in between. Nor does it imply cowardice: in fact, to sustain meekness often involves the courage to endure insult or injury without retaliation or losing one's temper. The word itself cannot be blamed for how some people tend to misinterpret it. As should be clear by this point, meekness (specifically πραυτης) is a quality of character and thus is available to everyone no matter how weak or strong they are. Aristotle's usage makes this evident: as a character virtue, and thus as a choice, one must make or a habit one must cultivate. If you can keep your anger, resentment, and violence in check, which anyone, no matter how strong they are, can do, then you can succeed in being πραυς.

Perhaps an argument can be made that the quality of "strength, prowess, physical competence, power, ability to do harm" should also be cultivated in addition to meekness. That may or may not be the case, depending on one's sense of virtue, but those qualities are not themselves implied by πραυς, nor are they necessarily ruled out. Πραυς does not define what one can do, but how one chooses to be. It is defined more by what one chooses not to do (not to retaliate, vent rage, etc.) than by what one can or chooses to do.

Let us look at some notable New Testament verses about strength, weakness, and violence.

Matt 5:38-45: "You have heard that it was said, ‘An eye for an eye and a tooth for a tooth.’ But I say to you: Do not resist an evildoer. But if anyone strikes you on the right cheek, turn the other also, and if anyone wants to sue you and take your shirt, give your coat as well, and if anyone forces you to go one mile, go also the second mile. Give to the one who asks of you, and do not refuse anyone who wants to borrow from you. You have heard that it was said, ‘You shall love your neighbor and hate your enemy.’ But I say to you: Love your enemies and pray for those who persecute you, so that you may be children of your Father in heaven, for he makes his sun rise on the evil and on the good and sends rain on the righteous and on the unrighteous."
Matt 26:52 " 'Put your sword back in its place,' Jesus said to him [Peter], 'for all who draw the sword will die by the sword.' "
Rom 12:14-21: "Bless those who persecute you; bless and do not curse them. Rejoice with those who rejoice; weep with those who weep. Live in harmony with one another; do not be arrogant, but associate with the lowly; do not claim to be wiser than you are. Do not repay anyone evil for evil, but take thought for what is noble in the sight of all. If it is possible, so far as it depends on you, live peaceably with all. Beloved, never avenge yourselves, but leave room for the wrath of God, for it is written, 'Vengeance is mine; I will repay, says the Lord.'[Deut 32:35] Instead, 'if your enemies are hungry, feed them; if they are thirsty, give them something to drink, for by doing this you will heap burning coals on their heads.'[Prov 25:21-22] Do not be overcome by evil, but overcome evil with good."
Rom 15:1 "We who are strong ought to put up with the failings of the weak and not to please ourselves."
1 Cor 1:25-29: "For God’s foolishness is wiser than human wisdom, and God’s weakness is stronger than human strength. Consider your own call, brothers and sisters: not many of you were wise by human standards, not many were powerful, not many were of noble birth. But God chose what is foolish in the world to shame the wise; God chose what is weak in the world to shame the strong; God chose what is low and despised in the world, things that are not, to abolish things that are, so that no one might boast in the presence of God."
2 Cor 12:5, 9-10: "On behalf of such a one I will boast, but on my own behalf I will not boast, except of my weaknesses... But he [Jesus] said to me, 'My grace is sufficient for you, for power is made perfect in weakness.' So I will boast all the more gladly of my weaknesses, so that the power of Christ may dwell in me. Therefore I am content with weaknesses, insults, hardships, persecutions, and calamities for the sake of Christ, for whenever I am weak, then I am strong."
Phil 2:5-8 "Let the same mind be in you that was in Christ Jesus, who, though he existed in the form of God, did not regard equality with God as something to be grasped, but emptied himself, taking the form of a slave, assuming human likeness. And being found in appearance as a human, he humbled himself and became obedient to the point of death—even death on a cross."

All these support the plain meaning of Matthew 5:5 and the repudiation of any claim that the New Testament advocates strength, the capacity to harm, or worldly power. The early Church fathers understood this perfectly well, notably the anti-militaristic Tertullian. The development of "Christian warriors" (in any literal sense) as any sort of norm or ideal is a much later development and flies in the face of a fair and honest reading of the New Testament. A plain, straightforward reading of the New Testament would find an endorsement of pacifism over militarism.

Finally, let us also look at three problematic verses, sometimes offered against a pacifistic message, and offer a refutation for each:

1) Matt 10:34 "Do not think that I have come to bring peace to the earth; I have not come to bring peace but a sword."

The suggestion is that Jesus is advocating for or at least may sometimes advocate for violence. But is this a fair reading, given all the foregoing? More context is revealing:

Matt: 10:32-39 “Everyone, therefore, who acknowledges me before others, I also will acknowledge before my Father in heaven, but whoever denies me before others, I also will deny before my Father in heaven. Do not think that I have come to bring peace to the earth; I have not come to bring peace but a sword. For I have come to set a man against his father, and a daughter against her mother, and a daughter-in-law against her mother-in-law, and one’s foes will be members of one’s own household. Whoever loves father or mother more than me is not worthy of me, and whoever loves son or daughter more than me is not worthy of me, and whoever does not take up the cross and follow me is not worthy of me. Those who find their life will lose it, and those who lose their life for my sake will find it."

It's clear that the "sword" is not any literal sword but stands in contrast to "peace" as a metaphor for strife and division between those obedient to Christ and those not, proverbially, the sheep and the goats. But there is another possibility, also metaphorical: many other passages in the New Testament use "sword" as a metaphor for the word of God (Eph 6:17, Heb 4:12, Rev 1:16, 2:16, 19:15, 19:21). Thus, Jesus brings the word of God, a cause of division and a means of warring with spiritual evil. What Jesus, of course, did not mean was any sort of literal sword, especially given that he never does bring any sort of actual sword.

2) Luke 22:36 "[Jesus] said to them, 'But now, the one who has a purse must take it, and likewise a bag. And the one who has no sword must sell his cloak and buy one.' "

Is Jesus advising his disciples, and his followers down through the ages, literally to go out and purchase weapons? Again, a bit of context makes this clear:

Luke 22:35-38: [Jesus] said to them, “When I sent you out without a purse, bag, or sandals [Luke 10:4], did you lack anything?” They [the 12 apostles] said, “No, not a thing.” He said to them, “But now, the one who has a purse must take it, and likewise a bag. And the one who has no sword must sell his cloak and buy one. For I tell you, this scripture must be fulfilled in me, ‘And he was counted among the lawless,’[Isaiah 53:12] and indeed what is written about me is being fulfilled.” They said, “Lord, look, here are two swords.” He replied, “It is enough.”

A few things to note:

Jesus' command was not even literally carried out by those he spoke it to, purchasing no swords and falling 10 swords short.
Jesus negates his previous teaching in Luke 10:4, so it would be impossible faithfully to follow both. Does this teaching supersede the former?
They must do so in order to fulfill the prophecy, that Jesus will be "counted among the lawless." In buying a sword, they are becoming lawless, since they would be forming an armed uprising, carrying weapons where it would be illegal to do so, as it would be for would-be-revolutionary Jews under Roman law. Is Jesus instructing his followers to become "lawless"?
Jesus' laconic response is terse and dismissive, and the conversation ends: "That is enough [to fulfill the scripture]." He could even easily be saying, colloquially "That's enough [so don't bother further]", or "That's enough [out of you/on the matter]."
The verse is not generalized to all his followers or even for all times. He is speaking only to his closest disciples and is giving them instruction for a specific time (now) and reason (to fulfill prophecy). There is no suggestion this is a general precept later Christians should follow.

The meaning is not terribly subtle, though it is worded in a less than direct way: Jesus knows he will be found to be a "lawless [one]" i.e. a criminal. He tells his disciples that they may as well go and buy swords since he will "be counted among the lawless" (found guilty and executed) and that would fulfill the scripture quite literally. When they produce two swords, he gives them an ambiguous dismissive answer and the conversation ends. This bit of dialogue is an element of the Passion story, not a maxim: there is no suggestion that later Christians ought to do this. Jesus does not endorse the arming of Christians, as a rule.

3) The Cleansing of the Temple (Matthew 21:12–17, Mark 11:15–19, Luke 19:45–48, John 2:13–16). Two versions--with sufficient context--will suffice:

Mark 11:15-19: On reaching Jerusalem, Jesus entered the temple courts and began driving out those who were buying and selling there. He overturned the tables of the money changers and the benches of those selling doves, and would not allow anyone to carry merchandise through the temple courts. And as he taught them, he said, “Is it not written: ‘My house will be called a house of prayer for all nations’[Isaiah 56:7]? But you have made it ‘a den of robbers.’[Jer. 7:11]”

John 2:13–25: The Passover of the Jews was near, and Jesus went up to Jerusalem. In the temple he found people selling cattle, sheep, and doves and the money changers seated at their tables. Making a whip of cords, he drove all of them out of the temple, with the sheep and the cattle. He also poured out the coins of the money changers and overturned their tables. He told those who were selling the doves, “Take these things out of here! Stop making my Father’s house a marketplace!” His disciples remembered that it was written, “Zeal for your house will consume me.” [Psalm 69:9] The Jews then said to him, “What sign can you show us for doing this?” Jesus answered them, “Destroy this temple, and in three days I will raise it up.” The Jews then said, “This temple has been under construction for forty-six years, and will you raise it up in three days?” But he was speaking of the temple of his body. After he was raised from the dead, his disciples remembered that he had said this, and they believed the scripture and the word that Jesus had spoken. When he was in Jerusalem during the Passover festival, many believed in his name because they saw the signs that he was doing. But Jesus on his part would not entrust himself to them, because he knew all people and needed no one to testify about anyone, for he himself knew what was in everyone.

These points all support a symbolic reading, specific to Jesus in particular at that specific time and place and point in history, with no suggestion at all that this is something Jesus would want his followers to emulate:

This event would have happened in the outermost court of the gentiles, which was massive: 36 acres of area. There is no way Jesus alone could have cleared and policed the entire space. Even with the help of his followers, unless there were hundreds, this would have been impossible, and even then it would take an hour or more. If this really did take place, it couldn't have taken place in the whole space, but only in one small corner. Unless it was symbolic, it would have been pointless.
Assuming the act is purely symbolic, its significance is hard to miss: Jesus is cleansing the temple of uncleanliness, stating that it is unsuitable for its purpose as a house of prayer (Synoptics), possibly from the noise and bustle of the market, or simply oughtn't to be a marketplace (John), and declaring that there are many "thieves", presumably the priests or those taking advantage of a captive market. Whether he succeeded is not relevant for establishing what Jesus's preferred ideal is, and thus what they ought to prefer as well.
In Mark, this takes place on Monday of Holy Week, on Sunday in Matthew and Luke. In either case, one week later, Jesus will have died and been resurrected, and initiated the destruction of death and Satan, prefiguring the cleansing of the world of evil at the eschaton. This also supports a symbolic reading: Jesus is symbolically cleansing the temple of evil, prefiguring his cleansing of death (evil) at his resurrection, prefiguring the general resurrection, and the general cleansing of evil at the Final Judgment. In John, this takes place near the beginning of his ministry, years from Holy Week, but at the Passover, on which day Jesus will be killed as a sacrifice, the paschal "lamb of God".
In John, the Jews ask him “What sign can you show us for doing this?”, and he replies “Destroy this temple, and in three days I will raise it up.” So the "sign for" whipping and driving out the money changers and others and overturning the tables and making his claim is, maybe, that the temple will be destroyed, and in or within three days Jesus will raise it up. The author explains: he was speaking of the temple of his body, thus, "destroy my body and in three days I will raise it up". Indeed, that is what the gospels say transpired at the crucifixion followed by the resurrection three days later. So what he did in the temple is like what will happen to his body. And Jesus himself knew what was in everyone, namely, evil, uncleanliness one needed to be cleansed of. Thus, he cleansed the Temple just as his body would be cleansed, through death and resurrection. Indeed, about 40 years later, as the author of the gospel of John clearly knows, the city and Temple of Jerusalem would be destroyed by the Roman general Titus. But the heavenly Jerusalem would be rebuilt from that destroyed body. There are clearly many layers of significance to this act if it is seen to be symbolic.
In the Synoptics, Jesus quotes an eschatological prophecy from Isaiah 56, describing all nations, morally perfected and having seen the truth of the Jewish faith, coming to the Jerusalem temple at the end of days. He also references an episode from Jeremiah 7-8, in which the prophet stands in the Jerusalem temple gate and exhorts all Judeans to "amend your ways and your doings" in ways both moral (interpersonal, social) and religious (heterolatry, idolatry, impiety, not heeding prophets), not merely in words, and if they do, YHWH "will dwell with you in this place, in the land that I gave to your ancestors forever and ever." Indeed, Jeremiah claims some of them "have built the high places of Topheth, which is in the valley of the son of Hinnom [Gehenna], to burn their sons and their daughters in the fire; which [YHWH] commanded not, neither came it into [His] mind." But if they do not, then "[YHWH's] anger and [YHWH's] fury shall be poured out upon this place, upon man, and upon beast, and upon the trees of the field, and upon the fruit of the land; and it shall burn, and shall not be quenched." The people will be bird-food and "the land will be desolate". Their bones will be unburied, "dung upon the face of the earth." Christians can obviously find the significance in Jer 8:4 "Thus saith [YHWH]: do men fall, and not rise up again? Doth one turn away, and not return?" The parallels between Jer 8:8-9 to 1 Cor 1:20-25 are striking. The temple would be destroyed as Jeremiah predicted, as described in Jeremiah 52, making this reference itself a prophecy of the destruction of the temple, which did take place 40 years later. The symbolic significance of the act is thus matched by a rich significance in the prophetic references he makes.

In conclusion, then, the cleansing of the temple clearly was a symbolic act that was relevant to Jesus and the Temple, at his specific time and place and point in Jewish/Christian history, and not in any way an example we ought to follow. Anyone who takes Imitatio Christi to the point of imitating him in this ought only to do so in the temple in Jerusalem, which doesn't exist, and only if they are the Messiah.

HELPS

This difficult-to-translate root (pra-) means more than "meek." Biblical meekness is not weakness but rather refers to exercising God's strength under His control – i.e. demonstrating power without undue harshness.
The English term "meek" often lacks this blend – i.e. of gentleness (reserve) and strength.

This source finds in the term πραυς a connotation of "...and strength", "exercising (God's) strength" and "demonstrating power without undue harshness" (thus, more simply: demonstrating power with due harshness). It even goes so far as to implicitly disparage other sources and translations for failing to include ("often lack") this crucial hidden meaning. None of the many translations and commentaries we have looked at have made any such claim. There is no reasoning given, no explanation as to how they came up with this connotation or why so many centuries of translators and commentators have failed to come up with it, or why "biblical meekness" should be so substantially different than non-biblical meekness. It is simply asserted here without any basis. One wonders where they are getting this sense of "strength" or "due harshness".

It is worth pointing out that the reference given, the Discovery Bible, is a Bible study software endorsed by a handful of evangelical scholars from evangelical universities. It advertises itself with the slogan:

Read Your Bible And Instantly See What Is Lost In Translation… (Without Knowing Any Greek Or Hebrew!)

With this software, you can allegedly, without the trouble of learning Greek or Hebrew, get quickly to the underlying meaning of the Bible that is obscured or absent in other Bible translations. You, the ignorant monolingual layperson, can get access to the true meaning of the Bible those professional, elite Bible scholars and translators don't (or can't) put in the standard translations. The appeal to those ignorant of the original biblical languages serves many purposes: 1) it allows the sellers of the software to frame their perspective as both "deep" and "hidden", 2) it draws in those who don't know any better and pushes away those who think they do (e.g. those who bother to learn Hebrew or Greek), and 3) It ensures that any false claims won't be found out. The website says "No Greek Or Hebrew Experience Required" but it would be more accurate to say "Required: No Greek or Hebrew Experience." Otherwise, you might see through the interpretational bias.

Looking at some of their promotional videos, we can see that many of their additions seem to be good-faith attempts to add value to Bible studies for those who are not expert, multilingual exegetes. Insights on word order and emphasis, verb forms, intertextuality, commentaries, and subtleties of translation are all perfectly acceptable, but there is also a clear ideological payload tucked inside the ostensible "insights". The example of πραυς is a case in point, reading into the term a masculine bent: meekness is, perhaps, perceived as uncomfortably feminine, gentle, soft, and yielding, as opposed to the strength, power, and even violence they would prefer to find in it. One imagines the thought process as something like this: " 'Blessed are the meek'? That can't be right. 'Meek' must not really mean 'meek'. "

However, this reading is entirely without any justifiable philological basis and is a flagrant case of eisegesis. It is an abdication of the responsibilities of interpretation and translation, succumbing to the temptation to find in the text what one wishes were there, rather than the restraint to limit oneself to what the text itself can support. Indeed, if "biblical-X" can mean something substantially different than "non-biblical X", how can we possibly get at this meaning? At best we can look for other usages in the biblical corpus, as we have done, but even these must be informed by the usage of the word more generally. The original readers, before the compilation and canonization of the Bible, had no recourse but to take the word in something like its standard or typical usage, and we must follow suit if we want to get at the original, fundamental meaning that the original author meant to express. If inclusion between the covers of the Bible transforms the meaning of a word, this transformation has no constraints, there is no way to verify or falsify any such claim of meaning, and if two people disagree over this transformation, there is no way to determine which has the better claim. In short, it becomes a dogma deprived of any verifiable basis. However, any standard, ecumenical exegetical resource should not cater to such dogmatic infiltrations. BibleHub ought to remove this spurious claim from its website and limit itself to strictly philological hermeneutical resources. Or, if this Discovery Bible entry remains up, it ought to come with a disclaimer making its evangelical (masculinist) bias transparent, and expressly stating that this interpretation rests not on any close reading of the text but rather on a particular ideological agenda.

One charitable interpretation of this variant interpretation is that so many years and layers of theology had been put on this little word πραυς that the meaning slowly evolved. This might derive from Aristotle's definition of the term, which has more of a sense of "self-restraint" or "self-mastery." Recall that he said: "we sometimes praise those who are harsh-tempered as manly, and fitted to command." The evangelical authors of this entry likely agree. But as we and other commentators have pointed out, this need not inform the biblical usage: Aristotle was prescriptively giving his definition of what is a technical term in his ethical system, rather than a descriptive definition of typical usage. It is a mistake to think that the word as Aristotle defined it must match how other users of the word meant it. But supposing Aristotle's meaning is involved, it's understandable how this meaning of "strength" or "due harshness" could creep in. Understandable but not excusable, however, as this is still ultimately an ideological insertion not supported by philological analysis. There are certainly plenty of more effusive commentators who take the liberty to add theological color to this word as they see or preach it, and there's nothing wrong with that for what it's worth. But that should always be separable from the meaning of the term before and absent any later ideological accretions.

As we shall see, this small inclusion has had some wide ripples in the broader culture, particularly through a certain Canadian psychology professor turned public intellectual/self-help guru and amateur Bible interpreter by the name of Jordan Peterson.

Jordan Peterson

Jordan Peterson has offered comments on Matthew 5:5 on a number of occasions, leaning on the imagery of a "sheathed sword". Let's look at several to establish his view.

In 2017, Jordan Peterson produced a series of biblical lectures, focusing mostly on the book of Genesis. In the 9th lecture titled "The Call to Abraham", he discusses Genesis 14, in which Abraham rescues his nephew Lot. After finishing a reading of Genesis 14:14, (in which Abraham learns of the capture of Lot, musters 318 armed servants, and pursues the captors), Peterson says (between around 1:44:00 and 1:46:00):

Well, so now we also know that Abraham's a pretty brave guy, right? He gets word that this horrible war has broken out in the worst of all possible places, and that his nephew is involved, and the first thing he does is, you know, mount up his posse and get the hell in there and rescue his nephew. So Abraham's... Oh, whatever goodness is, from the Old Testament perspective, it isn't harmlessness, right? It isn't emasculation and castration. It's not that. It's not weakness. It's not the inability to fight. None of that is associated with virtue. It's a sort of strength that enables someone to mount an armed team of [more than] 300 people when he finds out that his nephew is being kidnapped in a terrible war and to get the hell out there and take them back. And so that's it. That's a call to - it's a call to power, not a kind of peaceful meekness. That's funny, too, because, you know, there's a line in the New Testament: "The meek shall inherit the earth." (I've got to look at my phone for a sec, here, I don't know what time it is) There's a line in the New Testament that says--and it's in the Sermon on the Mount--it says "The meek shall inherit the earth." And that-- I read that line and it always bothered me. I thought "No way. That's not, that's not right. "Meek" can't be the right word." So when I was doing this story of Noah [lecture 7] and talking about the Sermon on the Mount, I spent a bunch of time looking at commentaries on that line looking at the roots, you know, the Greek roots and the Hebrew roots, and trying to figure out what that meant. And "meek" does not mean "meek": that's wrong. Here's what it means: "Those who have weapons and know how to use them but still keep them sheathed will inherit the earth." Jesus! That's a lot different, man. It's a lot better! Right? Because the way it's normally interpreted is: "If you're so weak that you're harmless then things will go well for you." It's like: No. That's not right, that's not. That can't be right: it doesn't fit with the narrative. It certainly doesn't fit with this narrative.

We can note a few things:

Peterson is put in mind of Matt 5:5 when reading an entirely different passage in Genesis 14. There is no direct connection between these two. This points to something in his personal psychology,
Peterson is operating under the (erroneous) belief that the Bible is homogenous in its message, with no logical or thematic inconsistencies. He seems to consider it a single unified book with a single unified message, despite it being written in multiple languages over centuries (one part literally called "Old" and another "New"), and that Jesus is explicitly reforming some of the Mosaic laws ("You have heard it said... but I tell you..."). Clearly, he prefers the model of Abraham's brave and violent exploits to Jesus' exhortations to "turn the other cheek" and to "Put your sword back into its place, for all who take the sword will die by the sword." He notes that "it certainly doesn't fit with this narrative" without considering that the two may be at odds: instead, he wants to find some way to reconcile them. He sees Abraham "mount up his posse and get the hell in there" to rescue his nephew and he admires and wants to advocate for that sort of behavior, but he is haunted by Jesus' claim praising meekness. The unequivocal claim by the Son of God must be taken as given.
Peterson misquotes the scripture, albeit in fairly a benign, periphrastic way. However, this is the first indication that he is not trying to be careful or exact. Strictly speaking, he is incorrect that the Bible states, in these terms: "The meek will inherit the earth".
Peterson is evidently preoccupied with masculinity ("It isn't emasculation and castration") and seeking a way to reconcile his ideal of masculinity with the scripture. As noted earlier, this tension between harsh masculinity and meekness is a driving issue in this heterodox interpretation.
Peterson's belief that "meek" is not "the right word" and it's "wrong" is purely based on his own intuitions, rather than on evidence. This is a telling suggestion that he is using motivated reasoning.
He claims he spent "a bunch of time looking at commentaries on that line looking at the [Greek/Hebrew] roots", yet he cites no source and his interpretation does not match any other source. He claims in other places, as seen below, that he did his research on BibleHub, looking at their commentaries. Yet as we have shown above, none of the commentaries on BibleHub come anywhere close to his interpretation. Could this be a bald-faced lie? Is he really so arrogant to think he knows better than the consensus of all those expert interpreters? Does he not expect anyone to check? As it happens, he is right: his fans don't check.
Peterson confidently offers his interpretation as the authoritative, singular ("Here's what it means"), best ("It's a lot better!"), interpretation, with no source or basis. Why not just say "Here's how I like to think about it"? His interpretation flies in the face of the consensus of interpretations given so far, possibly with the exception of HELPS (where he very likely is working from), but even in that case, he goes far beyond it. Note that πραυς has gone from "gentle, humble" to, effectively, "ready to attack". Nearly a 180-degree turn!
Peterson offers a misinterpretation that likewise seems unique to him: "If you're so weak that you're harmless then things will go well for you." One cannot help but think that this was how he himself had misinterpreted it. It may also be derived from Nietzsche's criticisms of Christianity (see below), so it seems as though Peterson is trying retroactively to change the meaning of this verse to preempt that criticism, as he understands it. As noted above, meekness has no connotation of "weakness", though it does arguably imply voluntary "harmlessness".
By Peterson's own interpretation, then, Abraham and his allies would not be "meek" since they clearly unsheathed their swords. Does it follow that they won't "inherit the earth"?

In late January of 2018, in his appearance on the Joe Rogan Experience (#1070), (between 1:06:22 and 1:08:45) Peterson and Rogan have the following exchange:

Peterson: Well the other thing I've been telling young men is that--and and this is something I think that you could relate to tremendously--is... I read this New Testament line, well, decades ago, and I could never understand it. It's... The line is "The meek shall inherit the earth" and I thought "There's something wrong with that. That line, it just doesn't make sense to me, meek[ness] just doesn't seem to me to be a moral virtue." And so I did a series of biblical lectures this year--like 15 of them, and that was also a weird little experience that we can talk about--but I was looking through these sayings, these maxims, and that was one of them: "The meek shall inherit the earth." But I've been using this [web]site called Biblehub[.com]and it's very interesting, it's very, it's organized very interesting. So you have a biblical line and then they they have like three pages of commentary on each line, and so, because people have commented on every verse in the Bible, like, to the to degree that's almost unimaginable, so you can look and see all the interpretations and all the translations and get some sense of what the genuine meaning might be, and the line "The meek shall inherit the earth": "meek" is not a good translation -- or the word has moved in the 300 years or so since it was translated. What it means is this: "those who have swords and know how to use them but keep them sheathed will inherit the world." and that's--
Rogan: Hmmm!
Peterson: --another thing I've been telling you... Yeah no kidding! That's a lot different.
Rogan: That's a big difference!
Peterson: It's so great! And so, like, one of the things I tell young men-- well, young women as well-- but the young men really need to hear this more, I think, is that: you should be a monster. You know, because everyone says: "Well you should be harmless, virtuous, you shouldn't do anyone any harm, you should sheathe your competitive instinct, you shouldn't try to win. You know, you you don't want to be too aggressive, you don't want to be too assertive, you want to take a back seat and all of that." It's like: No. Wrong. You should be a monster, an absolute monster, and then you should learn how to control it.
Rogan: Do you know the expression: "It's better to be a warrior in a garden than a gardener in a war"?
Peterson: Right, right, exactly, that's exactly it.
Rogan: Yeah,
Peterson: And that's exactly right. And so, when I tell young men that, they think... Well, lots of them are competitive, they're low in agreeableness, you know, because that's part of being competitive, temperamentally. It's like: is there something wrong with being competitive? There's nothing wrong with it. There's something wrong with cheating. There's something wrong with being a tyrant. There's something wrong with winning unfairly. All of those things are bad, but you don't want people to win? What's the difference between trying to win and striving? You want to eradicate striving?

Let's note a few things

"Meekness" is indeed a moral virtue, both in classical Christianity and in Aristotelian philosophy. Thus, whatever "seems" to Peterson is a poor guide to what is actually the case.
Peterson here claims to have consulted the many commentaries on BibleHub, which we have looked at above, and yet his interpretation does not come from any of them and doesn't even seem to be informed by any of them. Is he lying about reading the commentaries, or does he know he is not faithfully representing them? Or does he only mean HELPS? It's not clear, but he seems not to have done his research very well in any case.
Peterson is wrong in his speculation that the meaning of "meek" has substantially changed since (presumably) the King James Bible was written (1611, so closer to 400 years), which he'd know if he read the commentaries and other translations as he claims he did.
Rogan accurately notes "that's a big difference" yet doesn't seem to think that this "big difference" is indicative of an error on Peterson's part. Rather he seems implicitly to trust him and thus, presumably, thinks that there has been some big systemic or conspiratorial error on the part of biblical scholars and translators. This wouldn't be the first time Rogan has endorsed conspiratorial thinking.
Peterson's maxim "be a monster" and his endorsement of personal, competitive ambition is directly antithetical to the meaning and sense of πραυς as discussed above. The saying in James 3 is applicable: "But if you have bitter envy and selfish ambition in your hearts, do not be arrogant and lie about the truth. This is not wisdom that comes down from above but is earthly, unspiritual, devilish." It would thus be fair to characterize Peterson's maxim "be a monster", biblically speaking, as satanic.
Jesus likely would admonish against striving after worldly things but instead after heavenly things (gentleness, peace, charity, brotherly love). Competition and ambition ("trying to win") are precisely such worldly things Jesus and his followers would recommend against. The preacher of the book of Ecclesiastes, who was "king in Jerusalem" would likewise denounce all worldly striving as "vanity and vexation of spirit."
Peterson seems fixated on competition, focusing on a segment of young men whom he describes as "competitive... low in agreeableness". One gets the sense Peterson thinks of himself this way, and hampered by exhortations to be more agreeable and cooperative. He seems to neglect cooperation entirely, seeing life as a competitive zero-sum game of winners and losers. In his world of lobster dominance hierarchies, this makes some sense, but he is neglecting the message of brotherly love, compassion, charity, and community in the New Testament. Peterson seems to essentialize and thus legitimize disagreeableness, rather than seeing it as, at best, a quality of temperament with both benefits and detriments which should be held in proper check, and at worst, an impediment to proper full moral growth. Peterson seems to want to turn competitive disagreeableness into a good in itself.

In January of 2018, Peterson sat for an interview with Timon Dias for the Dutch publisher GeenStijl, titled "Jordan Peterson's Philosophy of 'How to be in the World' distilled down to its 5 strongest points". The fifth point (around 1:40:00) is "Minimize your persona and cultivate your essence and live in its closest possible proximity" (a long and complex way of saying "be yourself and be minimally duplicitous"). Peterson explains the Jungian ideas of the persona and the shadow, especially "integrating the shadow", obedience to norms, and the usefulness of the shadow in resisting that. His example is Nazi Germany in which, he claims, the society was gradually "bent" over to fascism, mentioning his opposition to Bill C-16 which rocketed him into celebrity, framing his opposition to the bill, which expanded the protections for transpeople, as a principled opposition to the creep of fascism. In response to the question "Do you think weak men can be virtuous?" Peterson gives an unqualified "No." "Weakness" is framed as lacking the option to sin. "Without the possibility for evil, there cannot be good," Peterson says. According to Peterson, rule-breaking, presumably as in doing what is wrong, is better than cowardice and weakness (the inability even to break rules), but it is less than being good. He ends with:

One of the most amazing things that I discovered this year or stumbled upon was-- I was puzzling over a line in the New Testament which I've always been curious about because it never sat right with me: "The meek shall inherit the earth." and so as I said before if you go online-- Bible-hub, I think it's called. Biblehub. It's really good for this because it contains a collection of commentaries so you can look at a verse and other translations, multiple translations and multiple commentators so each verse is taken apart by many many people-- and I found out that the word "meek" either doesn't mean now what it meant when people first translated the text or it was a mistranslation. Either way... But because meek sounds like powerless and harmless (it's something like that, right?)... But what meek actually means-- it's the derivation of a word-- it's the translation of a word that meant something more like "those who have swords and know how to use them but keep them sheathed." I thought "Oh yes that's exactly it: those who have swords and know how to use them but choose to keep them sheath will inherit the world" It's like: Yes. Exactly right. Exactly right. That's much different.

Let's note a few things:

Here, Peterson again makes reference to the many commentaries on Biblehub and gives his unsourced and unsupported interpretation that directly contradicts those interpretations.
As we have already noted, meekness does not itself connote powerlessness and harmlessness. He seems never to have actually checked. Thus, he is reacting to a misconstrual, likely his own. But even if it did mean that, he would have no basis for rejecting it given that he
He begins with the more qualified and hedged "something more like" but ends with the more decisive "that's exactly it". How he knows "that's exactly it" is based not on any evidence but merely on his own baseless and demonstrably inaccurate intuitions.
It's clear based on the narrative of tension ("never sat right") to release ("Oh yes that's exactly it") that his emotions are what is driving his selection of interpretation. His conclusion follows with no reluctance or even detachment but rather the contrary: it follows with relief and exultation, the overcoming of a long-held anxiety. One cannot help but get the sense that Peterson sees or wants to see himself as meeting his own interpretation, thereby rendering him virtuous, "in the proper moral position". Indeed, there is a note of shame in having once seen himself as virtuous because harmless, and now he is eager to distance himself from that former self-concept.
"That's much different" is taken as a good thing, a relief, rather than indicative of an error. Presumably, he would have to think that the "much different" interpretations are in error relative to his "exactly right" interpretation, bespeaking a profound arrogance.

Finally, In another clip, as part of a Q&A, Peterson offers the following:

I read this interesting commentary a little while ago on a statement by Christ in the New Testament, and the statement generally interpreted is: "The meek shall inherit the earth." But I was looking up the multiple translations of the word "meek," and "meek" is actually derived from a Greek word, of course, --because the Bible, or at least some of the original forms of the Bible were in Greek-- and that word [πραυς] didn't exactly mean "meek." It meant something like: "those who have weapons and the ability to use them, but determine to keep them sheathed will inherit the world." And that means people who are capable of force, let's say, but decide not to use it are in the proper moral position... So, now, if you're an axe murderer but don't have an axe that doesn't mean that you're moral.

And a few notes:

His ultimate interpretation of those in "the proper moral position" is highly simplistic and monochromatic: does being moral consist in being capable of force and not using it? Needless to say, this capacity for force is not at all implied by the word πραυς, though the restraint of retaliation or harsh action is implied. A half-truth, but likewise a half-lie, and not in any way faithful to the "multiple translations" he claims to have consulted.
His concept is governed by physical force and physical harm. His metaphors are always weapons and even murder. This is a particularly masculine, worldly, and literal concept altogether lacking from the Bible or any of the sources he claims he consulted.
Presumably, then, those incapable of force, such as women and children and weak men, cannot be in "the proper moral position". Thus, Peterson betrays a clear masculinist bias that denigrates weakness, making it grounds for exclusion from virtue. Only the strong can be good, as he sees it, as he explicitly says in the previous clip. ("Do you think weak men can be virtuous?" "No.")

As seen from all the foregoing, Peterson's interpretation is unique to him. No other commentator, translator, or interpreter uses the language or imagery of "sheathed weapons". Even the HELPS source he is probably drawing from doesn't mention it. So where did he get it from? And what makes him so confident as to offer it as "the meaning"?

The answer might be as simple as: narcissism. Note his reasoning. He comes upon this interpretation and rather than check to see if it is validated by the data, rather than try to falsify it like a good scientist or exegete, instead he checks to see if he himself approves of it ("I thought oh yes that's exactly it" "It's a lot better!"). This is mere confirmation bias. The "genuine meaning" is not defined by the history of usage or by the consensus of interpreters, both of which would weigh entirely against his preferred interpretation, but simply by his own internal, infallible, semantic compass. In short: he likes it because it comports with his masculinist ideology, or because of his personal history of self-image anxieties, or both, or whatever the case may be. He is the authority and his authoritativeness amounts to oracular access to "the truth", "the real meaning" that he simply knows when he sees it, that others simply don't have such access to. His confidence is based not on any argument or data, but on his personal feelings about what it should mean. He admits that the standard interpretation "never sat right with me" and that he thought "There's something wrong with that. That line, it just doesn't make sense to me, meek just doesn't seem to me to be a moral virtue." Clearly, then, he disagrees with both Aristotle and Jesus and centuries of historical Christianity and has to twist the meaning to suit his feelings. I'd guess he saw the direction the HELPS source went in and extrapolated from there to even greater heights of eisegesis. Clearly, this beatitude produces a lot of angst for those who want to reconcile the gospels with a tough, stern, strong concept of masculinity and by hook or by crook they will find a way to do so. If it comes to changing either the meaning of the scriptures or their idea of masculinity, the former will lose every time.

As pointed out, Peterson claims to have consulted the many commentaries on Biblehub: "if you go online-- Bible-hub, I think it's called. Biblehub. It's really good for this because it contains a collection of commentaries so you can look at a verse and other translations, multiple translations and multiple commentators so each verse is taken apart by many many people." Yet he never cites which of the many commentaries he is drawing from. It's not hard to see why. If we ourselves go to the commentaries on Biblehub and look through the collection of commentaries, none gives an interpretation coming anywhere close to Peterson's. Peterson also says he "was looking up the multiple translations", but as we showed above, all the translations are quite consistent, whether in English or any other language, and none comes close to his interpretation. If he has done the research he claims to have done, how can he hold to such a different view than the sources offer? None of the possibilities are terribly promising: 1) He is a bald-faced liar, 2) He didn't understand what he read, 3) He thinks he knows better than respected, expert exegetes and commentators, 4) He found the one interpretation (presumably HELPS) and decided (on no other grounds than his personal feelings) that that one, other commentaries notwithstanding, was the best and truest interpretation. But even in the last case, why not quote that source in the terms it uses? Why change or add to it? And why not cite his source (he almost never does)? Not even HELPS tries to construe the πραυς as trained swordsmen (literally or metaphorically) who keep their weapons sheathed, ready to attack. The answer is obvious: not only does he know he doesn't need to, but he knows it would only backfire if he did since then others could go and check that he was wrong. And nothing strikes more terror in Peterson's heart than the idea that he could be shown to be wrong: One searches in vain for a single instance of him unambiguously admitting he was wrong or apologizing for a single error.

In some of his other work, Peterson will talk about the "dark triad" of resentment, arrogance, and deceit as habits or facets of character that should be minimized or avoided. But as we have seen above, he is enacting this "dark triad" himself. He resents the standard definition of πραυς, arrogantly thinks he knows better than the expert commentators and translators he claims to have consulted, and deceptively states his interpretation as the singular correct one, despite it being antithetical to the actual meaning. He ought to listen to his own advice, and the advice of Jesus himself: "You hypocrite, first take the plank out of your own eye, and then you will see clearly to remove the speck from your brother’s eye."

Suppose we try to give Peterson as much charity and benefit of the doubt as we can. In what sense can he be right? Supposing we all had "swords" (means of doing harm) of some sort or other, he is right that meekness involves keeping them sheathed, but this is only a more convoluted way of saying "self-control" or "restraint". Indeed, if everyone has a "sword", then "those who have swords" is everyone and so is unnecessary: it would be clearer and more direct to say this otherwise, and no one would be "weak" as in "incapable of doing harm". Peterson should take his own advice and be more precise in his speech. He is also right that those who don't "have swords" are not rendered meek-- and thus blessed-- ipso facto because they lack the means to do harm. Meekness is a trait of character, not of action directly: in his terms, it is not wanting to be an axe murderer regardless of whether one has an axe or not. However, this is about as far as charity can take us. It cannot be avoided that the capacity to do harm is in no way implied by or part of the meaning of πραυς. It's also hard to give him much charity given that he has told us his sources, which we can check for ourselves, and find that he hasn't accurately represented any of them. Peterson has no escape here: he is a fraud, a hypocrite, and a perverter of holy scripture. He needs to put his house in order and tell the truth – or, at least, don't lie.

Nietzsche and Christianity

In the last video clip we gave, Peterson goes on to add a note about the 19th-century German philosopher Friedrich Nietzsche:

Nietzsche commented on that, too, a fair bit, you know. He thought of most morality as cowardice, not because morality itself is cowardice but because most people who are cowards disguise their cowardice as morality and they claim that their harmlessness--which is actually a consequence of their fear and inability to be harmful, say, or to be dangerous--is actually a sign of moral integrity and that's a really bad idea.

Why did Peterson mention this? In his books and lectures, Peterson frequently quotes or references Nietzsche, and has numerous videos discussing quotations or ideas from Nietzsche. It would be fair to assume Nietzsche has had a significant effect on Peterson's thought. Though it's not clear exactly where he is drawing this idea from, we can offer some guesses:

Daybreak section 101: "Suspicious. – To admit a belief merely because it is a custom – but that means to be dishonest, cowardly, lazy! – And so could dishonesty, cowardice and laziness be the preconditions of morality?"

The Genealogy of Morals section 14: "Will any one look a little into—right into—the mystery of how ideals are manufactured in this world? Who has the courage to do it? Come! Here we have a vista opened into these grimy workshops. Wait just a moment, dear Mr. Inquisitive and Foolhardy; your eye must first grow accustomed to this false changing light—Yes! Enough! Now speak! What is happening below down yonder? Speak out that what you see, man of the most dangerous curiosity—for now I am the listener. "I see nothing, I hear the more. It is a cautious, spiteful, gentle whispering and muttering together in all the corners and crannies. It seems to me that they are lying; a sugary softness adheres to every sound. Weakness is turned to merit, there is no doubt about it—it is just as you say." Further! "And the impotence which requites not, is turned to 'goodness,' craven baseness to meekness, submission to those whom one hates, to obedience (namely, obedience to one of whom they say that he ordered this submission—they call him God). The inoffensive character of the weak, the very cowardice in which he is rich, his standing at the door, his forced necessity of waiting, gain here fine names, such as 'patience,' which is also called 'virtue'; not being able to avenge one's self, is called not wishing to avenge one's self, perhaps even forgiveness (for they know not what they do—we alone know what they do). They also talk of the 'love of their enemies' and sweat thereby"...

In Beyond Good and Evil, the Genealogy of Morals, and even in the AntiChrist, Nietzsche describes the evolution of morality that began with "master morality"--the positivistic values of the nobility that contrast themselves not with "evil" but with "bad, mean, lowly"-- and progressed, through later Judaism and Christianity, to "slave morality", characterized by resentment, reaction, envy, and inability reframed as superior, "weakness is turned to merit". The latter Nietzsche considers intrinsic to and definitive of Christianity, as exemplified in the expression of Paul: "God chose what is weak in the world to shame the strong; God chose what is low and despised in the world, things that are not, to abolish things that are." Or, in the formulas of Jesus: “The last shall be first and the first last.” or "Whoever finds their life will lose it, and whoever loses their life for my sake will find it." This world is not what matters, but rather the world to come. The great reversal of fortunes at the eschaton represents precisely this inversion of values, as can also be seen in the "woes" of Luke 6:24-26, "if your enemies are hungry, feed them; if they are thirsty, give them something to drink, for by doing this you will heap burning coals on their heads," and in numerous other passages. This inversion of values cannot be separated from Christianity without doing great damage to its coherence and leaving it mired in absurdity and contradiction.

Peterson's interpretation of "meekness" seems to be an attempt to reconcile so-called "master morality" to Christianity. He is committed to Christianity, in one way or another, and yet also seems to agree with Nietzsche that "master morality" is preferable to "slave morality". Thus, he wants to have his cake and eat it, too: he wants to find "master morality" in Christianity. Nietzsche would obviously laugh at the attempt, as this reconciliation is impossible and doomed from the outset. Only by compartmentalization and doublethink, or by a methodology of confirmation bias and willful ignorance of contrary evidence can one undo the inversion of values evident in the Christian scriptures. Ergo, "meek" cannot mean "meek", rather it must mean "unmeek".

This is not unlike what Constantine and his successors did through the edicts of Milan and Thessalonica, resulting in Catholic Nicene Christianity becoming the state religion of the Roman Empire. It is a supreme irony of history that the empire responsible for executing Jesus--ruled by the "god of this age (2 Cor 4:4)" (i.e. the Devil), the New Babylon whose destruction is celebrated in Revelation 19--nominally adopted (i.e. cynically co-opted) the movement that yearned for its destruction. But even Constantine never went so far as to assert that Christianity promoted "master morality": he waited until he was near death to be baptized and washed of his sins, demonstrating that he did not think of himself as moral up to that point. But Peterson wants to endorse "master morality" and still construe it as being a "good Christian", and to do that, he must invert through convoluted or baseless interpretations the plain meaning of the New Testament.

Peterson is not alone in this, by any means: ever since Constantine, there have always been Christians seeking to legitimize negating the negation of--and thereby recover-- master morality. Most often this manifests as the nobles affirming slave morality in name only, with a significant admixture of guilt, perhaps, but de facto "master morality for me, slave morality for thee." The modern evangelical movement--documented extensively in Kobes Du Mez's book Jesus and John Wayne--with its emphasis on masculinity and militarism, is a particularly notable case, operating primarily through selective readings and compartmentalizations, and Peterson shows his influence from (or appeal to) this movement. Recall once more Aristotlecomment, especially apt here: "We sometimes praise those who are harsh-tempered as manly, and fitted to command." But the cost to achieve this recovery of "master morality" is too high: that cost is intellectual integrity and coherence. They would be more honest simply to deny Christianity outright. As Nietzsche writes: "It is at least certain that sub hoc signo Israel, with its revenge and transvaluation of all values, has up to the present always triumphed again over all other ideals, over all more aristocratic ideals."

If Peterson admires or aspires to "master morality," he should simply reject Christianity. And if he is committed to accepting Christianity, he cannot accept "master morality". Or, if he insists on having both, he will be plagued by the contradiction, requiring the lie (So much for rule 8: "Tell the truth – or, at least, don't lie.") Perhaps Peterson has succeeded in convincing himself it is true, but this would only raise further the degree to which he has dissociated himself from reality. In Nietzsche's description, he has become a theologian: "I find the arrogant habit of the theologian among all who regard themselves as 'idealists'—among all who, by virtue of a higher point of departure, claim a right to rise above reality, and to look upon it with suspicion... Whoever has theological blood in his veins is shifty and dishonorable in all things." Peterson's thought represents a failed Hegelian synthesis of the thesis of Christianity and the antithesis of Nietzsche, resulting in an incoherent and falsified abomination. In his unconcealed motivated reasoning, his unwillingness to tolerate uncomfortable truths (indeed, even to consider some truths uncomfortable, to allow comfort to be any indicator or guide to truth), and his evident compromised intellectual integrity, he shows that he is not one of Nietzsche's "true readers", as explained the AntiChrist:

The conditions under which any one understands me, and necessarily understands me—I know them only too well. Even to endure my seriousness, my passion, he must carry intellectual integrity to the verge of hardness. He must be accustomed to living on mountain tops—and to looking upon the wretched gabble of politics and nationalism as beneath him. He must have become indifferent; he must never ask of the truth whether it brings profit to him or a fatality to him.... Very well, then! of that sort only are my readers, my true readers, my readers foreordained: of what account are the rest?—The rest are merely humanity.—One must make one’s self superior to humanity, in power, in loftiness of soul,—in contempt.

...Why in the world should it be assumed that true judgments give more pleasure than false ones, and that, in conformity to some pre-established harmony, they necessarily bring agreeable feelings in their train?—The experience of all disciplined and profound minds teaches the contrary. Man has had to fight for every atom of the truth, and has had to pay for it almost everything that the heart, that human love, that human trust cling to. Greatness of soul is needed for this business: the service of truth is the hardest of all services.—What, then, is the meaning of integrity in things intellectual? It means that a man must be severe with his own heart, that he must scorn “beautiful feelings,” and that he makes every Yea and Nay a matter of conscience!

The window into Peterson's mind occasioned by this interpretation of this single biblical verse has indeed offered us much insight into his mental machinations, possibly more than the evidence will allow. I intend at some later time to give a fuller account of Peterson and his interpretations of Christianity and Nietzsche. But from what we have seen so far, it is plain that, at the very least, Peterson ought not to be taken simply at face value in any matter of exegesis or theology. He has demonstrated a willingness to interpret, not according to the evidence of philology or historical commentary, but rather by what he personally feels the interpretation must be, and rather than framing this as a personal interpretation, is content to assert that it is the meaning. One wonders how far this fault extends into his other claims (a lot, it turns out).

Epilogue: A Conversation with a Peterson Acolyte

"The Unconventional Compass" is a fairly small YouTube Channel with about 3000 subscribers, run by Josh Rueff. He describes himself as "an entrepreneur and digital marketing coach with a passion for helping people build remote businesses and driving revenue, especially using content, SEO and digital ads" as well as "a writer, Marine Corps vet, dog lover, traveler, fly-fisherman, and plenty of other things nobody cares about." His channel mostly produces self-help-style videos, with his earlier videos offering writing advice or videos about his dogs or fly-fishing. Starting in September 2021, his content shifted to focus mostly on Jordan Peterson, with Peterson's name in most of the titles and his face in most of the thumbnails. Other figures frequently featured are Jung, Nietzsche, and Jesus. Much of what he has to say merely echoes or expands on what Peterson has already said.

In a video titled "Guide To Integrating With Your Shadow - NEW Jordan Peterson Insights & Old + Carl Jung," Rueff gives a five-minute clip from one of Peterson's lectures, then has the following to say (The bolded words appear serially on-screen):

So he [Peterson] covers a few things there and basically says: "You need to become (more or less) a monster with self-control and morals." Right? It reminds me of when he talks about the Bible verse that says "Blessed are the meek for they will inherit the earth. [Matthew 5:5]" He says, correctly, that the Greek word we translate into meek which is πραυς actually has a fuller definition. Τhis word does not mean weakness of any sort, but actually refers to exercising strength under control; demonstrating power without undue harshness. It's what you are when you have a sword and you know how to use it and are courageous enough too [sic] if necessary. But you keep it sheathed unless it's necessary to use it. (That's the example Peterson likes to give) You're a monster, or can become one at the drop of the hat but you keep that part of your psyche holstered mostly, because you're properly integrated [with respect to your shadow].

He offers no sources in the video, but he is clearly drawing from the HELPS entry (which he plagiarizes almost verbatim, notably excluding the mention of "God") and Peterson's repeated comments on this verse. The rest of the video is largely Jung quotes and clips of Peterson, sparsely interspersed with commentary.

I engaged with him in the comments in a fairly lengthy back-and-forth (which I encourage anyone with the patience to read) about what I saw as a failure on his part to do proper research. I have already covered all the claims I made to him in this article (this article is my consummate revenge, in a sense), and I won't do a line-by-line analysis of the debate, so I will only discuss patterns in his responses. In short, he was committed to his conclusion and, to avoid the points and arguments I put forth, he resorted to a number of fallacies which we will discuss below. I could have been clearer that I was approaching this purely philologically, though I repeatedly asked him to offer even a single usage that evidenced his claimed meaning which he never provided. He also made mention on several occasions of sources for his claim, which never materialized, and it seems clear that his main sources were Peterson, HELPS, and possibly blogs like Sam Whatley's. For him, it wasn't a matter of showing that his interpretation was justified, but only that I couldn't prove to his satisfaction that it wasn't unjustified. As far as he was concerned, the say-so of Peterson was enough to legitimize the claim, and so all he had to do was find some way of dismissing my points.

Argument from Authority: Ultimately, this was the clincher. Despite my trying, I couldn't get him to understand that no person has access to special knowledge on this, and even the experts must base their interpretations on the data. Despite offering much data, it was never enough and he felt content to stick with his preferred authorities. He even went so far as to defend the argument from authority as non-fallacious. This is the effect of seemingly-legitimate authorities making bogus claims: uneducated people see the claims as validated by the imprimatur of the authority and spread them. As I had no authority, what I said was never enough to gainsay his authorities, no matter what evidence or arguments I put forth, and no matter the absence of any arguments or evidence from his authorities. He is simply authoritarian in such matters.
Circular Reasoning/Confirmation bias: By assuming that his interpretation was valid, he read other interpretations through that lens and found compatibility which he saw as justification. This is entirely backward thinking: conclusions should follow from the evidence and should never precede their evidence, unless there is then an attempt at falsification, as in the scientific method. I tried to point out that the idea of "strength" etc. must come from somewhere and that there is no evidence whatsoever for any legitimate place it could possibly come from, but it was pointless. Once he had this interpretation--and liked it--he read it back into whatever data was available. Being based on nothing, his position was unfalsifiable, but to him, this was more a feature than a flaw.
Red Herrings: the conversation was often sidetracked into irrelevancies clearly meant to divert attention away from the point of disagreement. Nietzsche came up, interestingly.
Moving Goalposts/Vagueness: Partway in, he suddenly expressed that he never meant "physical" strength or power (or at least didn't exclusively mean it). Rather, he meant mental or emotional strength. I pointed out that Peterson, with his reference to Abraham's warring, axe murderers, harmfulness, and force is clearly referring to physical prowess. He claimed that anger was a sort of sword, drawing on Aristotle's discussion. I conceded that there may be some sense if it's taken to that metaphorical extreme, but that it should be specified as such given that "strength," unqualified, especially in light of Peterson's comments, would be taken to be physical.

He later came out with a long explanatory comment laying out his thought on it. It contained exclusively those bits of information that could be taken to support his conclusion but failed to include even a single instance of non-confirmatory evidence (that is, it's entirely confirmation-biased). He writes:

Within the context of the following for example, praus seems to mean something like "gentleness" to a subordinate or lower creature - in the scriptures, it's often used in the context of a more powerful person or person of authority correcting someone "lower" in a sense, or disciplining them, and in ancient Greece it was clearly often used to express gentleness with a "lower" creature, or to describe a powerful animal's gentle or mild nature; self control or gentleness despite great power implied. The "gentle" creature spoken of often has great strength, power and/or nobility, like a warhorse, or a god... Above all, this word does NOT mean weak. Or timid. "Gentle, despite having strength or power" - something like that is closer to correct if not completely correct. I believe this is oftentimes and perhaps always a perfectly reasonable translation.

Much of this is simply false. Rueff is right that πραυς does not itself (necessarily) imply weak or timid, and he is right that it can be applied to powerful beings like warhorses. But what he gets wrong is that the word itself does not itself connote power or strength (Also, being an adjective, it would mean "gentle," not "gentleness"). Again, that it can be applied to powerful or strong beings doesn't mean that it connotes power or strength, as it could just as well be applied to Rueff claims "in the scriptures, it's often used in the context of a more powerful person or person of authority correcting someone 'lower' in a sense, or disciplining them." But he offers no evidence for this and it's simply not the case, as we have already shown. I tried asking him the clear yes/no question "Can a weak person be πραυς?" (the objectively correct answer is obviously "yes") which he never unambiguously answered. Rueff is just plain wrong that there is any connotation of "...despite having strength or power." This is a bogus addition. It is never a reasonable translation. Granted, beings with strength or power can be πραυς, but so can weak or powerless ones. The word πραυς itself is simply independent of strength and power. It doesn't mean weak, it doesn't mean strong, it means "gentle, humble, meek".

But Rueff does claim he's open to other evidence, though my interactions with him suggest this is not terribly sincere. At any rate, I hope he will read all of this and consider changing his mind. Josh, please leave behind the satanic lies of Peterson and come over to the side of Jesus and of truth, "Then you will know the truth, and the truth will set you free." For the rest of us, you serve as a cautionary tale of the dangers of authoritarian thinking and motivated reasoning. You can do better than Peterson, and I hope you someday do.

He has since blocked me and won't respond to my questions or comments. I tried sending him this blog post but have yet to get any reply or acknowledgment. So much for free speech or the free marketplace of ideas, huh?

Valuation Theory

2021-04-06T09:08:00.000-07:00

Valuation Systems

A valuation system (VS) is any system by which value is assigned to things. That is, the way in which terms like "better", and "worse", "good", and "bad", are given meaning or are understood. For example, in choosing a hinge for a door, one system of saying "hinge X is better than hinge Y" is to consider price (cheaper being better, for instance), or resistance to rust, or weight, or color, or size, etc. After all, there is no unqualified way to say "hinge A is better than hinge B", and any statement that does not explicitly state the way in which hinge A is deemed better than hinge B will have some implicit VS.

All VSs have a domain, which is the set of all things which can be valued by that VS. The VS used to compare hinges won't be able to compare the value of microprocessors, or political parties, or cake recipes. It is important to keep in mind the domain of a VS when discussing it. We will denote the domain of VS X as D_X.

Types of Valuation Systems

There are two general sorts of valuation systems:

Comparative Valuation Systems (CVSs): Determines only the ranking of value for the elements of a given, countable set. If X is a CVS and X values A above B, we will write that as \( (A>B)_X\), which we can read as "A is better than B, according to X". Note that CVSs don't have any notion of "good" or "bad", but only "better" and "worse", and possibly "best", if there is some element better than the rest.

A subset of CVSs are Bi-comparative VSs (bCVSs, or C₂VSs), which only rank sets with exactly two elements, either with one better and one worse, or with both equal. If the bCVS has the additional property of being transitive, then the system can be used to impose a partial ordering on the elements of its domain.

Evaluative Valuation Systems (EVSs): Determines the plain value of every element in its domain, like a function. Namely, we can symbolize "the value of A, according to EVS X" as \(V_X(A)\). Without loss of generality, we can take the values assigned to be real numbers. If only order is important, we can take the range to be the numbers in the interval \([-1,1]\). Note that EVSs can have a notion of "good" and "bad", in that we can define "A is bad, according to EVS X" as \(V_X(A)< c \), for some number c, which we can take to be 0. Similar statements can be similarly defined. To keep notation consistent, we will write \((A>B)_X\) iff \(V_X(A)>V_X(B)\), for some EVS X.

Indifferent Extensions

We can also define the indifferent extension of a valuation system X with domain D_X as the valuation system that is identical to X for any elements in D_X, and is indifferent to all other things. More exactly, we can define it for the cases of CVSs and EVSs as follows:

CVSs:
Let \(X\) be a CVS with domain \(D_X\). The CVS \(X'\) is the indifferent extension of \(X\), such that, for any \( a,b \notin D_X\) and \(c \in D_X\), \((a< c )_{X'} \), \((a=b)_{X'}\).

EVSs:
Let \(X\) be an EVS with domain \(D_X\). The EVS \(X'\) is the indifferent extension of \(X\), such that, for any \( a\notin D_X\), \(V_{X'}(a)=0\).

Optimal Elements

We can also give meaning to statements like "t is the best element in set S, according to X", in two senses. We can say that t is the optimal element of S according to VS X if, for every element s of S such that \(s \neq t\), then \( (t > s)_X \). We can say that t is an equi-optimal element of S according to VS X if, for every element s of S, \( (t \geq s)_X \). We can also say that "t is the best element in set S, according to set A", for some set A of VSs, if, for each VS X in A, s is the optimal element in X. We might also stipulate that for every VS in A there is an optimal element in S. Similarly for equi-optimal.

If we want to say something like "t is the best element in S" without qualifying it by a VS, it must be the case that all valuation systems agree (or perhaps there is some "best VS" which would deem s optimal, but we will get to that later). Namely, we say that s is the universo-optimal (UO) element of S if, for every VS X for which there is an optimal element in S, s is the optimal element of X. We also can say that s is a universo-equi-optimal(UEO) element of S if, for every VS X for which there is an equi-optimal element in S, s is an equi-optimal element of X. Note that for there to be a universo-optimal element, all relevant VSs must agree: if there is even one VS for which there is a different optimal element than another, then there is no universo-optimal element in S.

Meta-Valuation Systems, Optimal Valuation Systems, and Recommendation

We can also have VSs whose domain includes some subset of the set of all VSs. We can call these meta-valuation systems (MVS). We can also define the set of totally meta-VSs (TMVS), which is the set of all VSs whose domain includes the set of all VSs.
Now, if there is to be some VS that can be called "the best VS", it must be the case that it is UO (or at least UEO) in the set of all VSs. Thus we define:
a VS X is the objectively best VS iff, for ever VS Y in the set TMVSs for which there is an optimal element, X is the optimal element of Y in the set of all VSs.
However, it seems not hard to very strongly suggest if not prove that there is no such VS, for all it takes are two TMVSs with optimal elements that disagree as to this optimal element, and this seems very easy to construct. Thus there simply is no such objectively best VS. We can call this the Universo-Optimality Absence Theorem.

Also, we can say that VS A recommends VS B if \((B>A)_A\). We denote this by \(A \rightarrow B\). Clearly A must be a MVS, as it includes the VS B in its domain. The relevance is that, if we hold to VS A, and A recommends B, then we should discard A and take up B instead. We may have some issues if A recommends multiple VSs, but then the solution would then be to follow the recommendation that is outranks the rest. For example, if \(A \rightarrow B\) and \(A \rightarrow C\), and \((B>C)_A\), then we should choose B, rather than C. However, we will say that a VS A is a consistent recommender if it is the case that if \(A \rightarrow B\), and \(A \rightarrow C\), and \((B>C)_A\), then \(C \rightarrow B\), and it is not the case that \(B \rightarrow C\).

Antagonist Valuation Systems and the Universo-Optimality Absence Theorem

Take any VS \(X\). We define the antagonist valuation system to \(X\) (denoted \(X^A\)) as follows: If \(X\) is a CVS, and \( (P>Q)_X\), then \( (P<Q)_{X^A}\). Similarly, if \(X\) is an EVS, then \(V_X(P)=-V_{X^A}(P)\). It is clear that is any VS is specifiable, its antagonist will likewise be specifiable merely by reversing all the valuations. It also clearly follows that a VS and its antagonist can never recommend the same thing, as a VS and its antagonist never agree (except in the case of indifference). Thus, for a given valuation, however many VS can be found that agree with that valuation, precisely the same number of antagonists can be found or formed which disagree with the valuation. It follows that there cannot possibly be any universo-optimal valuation system, and thus the theorem is proven.

Some Implications for Morality

Morality is always associated with valuation. Specifically, every moral system corresponds to a valuation system, which we may call a moral valuation system. The domain of a moral valuation system would be decisions made in response to a scenario (the scenario may be implicit, but it is always there. Murder, for instance, is incoherent unless there is someone to murder and some way to murder them). For the same scenario, one decision may be better than another, or one may be good while another bad. A decision (or subset of decisions) may be called "obligatory" if that decision is the only one that is good while the complement is bad. A decision may be supererogatory if the decision is better than other good (or not-bad) decisions. Similarly, a moral systems may recommend another moral system if the first system deemd the second better than the first. However, by the universo-optimality absence theorem, there cannot be any one moral system universally judged better than all others. As no valuation can be made apart from some (implicit) valuation system, and there is no intrinsically or universally preferred valuation system, there cannot be such a thing as "objective moral value". Even a deity could not have access to something like that. Nothing, even a deity, could not be called "good" except in reference to a VS which must of necessity be, at base, arbitrary (why pick out that VS instead of its antagonist? Or that VS rather than any other VS? The answer cannot possibly be that the VS is "the best" as that has no meaning apart from some VS).

Thus, there may be three general projects for moral philosophy:

1) Descriptive ethics: what moral valuation system do people use? Can we form a description that fully/maximally captures the way the person/population actually values things?

2) Educing and following recommendations: given a certain VS, what VS does it recommend? Can we iterate this until we have a self-recommending VS?

3) Axiomatics/foundations: what minimal set of axioms sufficiently characterize a VS? How can we simplify a VS so that it is maximally described with the smallest set of principles? Can this be algorithmized in some way? What information is morally relevant/irrelevant to a given VS?

A Brief Discussion of a YouTube Debate (Which is Really Mostly About Some Finitist Arguments)

2020-09-29T14:13:00.003-07:00

I would like to discuss a case study in the contemporary internet apologetics and counter-apologetics ecosystem, namely an ongoing YouTube debate between Cameron Bertuzzi (CB) of Capturing Christianity, and Steven Woodford (SW) of Rationality Rules. Just the names of these channels should give you a fair idea of the sort of discourse in store. Unsurprisingly, this debate is on no more precise a topic than the existence of God. CB has given the opening argument, to which SW has replied and CB has recently given his response in turn.

The videos can be found here. I wish mainly to examine this debate as a window into what apologetics can (and often does) look like today online, and to some extent, in certain academic circles.

Cameron's Main Argument

CB states that he will be defending a version of the Kalam Cosmological argument (KCA). This is a favorite argument of the well-known apologist William Lane Craig (WLC), who almost invariably states it as follows:

Everything that begins to exist has a cause.
The universe began to exist.
Therefore the universe has a cause.
If the universe has a cause, that cause is God.
Therefore, God is the cause of the universe (and thus exists)

The premises are usually confined to 1-3, with the others being less formally explored once (3) is established. Once arriving at (3), WLC will proceed into what the cause of the universe must be like, arguing that it must be spaceless, timeless, powerful, etc. and thus worthy of being called God. This is not too dissimilar from an argument of St. Thomas Aquinas, who argues to divine properties in a similar sort of way, thus the ground there is fairly well-trod.

CB, however, does not offer the argument in this exact way. After going over some of what he sees as historical inaccuracies on SW's part (and alleging that SW consulted only the Wikipedia article), he instead offers the following:

There is a First Cause.
If there is First Cause, then God exists.
Therefore, God exists.

Cameron makes a point of the fact that this is a valid syllogism (stating, for anyone having trouble keeping up, that it follows the form of modus ponens). In his response, SW points out that a valid syllogism is really only what is expected of anyone who understands the basics of logic and thus is not something to be so stressed. CB retorts that validity is crucial for an argument. All I'll say is that validity is akin to showing up to a duel with a pistol rather than a broom, and is really hardy something that should be brandished as worthy of much attention or emphasis.

Let us make some remarks on the argument as given. It relies on the notion of a First Cause (FC), which will presumably function in some metaphysically important way. Moreover, it is arguing through causes. Now, the concept of a cause is ubiquitous in daily locution, and in all manner of diverse, and mutually incompatible ways. Language that sees much use in many forms often acquires a breadth of meaning, and as such is notoriously difficult to work within a philosophical argument, which is ideally as precise and unambiguous as possible. That is, it is prone to equivocation, ambiguity, concept creep, etc. It's not even clear how much validity such a concept has, at least in its most basic form. Perhaps we can one day come to an analysis of Edward Feser's Thomistic arguments and the issues that arise there, but those likewise rest on this idea of causation, which is somehow built from everyday usage and then ends up resembling it very tenuously, or not at all (I will argue). Thus, we may already have a very good idea of what the sort of argument CB will offer and likely what issues it will encounter.

CB explains what he means by a First Cause: "As we look back further and further into the past, we'll eventually arrive at something that has no prior causes. In other words, there is an uncaused starting point." Thus the First Cause can be read as an Uncaused Cause. By way of analogy, he says that there must be a "first domino" in the cascade of dominos that is the past series of causes. Seemingly as a way to calm the antagonistic viewer, he assures us that this First Cause could still, at this point in the argument, be considered potentially to be something that isn't God, perhaps some physical object or state of affairs.

Cameron's First Premise

CB states that he would like to get at an argument for his first premise through an argument for causal finitism, which is just the position that every event has a finite causal history. That is, for all events, there is some finite number N of causes that lead to that event. This is defended by the philosopher Alexander Pruss, known for offering and defending such arguments. CB claims that causal finitism would lead to his premise (1) being justified. We will return to this claim later.

For a sort of warm-up, CB offers a version of Thompson's Lamp. As CB describes it, the lamp is programmed to switch positions (if off, turn on; if on, turn off) at all of {11:00 PM - \(2^{-n}\) hours}, for n={0,1,2,3,...}, that is, at 10:00PM, at 10:30PM, at 10:45PM, etc. Assuming the lamp was off at 9:59PM, what is the state of the lamp at 11:01 PM? It seems that there can't be any good reason to say it is off or that it is on, as that is equivalent to saying infinity is even or odd, whereas that seems absurd. There has, of course, been much argumentation on this question, but CB proceeds from the difficulty in answering this question as evidence that there can't be an infinite number of switches. This is not a common conclusion drawn by philosophers examining this problem, and this serves as an insight into the issues to come. For what it's worth, I think the solution is simple: there is not sufficient information to determine the answer, or, stated another way, there is no contradiction either way. We can be assured the lamp is either off or on, but each is compatible with the problem as given. It is both true that \(\frac{\infty}{2}=\infty\) and that \(\frac{\infty-1}{2}=\infty\), and thus infinite quantities can function as both even and odd.

The Grim Reaper Argument

CB offers a rendition of one of Pruss' arguments, namely the Grim Reaper Argument (GRA). Summarizing, the argument goes like this:

A Grim Reaper (GR) is an entity that has certain god-like powers, only to the extent that it can instantaneously kill someone at a given specific time, if they are alive at that time (it is assumed that death and life are perfectly well defined with no intermediate states, etc.). We suppose that, for all the numbers \({0,1,2,3,4,...}\), there is a corresponding GR (GR(n)) which is programmed (fated?) to kill a certain Fred at \(2^{-n}\) hours after 12:00PM. That is, GR(0) will kill at 1:00PM, GR(1) will kill at 12:30PM, GR(2) at 12:15PM, etc. Fred is alive at 11:59 AM.

The question is, then, is Fred alive at, say, 1:01PM? The argument goes that, clearly, he can't survive past 1:00PM, as if he's alive by then, GR(0) would kill him. But which GR kills him, then? If you say GR(n), kills him, that would mean that GR(n+1) should have killed him instead, as that GR's allotted time came before GR(n)'s, and if Fred was alive for GR(n) to kill, Fred would have been alive for GR(n+1) to kill him. Thus, we cannot point to which GR killed Fred. This inability to point to the guilty GR is offered as a true paradox (as CB puts it, Fred is dead and also not-dead) that shows the impossibility of a past-infinite causal chain. CB claims that the fatal flaw in the GRA setup is the infinite number of GRs.

CB then proceeds as follows:

The GR scenario of the GRA is impossible
If the GR scenario of the GRA is impossible, then causal finitism is true.
Thus, causal finitism is true.
If causal finitism is true, then there is a First Cause
Thus, there is a First Cause.

A (Lengthy) Response to the Grim Reaper Argument

So let's examine this argument, beginning with (4). The impossibility of the scenario is clearly derived from the difficulty in naming the specific GR that killed Fred. We can see this as being the same difficulty in answering "what is the lowest element in the set {1/1,1/2, 1/3,1/4,...}?" The answer is that the set does not contain a minimum (though it has an infimum of 0). This is equivalent to the question "What is the largest member of the set {1,2,3,...}?" to which the answer is that there is no largest member: we can find an element of the set exceeding any given value.

These questions and their replies are instructive. Surely the metaphysical properties of the GRs are related to the mathematical way they are constructed and placed. The GR "paradox" arises from the fact that there is no smallest number greater than 0: for any given x>0, we can form x/2, and as x>x/2>0, x isn't the smallest. This was well-known to the ancient Greeks, even: each time I move my hand a distance D, I must move it through al the distances D/2, D/3, D/4, D/5, etc. Thus, there is no first point for my hand to move to, as it must move to the preceding point first. There is no way to begin moving my hand, then, as it's impossible to specify the first point to move to. Does this mean movement is impossible? Most philosophers (and people generally) don't think so. So how can it be resolved?

There are several solutions that can be proposed, and we will explore a few.

One is that Fred was killed, but not by any nameable GR, or perhaps by a GR not strictly bearing a natural number. This may be referred to as "the infinity-th GR" or "the omega-th GR" (omega being the smallest infinite ordinal). This need not be taken to refer to a GR with the name or index of "infinity" or "\(\omega\)", but rather as a way of saying "The GR beyond any GR one names". Though not rigidly specific, this does always have a referent: for any given GR, there is a GR that came before it. If we lined up all the GRs in ascending index, and asked each if the guilty GR was before (of smaller index) or after (of larger index) him, all would answer "after". The inability to single out a specific slayer may seem problematic, but that is merely the byproduct of dealing with the infinite. Given that "the largest natural number" is meaningless on its face, we shouldn't be surprised when an attempt to find it in a more (meta)physical context likewise fails to be meaningful.

Another potential solution is to introduce a temporal metaphysical principle that precludes the difficulty. As a motivating example, suppose a lamp is off at 12:00PM and it turned on at exactly 1:00PM. What was the last time that it was off? Either the question can't be answered (for reasons like those discussed above), or else we must bring up some metaphysical notions of time. Surely we would like to answer that the last time it was off was at 1:00PM, even though, explicitly, it was on at that time. Perhaps we say that transitions may be counted in both, or in neither, in a meaningful sense.

In the GR case, we may say that temporal sequences must contain their infima/suprema, or that they de facto do. That is, in terms of temporal metaphysics, there either must be or de facto is a GR at 12:00PM, and it is that GR (which may be labeled by its time, or perhaps by "infinity" in the context of index) that killed Fred. This sort of reasoning may apply to any sort of "completed infinite" set.

Suppose a Grim Apparition is, by definition, an opaque planar phantasm of the size of a normal human (say 6 feet tall and 3 feet wide) positioned and facing in a given location and direction. Now suppose that there is an infinite set of these, each indexed by natural number n. All are facing me, as I look due North, but the nth one is \(5+2^{-n}\) feet away. As the apparitions are opaque, I can only see one, the rest being hidden behind the ones preceding it. Which one do I see? It seems that I can't name the one that I see, supposing I do in fact see one. Does this prove that all infinite sets are impossible? Say this is not impossible, and I merely see some apparition but don't know its index, as all are indistinguishable apart from their positions. What I do is command each to form a number on itself (they may do this) that I may distinguish and count them. The first one forms a 1, then moves 5 feet East. The second, infinitesimally behind it (is it not even in the same position?) forms a 2, and steps to the East, behind the first. This continues for some time, perhaps some googolplex apparitions get a number and move aside. But it would seem I have not made any progress: the apparitions are precisely as far from me as they ever were (as \(2^n\) times an infinitesimal is still an infinitesimal, for any finite n). So I can remove any (finite) number and not make any perceptible progress. Effectively, there are arbitrarily many apparitions just stacked at (or just behind) 5 feet North of me.

This suggests another possibility: that "completed" infinite sequences cannot have all their members named. The natural numbers can be used to index arbitrarily deep, but cannot provide an index to all of them. They thus demand some sort of extension in order to be applied to "completed" cases, or cases where the order is reversed. The following may serve as a useful example: it is easily provable (e.g. by induction) that

\[\frac{1}{2^1}+\frac{1}{2^2}+\frac{1}{2^3}+...+\frac{1}{2^N}=1-\frac{1}{2^N}\]

As the series converges absolutely, we may permute the order of addition arbitrarily without worrying about convergence or changing the sum. Thus, let's write:

\[\frac{1}{2^N}+\frac{1}{2^{N-1}}+\frac{1}{2^{N-2}}+...+\frac{1}{2^1}=1-\frac{1}{2^N}\]

This clearly follows from the first. Likewise, we can prove that

\[\frac{1}{2^1}+\frac{1}{2^2}+\frac{1}{2^3}+...+\frac{1}{2^n}+...=1\]

That is, the sum of the reciprocals of 2 to the power of each natural number is 1. But what happens if we try to reverse this expression, as we did with the first? It's not clear that we meaningfully can. We could write something like:

\[\frac{1}{2^\infty}+\frac{1}{2^{\infty-1}}+\frac{1}{2^{\infty-2}}+...+\frac{1}{2^2}+\frac{1}{2^1}=1\]

But this seems like an abuse of notation, to some extent, though it does capture the sense of the Grim Apparitions scenario. We could merely write it as

\[...+\frac{1}{2^3}+\frac{1}{2^2}+\frac{1}{2^1}=1\]

But then this seems not to get at the "reversal" we had in mind, at least not fully (also an ellipsis at the beginning of an expression seems at the very least quite difficult to interpret). Each expression expresses something truthful, and yet also not to fully capture it. Is there merely a deficiency in language or description? Ought some new mathematical way of description be formulated to fill this gap, if indeed there is a gap? That doesn't seem altogether wrongheaded. As \(\omega\) often represents the smallest infinite ordinal, perhaps we can signify the largest finite ordinal by \(\psi\), for obvious reasons. Then \(\psi+1=\omega\) in a certain meaningful sense, though this results in passing from finite to infinite. We can then write:

\[\frac{1}{2^\psi}+\frac{1}{2^{\psi-1}}+\frac{1}{2^{\psi-2}}+...+\frac{1}{2^2}+\frac{1}{2^1}=1\]

in a perfectly meaningful way, the notation making clear that the sum is across all natural numbers. This then would give a clear way to label the guilty GR. Granted this is mathematically speculative and dubious. But then so are such extensions to the real numbers as the hyper-reals and surreal numbers, both of which would label the guilty reaper as "\(\omega\)".

Yet another potential avenue could be to examine the set of GRs as a whole, or perhaps just the concept of "the limit of GR(n) as n increases". Recall that we defined a GR as an entity that has certain god-like powers, only to the extent that it can instantaneously kill someone at a given specific time, if they are alive at that time. Given such a vague notion (that is, unless this is further specified), it is altogether arguable that the set or limit of GRs is itself a sort of GR, and one that kills Fred at 12:00PM (or, what is metaphysically the same thing, at a time infinitesimally removed from 12:00PM). Thus, the guilty GR could be meaningfully said to be an entity of this sort.

The seemingly counterintuitive or absurd is merely what one may run into when dealing with the infinite. All it means is our intuitions often break down when approaching the infinite. Is this a surprise? We don't have much contact with the infinite, in all its various forms, in everyday life, after all. At the end of the day, arguments against the infinite seem always to make that clear: the main reason they give for outlawing the infinite is that the infinite is difficult and gives rise to difficulties. Rathe than working top-down from intuitions, an alternate (if not strictly superior) approach is to work bottom-up: that is, we would use intuitions to build the fundamental framework and derive consequences of that framework. As long as the consequences are compatible with the underlying fundamental intuitions, we would simply go with the derived consequences, any resulting top-down counter-intuitiveness notwithstanding.

As an example, the notion that the part cannot ever be equinumerous to the whole certainly seems intuitive and holds true for all finite wholes and parts, but it fails in the case of the infinite set of whole and even numbers. That doesn't mean we discard the idea of an infinite number of whole and even numbers (some may disagree), but rather that we discard or qualify the intuition. Surely not all intuitions are valid, after all, and it is only through examination and testing that we can discern the valid ones from the invalid ones.

The Rest of the Argument for the First Premise

Getting at long last back to the argument, let's see how the GR argument would even fit into the place where CB uses it. Having cast serious doubt on (4), let's now examine (5), which is "If the GR scenario of the GRA is impossible, then causal finitism is true." As he doesn't offer any independent argument for (5), CB seems to take it as obvious, but is it? First of all, it is worth noting that in the GRA, there is no infinite chain of causes. The cause of any GR killing Fred is only that Fred is alive when that GR's appointed killing time arrives. In a weak sense, Fred's being alive is "caused" by previous GRs not killing him, but that seems incorrect in a major way: absences are now causes of lack of change? Is a window remaining unbroken in any meaningful sense caused by the absence of rocks thrown at it? That seems both silly and metaphysically incorrect. The only way that the GRA could support causal finitism is by supporting a more general notion of finitism which would then be applied to the case of causes. Thus the argument is more accurately framed as follows:

If the GR scenario of the GRA is impossible, then finitism over class X is true.
If finitism over class X is true, then causal finitism is true.
Therefore causal finitism is true.

Let's assume class X merely includes causal chains as a subset, rendering (10) unarguably true. But what of (9)? The GRA is merely one instance of a type of infinite set or sequence. How could one case imply that a much broader set of cases shares the same property? As a rule, this is not the case: P being true of an element rarely serves to show that P is true of the whole set (not even the composition fallacy partakes of this!). So there must be yet another line of reasoning implicit in the argument, namely:

If a case of non-finitism of type T is impossible, then finitism over class X is true.
The GR scenario of the GRA is a case of non-finitism of type T.
Therefore, if the GR scenario of the GRA is impossible, then finitism over class X is true.

But what could the type T be that would render (12) true? The only possibility is that it is in some way equivalent to "belonging to the class X, of which causal chains are a subset". However, this seems hopeless for the finitist: the class X/type T must be such as to include the GR scenario and causal chains generally, but this connection seems simply nonexistent, as the GR scenario doesn't include infinite causal chains at all. We may conclude from the GR scenario that, perhaps, scenarios that rely on identifying a "largest natural number" are impossible (as this is the singular difficulty of the GR scenario, evidenced by its being resolved if such a concept is given a referent). But beyond this, there is nothing standing in the way, as we go into below.

Something worth noting, as a general metaphysical principle, is that impossibilities are, well, impossible. That is, an impossible sort of thing is not merely sometimes impossible, but rather always so i.e. necessarily. Married bachelors aren't impossible only in certain circumstances, but rather in all circumstances. This seems to follow directly from the axiom of S5 modal logic that "necessarily X" implies "necessarily, necessarily X". Given that "necessarily not-X" is equivalent to "not possibly X," it follows that "not possibly X" implies "necessarily, not possibly X." Let us then make the following tiny variation to the GR scenario, with the addition of a single GR (let's call it GR(-1)) which is to kill Fred at exactly 12:00PM. Now we can easily answer which GR killed Fred: it's GR(-1). The difficulty disappears. But we still have an infinite number of GRs! Wasn't it allegedly the infinity of GRs that was at issue? How come we still have infinitely many GRs but now with no paradox? The answer is clear on even a cursory consideration: now there is a well-defined "earliest GR", and it was the absence of this that was the cause of the difficulty in the original GR scenario. This serves as a conclusive proof that it is only a circumstance that can arise with infinity that was at issue, rather than the infinity itself. We can distinguish between infinite scenarios that are and aren't problematic, and thus it cannot possibly be the case that all infinities are impermissible.

Here's a simple demonstration that you can even do at home that shows that it's not the infinite that's the problem, but rather the scenario in which it arises. It is easy to show that the fraction \(1/99\) has the decimal expansion 0.01010101... My challenge is simply this: write the digits 0 and 1 on a piece of paper. Now underline the digit that is the last decimal digit of \(1/99\) (if you don't think it's either of these, feel free to list out all 10 decimal digits and then underline the correct one). Can you do it? I'm quite certain you can't, and simply for the reason that the number has no final decimal digit. Suppose I define a p-rock as any solid rock larger than 1 inch in each dimension that has the last digit (excluding any trailing 0s) of the number p carved on its surface. Some p-rocks are entirely possible: a \(p=1/2\)-rock is possible. However, a \(p=1/99\)-rock is not possible. Thus p-rocks generally are neither all possible nor all impossible: it depends on the p of the p-rock. Suppose I say "Imagine a world with a \(p=1/99\)-rock...". At that point, it can be assured that such a world is impossible, as it includes an impossible object. It is not different than supposing a world with a married bachelor, and concluding that such a world is impossible. This serves merely to highlight an important point: among a class of things related to the infinite, some may be possible, while some may not be: the impossibility of some of them doesn't prove the impossibility of all of them.

Another quick illustrative example is to consider the infinite sequences {1,1,0,0,0,0,...,0,0,0,...} and {1,0,1,0,1,0,...,1,0,1,0,...}. The infinite is equally present in both cases. And yet, if we ask the question "what is the index of the last '1'?" we can answer simply "2" in the case of the first, while there is no such answer in the second. Clearly, then, it is not that the sequence is infinite that causes the difficulty, even if its being infinite allows for the difficulty to arise: the infinite makes possible that certain questions won't have answers, but it doesn't guarantee it. If it was important that we answer such questions of infinite sequences, we could merely restrict ourselves to those of the first sort and outlaw those of the second, or perhaps allow for such answers as "infinity".

From Causal Finitism to a First Cause?

However, there is still the matter of CB's premise (7): does causal finitism show there is a First Cause? Here is a scenario in which causal finitism holds but there is no universal First Cause: supposing finitism generally doesn't hold, let's imagine an infinite array of dominos separated by one unit (an inch, say), and we will specify the locations of the dominos as though they were in a cartesian plane. There are dominos at all of (m,n) where m is any integer {...,-2,-1,0,1,2,...} and n is any non-negative integer {0,1,2,3,...}. The dominos are oriented so that the domino at (m,n) can only be knocked over by the one at (m,n-1) and will then proceed to knock over the domino at (m,n+1). It takes one second between a domino getting hit and striking the next one. This leaves all the dominos at (m,0) free of any domino to knock them over. However, there is also an infinite set of demons D(m) corresponding to each column of dominos. The Demon D(m) knocks over the domino at (m,0) at precisely the time m, measured relative to the time at which the domino at (0,0) was knocked over. Thus, the domino at (m,n) gets knocked over at exactly the time m+n, and its causal chain was started at time m. As all of these are finite, it holds that causal finitism holds. However, there is no First Cause for al the dominos: each column has a separate "First Cause", while there is no single First Cause for all domino falls. demonstrates, then, that CB is making a quantifier shift fallacy: "all causal chains have a first cause" doesn't imply that "there is a (singular) first cause of all causal chains" (confer "all people have a mother" vs. "there is a (singular) mother of all people").

Thus (4), (5), and (7) have been called into serious doubt, which leaves (1) quite unsupported. This is obviously a grave issue for CB's argument.

Cameron's Second Premise

Going on, CB proceeds to argue in support of his second main premise, namely "If there is First Cause, then God exists." In support of this, he begins by inviting us to consider what distinguishes the caused and the uncaused. Despite correctly observing that we have no experience of any such uncaused things (thus rendering almost all his argument moot), he proceeds along the standard apologetics story of some hikers in the woods who find a strange object in the woods. The usual apologetics tack (following WLC) is to say that no matter how strange the object (a glowing orb, say) is, we nevertheless would be incorrect to conclude that it's uncaused, and that would be independent of its particular size or sort, such that even if it were the size of the universe, we would still be warranted in concluding it has a cause.

CB proceeds along these lines at some (tiresome) length, then stressing that "being very different" is not in itself a relevant difference for concluding that something is uncaused. He (again tiresomely) considers then eliminates such features as shape, size, color, and power (oddly referring to battery wattage or FLOPS rather than the more abstract potency), before arriving at his point, which is that limits are how we can distinguish the caused vs. the uncaused. He argues that we understand that rocks, people, cars, etc. have causes because they have limits, and "limits have causes". As the First Cause has itself no cause, he argues, the First Cause must be unlimited, and unlimited in the usual maximal-god ways (e.g. omnipotent, omniscient, etc.).

It's not clear where even to begin with this. CB hasn't given much specificity to the notion of what a cause even is. The causes of the first premise seem to be efficient causes, whereas the causes of the second are pretty clearly formal causes. Aristotelians are often happy to lump these together, but it's worth keeping a solid logical distinction between them. Suffice it to say that the logic of the defense of the first premise can't be used to substantiate the second without transgressing into equivocation. Implicitly, CB argues that something having a limit implies that it has some cause in the sense of not being the first part of a causal chain. Is this true? It's easy to imagine a violation of this, perhaps an eternal infinite void, or a stone forever floating alone in such a void, or perhaps an inferior deity that is not maximal but still functions as a first cause. Is this not a refutation of CB's principle? It seems perfectly coherent that the first element of the universe's causal chain (supposing there is one) has limits: try to imagine it and you'll likely find that you can with almost no difficulty. (You'll notice that many world religions think of a highly limited and imperfect being causing the universe. If nothing else, this is some amount of evidence for the conceivability of it.)

In fact, CB is arguing from "if caused then limited" (using as evidence all manner of everyday examples) to then illicitly conclude "if limited then caused". This doesn't follow logically, of course, but CB seems to want to argue it inductively. Induction has notorious issues (just because every swan we've seen thus far is white doesn't mean all swans are white, after all), but then applying induction here seems to open quite the can of worms. From experience, all limited effects have limited causes (in fact, there is often a certain proportionality: larger causes generally have larger effects), thus unlimited causes would have unlimited effects. Can something perfect produce something imperfect? From our experience, all imperfect effects have imperfect causes. Very arguably producing an imperfect effect would seem to render the cause itself imperfect. Perhaps a better case could be made if we all lived in Eden, but we clearly don't (arguments from the high imperfection of the world can go here).

CB seems to be quietly invoking some sort of "Principle of Sufficient Reason". Perhaps such a principle is strong enough to justify his desired conclusion that limits have causes (read: reasons), but then not only should he state so plainly, but then also defend such a principle.

Regardless, is the God that CB argues the First Cause has to be really limitless? Does God have infinite density and spatial curvature like a black hole? How many lies can God tell? How many miracles would god do to make his nonexistence totally untenable to even the staunchest atheist? The answer to all these are zero, and so zero seems to be another "good" sort of limit. Christians would also answer that other limits on God have the values "three" or "one" so these seem allowable as well. If Occam's razor is acceptable, is not the simplest cause of the universe some entity with exactly one ability, namely, whatever it takes to cause that something that we call the universe? We have no justification to infer that the cause of the universe is perfectly good when "amoral" seems a much more justifiable lack of limitation. We have no reason to suppose that it is omniscient when "lacking any knowledge of any kind" is a much more natural limit.

Thus, CB's premise (2) likewise runs into serious, perhaps insurmountable difficulties. As (1) and (2) are both now gravely undermined, the argument itself needs considerable work and certainly can't be considered to have succeeded.

Cameron's Response

Rather than discuss SW's rebuttal, we will skip straight to CB's response to that rebuttal. This is mainly because CB's arguments are my main object of interest, and, frankly, SW's rebuttal was not nearly as strong as it could have been. My hope is to offer what I consider the best rebuttal available, perhaps, even, in hopes that SW will employ arguments more to this effect.

The Grim Messenger Scenario

As a way to reformulate the GR scenario to cover an infinite past, CB offers a variant of the original GR scenario as follows. We imagine an infinite sequence of Grim Messengers (GMs), each assigned to a natural number index n. GM(n) is assigned to January 1st of the year 2021-n. Each GM is tasked with receiving and then transmitting a message, according to the following rules: GM(n) receives a message from GM(n-1). If the message doesn't contain a natural number, GM(n) is to pass to GM(n-1) the message of "n" (i.e. GM(n)'s index). If the message does contain a natural number, GM(n) passes the message, unaltered, to GM(n-1). CB describes this as "writing down a number", but since, for instance, (100!)! has well over a googol digits, writing it down on any sort of actual piece of paper seems infeasible, thus merely transmitting the message by whatever means seems better. The alleged paradox is thus that the message can't contain any number, as each number has a predecessor (if the message contains N, it should really contain N+1, N+2, etc.).

The response to this is quite simple: it is an impossible scenario, but not because of an infinite past. The only number the message could contain is "the largest natural number" which doesn't exist. Perhaps the message could contain \(\omega\) which each and every messenger passes to his successor, but I suspect CB would say this is not a natural number. The scenario stipulates rules which cannot be followed, and they cannot be followed simply because the rules are unfollowable, just as unfollowable as a form telling you to fill in a space with the last digit of pi. At least the GR scenario could be described in a reasonably coherent way. The GM scenario, in contrast, can at no time be described meaningfully and coherently. Either some GMs (always an infinite subset) fail to faithfully follow their rules or all of them do (by, e.g. passing an \(\omega\)). The scenario is as meaningfully possible as a \(p=1/99\)-rock.

As discussed above, this has no implication that an infinite past is impossible. Here is a simple proof: instead of being required to follow the rules as described, we simply modify the rules, changing nothing else. Now the messengers follow this rule: GM(n) always passes "n" to his successor. Now there is still an infinite number of GMs and yet any paradox has vanished. Thus, it is not that there are infinitely many GMs that caused the problem but merely the rule that they were stipulated to follow. It is not surprising that some rules can't be followed, as there are some rules that can't be followed even in finite time (e.g. write down the last digit of pi).

CB's Responses to SW's Objections to the GRA

SW objects that the GRA can't be applied to an infinite past. CB responds that, while the GRA doesn't itself imply a First Cause, his proposed resolution of the difficulty (i.e. causal finitism) does imply a First Cause. As we've discussed above, this is highly questionable if not fallacious. CB argues that the Grim Messenger scenario does imply the impossibility of an infinite past. Again, this is not at all clear or established and, as I've argued, fails quite resoundingly.

SW objects that one sort of infinity being impossible doesn't imply that all infinities are then impossible. CB responds that he didn't argue that all infinities are impossible. Rather than strict or general finitism, he argues, he has only concluded with causal finitism. Now, this seems either disingenuous or altogether misguided. As discussed above, the GRA has no clear connection to causal finitism except through finitism generally. This is the thrust of SW's objection and CB has tried to avoid it in a rather obtuse way.

CB's Responses to SW's Objections to the 2nd Premise

CB begins by latching on to a comment SW makes and spending a good chunk of time rebutting it. Namely, SW states "As an empiricist, I would say that, to concretely know something, we must verify it through the scientific method". CB labels this as "scientism" and proceeds to offer some standard apologetics rebuttals to it. No real attempt seems to have been made to interpret it in as favorable light as is reasonable. Moreover, all the efforts of the well-known empiricists (e.g. Hume) are not touched on. One simple way to interpret SW's claim more favorably is as stating that intuitions, especially when they conflict with other intuitions, often need to be tested in order to be validated and used confidently. In fact, CB is making what is fundamentally an empiricist argument, appealing to our experiences of things in order to infer certain conclusions. Further, it seems CB missed that SW follows up his statement with "I agree", which would render CB's knee-jerk reaction a case of responding to the part, not the whole.

SW objects that advancements in physics call our notions of causation deeply into question, as can be seen in the cases of Special and General Relativity, and Quantum Mechanics. CB seems to miss this point entirely and argues that such things as scale and celerity are irrelevant to metaphysics. He illustrates his point with such arguments as "if size is relevant to what boxes I can carry, it's relevant to what size boxes can spontaneously appear above my head," and "A semi-truck or race car aren't uncaused in virtue of their being larger or faster than a sedan." These arguments are technically true, but they seem hopelessly to miss the point. He's clearly working in the so-called "classical" regime. Are virtual particles caused? In a sense yes and no. Is radioactive decay caused or uncaused? In a sense yes and no. I strongly suspect cases like these are more what SW had in mind. It seems entirely reasonable that our everyday physics is really only a sort of special case or regime of a much more general and different physics, a physics where causation may take on a very different cast or be altogether inapplicable. CB would do better not to patronize either his audience or his interlocutor so grossly.

SW objects that CB has offered no examples of anything uncaused, and yet has confidently concluded that any such thing must be unlimited. He also argues that CB has inferred from all caused things having limits (i.e. since everything we observe is both caused and limited) that uncaused things don't have limits, which doesn't follow. CB responds "I don't see any reason at all to think that reasonable conclusions require concrete examples."

Ironically, CB then offers the example that he can know that a fair 200-sided die has an equal chance of 0.5% of landing on any face. In response to this, we can simply point out that this example is extremely poor. The probability of landing on any given face follows simply from the definition of what it means for an N-sided die to be fair, namely, that there is a 1/N chance of landing on each face. That is, we can conclude the probability is 0.5% purely analytically, from the definitions of the terms. Secondly, I believe there are, in fact, 200-sided dice. But even if there weren't, there are still dice, even dice with a pretty large number of sides. That is, there are things that are quite like (if not exactly like) 200-sided dice. However, we have no experience of anything like an uncaused cause or an unlimited being. Our intuition about dice derives, in part, from our experiences with other dice, dice we've used and understand pretty well. There is no such experience from which intuitions about uncaused or unlimited beings can derive. Thus, the dice example is terrible for substantiating CB's point.

The example notwithstanding, is CB's claim true? Do reasonable conclusions require examples? This is difficult to argue in general, but examples are, at the very least, extremely useful when examining statements like "All X are Y". This sort of statement is equivalent to "There is no X that is not also a Y," and to "All non-Y are not X." Justifying such a statement, then, often goes along the lines of supposing that something is an X and not a Y and deriving some sort of contradiction. This is why examples are so important: often if we have an example of something that is both an X and a Y to examine, the relationship between X-ness and Y-ness becomes more apparent and generalization becomes more accessible. Epistemically, examples are invaluable. Argumentatively, examples are extremely useful (as an example, CB's own example-giving demonstrates this) as means of analogy. It's obvious that if CB did have an example to back up his claim he would not hesitate to offer it, so his attempt to justify not offering one seems rather dishonest.

CB then offers that his formula "if uncaused then unlimited" is only a proposal, and invites his audience to think about it and offer alternatives. In the spirit of CB's openness, here is my own proposal: the difference between the caused and the uncaused is that the one is caused and the other is not, and nothing beyond this can be reduced.

CB states that his premise "if there is a First Cause, then it has no limits" has wide empirical support, namely, that in our experience, all limited things have causes. This runs into the problems mentioned both in my response and SW's. For example, we may instead conclude that all caused things have limits, from which the premise can't be logically derived. Or we can conclude that all things are both caused and limited, which would serve to refute the claim that there is a first cause. This is why examples would be so valuable to CB: an example of an uncaused and unlimited being would provide significant insight into how causes relate to limits. CB denies that his conclusion doesn't follow, as his argument is valid, but that isn't the allegation: SW's point is that the inference doesn't follow from the data, not that the conclusion doesn't follow from the premises.

Arguments in Favor of "Unlimited Implies Perfect"

CB points out that "a perfect being" is simply what he means by "god" so all that he needs to do is argue that (A) If there is a First Cause, then it is unlimited, and (B) if X is unlimited, then X is perfect. Satisfied with his defense of (A), he moves on to offer some arguments in defense of (B).

His first argument is summarized as "imperfection implies limitation". As this is logically equivalent to the claim to be proved, it's difficult for this argument to get off the ground. But beyond this, the objections I raised above would apply here. Unlimited could mean "unlimitedly small" namely zero. Thus, the First cause may have exactly zero knowledge, exactly zero love, and exactly zero hate (as these seem totally unnecessary for causing things (just ask electrons), this seems far more natural and justifiable than assuming it has infinite love and yet still zero hate). In fact, the more justifiable premise would be "limitation implies imperfection" rather than vice versa, though even this seems vexed.

His next argument "moves from some degree of value to unlimited value to god" (He never offers any sort of definition of what he means by "value"). CB asserts that "different things exemplify different degrees of value." He uses, as an example (I thought these were unnecessary?) his house and his daughter. The philosophical problems here are many. It assumes some sense of objective value which many people, particularly many atheists, would never permit. Presumably, CB has some sense of possession of his house and paid something for it, while the same can't be said of his daughter (hopefully). He seems to be flagrantly equivocating "value" among moral value, sentimental value, and monetary value. To make matters worse, he asserts that the First Cause "is the ground or source of everything else that does have value." This is extremely debatable, and at best totally unsubstantiated. Finally, CB argues that if the First Cause has some amount of value, as it is unlimited, it must have an unlimited degree of value. Given that most people would say the Big Bang singularity had no moral value and yet the sum of all humanity has quite a bit of moral value, value can grow over time, and it would not be beyond thinking that the cause of the universe has no value at all. Is not the value of the universe increased every time a child is born and does it not decrease somewhat every time someone dies? Perhaps you disagree but such a position seems not unsupportable. In sum, I find this argument totally dead on arrival.

His third and final argument is almost identical to the second, but replacing "power" for "value". He argues that the cause must at least have the power to cause what it causes. Proceeding from his argument from limits, he concludes that the First Cause must have unlimited power (i.e. omnipotent). "We have the power to know things," he argues, and uses this formulation to conclude that the First Cause is omniscient as well. In response, we can merely refer to the above, as none of this is really new, but we can make some remarks about "power" and "power to know" in particular.

At this point repeating, we can point out that "knowing nothing, having no ability to know anything" is a natural lack of limitation, in the same way that electrons can't know things. What even are the powers of something like an electron, or a rock, or anything inanimate? What does it even mean to have power? Abilities? Potentials? I have the potential to lie: does the First Cause have the unlimited ability to lie? I have the ability to feel sexual pleasure: does the First Cause have the ability to feel unlimited sexual pleasure? I have the ability to change, and the ability to die, and the ability to be ignorant or make mistakes. Does the First Cause have these also to an unlimited degree? I can already hear in my head all the theistic arguments in response, but then I can also envision my replies to them. We can merely conclude that this line of argumentation that CB offers is riddled with holes, to the point that it seems more hole than not.

Occam's Razor?

CB invokes Occam's Razor as a way to justify considering the First Cause to be singular and not a plurality. He gives the example of a used cereal bowl lying in his sink and compares the explanation that his daughter placed it there rather than "a billion tiny aliens". His response to the second alternative is to make a funny face and scoot offscreen.

Amateur theatrics notwithstanding is it, in fact, reasonable to conclude, using Occam's Razor, that the First Cause is singular rather than numerous? This seems to badly misunderstand the Razor itself. A better form of the razor is "the simplest explanation is likely correct." CB's form would have us conclude that desks are solid rather than made of billions of atoms, and yet we nevertheless think desks are, in fact, made of billions of atoms. Beyond this, the razor would have us shave off unnecessary properties of the supposed cause, like knowledge or goodness, which CB would clearly like to avoid. Going strictly by Occam's Razor, we would conclude that the First Cause has just the ability to cause what it causes and nothing more, unless any other properties can be shown to be necessary for this or entailed by this. Indeed, we would conclude that there is at least one such thing, but possibly more. But then this assumes the First Cause has already been established, which has yet to be done.

The Razor is something theists would do best to employ with some care, as it is so very often used as a way to exclude God from one's ontology. If there is nothing for God to do, nothing for him to explain or cause or ground etc., then Occam's Razor would suggest shaving him away entirely. To be candid, this is precisely my view.

Conclusion

This discussion by no means offers anything like a comprehensive or representative view of all apologetics. However, this example does serve as a characteristic example of the sorts of arguments one can expect to encounter in mainstream apologetics. Many other such arguments don't differ very greatly in spirit or thrust, though obviously some are better phased or are given at more length. CB is perhaps close to the average, though he seems to lack something of originality in the arguments he presents. This, however, is not necessarily a fault of his, as his abilities are concentrated elsewhere and certainly exceed mine in a number of other areas. He is quite well-read in the realm of contemporary apologetics, though it would likely behoove him to invest some time in the fundamentals (as it would for almost all of us).

The responses, likewise, can be considered to follow a certain pattern. You may note that the refutations of the arguments require considerably more effort to expound than the stated arguments and their defenses. If there's any broadly applicable truth that I've learned from my time engaging with such arguments, it is that they are formulated and offered so that their refutations require considerably more discussion than their presentation. I consider this one of their definitive qualities. In my view, this is their greatest strength and explains their pervasiveness. It makes them well-suited for debate formats where participants are given equal time. They are strong rhetorically and quite weak philosophically. (Don't believe me? Almost all philosophers, scientists, and highly-educated people don't believe in god. Can this be said of any other position for which there are strong philosophical arguments? Note: this is not an appeal to authority or popularity, but just an invitation to consider some relevant evidence. That many people believe something isn't why one should believe a thing, but that many and especially intelligent and educated people believe something ought to give one some good reason to consider it closely.)

This is not to say that they aren't worth examining and discussing, far from it. I think they serve as superb jumping-off points for almost all philosophical topics. A broad course or survey of philosophy could involve simply going into each of the various sorts of theistic arguments and developing an understanding of how they work, what ideas they root themselves in. Above, we have discussed such fundamental topics as the infinite, causation, and concepts of limitation, perfection, and explanation. Other arguments introduce us to such topics as metaethics, modality, ontology, design, and so on. Theists are a natural, valuable, and likely inevitable sub-species of the philosophical ecosystem. Opinions vary rather widely on their overall utility (from essential and ultimately correct, to worthless and harmful), but for my purposes, they offer a useful function. They may not like the description, but it is the same function that pseudoscience quacks offer. That is, they have a nose for the cracks in any intellectual edifice and take as firm root there as they can, thus calling to attention those areas that need to be strengthened and made less susceptible to such infestation.

I hope this has been interesting and instructive. I hope CB and SW don't mind my honest criticism, though it may have been at their expense at times. My main motivation for writing this was my disappointment both at the weakness of CB's arguments and SW's rebuttals, and my sincere hope is that both improve so as to offer us some better, more philosophically robust arguments in the future. There's no real way to do so without some amount of patronization, so I won't bother. It's all with the best of intentions if that counts for anything, the road to hell notwithstanding.

A Fairly Rigorous Derivation of Euler's Formula

2020-09-14T14:43:00.005-07:00

Exponential Functions

The general exponential function \(b^x\) for base \(b > 0 \) and real number \(x\) (*) is defined as the function that satisfies the conditions \[ b^x > 0 \\ b^x\cdot b^y=b^{x+y} \\ b^1=b \] It follows that: \[ \prod_{k=1}^{N}b^{x_k}=b^{\sum_{k=1}^{N}x_k} \\ b^0=1 \\ b^{-x}=1/b^x \\ (ab)^x=a^x b^x \\ b^{m/n}=\sqrt[n]{b^m}=\left ( \sqrt[n]{b} \right )^m \\ b^x=\underset{n \to \infty}{\lim}b^{\left \lfloor xn \right \rfloor/n} \]

(*) We will extend this definition to complex \(x\), for which, we will find, that \(b^x>0\) may not hold. Moreover, there is some ambiguity for non-integer \(x\), as, for example, \(4^{1/2}\) may be \(2\) or \(-2\).

Some Exponential Inequalities

Let \(b>0\). By a simple argument we find: \[ 0 \leq \left ( b^{(y-x)/2}-1 \right )^2 \\ b^{(y-x)/2} \leq \frac{b^{(y-x)}+1}{2} \\ b^xb^{(y-x)/2}\leq b^x\left (\frac{b^{y-x}+1}{2} \right ) \\ b^{(y+x)/2} \leq \tfrac{1}{2}b^y+\tfrac{1}{2}b^x \] Suppose that \(0 \leq \alpha,\beta \leq 1\) and that \[ b^\alpha\leq\alpha b + (1-\alpha) \\ b^\beta\leq\beta b + (1-\beta) \] Then \[ b^{(\alpha+\beta)/2} \leq \tfrac{1}{2}b^\alpha+\tfrac{1}{2}b^\beta \\ b^{(\alpha+\beta)/2} \leq \tfrac{1}{2}(\alpha b + (1-\alpha))+\tfrac{1}{2} (\beta b + (1-\beta)) \\ b^{(\alpha+\beta)/2} \leq \tfrac{\alpha+\beta}{2}b+(1-\tfrac{\alpha+\beta}{2}) \] As \(b^0=1\leq 0 \cdot b + (1-0)=1\), and \(b^1=b\leq 1 \cdot b + (1-1)=b\), it follows that, for all dyadic fractions of the form \(x=M/2^N\) for some whole numbers M and N with \(0 \leq M \leq 2^N\): \[ b^x \leq x b + (1-x) \] Moreover, as all real numbers \(0 \leq x \leq 1\) can be written as the limit \[ x=\underset{N \to \infty}{\lim} \frac{\left \lfloor x \cdot 2^N \right \rfloor}{2^N} \] It follows that \[ b^x \leq x b + (1-x) \] Holds for all real x in the interval \( [0,1]\) for all \(b>0\), with equality holding only at the extremes. It follows that \(2^x < 1+x\). Additionally, \((1/2)^x < 1-x/2\). We may then make the following argument: for \(0 < x < 1\) \[ x^2 > 0 \\ 1-x^2=(1+x)(1-x) < 1 \\ 1+x < \frac{1}{1-x} \\ (1/2)^x < 1-x/2 \\ 2^x > \frac{1}{1-x/2} \\ 2^x >{1+x/2} \\ 4^x >(1+x/2)^2 > 1+x \] Thus, we have \(2^x < 1+x < 4^x\).

Derivatives and Derivatives of Exponentials

The definition of a derivative of a function is: \[ \frac{\mathrm{d} }{\mathrm{d} x}f(x)=f'(x) \triangleq \underset{h \to 0}{\lim}\frac{f(x+h)-f(x)}{h} \] Thus, for an exponential, the derivative would be given by: \[ \frac{\mathrm{d} }{\mathrm{d} x}b^x\triangleq \underset{h \to 0}{\lim}\frac{b^{x+h}-b^x}{h}=b^x\underset{h \to 0}{\lim}\frac{b^{h}-1}{h}=b^x L(b) \] Where \(L(b)=\underset{h \to 0}{\lim}\frac{b^{h}-1}{h}\), provided this limit exists. This limit can be proven to exist as follows: for \(0 < q < 1 \), and \(0 < x\), for \(y = q x < x\) by the derived inequality \[ (b^x)^q = b^{qx} < (b^x -1) q +1 \\ \frac{b^{qx}-1}{qx} < \frac{(b^x -1)}{x} \\ \frac{b^{y}-1}{y} < \frac{(b^x -1)}{x} \] Thus, the limit is monotonically decreasing (from the right, increasing from the left). Moreover, the limit is bounded from below and above (for |x| < 1), as \[ 1-\tfrac{1}{b} < \frac{(b^x -1)}{x} < b-1 \] Thus, the limit exists, and so \(b^x\) is everywhere differentiable. As exponentials with \(b > 0\) are eveywhere differentiable and thus continuous, we may take the limit for \(h > 0\). From the above inequalities, we have \[ L(2)=\underset{h \to 0}{\lim}\frac{2^{h}-1}{h} < \underset{h \to 0}{\lim}\frac{1+h-1}{h}=1 \\ L(4)=\underset{h \to 0}{\lim}\frac{4^{h}-1}{h} > \underset{h \to 0}{\lim}\frac{1+h-1}{h}=1 \] As the limits are decreasing and both are bounded below ( \(L(2) > 1/2, \; L(4) > 1 \)), it follows that both limits converge. Thus \(L(2) < 1 < L(4) \). As L is clearly continous, by the intermedate value theorem, it follows that there is some real number \(2 < e < 4 \) such that \( L(e)=1 \). Let us define this number \(e\) to be that number that satisfies \[L(e)=\underset{h \to 0}{\lim}\frac{e^{h}-1}{h}=1\] This implies that \[ \frac{\mathrm{d} }{\mathrm{d} x}e^x=e^x \] This is a defining feature of the number \(e\). We may also notice that, for any real x, if \(h \to 0\) then \(xh \to 0\). Thus \[ \underset{h \to 0}{\lim}\frac{e^{xh}-1}{xh}=1 \\ \underset{h \to 0}{\lim}\frac{e^{xh}-1}{h}=x \] Thus \(L(e^x)=x\). This implies that, by definition, \(L(x)=\log_e (x)=\ln (x)\). Moreover, given the chain rule \(\frac{\mathrm{d} }{\mathrm{d} x}f(g(x))=f'(g(x))g'(x)\), we find \[ \frac{\mathrm{d} }{\mathrm{d} x} L(e^x)=L'(e^x)e^x=1 \] And thus \(L'(x)=1/x\). This is a very helpful result. For example, by rewriting and using the chain rule, we find: \[ \frac{\mathrm{d} }{\mathrm{d} x} x^a=\frac{\mathrm{d} }{\mathrm{d} x} e^{aL(x)}=e^{aL(x)} \frac{a}{x}=a x^{a-1} \] A result that is otherwise difficult to establish in the general case. We may write the limit derived above in an equivalent way as \[ \underset{n \to \infty}{\lim}n \cdot (e^{x/n}-1)=x \] Which directly implies that \[ e^x=\underset{n \to \infty}{\lim} \left ( 1+\frac{x}{n} \right )^n \] Let us expand the above expression using the binomial theorem: \[ e^x=\underset{n \to \infty}{\lim} \left ( 1+\frac{x}{n} \right )^n \\ e^x= \underset{n \to \infty}{\lim}\sum_{k=0}^{n}\binom{n}{k}\left ( \frac{x}{n} \right )^k \\ e^x= \underset{n \to \infty}{\lim}1+\sum_{k=1}^{n}\frac{x^k}{k!}\prod_{j=1}^{k}\left ( 1-\frac{j-1}{n} \right ) \] Clearly, in the limit, all the factors in the products from 1 to k go to 1. Thus, we find: \[ e^x=1+\sum_{k=1}^{\infty}\frac{x^k}{k!} \] It can be checked that this series converges for all real x by the ratio test. This is an extremely useful formula, and can be taken to be a more robust and easy-to-work-with definition for \(e^x=\exp(x)\). Note this formula directly implies that: \[ e=\sum_{k=0}^{n}\frac{1}{k!} \] Note that, as a verification, we can check that \(e^0=1=1+\sum_{k=1}^{n}\frac{0^k}{k!}\) and \[ \frac{\mathrm{d} }{\mathrm{d} x} e^x=e^x=1+\sum_{k=2}^{n}\frac{kx^{k-1}}{k!}=1+\sum_{k=2}^{n}\frac{x^{k-1}}{(k-1)!}=1+\sum_{k=1}^{n}\frac{x^{k}}{k!} \] Which verifies the differentiation formula.

Trigonometric Functions and Inequalities

Figure 1

The definitions of the basic trigonometric functions are given by Figure 1. The curve between points C and D is the set of points equidistant from A between the line segments \(AC\) and \(AD\), i.e. a circular arc. Let us call the length of this curve \(L\). Then the standad definition for the basic trigonometic functions is given by: \[ \theta=\frac{L}{\overline{AD}} \\ \\ \sin(\theta)\triangleq \frac{\overline{BD}}{\overline{AD}}, \; \;\; \cos(\theta)\triangleq\frac{\overline{AB}}{\overline{AD}}, \; \;\; \tan(\theta)\triangleq\frac{\overline{BD}}{\overline{AB}} \]

Figure 2

Using these, let us look at figure 2. This figure will serve to evaluate bounds on the trigonometric functions for small angles (\(0 < \theta < 1 \)) Let us denote the length of the curve \(BE\), which is a circular arc, by \(L\). It is clear that \[ \overline{BD} < L < \overline{BF} \] (An alternative way to demonstrate this is through areas, as triangle ABD is a strict subset of sector ABE which is a strict subset of triangle ABF.) Using the definitions above, and defining \(\theta=L/\overline{AB}\), we have: \[ \frac{\overline{BD}}{\overline{AB}}=\sin(\theta) < \frac{L}{\overline{AB}}=\theta < \frac{\overline{BF}}{{\overline{AB}}}=\tan(\theta)=\frac{\sin(\theta)}{\cos(\theta)} \] And so it follows that \[ \theta\cdot\cos(\theta) < \sin(\theta) <\theta \] It follows from the Pythagorean theorem that \[ \sin(\theta)^2+\cos(\theta)^2=1 \] From which we find: \[ \cos(\theta)^2 > 1-\theta^2 > (1-\theta^2)^2 \] The last inequality following from the fact that \(0 < \theta < 1 \). We thus find \[ 1-\theta^2 < \cos(\theta) < 1 \\ \theta-\theta^3 < \sin(\theta) <\theta \]

Figure 3

Let us now find the summation formulas for sine and cosine. These are easily found using the construction in figure 3. \[ RB=QA \;\;\;\;\;\;\;\;\;\; RQ=BA \] \[ \frac{RQ}{PQ}=\frac{QA}{OQ}=\sin(\alpha) \;\;\;\;\;\;\;\; \frac{PR}{PQ}=\frac{OA}{OQ}=\cos(\alpha) \] \[ \frac{PQ}{OP}=\sin(\beta) \;\;\;\;\;\;\;\; \frac{OQ}{OP}=\cos(\beta) \] \[ \frac{PB}{OP}=\sin(\alpha+\beta) \;\;\;\;\;\;\;\; \frac{OB}{OP}=\cos(\alpha+\beta) \] \[ PB=PR+RB=\frac{OA}{OQ}PQ+QA \] \[ \frac{PB}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OQ}\frac{OQ}{OP} \] \[ \sin(\alpha+\beta)=\cos(\alpha)\sin(\beta)+\sin(\alpha)\cos(\beta) \] \[ OB=OA-BA=\frac{OA}{OQ}OQ-\frac{BA}{PQ}PQ \] \[ \frac{OB}{OP}=\frac{OA}{OQ}\frac{OQ}{OP}-\frac{BA}{PQ}\frac{PQ}{OP} \] \[ \cos(\alpha+\beta)=\cos(\alpha)\cos(\beta)-\sin(\alpha)\sin(\beta) \]

Complex Numbers

Complex numbers can be defined and used in the usual way, namely, as algebraic objects with the symbol \(i\) having the property that \(i^2=-1\). Additionally, we can define the norm of a complex number as \(|a+bi|^2=a^2+b^2\). Some simple theorems we will make use of: \[ (a+bi)\cdot (c+di)=(ac-bd)+i(ad+bc) \\ |(a+bi)\cdot (c+di)|=|a+bi|\cdot|c+di| \\ \frac{1}{a+bi}=\frac{a-bi}{a^2+b^2} \] Let us define the function \[ \mathrm{cis}(x)=\cos(x)+i\sin(x) \] This function has the property that \[ \mathrm{cis}(\alpha)\cdot \mathrm{cis}(\beta)= \left (\cos(\alpha)+i\sin(\alpha) \right ) \cdot \left(\cos(\beta)+i\sin(\beta) \right ) \\ \mathrm{cis}(\alpha)\cdot \mathrm{cis}(\beta)= \left (\cos(\alpha)\cos(\beta)-\sin(\alpha)\sin(\beta) \right ) + i\left(\sin(\alpha)\cos(\beta)+\sin(\beta)\cos(\alpha) \right ) \\ \mathrm{cis}(\alpha)\cdot \mathrm{cis}(\beta)= \cos(\alpha+\beta) + i\sin(\alpha+\beta) \] And thus \(\mathrm{cis}(\alpha)\cdot \mathrm{cis}(\beta)=\mathrm{cis}(\alpha+\beta)\). It follows by induction and the definition of the exponential that, for any natural number \(n\): \[ (\mathrm{cis}(x))^n=\mathrm{cis}(nx) \] And, thus \[ \mathrm{cis}(x)=(\mathrm{cis}(x/n))^n \] Importantly, this is true in the limit of large \(n\). We can always pick \(n\) large enough to make \(x/n\) as small as needed. Thus, we can use the inequalities derived above, namely: \[ \mathrm{cis}\left ( \frac{x}{n} \right )=1+i\frac{x}{n}-\frac{x^2}{n^2}g(x) \] Let \(g(x)=g_r(x)+i g_i(x)\) where \(g_r, g_i\) are real. Then \(0 < g_r(x) < 1\) and \(0 < g_i(x) < \tfrac{x}{n} \). Clearly, then, for \(n > |x|\), \[ |g(x)|^2 < 1+\frac{x^2}{n^2} < 2 \] And so \(|g(x)| < 2\). Also important to note is that a generic complex number can be written as \[ z=a+bi=r \cdot\mathrm{cis}(\theta) \] Where \(r=|z|\) and \(\theta\) satisfies \(r \cos(\theta)=a, \;\;\; r \sin(\theta)=b\). From the above geometric argument, assuming \(a,b > 0\) we have \[ \sin(\theta)=\frac{b}{|z|} < \theta < \tan(\theta)=\frac{b}{a} \] From the fact that \((\mathrm{cis}(x))^n=\mathrm{cis}(nx)\), we find that \[ z^n=(a+bi)^n=r^n \cdot\mathrm{cis}(n\theta) \]

A Lemma for a Family of Limits

From the above we have, for \(0 < x < 1\): \[ 2^{x} < 1+x < 4^{x} \] Let \(x=B/n^2 \), for \(B>0\) and sufficiently large \(n\). Then \[ 2^{B/n^2} < 1+\frac{B}{n^2} < 4^{B/n^2} \\ 2^{B/n} < \left (1+\frac{B}{n^2} \right )^n < 4^{B/n} \] In the limit of large \(n\), \(B/n \to 0\). As \(2^0=4^0=1\), we have \[ \underset{n \to \infty}{\lim} \left (1+\frac{B}{n^2} \right )^n=1 \] A similar argument applies to the case that \(B < 0\). In fact, suppose B is complex, then: \[ \underset{n \to \infty}{\lim} \left (1+\frac{B}{n^2} \right )^n =\underset{n \to \infty}{\lim} \left |1+\frac{B}{n^2} \right |^n \mathrm{cis}\left ( n\theta \right ) \] Where \[ \frac{1+\frac{B}{n^2}}{\left | 1+\frac{B}{n^2} \right |}=\mathrm{cis}(\theta) \] For sufficiently large \(n\), the real part is always positive. It's clear that \(-|B|/n^2 \leq b \leq |B|/n^2\), and so \(-\frac{|B|}{n^2} \leq \theta \leq \frac{|B|}{n^2}\). It clearly follows that \(-\frac{|B|}{n} \leq n\theta \leq \frac{|B|}{n}\). Thus, i nthe limit of large n, \(n\theta \to 0\), and so \(\mathrm{cis}(n\theta)\to 1\). Therefore, for all complex \(B\): \[ \underset{n \to \infty}{\lim} \left (1+\frac{B}{n^2} \right )^n=1 \] Finally, let us note that \[ 1+\frac{A}{n}+\frac{B}{n^2}=\left ( 1+\frac{A}{n} \right )\frac{1+\frac{A}{n}+\frac{B}{n^2}}{1+\frac{A}{n}}= \left ( 1+\frac{A}{n} \right )\left ( 1+\frac{1}{n^2}\frac{B}{1+\frac{A}{n}} \right ) \] For sufficiently large \(n\), we have, then \[ \left |\frac{B}{1+\frac{A}{n}} \right | < 2|B| \] It follows from the above that \[ \underset{n \to \infty}{\lim}\left (1+\frac{A}{n}+\frac{B}{n^2} \right )^n=\underset{n \to \infty}{\lim}\left ( 1+\frac{A}{n} \right )^n \] Clearly this applies to any \(B(n)\) such that, there is some M such that, for \(n>M\), \(|B(n)| < K\) for some real \(K>0\).

Euler's Formula and Identity

We recall the following from a previous section: \[ \mathrm{cis}(x)=(\mathrm{cis}(x/n))^n \] And, for sufficiently large \(n\): \[ \mathrm{cis}\left ( \frac{x}{n} \right )=1+i\frac{x}{n}-\frac{x^2}{n^2}g(x) \] Where \(|g(x)| < 2\). Combining yields: \[ \mathrm{cis}(x)=\left ( 1+i\frac{x}{n}-\frac{x^2}{n^2}g(x) \right )^n \] Equality must hold in the limit of large \(n\), and so, using the above lemma, we have: \[ \mathrm{cis}(x)=\underset{n \to \infty}{\lim}\left ( 1+i\frac{x}{n} \right )^n \] Using the limit definition of \(e^x\), this yields, at last, Euler's celebrated formula: \[ e^{ix}=\cos(x)+i\sin(x) \] This has the special case, by the definition of \(\pi\) and the trigonometric functions: \[ e^{i\pi}+1=0 \] Using the power series expansion for the exponential function, and equating realand imaginary parts yields the two power series expansions: \[ \cos(x)=1+\sum_{k=1}^{\infty}\frac{(-x^2)^k}{(2k)!} \\ \sin(x)=\sum_{k=0}^{\infty}\frac{(-1)^k x^{2k+1}}{(2k+1)!} \]

Gaussians and Normal Distributions

2019-09-19T15:01:00.003-07:00

Definition

The standard Gaussian function is defined as \[ \bbox[5px,border:2px solid red] { g_{0,1}(x)=g(x)=\frac{1}{\sqrt{2\pi}}e^{-x^2/2} } \] This is by definition the probability density function of a standard normal random variable. It has zero mean and unit variance. A general Gaussian function (with mean \(\mu\) and variance \(\sigma^2\) is defined as \[ \bbox[5px,border:2px solid red] { g_{\mu,\sigma^2}(x)=\frac{1}{\sigma}g\left ( \frac{x-\mu}{\sigma} \right )=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\tfrac{1}{2\sigma^2}\left ( x-\mu \right )^2} } \] This is the probability density function of a general normal distribution. Note that if \(Z\) is a standard normal random variable, then \(X=\mu+\sigma Z\) will be a normal random variable with mean \(\mu\) and variance \(\sigma^2\).

Historical Timeline

1600	In astronomy, discrepancies in measurements demanded some form of resolution to settle on a single number. Different astronomers had different ways of inferring the true value from a set of measurements, some using medians, some means, but varying widely in their techniques for calculation. Notably, Tycho Brahe and Johannes Kepler use unclear methods to obtain representative values from multiple measurements.
1632	Galileo Galilei is concerned with the ambiguity in obtaining a single value from multiple measurements, and notes that there must be a true value, and that errors are symmetrically distributed around this true value, with smaller errors being more likely than larger ones.
1654	Fermat and Pascal develop the modern theory of probability. Pascal develops his triangle (the binomial coefficients), and the probabilities of the binomial distribution for success probability \(p=1/2\).
1710	James Bernoulli finds the binomial distribution for general success probability.
1712	The Dutch mathematician Willem ’s Gravesande uses Pascal's calculation of probability to investigate birthrate discrepancies of male and female children.
1722	Roger Cotes suggests the true value of a set of measurements be at the "center of mass" of the observed values (the modern definition of the mean). This is the same as the value with the minimum square deviation from the measurements.
1733	Abraham De Moivre found that \[\binom{N}{k}\approx \frac{2^{N+1}}{\sqrt{2\pi N}}e^{-\tfrac{2}{N}\left (k-\tfrac{N}{2} \right )^2}\] and used it in relation to the binomial distribution. This is very clearly the first instance of the Gaussian function in a probability context.
1756	Thomas Simpson proved that the expected error is bounded when the error is distributed by what would today be called a two-sided geometric distribution, and in the case where it has a rectangular or triangular distribution.
1774	Pierre-Simon, marquis de Laplace argued the distribution of errors should take on a distribution of the form \(p(x)=\frac{m}{2}e^{-m\|x\|}\).
1777	Daniel Bernoulli becomes concerned about the center-of-mass mean being universally accepted without justification. He favored a semicircular distribution of errors. He also proved that with fixed errors, the accumulated error tends toward a Gaussian-like distribution.
1801	Giuseppe Piazzi observes a celestial object proposed to be a planet (later identified as Ceres). It goes behind the sun and astronomers try to predict where it will re-emerge. Karl Friedrich Gauss guesses correctly while most others guess incorrectly.
1809	Gauss proposes the method of least squares, which leads him to conclude that the error curve must be a Gaussian function.
1810	Laplace proves the Central Limit Theorem, which concludes that the Gaussian is the limiting distribution of averages from almost any distribution.
1846	Adolphe Quetelet applies the Gaussian distribution to sociology and anthropometry, particularly to the distribution of chest sizes of Scottish soldiers.
1860	James Clark Maxwell shows that the Gaussian distribution applies to the velocities of gas particles.
1869	Sir Francis Galton applies Quetelet's work and the central limit theorem more broadly, to try to prove that intelligence is hereditary, and spends considerable time and energy investigating heredity and statistics. However, one of his primary motivations was to apply these to eugenics.
1873	C. S. Peirce, Galton, and Wilhelm Lexis refer to the Gaussian distribution as the "normal" distribution.
1900	Karl Pearson popularizes "normal distribution" as a shorter and uncredited name for the "Laplace-Gauss curve".
1947-50	P.G. Hoel and A.M. Mood publish popular textbooks that refer to the distribution with zero mean and unit variance as the "standard normal" distribution.

More information

Normalization

We wish to verify that the functions above are normalized, i.e. that they are proper probability density functions. To do this, let's evaluate the integral \[ I=\int_{-\infty}^{\infty}e^{-x^2}dx \] Recall the limit definition of the exponential: \(e^x=\underset{n \to \infty}{\lim}\left ( 1+\tfrac{x}{n} \right )^n\). Using this, the integral becomes the following limit: \[ I=\underset{n \to \infty}{\lim}\int_{-\sqrt{n}}^{\sqrt{n}}\left ( 1-\frac{x^2}{n} \right )^n dx =\underset{n \to \infty}{\lim}\sqrt{n}\int_{-1}^{1}\left ( 1-x^2 \right )^n dx \] Let's define \[ I_p=\int_{-1}^{1}\left ( 1-x^2 \right )^{p/2} dx \] Note: this implies that \(I=\underset{p \to \infty}{\lim} \sqrt{\frac{p}{2}} \cdot I_p\). Using integration by parts: \[ \begin{matrix} I_p & = & \left.\ x\left ( 1-x^2 \right )^{p/2}\right|_{x=-1}^{1}+p\int_{-1}^{1}x^2\left ( 1-x^2 \right )^{(p-2)/2}dx \\ \\ & = & p\left [\int_{-1}^{1}\left ( 1-x^2 \right )^{(p-2)/2}dx-\int_{-1}^{1}\left ( 1-x^2 \right )^{p/2}dx \right ] \end{matrix} \] It follows that \(I_p = p\left [ I_{p-2}-I_p \right ] \), and so \(I_p=\frac{p}{p+1}I_{p-2}\). As clearly \(I_0=2\), and \(I_{1}=\pi/2\), we have the two formulas, where m is some natural number: \[ I_{2m}=2\prod_{k=1}^{m}\frac{2k}{2k+1} \\ I_{2m+1}=\frac{\pi}{2(m+1)}\prod_{k=1}^{m}\frac{2k+1}{2k} \] Which reveals the relationship, valid for any integer p \[ I_{p} I_{p+1}=\frac{2\pi}{p+2} \] As for any p, \(I_{p+1} < I_p \) it follows that \[ \frac{2\pi}{p+2} < I_p^2 < \frac{2\pi}{p+1} \] From which it follows that \[\underset{p \to \infty}{\lim} \sqrt{\frac{p}{2}} \cdot I_p=I=\sqrt{\pi} \] Which is the desired and expected expression. Note that we also obtain the following limit as a byproduct: \[ \bbox[5px,border:2px solid red] { \underset{n \to \infty}{\lim}2\sqrt{n}\prod_{k=1}^{n}\frac{2k}{2k+1} =\underset{n \to \infty}{\lim} \frac{\left ( 2^n \cdot n! \right )^2}{(2n)!\cdot \sqrt{n}}= \sqrt{\pi} } \] which can be seen as equivalent to the Wallis product. This suffices to show that the Gaussian functions defined above are indeed normalized, as expected.

General Integral and Corollaries

Using the result above, let us evaluate \[ \int_{-\infty}^{\infty}e^{-ax^2+bx+c}dx \] This is easily done by completing the square: \(-ax^2+bx+c=-a\left ( x-\tfrac{b}{2a} \right )^2+\tfrac{b^2}{4a}+c\). This immediately gives \[ \int_{-\infty}^{\infty}e^{-ax^2+bx+c}dx=\int_{-\infty}^{\infty}e^{-a\left ( x-\tfrac{b}{2a} \right )^2+\tfrac{b^2}{4a}+c}dx=e^{\tfrac{b^2}{4a}+c}\int_{-\infty}^{\infty}e^{-au^2}du \] \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}e^{-ax^2+bx+c}dx=e^{\tfrac{b^2}{4a}+c}\sqrt{\pi/a} } \] Based on this, we can easily find: \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}g_{\mu,\sigma^2}(x)e^{t x}dx=e^{\mu t+\tfrac{1}{2}\sigma^2t^2} } \] This is the same as the moment generating function for a Gaussian distribution. Several results can be deduced from this. For instance, the Fourier transform of a Gaussian function: \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}g_{\mu,\sigma^2}(x)e^{-i\omega x}dx=e^{-\tfrac{1}{2}\sigma^2\omega^2-i\mu\omega} } \] The Fourier transform of a Gaussian is another Gaussian, with variance equal to one over the input variance. By taking the real and imaginary parts, we find that: \[ \int_{-\infty}^{\infty}g_{\mu,\sigma^2}(x)\cos(\omega x)dx=e^{-\tfrac{1}{2}\sigma^2\omega^2}\cos(\mu\omega) \\ \int_{-\infty}^{\infty}g_{\mu,\sigma^2}(x)\sin(\omega x)dx=e^{-\tfrac{1}{2}\sigma^2\omega^2}\sin(\mu\omega) \] Substituting a complex number for \(a\) gives us: \[ \int_{0}^{\infty}e^{-\alpha e^{-i\theta}x^2}dx=\frac{1}{2}\sqrt{\frac{\pi}{\alpha e^{-i\theta}}}=\frac{1}{2}\sqrt{\frac{\pi}{\alpha }}e^{i\theta/2} \\ \\ \int_{0}^{\infty}e^{-\alpha \cos(\theta)x^2}\cos\left (\alpha\sin(\theta)x^2\right ) dx= \frac{1}{2}\sqrt{\frac{\pi}{\alpha }}\cos(\theta/2) \\ \\ \int_{0}^{\infty}e^{-\alpha \cos(\theta)x^2}\sin\left (\alpha\sin(\theta)x^2\right ) dx= \frac{1}{2}\sqrt{\frac{\pi}{\alpha }}\sin(\theta/2) \] Particularly, when \(\theta=\pi/2\), we have \[ \bbox[5px,border:2px solid red] { \int_{0}^{\infty}\cos\left (\alpha x^2\right ) dx= \int_{0}^{\infty}\sin\left (\alpha x^2\right ) dx= \frac{1}{2}\sqrt{\frac{\pi}{2\alpha }} } \] Let us evaluate the integrals \[ I_n=\int_{0}^{\infty}x^ne^{-\tfrac{1}{2\sigma^2}x^2}dx \] By integrating by parts we find \[ I_n=\frac{1}{\sigma^2(n+1)}\int_{0}^{\infty}x^{n+2}e^{-\tfrac{1}{2\sigma^2}x^2}dx=\frac{I_{n+2}}{\sigma^2(n+1)} \] Which can be written as \(I_{n+2}=\sigma^2(n+1)I_n\). As \(I_0=\sigma\sqrt{\pi}/2\) and \(I_1=\sigma^2\), we find that \[ \bbox[5px,border:2px solid red] { \int_{0}^{\infty}x^ne^{-\tfrac{1}{2\sigma^2}x^2}dx=\sigma^{n+1}\cdot\left\{\begin{matrix} \sqrt{\tfrac{\pi}{2}}1\cdot3\cdot5\cdots (n-1) & \: \: \: n\: \: \mathrm{even}\\ 2\cdot4\cdot6\cdots (n-1) & \: \: \: n\: \: \mathrm{odd}\\ \end{matrix}\right. } \] Finally, let us examine the following parametrized integral \[ I(a,b)=\int_{0}^{\infty}e^{-a^2x^2-\tfrac{b^2}{x^2}}dx \] Let us differentiate with respect to \(b\) \[ \frac{\partial }{\partial b}I(a,b)=\int_{0}^{\infty}\frac{\partial }{\partial b}e^{-a^2x^2-\tfrac{b^2}{x^2}}dx=-2b\int_{0}^{\infty}\frac{1}{x^2}e^{-a^2x^2-\tfrac{b^2}{x^2}}dx \] We can then make the substitution \(y=\tfrac{b}{ax}\) to find \[\frac{\partial }{\partial b}I(a,b)=-2a\int_{0}^{\infty}e^{-a^2y^2-\tfrac{b^2}{y^2}}dy=-2aI(a,b)\] Given that \(I(a,0)=\tfrac{\sqrt{\pi}}{2a}\), it follows that \[ \bbox[5px,border:2px solid red] { \int_{0}^{\infty}e^{-a^2x^2-\tfrac{b^2}{x^2}}dx=\frac{\sqrt{\pi}}{2|a|}e^{-2|ab|} } \] Additionally, if we set \(b\to b\sqrt{i}\), we find \[ \int_{0}^{\infty}e^{-ax^2-\tfrac{bi}{x^2}}dx=\frac{1}{2}\sqrt{\frac{\pi}{a}}e^{-2\sqrt{abi}} =\frac{1}{2}\sqrt{\frac{\pi}{a}}e^{-\sqrt{2ab}}e^{-i\sqrt{2ab}} \\ \\ \int_{0}^{\infty}e^{-ax^2}\cos\left ( \frac{b}{x^2} \right )dx=\frac{1}{2}\sqrt{\frac{\pi}{a}}e^{-\sqrt{2ab}}\cos({\sqrt{2ab}}) \\ \\ \int_{0}^{\infty}e^{-ax^2}\sin\left ( \frac{b}{x^2} \right )dx=\frac{1}{2}\sqrt{\frac{\pi}{a}}e^{-\sqrt{2ab}}\sin({\sqrt{2ab}}) \] Using the last expression and taking the limit as \(a\) goes to zero, we find \[ \bbox[5px,border:2px solid red] { \int_{0}^{\infty}\sin\left ( \frac{b}{x^2} \right )dx=\sqrt{\frac{b\pi}{2}} } \] Similar results follow using complex substitutions in the above general cases.
A remarkably related integral that we can evaluate with these results is \[ I(a,b)=\int_{-\infty}^{\infty}\frac{\cos(ax)}{b^2+x^2}dx \] We use the elementary fact that \(t^{-1}=\int_{0}^{\infty}e^{-xt}dx\). Namely, we write: \[ I(a,b)=\int_{-\infty}^{\infty}\int_{0}^{\infty}\cos(ax)e^{-t(b^2+x^2)}dtdx \] Interchanging the order of integration and using the general formula we derived above, we find \[ I(a,b)=\int_{0}^{\infty}e^{-tb^2}\int_{-\infty}^{\infty}\cos(ax)e^{-tx^2}dxdt =\int_{0}^{\infty}e^{-tb^2}\sqrt{\frac{\pi}{t}}e^{-\frac{a^2}{4t}}dt \] Letting \(t=u^2\), we put it into the form above \[ I(a,b)=2\sqrt{\pi}\int_{0}^{\infty}e^{-u^2b^2}e^{-\frac{a^2}{4u^2}}du \] \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}\frac{\cos(ax)}{b^2+x^2}dx=\frac{\pi}{|b|}e^{-|ab|} } \] Particularly, if \(a=b=1\) \[ \bbox[5px,border:2px solid red] { \int_{-\infty}^{\infty}\frac{\cos(x)}{1+x^2}dx=\frac{\pi}{e} } \]

Maximum Likelihood

Suppose we wish to find a distribution with two parameters (\(\mu\) and \(\sigma^2\)) with the property that the maximum likelihood estimators for \(\mu\) and \(\sigma^2\) are the sample mean and variance, respectively. Additionally, we require that the distribution be symmetric about the mean. We will suppose that \[ f(x;\mu,\sigma^2)=\tfrac{1}{\sigma}g\left ( \tfrac{x-\mu}{\sigma} \right ) \] Where \(g(x)\) is a distribution with zero mean and unit variance, such that \(g(-x)=g(x)\). Let us also define \(\eta(x)=g'(x)/g(x)\), which satisfies \(\eta(-x)=-\eta(x)\). Then the log-likelihood for samples \(x_1,x_2,...,x_N\) is given by: \[ \ell(\mu,\sigma^2;\mathbf{x})=\sum_{k=1}^{N}\ln\left (f(x_k;\mu,\sigma^2) \right )=\sum_{k=1}^{N}\ln\left (\tfrac{1}{\sigma}g\left ( \tfrac{x_k-\mu}{\sigma} \right ) \right ) \] The maximization constraints for each parameter are: \[ \frac{\partial }{\partial \mu}\ell(\mu,\sigma^2;\mathbf{x})= \tfrac{-1}{\sigma}\sum_{k=1}^{N}\eta\left (\tfrac{x_k-\mu}{\sigma} \right ) =0 \\ \frac{\partial }{\partial \sigma^2}\ell(\mu,\sigma^2;\mathbf{x})= -\frac{N}{2\sigma^2}-\frac{1}{2\sigma^3}\sum_{k=1}^{N}(x_k-\mu) \eta\left ( \tfrac{x_k-\mu}{\sigma} \right )=0 \] By the supposition, these conditions hold when \[ \mu=\tfrac{1}{N}\sum_{k=1}^{N}x_k=\overline{\mathbf{x}} \\ \sigma^2=\tfrac{1}{N}\sum_{k=1}^{N}(x_k-\overline{\mathbf{x}})^2=\overline{\mathbf{x}^2}-\overline{\mathbf{x}}^2=\mathrm{var}(\mathbf{x}) \] Now, suppose we set \(x_1=a\) and \(x_2=x_3=....=x_N=a-Nb\). Then \(\overline{\mathbf{x}}=a-(N-1)b\). We then have from the first condition: \[ 0=\sum_{k=1}^{N}\eta\left (\tfrac{x_k-(a-(N-1)b)}{\sigma} \right )=\eta\left (\tfrac{(N-1)b}{\sigma} \right )+(N-1)\eta\left (\tfrac{-b}{\sigma} \right ) \] So that \(\eta\left (\tfrac{(N-1)b}{\sigma} \right )=(N-1)\eta\left (\tfrac{b}{\sigma} \right )\). As \(\eta\) is continuous and must satisfy this for any choice of variables, it must be the case that \(\eta(x)=\frac{g'(x)}{g(x)}=Bx\). This differential equation can be easily solved to give \(g(x)=g(0) \cdot e^{\tfrac{B}{2}x^2}\). Requiring that this be a normalized function puts it in the form \(g(x)=\frac{b}{\sqrt{\pi}} \cdot e^{-b^2x^2}\) for some \(b>0\), which makes \(\eta(x)=-2b^2x\). Using the fact that \(g(x)\) has unit variance gives \(b=\tfrac{1}{\sqrt{2}}\). Now we turn to the second constraint \[ 0=-\frac{N}{2\sigma^2}+\frac{1}{2\sigma^4}\sum_{k=1}^{N}(x_k-\mu)^2 \] Which easily gives: \[ \sigma^2=\frac{1}{N}\sum_{k=1}^{N}(x_k-\overline{\mathbf{x}})^2=\mathrm{var}(\mathbf{x}) \] Thus the normal distribution has the required properties. Moreover, the sample mean and variance are the maximum likelihood estimators for the normal distribution. This is the original method Carl Friedrich Gauss used to derive the distribution.

Convolution

Recall that the convolution of two functions is defined as \[ a(x)*b(x)=\int_{-\infty}^{\infty}a(t)b(x-t)dt \] For the case of two general Gaussians, we use the general integral above: \[ \\ g_{\mu_1,\sigma_1}*g_{\mu_2,\sigma_2} =\frac{1}{\sqrt{2\pi\sigma_1^2}}\frac{1}{\sqrt{2\pi\sigma_2^2}} \int_{-\infty}^{\infty}e^{-\tfrac{1}{2\sigma_1^2}\left ( t-\mu_1 \right )^2}e^{-\tfrac{1}{2\sigma_2^2}\left ( x-t-\mu_2 \right )^2}dt \\ \\ a=\tfrac{1}{2\sigma_1^2}+\tfrac{1}{2\sigma_2^2},\: \: b=\mu_1\tfrac{1}{\sigma_1^2}+(x-\mu_2)\tfrac{1}{\sigma_2^2},\: \: c=\tfrac{\mu_1^2}{2\sigma_1^2}+\tfrac{(x-\mu_2)^2}{2\sigma_2^2} \\ \\ g_{\mu_1,\sigma_1^2}*g_{\mu_2,\sigma_2^2}=\frac{1}{\sqrt{2\pi\sigma_1^2}}\frac{1}{\sqrt{2\pi\sigma_2^2}}e^{\tfrac{b^2}{4a}+c}\sqrt{\pi/a} \\ \\ g_{\mu_1,\sigma_1^2}*g_{\mu_2,\sigma_2^2}=\frac{1}{\sqrt{2\pi(\sigma_1^2+\sigma_2^2)}}e^{-\tfrac{1}{2(\sigma_1^2+\sigma_2^2)}\left ( x-(\mu_1+\mu_2) \right )^2} \] Thus: \[ \bbox[5px,border:2px solid red] { g_{\mu_1,\sigma_1^2}*g_{\mu_2,\sigma_2^2}=\frac{1}{\sqrt{2\pi(\sigma_1^2+\sigma_2^2)}}e^{-\tfrac{1}{2(\sigma_1^2+\sigma_2^2)}\left ( x-(\mu_1+\mu_2) \right )^2}=g_{\mu_1+\mu_2,\sigma_1^2+\sigma_2^2} } \] That is, the convolution of two Gaussians is another Gaussian with mean and variance equal to the sum of the convolved means and variances. This could be much more easily seen from the convolution property of the Fourier transform.

Entropy

The entropy of a probability distribution \(f(x)\) is defined as \[ \bbox[5px,border:2px solid red] { H=-\int_{-\infty}^{\infty}f(x)\ln\left ( f(x) \right )dx } \] We wish to find a function that maximizes this value subject to the constraints that it is normalized, that it has mean \(\mu\) and variance \(\sigma^2\). This can be done using the method of Lagrange multipliers. Namely, we define the function \[ \begin{align*} L(f,a,b,c)=& -\int_{-\infty}^{\infty}f(x)\ln(f(x))dx+a\left [ 1-\int_{-\infty}^{\infty}f(x)dx \right ]\\ & +b \left[\mu-\int_{-\infty}^{\infty}xf(x)dx \right ]+c\left[ \sigma^2-\int_{-\infty}^{\infty}(x-\mu)^2f(x)dx \right ] \end{align*} \\ \\ L(f,a,b,c)=a+b\mu+c\sigma^2-\int_{-\infty}^{\infty}f(x)\left [\ln(f(x))+a+bx+c(x-\mu)^2 \right ]dx \] Using the Euler-Lagrange Equation, we find that \(f(x)\) must satisfy \[ \frac{\partial }{\partial f}f(x)\left [\ln(f(x))+a+bx+c(x-\mu)^2 \right ]=0 \\ \ln(f(x))+a+bx+c(x-\mu)^2+1=0 \\ f(x)=e^{-1-a-bx-c(x-\mu)^2} \] Combining this with the conditions from the other Lagrange factors, we immediately find: \[ f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\tfrac{1}{2\sigma^2}(x-\mu)^2}=g_{\mu,\sigma^2}(x) \] That is, a normal distribution has the maximal entropy of all distributions with a given mean and variance. The entropy is then given by \[ \bbox[5px,border:2px solid red] { H=\frac{1}{2}\ln \left( 2\pi e\sigma^2 \right ) } \] This can be seen as a way of understanding the central limit theorem, described below, as wel las the stability under convolution: summing independent random variables increases the entropy, and already maximum-entropy distributions can only add their variances.

Central Limit Theorem

Suppose X is a random variable with a well-defined mean and variance. That is, \(\mathrm{E}(X)=\mu\) and \(\mathrm{E}((X-\mu)^2)=\sigma^2\) both exist. We want to find the distribution of \[ \bbox[5px,border:2px solid red] { Z_n=\frac{\overline{X_n}-\mu}{\sigma/\sqrt{n}} } \] Where \[ \overline{X_n}=\frac{1}{n}\sum_{k=1}^{n}X_k \] and the \(X_k\) are all independent variables distributed in the same way as \(X\). It is easy to show that if \(A\) is a random variable with distribution \(a(x)\) and \(B\) is a random variable with distribution \(b(x)\), independent to \(A\). Then the distribution of \(A+B\) is \(a(x)*b(x)\). Given this, it is useful to use the Fourier transforms of the distributions, or something similar to it. In fact, it is most advantageous to use the moment generating function, where the moment generating function of \(X\) is defined as \(M_X(t)=\mathrm{E}(e^{Xt})\). This has the property that \(M_{A+B}(t)=M_{A}(t)M_{B}(t)\). Another useful property is \[ M_{aX+b}(t)=\mathrm{E}(e^{t(aX+b)})=e^{tb}\mathrm{E}(e^{atX})=e^{tb}M_X(at) \] Using this, let us find the moment generating function of \(Z_n\). \[ M_{Z_n}(t)=\mathrm{E}\left ( e^{tZ_n} \right ) =\mathrm{E}\left ( e^{t\frac{\overline{X_n}-\mu}{\sigma/\sqrt{n}}} \right ) \\ \\ M_{Z_n}(t)=e^{-t\sqrt{n}\frac{\mu}{\sigma}}\mathrm{E}\left ( e^{\frac{t}{\sigma\sqrt{n}}\sum_{k=1}^{n}X_k} \right )=e^{-t\sqrt{n}\frac{\mu}{\sigma}}M_X^n\left (\frac{t}{\sigma\sqrt{n}} \right ) \] Based on the definition of the moment generating function, we find \[ M_X(t)=\mathrm{E}\left ( e^{tX} \right )=\mathrm{E}\left ( 1+\frac{t}{1!}X+\frac{t^2}{2!}X^2+\frac{t^3}{3!}X^3+... \right ) \\ M_X(t)=1+t\mathrm{E}(X)+\frac{t^2}{2!}\mathrm{E}(X^2)+\frac{t^3}{3!}\mathrm{E}(X^3)+... \] Using this in our expression above, we get \[ M_{Z_n}(t)=e^{-t\sqrt{n}\frac{\mu}{\sigma}}\left ( 1+\frac{t}{\sigma\sqrt{n}}\mu+\frac{t^2}{2n\sigma^2}\left ( \mu^2+\sigma^2 \right )+O\left ( \sqrt{n^3} \right ) \right )^n \] Taking the limit as \(n\) goes to infinity, we get \[ \bbox[5px,border:2px solid red] { \underset{n \to \infty}{\lim} M_{Z_n}(t)=e^{t^2/2} } \] That is, the scaled mean is asymptotically a standard normal. Generally, the mean of \(n\) i.i.d. random variables with mean \(\mu\) and variance \(\sigma^2\) approaches a normal distribution with mean \(\mu\) and variance \(\sigma^2/n\).

Gaussian Limits of Other Distributions

For several families of probability distributions, both discrete and continuous, with both finite and infinite support, in the limit of certain parameters, and scaling for the mean and variance, the distribution converges to a Gaussian.
The general approach will be as follows: suppose the random variable is \(x\) and the distribution \(f(x;\mathbf{p})\) is parametrized by some list of parameters \(\mathbf{p}\). We will turn these parameters into a functions of \(n\) (which we will let tend to infinity), \(\mathbf{p}(n)\) and find the mean \(\mu(n)\) and variance \(\sigma^2(n)\) as functions of \(n\). We then rescale to obtain the new distribution \[ g(z;n)=\sigma(n)\cdot f(\mu(n)+z\cdot\sigma(n);\mathbf{p}(n)) \] This distribution will have zero mean and unit variance. Clearly if \(f\) is normalized, \(g\) will be as well. All we will be looking for is how \(g\) varies with \(z\). To this end, we will take \[ \underset{n \to \infty}{\lim}\frac{\partial }{\partial z} \ln \left (g(z;n) \right ) \] As we expect \(g\) to converge to a standard normal, we expect this limit to be \(-z\).
Another approach may be to use the moment generating function. In particular, we expect \[ \underset{n \to \infty}{\lim} e^{-\tfrac{\mu(n)}{\sigma(n)}t}M_{X;\mathbf{p}(n)}\left ( \frac{t}{\sigma(n)} \right )=e^{t^2/2} \]

Binomial Distribution

\[ f(x;n,p)=\binom{n}{x}p^x(1-p)^{n-x} \\ \mu(n)=np,\: \: \sigma(n)=\sqrt{np(1-p)} \\ M_{X;n}(t)=\left ( 1-p+pe^t \right )^n \] Using the distribution method is rather involved and a discussion of it can be seen in the article on Stirling's approximation. However, the moment generating function method is much simpler: \[ M_{Z;n}(t)=\left ( 1+p\left [e^{t/\sqrt{np(1-p)}}-1 \right ] \right )^ne^{-t\sqrt{\frac{np}{1-p}}} \\ M_{Z;n}(t)=\left ( 1+\frac{pt}{\sqrt{np(1-p)}}+\frac{t^2}{2n(1-p)}+O(n^{-3/2}) \right )^ne^{-t\sqrt{\frac{np}{1-p}}} \] Thus, in the limit of large n \[ \underset{n \to \infty}{\lim}M_{Z;n}(t)=e^{\frac{t^2}{2(1-p)}}e^{-\frac{pt^2}{2(1-p)}}=e^{t^2/2} \]

Poisson Distribution

\[ f(x;\lambda)=e^{-\lambda}\frac{\lambda^x}{x!} \\ \lambda(n)=n,\: \: \mu(n)=n,\: \: \sigma(n)=\sqrt{n} \\ M_{X;n}(t)=e^{n\left ( e^t-1 \right )} \] We will make use of the fact that \[ \ln(x!)=(x+\tfrac{1}{2})\ln(x)-x+O(1) \] Which follows from Stirling's approximation. It follows that \(\ln(g)\) and its derivative take the form \[ \ln(g(z;n))=(n+z\sqrt{n})\ln(n)-\left (n+z\sqrt{n}+\tfrac{1}{2} \right )\ln(n+z\sqrt{n})+z\sqrt{n}+O(1) \\ \frac{\partial }{\partial z}\ln(g(z;n))=\sqrt{n}\ln(n)-\sqrt{n}\ln\left (n+z\sqrt{n} \right )-\frac{1}{2\left (n+z\sqrt{n} \right )} \] And so it follows that \[ \underset{n \to \infty}{\lim}\frac{\partial }{\partial z}\ln(g(z;n))=-z \] This analysis is much more simple when approached with the moment generating function method \[ M_{Z;n}(t)=e^{n\left ( e^\frac{t}{\sqrt{n}}-1 \right )}e^{-t\frac{n}{\sqrt{n}}} =e^{\frac{t^2}{2}+O(n^{-1/2})} \] Clearly then, in the limit \[ \underset{n \to \infty}{\lim}M_{Z;n}(t)=e^{t^2/2} \]

Gamma Distribution

\[ f(x;n,\theta)=\frac{x^{n-1}}{(n-1)!\theta^n}e^{-x/\theta} \\ \mu(n)=n\theta,\: \: \sigma(n)=\sqrt{n}\theta \\ M_{X;n}(t)=\left ( 1-\theta t \right )^{-n} \] Proceeding as above \[ \ln(g(z;n))=(n-1)\ln(n\theta+z\sqrt{n}\theta)-(n\theta+z\sqrt{n}\theta)/ \theta \\ \frac{\partial }{\partial z}\ln(g(z;n))=\frac{n-1}{\sqrt{n}+z}-\sqrt{n} \\ \underset{n \to \infty}{\lim}\frac{\partial }{\partial z}\ln(g(z;n))=-z \] However, it is simpler to use \[ M_{Z;n}(t)=\left ( 1-\frac{t}{\sqrt{n}} \right )^{-n}e^{-t\sqrt{n}} \\ \underset{n \to \infty}{\lim}M_{Z;n}(t)=e^{t^2/2} \]

Beta Distribution

\[ f(x;a,b)=\frac{x^{a-1}(1-x)^{b-1}}{\mathrm{B}(a,b)} \\ a=\alpha n,\: \: b=\beta n,\: \: \mu(n)=\frac{a}{a+b}=\frac{\alpha}{\alpha+\beta}, \nu=\frac{\beta}{\alpha+\beta}, \\ \sigma(n)=\sqrt{\frac{ab}{(a+b)^2(a+b+1)}}=\frac{1}{\sqrt{n}}\sqrt{\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+\tfrac{1}{n})}}\approx\frac{1}{\sqrt{n}}s \] Where \(s=\sqrt{\tfrac{\alpha\beta}{(\alpha+\beta)^3}}\). This approximation is not significant since we will be taking the limit of \(n\), and the approximation is quite good even for moderately sized \(n\). \[ \ln(g(z;n))=(\alpha n-1)\ln\left ( \mu+\tfrac{s}{\sqrt{n}}z \right )+(\beta n-1)\ln\left ( \nu-\tfrac{s}{\sqrt{n}}z \right )-\ln(\mathrm{B}(\alpha n,\beta n)) \\ \frac{\partial }{\partial z}\ln(g(z;n))=s\left [\frac{\alpha n-1}{\mu\sqrt{n}+sz}-\frac{\beta n-1}{\nu\sqrt{n}-sz} \right ] \\ \underset{n \to \infty}{\lim}\frac{\partial }{\partial z}\ln(g(z;n))=-s^2z\frac{\alpha+\beta}{\mu\nu}=-z \]

Correlated Normal Variables

Often we encounter variables that are not independent. These can be normally distributed or closely modeled as normally distributed. Suppose \(X,Y,Z\) are independent standard normal random variables. Let us define: \[ V=aX+bY+c \\ W=dX+fZ+g \] As sums of Gaussian distributions, \(V\) and \(W\) will each be themselves Gaussians. Since \(X,Y,Z\) are independent with zero mean, it follows that \(\mathrm{E}(XY)=\mathrm{E}(XZ)=\mathrm{E}(YZ)=0\). We then find the means and variances of \(V\) and \(W\): \[ \mu_V=\mathrm{E}(V)=c,\: \:\: \: \: \sigma_V^2=\mathrm{E}((V-\mu_V)^2)=a^2+b^2 \\ \mu_W=\mathrm{E}(W)=g,\: \:\: \: \: \sigma_W^2=\mathrm{E}((W-\mu_W)^2)=d^2+f^2 \] The covariance of \(V\) and \(W\) is given by \[\sigma_{VW}=\mathrm{E}((V-\mu_V)(W-\mu_W))=ad\] Thus the two variables are correlated, and we can achieve any desired correlation. We can achieve the same effect using only two variables: \[ \begin{bmatrix} V\\ W \end{bmatrix}=\begin{bmatrix} \sigma_V\sqrt{1-\rho^2} & \rho\sigma_V\\ 0 & \sigma_W \end{bmatrix} \begin{bmatrix} X\\ Y \end{bmatrix}+\begin{bmatrix} \mu_V\\ \mu_W \end{bmatrix} \] Where \(\rho\) is the correlation coefficient between \(V\) and \(W\). When defined this way, the joint probability density function is given by \[ f(v,w)=\frac{1}{2\pi\sigma_V\sigma_W\sqrt{1-\rho^2}}e^{-\tfrac{1}{2(1-\rho^2)}\left [ \tfrac{(v-\mu_V)^2}{\sigma_V^2}+\tfrac{(w-\mu_W)^2}{\sigma_W^2}-2\rho\tfrac{(v-\mu_V)(w-\mu_W)}{\sigma_V\sigma_W} \right ]} \] More generally, suppose that we have \(n\) correlated Gaussian variables represented in a column matrix: \(\mathbf{x}\). Let \(\boldsymbol{\mu}=\mathrm{E}(\mathbf{x})\).The covariance matrix is defined as \(\mathbf{\Sigma}=\mathrm{E}\left (\mathbf{(x-\boldsymbol{\mu})}\mathbf{(x-\boldsymbol{\mu})}^T \right )\), so that \(\mathbf{\Sigma}_{a,b}=\mathrm{E}\left ((x_a-\mu_a)(x_b-\mu_b) \right ) \). Then the probability density function is given by: \[ \bbox[5px,border:2px solid red] { f(\mathbf{x})=\frac{1}{\sqrt{(2\pi)^n\left | \mathbf{\Sigma} \right |}}e^{-\tfrac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})} } \] Let \(\mathbf{z}\) be a column matrix of \(n\) independent standard normal random variables. Let \(M\) be an \(n\) by \(n\) matrix such that \(MM=\mathbf{\Sigma}\). Then if we take \(\mathbf{x}=M\mathbf{z}+\boldsymbol{\mu}\), then \(\mathbf{x}\) will be distributed with the density function given just above.

Wiener Processes and Stochastic Differential Equations

Let \(z_k\) be independent standard normal random variables for each \(k\). We define a random variable \(W\) as a function of \(t \geq 0\) to be \[ \bbox[5px,border:2px solid red] { W(t)=\underset{n \to \infty}{\lim}\frac{1}{\sqrt{n}}\sum_{1 \leq k \leq nt} z_k } \] It can be seen from the proceeding sections that at each \(t\), \(W(t)\) is a normal random variable with zero mean and variance \(t\). Moreover, \(W(s+t)-W(s)\) can be seen to have the same distribution and is independent of \(W(s)\). This is the definition of a Wiener process, also known as Brownian motion. Note that the summed variables don't even need to be normal, but just zero-mean and unit-variance, and the central limit theorem will guarantee that the process still tend to the same distribution.
One way to understand this is using the notation \(dW=z_t\sqrt{dt}\) where \(z_t\) is understood as an independent standard normal random variable for each \(t\). That way \(dW\) is a normal random variable with variance \(dt\). Then \[W(t)=\int_{0}^{t}dW\] It is clear from the definition that, for \(c>0\): \(\frac{1}{\sqrt{c}}W(ct)\) is distributed in exactly the same way and has all the same properties as \(W(t)\) (It's another Wiener process). Thus \(W(t)\) exhibits self-similarity. It is also evident that, for the same process, the covariance of \(W(s)\) and \(W(t)\) is \(\min(s,t)\). Suppose that \(V(t)=W(f(t))\) where \(f(t)\) is a non-decreasing function. Then it follows that \(dV=\sqrt{f'(t)}dW\).
If we wish to simulate a Wiener process at the times \(t_k\). This can be done incrementally as: \[ W(t_{k+1})=W(t_k)+\sqrt{t_{k+1}-t_k}\cdot z_k \] Where the \(z_k\) are independent standard normal variables.
The Wiener process and its stochastic differential \(dW\) is fundamental to the study of stochastic differential equations. Stochastic differential equations have many applications in physics, finance, population growth, and econometrics. One common general form for these equations is \[ \bbox[5px,border:2px solid red] { dX=\mu(X,t)dt+\sigma(X,t)dW } \] In this equation, \(\mu(X,t)\) is the drift and \(\sigma(X,t)\) is the spread or diffusion. The density function for \(X\), \(f(x,t)\), satisfies the Fokker-Planck equation: \[ \frac{\partial }{\partial t}f(x,t)=-\frac{\partial }{\partial x}\left [ \mu(x,t)f(x,t) \right ]+\frac{1}{2}\frac{\partial^2 }{\partial x^2}\left [\sigma^2(x,t)f(x,t) \right ] \] Suppose we wish to find what happens to the new variable \(Y=g(X,t)\), where \(g(x,t\) is a smooth, multiply differentiable function. We can expand \(dY\) in a Taylor series: \[ dY=\frac{\partial g}{\partial t}dt+\frac{\partial g}{\partial x}dX+\frac{1}{2}\frac{\partial^2 g}{\partial x^2}dX^2+... \] Subsituting in the expansion for \(dX\), and taking advantage of the fact that \(dW^2=dt\) in the proper statistical sense: \[ \bbox[5px,border:2px solid red] { dY=\left [\frac{\partial g}{\partial t}+\mu\frac{\partial g}{\partial x}+\frac{\sigma^2}{2}\frac{\partial^2 g}{\partial x^2} \right ]dt+\sigma\frac{\partial g}{\partial x}dW } \] All the remaining terms are higher than first order and hence will not be significant. This is Ito's lemma and allows us to make changes of variable for various stochastic processes, which allows us to solve several types of stochastic differential equations.

General Integrated Brownian Motion

Let us define: \[ V(t)=\int_{0}^{t}f'(s)W(s)ds=\int_{0}^{t}(f(t)-f(s))dW \] A fact that follows easily from the definition of the Wiener process, known as Ito isometry, is: \[ \bbox[5px,border:2px solid red] { \mathrm{E}\left ( \left [\int_0^t F(s)dW \right ]^2 \right )=\mathrm{E}\left ( \int_0^t F(s)^2ds \right ) } \] Given this, it follows that (for \(a>0\)): \[ \mathrm{Var}\left ( V(t) \right)=\int_0^t \left ( f(t)-f(s) \right )^2ds \\ \mathrm{cov}\left ( V(t+a),V(t) \right)=\int_0^t \left ( f(t+a)-f(s) \right )\left ( f(t)-f(s) \right )ds \] For instance, when \(f(t)=t\), \(\mathrm{cov}\left ( V(t+a),V(t) \right)=t^2\tfrac{3a+2t}{6}\). Note that as the sum of zero-mean Gaussians, all of the random variables are Gaussian as well.
A way to simulate such a process at times \(t_k\) is to use the above to determine the covariance matrix \(\mathbf{\Sigma}\) of all the \(V(t_k)\). Then the process can be simulated by the method described in the section on correlated normal variables. Namely, we take a vector of iid standard normal variables \(\mathbf{z}\), determine a matrix \(M\) such that \(MM=\mathbf{\Sigma}\), then \(V(\mathbf{t})=M\mathbf{z}\). This technique applies to any correlated process.
Note also, from the note above, we easily derive the formula, for \(f(t)\) some function and constant \(A \geq 0\): \[ \bbox[5px,border:2px solid red] { \int_0^t A\cdot f(s)dW=A\cdot W\left ( \int_0^t f^2(s)ds \right ) } \]

Brownian Motion with Drift

Suppose that \[ \bbox[5px,border:2px solid red] { dX=\mu dt+\sigma dW } \] Where \(\mu\) and \(\sigma\) are constants. Let us define \(Y=g(X,t)\), where \(g(x,t)=\frac{x-\mu t}{\sigma}\). By Ito's lemma, the differential becomes: \[ dY=\left [\frac{-\mu}{\sigma}+\mu\frac{1}{\sigma}+0 \right ]dt+\sigma\frac{1}{\sigma}dW=dW \] Thus \(\bbox[5px,border:2px solid red] {X(t)=\sigma W(t)+\mu t+X_0}\). It also follows that X is a normal random variable with mean \(\mu t+X_0\) and variance \(\sigma^2 t\).

Geometric Brownian Motion

Suppose that \[ \bbox[5px,border:2px solid red] { dX=X\mu dt+X\sigma dW } \] Where \(\mu\) and \(\sigma\) are constants. This is a process where the percentage change \(\frac{dX}{X}\) follows a Brownian motion with drift. Let us define \(Y=\ln(X)\). By Ito's lemma, the differential becomes: \[ dY=\left [0+\mu X\frac{1}{X}-\frac{\sigma^2 X^2}{2}\frac{1}{X^2} \right ]dt+\sigma X\frac{1}{X}dW=\left ( \mu-\frac{\sigma^2}{2} \right )dt+\sigma dW \] But this has the same form as the case of Brownian motion with drift. Thus \(Y(t)=\sigma W(t)+\left (\mu-\frac{\sigma^2}{2} \right ) t+Y_0\), from which it follows that \[ \bbox[5px,border:2px solid red] { X(t)=X_0 \cdot e^{\left (\mu-\frac{\sigma^2}{2} \right ) t+\sigma W(t)} } \] Moreover, \(X\) will be log-normally distributed with \(\mu\)-parameter equal to \(\ln(X_0)+(\mu-\tfrac{\sigma^2}{2})t\) and \(\sigma^2\)-parameter equal to \(\sigma^2 t\).

Ornstein–Uhlenbeck process

Suppose that \[ \bbox[5px,border:2px solid red] { dX=-X\theta dt+\sigma dW } \] Where \(\theta\) and \(\sigma\) are constants. Let us define \(Y=g(X,t)\), where \(g(x,t)=x e^{\theta t}\). By Ito's lemma, the differential becomes: \[ dY=\left [\theta X e^{\theta t}-X\theta e^{\theta t}+0 \right ]dt+\sigma e^{\theta t}dW=\sigma e^{\theta t} dW \] It follows that \[ Y=Y_0+\frac{\sigma}{\sqrt{2\theta}} W(e^{2\theta t}-1) \] And so \[ \bbox[5px,border:2px solid red] { X=X_0e^{-\theta t}+\frac{\sigma}{\sqrt{2\theta}}e^{-\theta t} W(e^{2\theta t}-1) } \] Another common formulation has \[ \bbox[5px,border:2px solid red] { dX=(\xi-X)\theta dt+\sigma dW } \] Which has the solution \[ \bbox[5px,border:2px solid red] { X(t)=X_0e^{-\theta t}+(1-e^{-\theta t})\xi +\frac{\sigma}{\sqrt{2\theta}}e^{-\theta t} W(e^{2\theta t}-1) } \] This shows that \(X\) is at each time a Gaussian with mean \(X_0e^{-\theta t}+(1-e^{-\theta t})\xi\), and variance \(\tfrac{\sigma^2}{2\theta}(1-e^{-2\theta t})\). It follows that, asymptotically, the distribution tends toward a Gaussian with mean \(\xi\) and variance \(\tfrac{\sigma^2}{2\theta}\). Thus the process is mean-reverting. The covariance of \(X\) between times \(a\) and \(b\) is given by \[ \mathrm{cov}\left ( X(a),X(b) \right )=\frac{\sigma^2}{2\theta}\left ( e^{-\theta|a-b|}-e^{-\theta(a+b)} \right ) \]

Generating Gaussian Random Samples

Often we wish to simulate independent samples from a standard normal distribution. By the central limit theorem, one option is to simulate \(N\) uniform random variables, and take the normalized mean. However, this is quite inefficient. We descibe two methods below:

2D-Distribution Based

The typical method of evaluating the Gaussian integral is to square the integral and evaluate it as a two-dimensional integral by an advantageous change of variables. For example: \[ I=\int_{-\infty}^{\infty}e^{-x^2/2}dx \\ I^2=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-(x^2+y^2)/2}dxdy =\int_{0}^{2\pi}\int_{0}^{\infty}e^{-r^2/2}rdrd\theta \\ I^2=\left.\begin{matrix} \theta \end{matrix}\right|_0^{2\pi}\cdot \left [ -e^{-r^2/2} \right ]_0^\infty =2\pi \] This implies that \(\theta\) is uniformly distributed between \(0\) and \(2\pi\), and \(r\) is distributed with a CDF given by \(F(r)=1-e^{-r^2/2}\). Thus, if \[ \begin{align} x &=r\cos(\theta), &\: \: y &= r\sin(\theta)\nonumber\\ \theta &= 2\pi U_1, &\: \: r &= \sqrt{-2\ln(U_2)}\nonumber\\ \nonumber \end{align} \] Where \(U_1\) and \(U_2\) are uniform random variables on \((0,1]\), then \(x\) and \(y\) are independent standard normal variables. This is called the Box-Muller Method.
An equivalent method, Marsaglia's Polar Method, samples \((u,v)\) uniformly over the unit disk, then returns \[ x=u\sqrt{\frac{-2\ln(u^2+v^2)}{u^2+v^2}} \\ y=v\sqrt{\frac{-2\ln(u^2+v^2)}{u^2+v^2}} \] as two independent standard Gaussian variables.

Distribution-Geometry Based

This method, known as the Ziggurat Method is quite general: it can be applied to any random variable with a monotone decreasging density function, or one with a a symmetric one that is monotone decreasing around a central mode. The main principle we use is that the x-coordinate of a point uniformly selected under a given probability density function is distributed with that probability density. The Ziggurat algorithm cuts the distribution into N layers, stacked in the way a ziggurat appears. The less area there is in the tail and in the ziggurat but outside the curve, the more efficient the algorithm will be, as fewer resamplings will be needed.

For the specific case of a normal distribution, the algorithm proceeds as follows:

Pre-compute layer limits and probabilities . We select N y-limits such that \(0 = y_0 < y_1 < y_2 < ... < y_{N-1} < y_N=1\). We then get the probabilities \[ p_k={\frac{2}{\sqrt{\pi}}}\int_{y_{k}}^{y_{k+1}}\sqrt{-\ln(x)}dx \] We also define \(x_k=-2\ln(y_k)\). Once we have the limits and probabilities, they can be stored and used for all further calculations. Note that the first layer includes the tail (\(x_0=\infty\)), which will have to be dealt with differently than the other layers.
Randomly select a layer. From the N layers, randomly pick one with probability corresponding to the layer probability computed above: save the selected index \(k\) (layer \(k\) spans from \(y_k\) to \(y_{k+1}\)).
Sample layer. If \(k>0\), pick a uniform random value \(u\) between 0 and 1. Then \(x=u \cdot x_{k}\). If \(k=0\), use the tail algorithm.
Test sampled value. If \(x < x_{k+1}\), return \(y\). Otherwise, pick a uniform random value \(v\) between 0 and 1. define \(y=y_k+v \cdot (y_{k+1}-y_k)\). If \(y < e^{-x^2/2}\), return x. Otherwise, resample the layer (return to step 2).
Tail algorithm. If \(k=0\), pick uniform random values \(u_1,u_2,u_3,u_4\) between 0 and 1. If \(u_1 < x_1\cdot y_1\), return \(x=x_1 \cdot u_2\). Otherwise, let \(\xi=-\ln(u_3)/x_1\) and \(y=-\ln(u_4)\). If \(2y > \xi^2\) return \(x=x_1+\xi\). Otherwise, resample the layer (return to step 2).
Sign of value. Pick a random bit \(b\) which is equally likely to be 0 or 1. Set \(x \to (-1)^b x\).

Gaussian Kernel Smoothing and Density Estimation

Gaussian functions serve as very convenient and useful kernels for a number of applications. A kernel is a function that depends on a location and a value at that location and spreads that value to nearby locations. This is useful for smoothing, interpolating, and making discrete sets continuous. The Gaussian is an attractive kernel because it is normalized, it has natural location (\(\mu\)) and width (\(\sigma\)) parameters, tails off quickly, and is not abitrarily constrained to a finite window. Applying the kernel generally operates like a convolution: for discrete data \(\left \{ (x_1,y_1),(x_2,y_2),...,(x_N,y_N) \right \}\) we define a function as \[ F(x)=\nu(x)\cdot\sum_{k=1}^{N}y_k\cdot g_{0,\sigma^2}(x-x_k) \] Where \(\nu(x)\) is a normalizing or scaling function. For continuous data, \(f(x)\) defined for all \(x\), then the application of the kernel gives: \[ F(x)=\nu(x)\cdot [f*g_{0,\sigma^2}(x)] \] The parameter \(\sigma\) is the only one left unspecified, and allows us to have control over the degree of smoothing. In higher dimensions, it may be asymmetric, including off-diagonal terms. It is possible to vary it for different \(x_k\).

Gaussian Blur

Given an image, often it is desirable to remove noise, soften sharp edges, or remove detail. This is useful, for example, to make backgrounds draw less attention, or, in edge-detection, to reduce the number of detected features. Several processes tend to produce noise or sharp edges and so it is advantageous to follow them with a process to reduce such unwanted features. By using a Gaussian kernel, we can achieve this desired type of smoothing, with the degree of smoothing controlled by \(\sigma\) (in units of pixels). One major advantage of the Gaussian kernel, particularly for image processing, is that, for diagonal covariance matrices, it can be decomposed into sequential one-dimensional convolutions, and hence is generally much more efficient than other kernels. Note that, for color images, the process is done one each of the different RGB channels.
For practical purposes, the Gaussian kernels used are rarely infinite in extent, and need not be continuous. Rather, a \((2n+1)\times(2n+1)\) pixel kernel is used, where \(n\geq 3\sigma\), which is just as effective, nearly indistinguishable, and far more efficient. This kernel, for \(0\leq i,j\leq 2n\), can be given by: \[ K(i,j)=\frac{1}{\kappa^2}e^{-\tfrac{1}{2\sigma^2}\left [ \left (i-n \right )^2+\left (j-n \right )^2 \right ]} \] Where \[ \kappa=\sum_{j=-n}^{n}e^{-\tfrac{j^2}{2\sigma^2}} \]

Gaussian Kernel Smoothing

Given a discrete set of x-values and their corresponding y-values, we define a function as: \[ F(x)=\frac{\sum_{k=1}^{N}y_k \cdot g_{0,\sigma^2}(x-x_k)}{\sum_{k=1}^{N}g_{0,\sigma^2}(x-x_k)} \] This function has the property that points near \(x_k\) will have values close to \(y_k\). In fact, in the limit as \(\sigma^2\) goes to zero, the function converges to the nearest-neighbor or Voronoi map. By adjusting \(\sigma^2\), we can adjust how quickly the function transitions between values. This is most easily seen from a one-dimensional example: suppose we have the two data points \(\{(x_1,y_1),(x_2,y_2)\}\). Then the function above can be equivalently written as: \[ F(x)=\frac{y_1+y_2}{2}+\frac{y_2-y_1}{2}\tanh\left ( \left [ \frac{x_2-x_1}{2\sigma^2} \right ]\left ( x-\tfrac{x_1+x_2}{2} \right ) \right ) \] The rise span of this function is proportional to \(\frac{\sigma^2}{x_2-x_1}\). This behavior is somewhat opposite to what may be desired, namely, that more widely separated points have shorter rise spans. This can either be simply accepted as a feature of the method, or different \(\sigma\) can be used at different points, depending on the distance to their nearest-neighbors.
This type of smoothing has a great number of applications, as it can readily be used in multivariate or vector data (for both \(x\) and \(y\)). It even permits easy adaptation to non-euclidean spaces, e.g. over the surface of a sphere.
The method however has a few drawbacks. One is that care needs to be taken in cases where the numerator and denominator are very small. Another is that evaluating the function can be rather computationally intensive when needed to be evaluated at a large number of locations or when \(N\) is large. However, the method is quite powerful and admits for much flexibility to provide smooth interpolations and extrapolation for discrete data.

Kernel Density Estimation

Suppose we sample \(N\) times from an unknown probability distribution, and then wish to estimate the density function. If we had a guess as tothe form of the distribution, we could use date to estimate the underlying parameters. However, often the distribution is not of a known kind so this is not pragmatic. Instead we can estimate the density from the sample values \(x_1,x_2,...,x_N\) as \[ f(x)=\frac{1}{N}\sum_{k=1}^{N}g_{0,\sigma^2}\left ( x-x_k \right ) \] Note that this density function has mean \(\frac{1}{N}\sum_{k=1}^{N}x_k=\overline{x}\) and variance \(\sigma^2+\frac{1}{N}\sum_{k=1}^{N}(x_k-\overline{x})^2\).
This can be extended to multiple dimensions in a straightforward way, although in that case the variance may be replaced either by a matrix proportional to the sample covariance matrix, or by a diagonal matrix. For the one-dimensional case, a general rule of thumb for picking the variance is \(\sigma=\sqrt{\mathrm{var}\left ( \mathbf{x} \right )}\left ( \frac{4}{3n} \right )^{1/5}\). Another, rather aggressive approximation is \(\sigma=\tfrac{1}{2}\max\left ( \Delta x \right )\) where \(\Delta x\) is the difference between successive sorted sample values.

Normal Distributions in Non-Euclidean Spaces

The distributions we have so far looked at exist in a Euclidean space. That is, the space in which the random variables exist and interact is taken to be flat, having zero curvature everywhere. In general, for two vectors in an (N+1)-dimensional space, with constant curvature \(K\) (\(K=0\) means a flat or Euclidean space, \(K>0\) means a positively curved or elliptical space, and \(K<0\) means a negatively curved or hyperbolic space), we can define a sort of inner product, called a bilinear form: \[ \bbox[5px,border:2px solid red] { \mathbf{a}\cdot\mathbf{b}=a_0 b_0+K \sum_{j=1}^N a_jb_j } \] We will be looking exclusively as unit vectors, i.e. vectors that satisfy \(\mathbf{v}\cdot\mathbf{v}=|\mathbf{v}|^2=v_0^2+K \sum_{j=1}^N v_j^2=1\).
In general, the distance between the points corresponding to the unit vectors \(\mathbf{a}\) and \(\mathbf{b}\) is \[ \bbox[5px,border:2px solid red] { d(\mathbf{a},\mathbf{b})=\tfrac{1}{\sqrt{K}}\cos^{-1}\left ( \mathbf{a}\cdot\mathbf{b} \right )=\tfrac{1}{\sqrt{-K}}\cosh^{-1}\left ( \mathbf{a}\cdot\mathbf{b} \right ) } \] Where the second form is more easily applicable for \(K<0\). Note that Euclidean space must be approached as the limit as \(K \to 0\). Moreover, the zeroth dimension in Euclidean space is just a bookkeeping device: as only the zeroth dimension contributes to the magnitude of the vector, the other components are free to take on any magnitude. However, note that the Euclidean distance is indeed given as the limit of the general distance formula: \[ d(\mathbf{a},\mathbf{b})=\underset{K \to 0}{\lim} \frac{1}{\sqrt{K}}\cos^{-1}\left ( \sqrt{1-K\sum_{j=1}^{N}a_j^2} \sqrt{1-K\sum_{j=1}^{N}b_j^2}+K \sum_{j=1}^N a_jb_j \right ) \\ d(\mathbf{a},\mathbf{b})=\sqrt{\sum_{j=1}^N (a_j-b_j)^2} \] A more useful form for differential geometry is to impose the unit-vector condition directly, and use generalized polar coordinates use the parametrization \[ r=d(\mathbf{x},[1,\mathbf{0}]) \\ d\boldsymbol{\psi}^2=d\theta_1^2+\sin^2(\theta_1)d\theta_2^2+\sin^2(\theta_1)\sin^2(\theta_2)d\theta_3^2+... \] Where the \(\theta\)s are generalized angles. Then the distance metric (the differential distance between nearby points as a function of their coordinates) is given by \[ \bbox[5px,border:2px solid red] { ds^2=dr^2+\left [ \frac{\sin(r\sqrt{K})}{\sqrt{K}} \right ]^2 d\boldsymbol{\psi}^2 } \] For \(N=2\), if the meric is expressed as \(ds^2=A(v,w) dv^2+B(v,w)dw^2\), the Gaussian curvature \(G\) is given by \[ \bbox[5px,border:2px solid red] { G=\frac{-1}{2\sqrt{AB}}\left ( \frac{\partial }{\partial v}\left [ \frac{1}{\sqrt{AB}}\frac{\partial B}{\partial v} \right ]+\frac{\partial }{\partial w}\left [ \frac{1}{\sqrt{AB}}\frac{\partial A}{\partial w} \right ] \right ) } \] It is easy to verify that for our distance metric, we do indeed get \(G=K\).
Let us take the following probability distribution over unit vectors \(\mathbf{x}\): \[ \bbox[5px,border:2px solid red] { \bbox[5px,border:2px solid red] { f(\mathbf{x})=C\cdot \exp\left (\tfrac{1}{K \sigma^2}\left [\boldsymbol{\mu}\cdot\mathbf{x}-1 \right ] \right ) } } \] Where \(C\) is a normalization constant, and \(\boldsymbol{\mu}\) is a constant unit vector, and \(\sigma^2\) is some positive constant. This is the generalized Von Mises–Fisher distribution, which extends the normal distribution to non-Euclidean geometries. In the limit as \(K \to 0\): \[ f(\mathbf{x})=\frac{1}{(2\pi\sigma^2)^{N/2}} \exp\left (\frac{-1}{2 \sigma^2}\sum_{j=1}^{N}(x_j-\mu_j)^2 \right ) \] Which is the expected Euclidean distribution. The more general multivariate distribution could be recovered by modifying the definition of the bilinear form. In one dimension, the distributions take the form \[ \bbox[5px,border:2px solid red] { f(r)= \left\{\begin{matrix} \frac{\sqrt{K}}{2\pi e^{-\tfrac{1}{K\sigma^2}} I_0\left (\tfrac{1}{K\sigma^2} \right )} \exp\left ( \frac{\cos(r\sqrt{K})-1}{K\sigma^2} \right ) & \: \: \: \: \: \: \: \: K > 0\\ \\ \frac{1}{\sqrt{2\pi\sigma^2} }\exp\left ( -\frac{r^2}{2\sigma^2} \right ) & \: \: \: \: \: \: \: \: K = 0\\ \\ \frac{\sqrt{-K}}{2e^{-\tfrac{1}{K\sigma^2}} K_0 \left (\tfrac{1}{-K\sigma^2} \right )}\exp\left ( \frac{\cosh(r\sqrt{-K})-1}{K\sigma^2} \right ) & \: \: \: \: \: \: \: \: K < 0 \end{matrix}\right. } \] Where \(I_0(x)\) and \(K_0(x)\) are the zero-order modified bessel functions of the first and second kind. It is clear from this that \[ \underset{x\to \infty}{\lim}I_0(x)e^{-x}\sqrt{x}=\frac{1}{\sqrt{2\pi}} \\ \underset{x\to \infty}{\lim}K_0(x)e^{x}\sqrt{x}=\sqrt{\frac{\pi}{2}} \] Here we show two animations, one showing a centered, symmetric distribution with unit variance for different curvatures. Then we also show the same distribution, but off-center.

In two dimensions: Let \(\boldsymbol{\mu}=[1,0,0]\).
Let us find the probability \(\mathrm{P}(r=d(\mathbf{x},\boldsymbol{\mu}) < t)\). It follows from the above that \(\boldsymbol{\mu}\cdot\mathbf{x}=\cos\left (r\sqrt{K} \right )\). From the distance metric, it can easily be seen that a circle with a radius \(r\) will have circumference \(2\pi\tfrac{\sin(r\sqrt{K})}{\sqrt{K}}\). From this the probability can be seen to be given by: \[ \mathrm{P}(r < t)=2\pi C\int_{0}^{t}\frac{\sin(r\sqrt{K})}{\sqrt{K}} \exp \left (\frac{\cos(r\sqrt{K})-1}{K\sigma^2} \right )dr \\ \mathrm{P}(r < t)=2\pi C \sigma^2\left [ 1-\exp \left (\frac{\cos(t\sqrt{K})-1}{K\sigma^2} \right ) \right ] \] For \(t\) as large as possible, this must be one, and so the normalization constant is given by \[ C=\frac{1}{2\pi\sigma^2}\left\{\begin{matrix} \left ( 1-e^{-\tfrac{2}{K\sigma^2}} \right )^{-1} & K>0\\ 1 & K \leq 0 \end{matrix}\right. \] Which makes the probability density function \[ \bbox[5px,border:2px solid red] { f(\mathbf{x})=\frac{\exp\left (\tfrac{1}{K \sigma^2}\left [\boldsymbol{\mu}\cdot\mathbf{x}-1 \right ] \right )}{2\pi\sigma^2}\left\{\begin{matrix} \left ( 1-e^{-\tfrac{2}{K\sigma^2}} \right )^{-1} & K>0\\ 1 & K \leq 0 \end{matrix}\right. } \] The radial CDF is then given by \[ \mathrm{P}(r < t)=\left [ 1-\exp \left (\frac{\cos(t\sqrt{K})-1}{K\sigma^2} \right ) \right ]\left\{\begin{matrix} \left ( 1-e^{-\tfrac{2}{K\sigma^2}} \right )^{-1} & K>0\\ 1 & K \leq 0 \end{matrix}\right. \] If \(K=0\), this is just the usual 2-dimensional Gaussian distribution with equal variances and zero covariance.
If \(K>0\), this is an elliptical Von Mises-Fisher distribution.
If \(K<0\), this is a hyperbolic generalized Von Mises-Fisher distribution.
On the left we show a 2-D Generalized Von Mises-Fisher distribution with \(\mu=1\), \(\sigma^2=1\), for \(|K| \leq 3\).
For the special case of \(K=-1\), we can use the Poincare disk model: on the right we show the 2 dimensional generalized Von Mises-Fisher distribution with \(|\mu|\leq3\) and \(\sigma=0.65\).

A Theorem about Circles and a Volumizing Algorithm

2018-02-22T09:08:00.000-08:00

A Circle Theorem

Take a circle of radius \(R\). Select a point \(A\) inside it a distance \(a\) from the center, with \(a < R\). From \(A\), construct \(N>1\) line segments starting from A and touching the circle, segment k touching the circle at \(P_k\), such that if \(a-b\equiv \pm 1 \mod N\), then \(\measuredangle P_aAP_b=2\pi/N\), that is all the segments are equally-angularly-spaced. Let \(d_k=\overline{AP_k}\). Then \[ \prod_{k=1}^{N}d_k=\prod_{k=1}^{N}\left ( a\cos \left ( \theta_0+\frac{2 k \pi}{N} \right )+\sqrt{R^2-a^2+a^2\cos^2 \left ( \theta_0+\frac{2 k \pi}{N} \right )} \right ) \\ \prod_{k=1}^{N}d_k=\prod_{k=1}^{N}\left (\sqrt{R^2-a^2}\exp\left ( \sinh^{-1}\left (\frac{a}{\sqrt{R^2-a^2}}\sin \left ( \theta'_0+\frac{2 k \pi}{N} \right ) \right ) \right ) \right ) \] Therefore \[ \sqrt[N]{\prod_{k=1}^{N}d_k}=\sqrt{R^2-a^2}\exp \left (\frac{1}{N}\sum_{k=1}^{N} \sinh^{-1}\left (\frac{a}{\sqrt{R^2-a^2}}\sin \left ( \theta'_0+\frac{2 k \pi}{N} \right ) \right ) \right ) \] It follows that, for N even, as the summation will cancel in every term, \[ \sqrt[N]{\prod_{k=1}^{N}d_k}=\sqrt{R^2-a^2} \] This also holds asymptotically, as the error approaches zero. it is generally not true for N odd.

Note that this much more widely generalizes the well-known geometric mean theorem. This can be seen as a consequence of the power of a point theorem.

A pleasant interpretation of this is that if we take a diametric cross-section of a sphere and choose a point on that disk, the height of the sphere above that point is the geometric mean of the legs of any \(2N>1\) equiangular, planar, stellar net connecting that point to the boundary of the disk.

A Related Volumizing Algorithm

This theorem suggests an algorithm for producing a 3D volume given a closed 2D boundary shape. If we assume the 2D shape is of a diametric cross-section, we simply apply the method detailed above to produce the height above that point. That is, for a given point inside the shape, we take an N-leg equiangular stellar net emanating from that point to the boundary of the shape. The height of the surface at that point is then the geometric mean of the N legs of that net.

This method ensures that circular shapes produce spherical surfaces. However, if N is low, for less regular boundary shapes, the resulting surface may be quite lumpy or sensitive to how the angles of each net are chosen. One solution, then, is simply to make N large enough. However, this may end up being computationally expensive.

In theory, it may be possible to find the asymptotic value: find all parts of the boundary shape visible from the given point, and find the integral of the log of the distance, sweeping over the angle. If the boundary is a polygon, this involves evaluating (or approximating) integrals of the form \[ \int\ln\left ( \sin(x) \right)dx \] Which have no general closed form in terms of elementary functions. However, we can evaluate certain cases. One easy example is that of an infinite corridor formed from two parallel lines. We find that the height profile is double that of a circular cylinder. It may be desirable, then, to determine another function to multiply by which will halve the heights of corridors but leave hemispheres undisturbed.

Below we give some visual examples of the results of the algorithm. The original 2D shapes are shown in red.

In order, an equilateral triangle, an icosagon, a five-pointed star, an almost-donut, an Escherian tesselating lizard, and a tesselating spider.

Rotating Fluid

2018-02-18T12:03:00.000-08:00

Suppose we have an infinitely tall cylinder of radius R, filled to a height H with an incompressible fluid. We then set the fluid rotating about the cylindrical axis at angular speed \(\omega\). Suppose we take a differential chunk of fluid on the surface, a radius r from the axis.

The resulting normal force will then be \(N=F_c+W\). This normal force, as the name suggests, will be normal to the fluid surface. It follows by simple geometry, that \[ \frac{dy}{dr}=\frac{F_c}{W}=\frac{r \omega^2}{g} \] From which it follows that the height of the surface at any radius will be given by \[ y=\frac{r^2 \omega^2}{2g}+C \] Let us define \[ \omega_0=2\sqrt{gH}/R \\ u=\omega/\omega_0 \] Given that the fluid is incompressible, we know that the total volume does not change. From this, we can determine that the height of the surface at any radius will be given by: \[ y(r)=2H\left ( ru/R \right )^2+\left\{\begin{matrix} H(1-u^2) \\ 2H(u-u^2) \end{matrix}\right. \, \, \, \, \, \, \, \begin{matrix} u \leq 1\\ u > 1 \end{matrix} \] The highest point on the liquid surface is then given by: \[ y_{\textrm{max}}=\left\{\begin{matrix} H(1+u^2)\\ 2Hu \end{matrix}\right. \, \, \, \, \, \, \, \begin{matrix} u \leq 1\\ u > 1 \end{matrix} \] If \(u > 1\), the center of the base of the cylinder is not covered by fluid. There is a minimum radius at which fluid can be found. This minimum radius is given by: \[ r_{\textrm{min}}=R\sqrt{1-\frac{1}{u}} \] If the fluid is of uniform density and of total mass M, then the moment of inertia of the rotating fluid is given by \[ I=\left\{\begin{matrix} \frac{MR^2}{2}\left ( 1+\frac{u^2}{3} \right )\\ MR^2\left ( 1-\frac{1}{3u} \right ) \end{matrix}\right. \, \, \, \, \, \, \, \begin{matrix} u \leq 1\\ u > 1 \end{matrix} \] Note for each of these piecewise functions, the functions and their first derivatives are continuous.

Bias in Statistical Judgment

2018-02-06T13:06:00.001-08:00

Bias in Performance Evaluation

Suppose you are an employer. You are looking to fill a position and you want the best person for the job. To do this, you take a pool of applicants, and for each one, you test them N times on some metric X. From these N tests, you will develop some idea of what each applicant's performance will look like, and based on that, you will hire the applicant or applicants with the best probable performance. However, you know that each applicant comes from one of two populations which you believe to have different statistical characteristics, and you know immediately which population each applicant comes from.

We will use the following model: We will assume that the population from which the applicants are taken is made up of two sub-populations A and B. These two sub-populations have different distributions of individual mean performance that are both Gaussian. That is, an individual drawn from sub-population A will have an expected performance that is normally distributed with mean \(\mu_A\) and variance \(\sigma_A^2\). Likewise, an individual drawn from sub-population B will have an expected performance that is normally distributed with mean \(\mu_B\) and variance \(\sigma_B^2\). Individual performances are then taken to be normally distributed with the individual mean and individual variance \(\sigma_i^2\).

Suppose we take a given applicant who we know comes from sub-population B. We sample her performance N times and get performances of \(\{x_1,x_2,x_3,...,x_N\}=\textbf{x}\). We form the following complete pdf for the (N+1) variables of the individual mean and the N performances: \[ f_{\mu_i,\textbf{x}|B}(\mu_i,x_1,x_2,...,x_N)=\frac{1}{\sqrt{2\pi}^{N+1}}\frac{1}{\sigma_B \sigma_i^N} \exp\left ({-\frac{(\mu_i-\mu_B)^2}{2\sigma_B^2}} \right ) \prod_{k=1}^N\exp\left ({-\frac{(x_k-\mu_i)^2}{2\sigma_i^2}} \right ) \] It follows that the distribution conditioned on the test results is proportional to: \[ f_{\mu_i|,\textbf{x},B}(\mu_i)\propto \exp\left ({-\frac{(\mu_i-\mu_B)^2}{2\sigma_B^2}} \right ) \prod_{k=1}^N\exp\left ({-\frac{(x_k-\mu_i)^2}{2\sigma_i^2}} \right ) \] By normalizing we find that this implies that the individual mean, given that it comes from sub-population B and given the N test results, is normally distributed with variance \[ \sigma_{\tilde{\mu_i}}^2=\left ( {\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} \right )^{-1} \] and mean \[ \tilde{\mu_i}=\frac{\frac{\mu_B}{\sigma_B^2}+\frac{1}{\sigma_i^2}\sum_{k=1}^{N}x_k}{\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} =\frac{\frac{\mu_B}{\sigma_B^2}+\frac{N}{\sigma_i^2}\bar{\textbf{x}}}{\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} \] We will assume that this mean and variance are used as estimators to predict performance. Note that, in the limit of large N, \(\sigma_{\tilde{\mu_i}}^2\rightarrow \sigma_i^2/N\) and \(\tilde{\mu_i}\rightarrow \bar{\textbf{x}}\rightarrow \mu_i\), as expected.

Suppose we assume sub-populations A and B have the same variance \(\sigma_{AB}^2\), but \(\mu_A>\mu_B\). then we can note the following few implications:

The belief about the sub-population the applicant comes from acts effectively as another performance sample of weight \(\sigma_i^2/\sigma_{AB}^2\).
If applicant 1 comes from sub-population A and applicant 2 comes from sub-population B, even if they perform identically in their samples, applicant 1 would nevertheless still be preferred.
The more samples are taken, the less the sub-population the applicant comes from matters.
The larger the difference in means between the sub-populations is assumed to be, the better the lesser-viewed applicant will need to perform in order to be selected over the better-viewed applicant.
Suppose we compare \(\tilde{\mu_i}\) to \(\bar{\textbf{x}}\). Our selection criteria will simply be if the performance predictor is above \(x_m\). We want to find the probability of being from a given sub-population given that the applicant was selected by each predictor. For the sub-population-indifferent predictor: \[ P(A|\bar{\textbf{x}}\geq x_m)=\frac{P(\bar{\textbf{x}}\geq x_m|A)P(A)}{P(\bar{\textbf{x}}\geq x_m|A)P(A)+P(\bar{\textbf{x}}\geq x_m|B)P(B)} \\ \\ P(A|\bar{\textbf{x}}\geq x_m)= \frac{P(A)Q\left (\frac{x_m-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} {P(A)Q\left (\frac{x_m-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right ) + P(B)Q\left (\frac{x_m-\mu_B}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} \] Where \[ Q(z)=\int_{z}^{\infty}\frac{e^{-s^2/2}}{\sqrt{2\pi}}ds\approx \frac{e^{-z^2/2}}{z\sqrt{2\pi}} \] For the sub-population-sensitive predictor, we first note that \[ \tilde{\mu_i} \geq x_m \Rightarrow \bar{\textbf{x}}\geq x_m+(x_m-\mu_A)\frac{\sigma_i^2}{N\sigma_A^2}=x_m' \] Which then implies \[ P(A|\tilde{\mu_i}\geq x_m)=\frac{P(\tilde{\mu_i}\geq x_m|A)P(A)}{P(\tilde{\mu_i}\geq x_m|A)P(A)+P(\tilde{\mu_i}\geq x_m|B)P(B)} \\ \\ P(A|\tilde{\mu_i}\geq x_m)= \frac{P(A)Q\left (\frac{x_m'-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} {P(A)Q\left (\frac{x_m'-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right ) + P(B)Q\left (\frac{x_m'-\mu_B}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} \] As \(x_m > \mu_A\) and thus \(x_m' > x_m\), it is easy to see that \(P(A) < P(A|\bar{\textbf{x}}\geq x_m) < P(A|\tilde{\mu_i}\geq x_m) \). Thus the sensitivity further biases the selection towards sub-population A. We can call \(\bar{\textbf{x}}\) the meritocratic predictor and \(\tilde{\mu_i}\) the semi-meritocratic predictor.

Some Sociological Implications

Though the above effects may, in theory, be small, their effects in practice may not be. Humans are not perfectly rational and are not perfect statistical computers. The above is meant to give motivation for taking seriously effects that are often much more pronounced. If there is a perceived difference in means, there is likely a tendency to exaggerate it, to think that the difference in means should be visible, and hence that the two distributions should be statistically separable. Likewise, population variances are often perceived as narrower than they really are, leading to further amplification of the biasing effect. Moreover, the parameter estimations are not based simply on objective observation of the sub-populations, but also if not mainly on subjective, sociological, psychological, and cultural factors. As high confidence in one's initial estimates makes one less likely to take more samples, the employer's judgment may rest heavily on subjective biases. Given this, if the employer's objective is simply to hire the best candidates, she should simply use the meritocratic predictor (or perhaps at least invest some time into getting accurate sub-population parameters).

However, it is worth noting some effects on the candidates themselves. As a rule, the candidates are not subjected to this bias just in this bid for employment alone, but rather serially and repeatedly, in bid after bid. This may have any of the following effects: driving applicants toward jobs where they will be more favored (or less dis-favored) by the bias; affecting the applicant's self-evaluations, making them think their personal mean is closer to the broadly perceived sub-population mean; normalizing the broadly perceived sub-population mean, with an implicit devaluation of deviation from it. Also, we can note the following well-known problem: personal means tend to increase in challenging jobs, meaning that the unfavorable bias will perpetually stand in the way of the development of the negatively biased candidate, which then only serves to further feed into the bias. Both advantages and disadvantages tend to widen, making this a subtle case of "the rich get richer and the poor get poorer".

The moral of all this can be summarized as: the semi-meritocratic predictor should be avoided if possible as it is very difficult to implement effectively and has a tendency to introduce a host of detrimental effects. Fortunately, the meritocratic predictor loses only a small amount by way of informative-ness, and avoids the drawbacks mentioned above. Care should then be taken to ensure that the meritocratic selection system is implemented as carefully as can be managed to preclude the introduction of biasing effects. one way of washing out the effects of biasing in general is simply to give the applicants many opportunities to demonstrate their abilities.

Some Newtonian Gravitational Mechanics

2017-08-01T14:36:00.001-07:00

Duration of a trajectory

Suppose we launch an object straight up. We wish to find how long it will take to return. Suppose we launch it up at a speed \(v_0\). It is well known that the classical escape velocity is given by \[ v_e=\sqrt{\frac{2MG}{R}} \] By examining the energy equation, we find that the speed when the object is a distance r from the center of the planet is given by: \[ v(r)=v_e\sqrt{\frac{R}{r}-\gamma} \] Where \[ \gamma=1-\frac{v_0^2}{v_e^2} \] To find the travel time, we integrate: \[ T=2\int_{R}^{R/\gamma}\frac{dr}{v(r)}=\frac{2}{v_e}\int_{R}^{R/\gamma}\frac{dr}{\sqrt{\frac{R}{r}-\gamma}}=\frac{2R}{v_e}\int_{\gamma}^{1}\frac{du}{u^2\sqrt{u-\gamma}} \] \[ T=\frac{2R}{v_e}\frac{\tan^{-1}\left ( \sqrt{\frac{1}{\gamma}-1} \right )+\sqrt{\gamma-\gamma^2}}{\gamma^{3/2}}=\frac{2R}{v_e}\frac{\sin^{-1}(u)+u\sqrt{1-u^2}}{(1-u^2)^{3/2}} \] Where \(u=v_0/v_e\).

Optimal Path through a Planet

We want to find the best path through a planet of radius R, connecting two points \(2\alpha\) radians apart (great circle angle). We assume the planet is of uniform density. As is well known, the acceleration due to gravity a radius r from the center of the planet is given by: \[ a=-gr/R \] Where \(g\) is the surface gravitational acceleration. Thus, if it falls from the surface along a path through the planet, its speed at a distance r from the center will be given by \[ \tfrac{1}{2}mv^2=\tfrac{1}{2}m\frac{g}{R}\left ( R^2-r^2 \right ) \] \[ v(r)=\sqrt{\frac{g}{R}} \sqrt{R^2-r^2} \] Let us suppose it falls along the path specified by the function \(r(\theta)\), where r is even and \(r(\pm\alpha)=R\). The total time is given by \[ T=2\int_{0}^{\alpha}\frac{d\ell}{v}=2\sqrt{\frac{R}{g}}\int_{0}^{\alpha}\frac{\sqrt{r^2(\theta)+r'^2(\theta)}}{\sqrt{R^2-r^2(\theta)}}d\theta \] In order to obtain conditions for the optimal path, then, we use calculus of variations. The Lagrangian is \[ L(r,r',\theta)=\frac{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}} \] Using the Beltrami Identity, we find: \[ \frac{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}}-\frac{r'^2}{{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}}}=\frac{r^2}{{\sqrt{r^2+r'^2}}{\sqrt{R^2-r^2}}}=C \] Let \(1+ \tfrac{1}{C^2}=1/q^2\). Rearranging, we find: \[ r'=r\sqrt{\frac{\left (1+ \tfrac{1}{C^2} \right )r^2-R^2}{R^2-r^2}}=\frac{r}{q}\sqrt{\frac{r^2-R^2q^2}{R^2-r^2}} \] As \(r'(0)=0\), this implies that \[ r(0)=Rq \] \[ r=\frac{r(0)}{R} \] Let us make the change of variables: \(u=r^2/R^2\). This then gives: \[ u'=2u\sqrt{\frac{\tfrac{1}{q^2}u-1}{1-u}} \] \[ u(0)=q^2 \] In order to determine this value, we can integrate the differential equation: \[ \frac{1}{2u} \sqrt{\frac{1-u}{\tfrac{1}{q^2}u-1}}du=d\theta \] \[ \int_{q^2}^{1}\frac{1}{2u} \sqrt{\frac{1-u}{\tfrac{1}{q^2}u-1}}du=\frac{\pi}{2}(1-q)=\int_{0}^{\alpha}d\theta=\alpha \] Thus \[ q=1-\frac{2\alpha}{\pi} \] We can then find the total travel time: \[ T=2\sqrt{\frac{R}{g}}\int_{0}^{\alpha}\frac{\sqrt{r^2(\theta)+r'^2(\theta)}}{\sqrt{R^2-r^2(\theta)}}d\theta=2\sqrt{\frac{R}{g}}\int_{Rq}^{R}\frac{\sqrt{r^2(\theta)+r'^2(\theta)}}{\sqrt{R^2-r^2(\theta)}}\frac{1}{r'}dr \] \[ T=2\sqrt{\frac{R}{g}}\int_{Rq}^{R} \frac{q}{r} \sqrt{r^2+\frac{r^2}{q^2}{\frac{r^2-R^2q^2}{R^2-r^2}}}\frac{dr}{\sqrt{r^2-R^2q^2}} \] \[ T=\sqrt{\frac{R}{g}}\sqrt{1-q^2}\int_{Rq}^{R} \frac{2rdr}{\sqrt{R^2-r^2} \sqrt{r^2-R^2q^2}} \] \[ T=\sqrt{\frac{R}{g}}\sqrt{1-q^2}\int_{q^2}^{1} \frac{dx}{\sqrt{1-x^2} \sqrt{x^2-q^2}} \] \[ T=\pi \sqrt{\frac{R}{g}}\sqrt{1-q^2}=2\sqrt{\frac{R}{g}}\sqrt{\pi\alpha-\alpha^2} \] Below we show several trajectories along the optimal path for several values of alpha:

In fact, these solutions are hypocycloids.

Golomb's Sequence

2017-07-27T08:17:00.002-07:00

Definition

Golomb's sequence, named after Solomon Golomb, is a curious sequence of whole numbers that describes itself. It is defined in the following way: it is a non-decreasing sequence of whole numbers where the nth term gives the number of times n occurs in the sequence, and the first term is 1. From this we can begin constructing it: The second element must be greater than 1 as there is only one 1. It must be 2, and so must be the third element. Given this, there must be 2 threes, and from here on we may merely refer to the terms in the sequence and continue from there. The first several terms of the sequence are: \[ 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, \\ 9, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12,... \]

Recurrence Relation

The sequence can be given an explicit recurrence relation by stating it in the following way, using the self-describing property: To determine the next term in the sequence, go back the number of times that the previous term occurred (this will put you at the next-smallest value), then add one. For example, to determine the 12 term (6), count the number of times that the value of the 11th term (5) occurs (3 times). Step back that many terms (to the 9th term: 5) then add one to that value (6). This then gives the recurrence relation: \[ a(n+1)=1+a\left ( n+1-a(a(n)) \right ) \] Where \(a(1)=1\).

Asymptotic Behavior

The recurrence relation allows us to give an asymptotic expression for the value of the sequence. Let us suppose the sequence grows like \[ a(n)=A n^\alpha \] Let us put this into the recurrence relation: \[ A(n+1)^\alpha=1+A\left ( n+1-A(A n^\alpha)^\alpha \right )^\alpha \] Simplifying and rearranging, we obtain: \[ 1=\frac{1}{A(n+1)^\alpha}+\left (1-A^{1+\alpha}\frac{n^{\alpha^2}}{n+1} \right )^\alpha \] As \(\alpha<1\), \(\frac{n^{\alpha^2}}{n+1}\) goes to zero. For small x, \((1+x)^b\rightarrow 1+bx\). Thus, asymptotically: \[ 1\approx\frac{1}{A(n+1)^\alpha}+1-\alpha A^{1+\alpha}\frac{n^{\alpha^2}}{n+1} \] \[ \alpha A^{2+\alpha}n^{\alpha^2}(n+1)^{\alpha-1} \approx 1 \] Thus it must be the case that \[ \alpha^2+\alpha-1=0 \] \[ A=\alpha^{-\frac{1}{2+\alpha}} \] The solution to the first equation is \[ \alpha=\left \{\varphi-1,-\varphi \right \} \] Where \(\varphi\) is the golden ratio. As the exponent is clearly positive, we find the sequence is asymptotic to: \[ a(n)\rightarrow \varphi^{2-\varphi}n^{\varphi-1} \] Below we plot the ratio of these two expressions:

Continued Fractions

2017-07-07T11:05:00.001-07:00

Definition and Background

A continued fraction is a representation of a number \(x\) in the form \[ x=a_0+\cfrac{b_0}{a_1+\cfrac{b_1}{a_2+\cfrac{b_2}{a_3+\cfrac{b_3}{\ddots}}}} \] Often, the b's are taken to be all 1's and the a's are integers. This is called the canonical or simple form. There are numerous ways of representing continued fractions. For instance, \[ x=a_0+\cfrac{1}{a_1+\cfrac{1}{a_2+\cfrac{1}{a_3+\cfrac{1}{\ddots}}}} \] can be represented as \[ x=a_0+\overset{\infty}{\underset{k=1}{\mathrm{K}}}\frac{1}{a_k} \] Or as \[ \left [ a_0;a_1,a_2,a_3,... \right] \]

Construction Algorithm

The continued fraction terms can be determined as follows: Given \(x\), set \(x_0=x\). Then \[ a_k=\left \lfloor x_k \right \rfloor \] \[ x_{k+1}=\frac{1}{x_k-a_k} \] Continue until \(x_k=a_k\).

Convergents

The convergents of a continued fraction are the rational numbers resulting from taking the first n terms of the continued fraction. Let \(P_n\) and \(Q_n\) be the numerators and deominators respectively of the nth convergent (the one that includes \(a_n\)). It is not difficult to show that \[ P_n=a_nP_{n-1}+P_{n-2} \] \[ Q_n=a_nQ_{n-1}+Q_{n-2} \] An alternate way of saying this is that \[ \begin{bmatrix} a_n & 1\\ 1 & 0 \end{bmatrix} \begin{bmatrix} P_{n-1} & Q_{n-1}\\ P_{n-2} & Q_{n-2} \end{bmatrix} = \begin{bmatrix} P_{n} & Q_{n}\\ P_{n-1} & Q_{n-1} \end{bmatrix} \] Where \[ \begin{bmatrix} P_{-1} & Q_{-1}\\ P_{-2} & Q_{-2} \end{bmatrix}= \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \] And therefore \[ {}^L\prod^n_{k=0} \begin{bmatrix} a_k & 1\\ 1 & 0 \end{bmatrix} = \begin{bmatrix} P_{n} & Q_{n}\\ P_{n-1} & Q_{n-1} \end{bmatrix} \] Let \[p_n=\frac{P_n}{P_{n-1}}\] \[q_n=\frac{Q_n}{Q_{n-1}}\] Then \[p_n=a_n+\frac{1}{p_{n-1}}\] \[q_n=a_n+\frac{1}{q_{n-1}}\] We find that \[ \frac{P_{n+1}}{Q_{n+1}}=a_0+\sum_{k=0}^{n}\frac{(-1)^k}{Q_kQ_{k+1}} \] And thus \[ \left | x- \frac{P_{n}}{Q_{n}}\right |<\frac{1}{Q_nQ_{n+1}} \] As \(a_n \geq 1\), \(Q_n \geq F_n\) i.e. the nth Fibonacci number. This, then, implies Hurwitz's theorem: For any irrational number x, there exist infinitely many ratios \(P/Q\) such that \[ \left | x-\frac{P}{Q} \right |<\frac{k}{Q^2} \] Only if \(k \geq 1/\sqrt{5}\).

Periodic Continued Fractions

Suppose that for \(k \geq N\), \(a_{k+M}=a_k\). Let \[ [a_0;a_1,a_2,...a_{N-2}]=\frac{P_{Y1}}{Q_{Y1}} \] \[ [a_0;a_1,a_2,...a_{N-1}]=\frac{P_{Y2}}{Q_{Y2}} \] \[ \left [a_N;a_1,a_2,...a_{N+M-2} \right ]=\frac{P_{Z1}}{Q_{Z1}} \] \[ \left [a_N;a_1,a_2,...a_{N+M-1} \right ]=\frac{P_{Z2}}{Q_{Z2}} \] Then x satisfies the formula \[ x=\frac{P_{Y2}\cdot y+P_{Y1}}{Q_{Y2} \cdot y+Q_{Y1}} \] Where y satisfies \[ y=\frac{P_{Z2}\cdot y+P_{Z1}}{Q_{Z2} \cdot y+Q_{Z1}} \] Thus a continued fraction will be eventually periodic if and only if it is the solution of some quadratic polynomial.

Generic Continued Fractions

Let x be uniformly chosen between 0 and 1. We define a sequence of random variables as follows \[ \xi_0=x \] \[ \xi_{n+1}=\frac{1}{\xi_n}-\left \lfloor \frac{1}{\xi_n} \right \rfloor \] Clearly, if \[x=[0;a_1,a_2,a_3,...] \] Then \[\xi_n=[0;a_{n+1},a_{n+2},a_{n+3},...]\] Let us assume that, asymptotically, the \(\xi\)'s approach a single distribution. Based on our definitions, this would imply that \[ P(\xi_{n+1} < z)=\sum_{k=1}^{\infty} P \left (\frac{1}{k} < \xi_n < \frac{1}{k+z} \right ) \] Differentiating both sides gives the required relationship: \[ f_\xi(z)=\sum_{k=1}^{\infty}\frac{f_\xi\left ( \tfrac{1}{k+z} \right )}{(k+z)^2} \] Let us test the function \[ f_\xi(z)=\frac{A}{1+z} \] \[ \sum_{k=1}^{\infty}\frac{A}{1+\tfrac{1}{k+z}}\frac{1}{(k+z)^2}=\sum_{k=1}^{\infty}\frac{1}{(1+k+z)(k+z)} \] \[ \sum_{k=1}^{\infty}\frac{1}{(1+k+z)(k+z)}=\sum_{k=1}^{\infty}\frac{1}{k+z}-\frac{1}{k+z+1}=\frac{A}{1+z} \] It can be proved more rigorously that this is indeed the asymptotic probability density function, with \(A=1/\ln(2)\). Thus \[ P(\xi_{n} < z)=\log_2(1+z) \] From this we can easily find the asymptotic density function for the continued fraction terms. The probability that \(a_{n+1}=k\) is the same as the probability that \(\left \lfloor \tfrac{1}{\xi_n} \right \rfloor=k\). This is then \[ P(a_{n+1}=k)=P\left ( \frac{1}{k+1} < \xi_n \leq \frac{1}{k} \right )=\log_2(1+\tfrac{1}{k})-\log_2(1+\tfrac{1}{k+1}) \] \[ P(a_{n+1}=k)=\log_2\left ( \frac{(k+1)^2}{k(k+2)} \right )=\log_2\left ( 1+\frac{1}{k(k+2)} \right ) \] This is called the Gauss-Kuzmin Distribution.

From this we can then easily find the asymptotic geometric mean of the terms \[ \underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}a_k}=\exp\left (\underset{n \to \infty}{\lim} \frac{1}{n}\sum_{k=1}^{\infty} \ln(a_k)\right )=\exp\left ( E(\ln(a_k)) \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}a_k}= \exp\left (\sum_{j=1}^{\infty}P(a_k=j)\ln(j) \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}a_k}= \prod_{j=1}^\infty \left ( 1+\frac{1}{j(j+2)} \right )^{\log_2(j)}=2.685452001...=K_0 \] This value is called Khinchin's Constant.

Let us now look at the asymptotic behavior of the convergents. Namely, we wish to examine the asymptotic behavior of the denominators. First note that \[ \xi_n=\frac{1}{\xi_{n-1}}-a_n \] If we let \(y_n=1/\xi_n\), we then have \[ y_{n-1}=a_n+\frac{1}{y_n} \] From above we have that \[q_n=a_n+\frac{1}{q_{n-1}}\] As, asymptotically, \(\xi_n \sim \xi_{n-1}\), this implies that, asymptotically, \(y_n \sim y_{n-1} \sim 1/\xi_n\) and therefore \(q_n \sim q_{n-1} \sim 1/\xi_n\). Thus \[ f_q(z)=\left\{\begin{matrix} \frac{1}{z^2}\frac{1}{\ln(2)}\frac{1}{1+1/z} \\ 0 \end{matrix}\right.\; \; \begin{matrix} z > 1 \\ z \leq 1 \end{matrix} \] As \[ Q_n=\prod_{k=1}^{n}q_k \] We have \[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}=\underset{n \to \infty}{\lim}\sqrt[n]{\prod_{k=1}^{n}q_k}=\exp\left (\underset{n \to \infty}{\lim}\frac{1}{n}\sum_{k=1}^{\infty}\ln(q_k) \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}=\exp\left (E(\ln(q_n)) \right )= \exp\left (\int_{1}^{\infty}\ln(z)\frac{1}{z^2}\frac{1}{\ln(2)}\frac{1}{1+1/z}dz \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}= \exp\left (-\frac{1}{\ln(2)}\int_{0}^{1}\frac{\ln(z)}{1+z}dz \right ) \]\[ \underset{n \to \infty}{\lim}\sqrt[n]{Q_n}=\exp\left ( \frac{\pi^2}{12\ln(2)} \right )=3.275823... \] This value (or sometimes its natural log) is called Levy's constant.

We want to know how efficient continued fractions are for representing numbers relative to place-value expansions. Suppose we are working in base b. We want to find how many terms in the continued fraction expansion are required to obtain an approximation good to m base-b digits. We will have obtained such an approximation when the error is less than \(b^{-m}\) but greater than \(b^{-(m+1)}\). From above we have \[ \left | x- \frac{P_{n}}{Q_{n}}\right |<\frac{1}{Q_nQ_{n+1}} \] Thus \[ b^{-(m+1)} < \left | x- \frac{P_{n}}{Q_{n}}\right | < \frac{1}{Q_nQ_{n+1}} < \frac{1}{Q_n^2} \leq b^{-m} \] Rearranging, we have \[ b^m \leq Q_n^2 < b^{m+1} \] \[ b^{\frac{m}{2n}} \leq \sqrt[n]{Q_n} < b^{\frac{m+1}{2n}} \] Thus, as the center expression approaches a limit for large n, it follows that \(m/n\) does as well. Namely, by rearranging, we find that for n the number of continued fraction terms needed to express x in base b up to m decimal places, \[ \underset{m,n \to \infty}{\lim}\frac{m}{n}=\frac{\pi^2}{6\ln(2)\ln(b)} \] This is known as Loch's Theorem. In particular, for base 10, this implies that each continued fraction term provides on average 1.03064... digits of precision. In fact, base 10 is the largest integral base for which the continued fraction is more efficient.

Iterated Radicals

2017-05-31T13:34:00.002-07:00

The Case of Square Roots

We wish to examine the behavior of the iterated radical expression \[ R_a(n)=\underbrace{\sqrt{a+\sqrt{a+...\sqrt{a+R_a(0)}}}}_{n\textrm{ radicals}} \] Let \[ A=\lim_{n \rightarrow \infty} R_a(n) \] Then clearly \[ A^2=a+A \] And so \[ A=\tfrac{1+\sqrt{1+4a}}{2} \] In order to determine the nature of the convergence to this limit, let us examine a function defined as follows: \[ f(x/q)=\sqrt{a+f(x)} \] Where q is a value yet to be determined. Clearly \(f(0)=A\), and it is not hard to see that \[ R_a(n)=f\left ( \tfrac{f^{-1}(R_a(0))}{q^n} \right ) \] Thus the behavior of f, as well as the value of q, will determine the convergence of \(R_a(n)\). We rearrange the above relation to get \[ f^2(x)=a+f(qx) \] Let us expand f in a Taylor series. \[ f(x)=A+b_1 x +b_2 x^2 +b_3 x^3+... \] We can substitute this into our functional equation to get \[ A^2+2Ab_1 x+(2A b_2+b_1^2)x^2+(2Ab_3+2b_1b_2)x^3+...=a+A+qb_1x+q^2b_2x^2+q^3b_3x^3+... \] By equating coefficients, we find that \(q=2A\). Note that changing \(b_1\) only affects the scaling of the function. Assuming we want the inverse to be positive as we approach from below, \(b_1\) must be negative, thus we simply set \(b_1=-1\). Now the rest of the coefficients can be found algorithmically in sequence. In general, the coefficient of \(x^k\) will be \[ b_k=\frac{1}{(2A)^k-2A}\sum_{j=1}^{k-1}b_jb_{k-j} \] And thus \[ f\left ( \tfrac{x}{2A} \right )=\sqrt{a+f(x)} \]\[ f^2(x)=a+f(2Ax) \\ R_a(n)=f\left ( \tfrac{f^{-1}(R_a(0))}{(2A)^n} \right ) \] Where f is defined by the polynomial with the given coefficients. It follows that \[ \lim_{n \rightarrow \infty} (2A)^n(A-R_a(n))=\lim_{n \rightarrow \infty} (2A)^n(f(0)-f(f^{-1}(R_a(0))/(2A)^n)) \]\[ \lim_{n \rightarrow \infty} (2A)^n(A-R_a(n))=-f'(0)f^{-1}(R_a(0))=f^{-1}(R_a(0)) \]\[ \lim_{n \rightarrow \infty} (2A)^n \left (A-\underbrace{\sqrt{a+\sqrt{a+...\sqrt{a+z}}}}_{n\textrm{ radicals}} \right )=f^{-1}(z) \] Another way to construct \(f(x)\) is by the following approach, which converges fairly quickly: Let \(f_0(x)=A-x\). We define \[ f_{k+1}(x)= f_k^2\left (\frac{x}{2A} \right )-a \] Then \[ \lim_{k \rightarrow \infty}f_k(x)=f(x) \]

A Special Trigonometric Case

For the case of \(a=2\), it is easy to show by induction that \[ b_k=2(-1)^k\frac{1}{(2k)!} \] Which would imply that \[ f(x)=2\cos(\sqrt{x}) \] Therefore \[ \lim_{n \rightarrow \infty} 4^n \left( 2-\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2}}}}_{n \textrm{ radicals}}\right)=\pi^2/4 \]

An Infinite Product

Beginning with \[ f^2(x)=a+f(2Ax) \] Let us differentiate to obtain \[ f(x)f'(x)=Af'(2Ax) \] Thus, if we define \[ g(x)=-xf'(x) \] Then we easily see that \[ g(2Ax)=2g(x)f(x) \] Clearly \(g(0)=0, g'(0)=1\). Then \[ g(x)=2f\left (\tfrac{x}{2A} \right )g\left (\tfrac{x}{2A} \right )=2^2f\left (\tfrac{x}{2A} \right ) f\left (\tfrac{x}{(2A)^2} \right ) g\left (\tfrac{x}{(2A)^2} \right ) \]\[ g(x)=2^N g\left (\tfrac{x}{(2A)^N} \right )\prod_{k=1}^{N} f\left (\tfrac{x}{(2A)^k} \right ) \]\[ g(x)=(2A)^N g\left (\tfrac{x}{(2A)^N} \right )\prod_{k=1}^{N} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \] Taking the limit \[ g(x)=\underset{N \to \infty}{\lim}(2A)^N g\left (\tfrac{x}{(2A)^N} \right )\prod_{k=1}^{N} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right )=x\prod_{k=1}^{\infty} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \] Thus \[ -f'(x)=\prod_{k=1}^{\infty} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \] Thus we need only examine the zeros of f to find the zeros of f'. In fact, if f has zeros \[\left \{z_1,z_2,z_3,... \right \}\] Then f will have extrema at \[ \bigcup_{k=1}^{\infty}\left \{(2A)^kz_1,(2A)^kz_2,(2A)^kz_3,... \right \} \]

An Associated Infinite Series

Differentiating the log of both sides of the result above, we find the infinite series: \[ \frac{d}{dx}\ln\left (-f'(x) \right )=\frac{d}{dx}\ln\left (\prod_{k=1}^{\infty} \tfrac{1}{A}f\left (\tfrac{x}{(2A)^k} \right ) \right ) \]\[ \frac{f''(x)}{f'(x)}=\sum_{k=1}^{\infty}\frac{1}{(2A)^k}\frac{f'\left (\tfrac{x}{(2A)^k} \right )}{f\left (\tfrac{x}{(2A)^k} \right )} \]

Zeros of \(f(x)\)

Below is a plot of the zeros of for different values of a on the vertical axis, plotted semi-logarithmically.

Below is a plot the sign of f (Yellow is positive, blue is negative), from which the zero contours can be seen. However, we can also see that some zeroes of f for certain values of a are multiple roots, as f goes to zero without changing sign.

Special Cases

Two special cases bear mentioning. In the case \(a=1\), the zeros are given by \[ z_n=2.1973\cdot (1+\sqrt{5})^{2n} \] for \(n \geq 0\). In fact, in this case, after the first zero, f is always between -1 and 0. f is -1 at \[ x_n=2.1973\cdot (1+\sqrt{5})^{2n+1} \] for \(n \geq 0\). For \(a=2\), the zeros are at \[ z_n=\left ( (2n+1)\frac{\pi}{2} \right )^2 \] And, in fact, \(f(x)=2\) at \(x_n=\left ( 2n\pi \right )^2\), and \(f(x)=-2\) at \(x=\left ( (2n+1)\pi \right )^2\), for \(n\geq0\).

Periodic and Possible Fractal Structure

Although f is generally not very interesting close to zero, it exhibits remarkable behavior on larger scales. We find, namely, that if we take \[ h(x)=\left | f(x) \right |^{x^{-\log_{2A}(2)}} \] Then h is exponentially periodic, asymptotically. We define \[J(x)=h((2A)^x)\] This function has period 1, asymptotically. Below we show the behavior of J for some values of A

Note that the number of zeros remains constant. All seem to be single roots. In fact, the location of the dominant maxima seem constant as well However, within the periodicity, J appears to have a fractal structure. Below we show a zoom of \(J(x)\) for \(a=3\).

Complex Behavior

We can take the series and functional definitions of the function and use them to extend the function to the entire complex plane. Below we plot the complex sign of \(f(Cz|z|)\) for different values of a, and a certain value of C (this rescaling done to make the regularities more evident). The complex sign is given by the color:

Dark Blue\(\Leftrightarrow\textrm{Re}< 0 ,\textrm{Im} < 0 \)
Light Blue\(\Leftrightarrow\textrm{Re} < 0 ,\textrm{Im} > 0 \)
Orange\(\Leftrightarrow\textrm{Re} > 0,\textrm{Im} < 0 \)
Yellow\(\Leftrightarrow\textrm{Re} > 0,\textrm{Im} > 0 \)

This allows us to find zeros, which correspond to points where all four colors meet.

We note several remarkable features:

The function is conjugate-symmetric.
The function displays remarkable regularity away from the real line. Note the persistent ripples which reach total regularity at \(a=2\). There is a structure of "fingers" that gradually join, each finger corresponding to one zero. The position of certain features on the real line remains fixed, e.g. the prominent feature at about 0.8.
The evolution of the function over a can be broken into three eras.
1. Pre-Saturating: For \(a< 1 \), there is exactly one real zero.
2. Saturating: For \(1\leq a < 2 \), zeros join to form pairs of real zeros.
3. Saturated: For \(2 \leq a\), all zeros are real.
The number and larger-scale density of zeros remains roughly constant.
The function displays quasi-fractal properties, as it becomes increasingly self-similar on larger scales. In a sense, a cross between periodic and fractal behavior, as seen in the other figures.
The process of the fusing of complex zeros into pairs of real zeros can also be seen in the plots of the real zeros above, giving a new view of the branching features.
The fingers coalesce along elliptical paths. In fact, these ellipses are of the form \(x^2+2y^2=C'^2\)

The Case of Arbitrary Roots

More generally, suppose we examine \[ R_a(n)=\underbrace{\sqrt[p]{a+\sqrt[p]{a+...\sqrt[p]{a+R_a(0)}}}}_{n\textrm{ radicals}} \] Let \[ A=\lim_{n \rightarrow \infty} R_a(n) \] Then \[ f(x/q)=\sqrt[p]{a+f(x)} \] Clearly \(f(0)=A\), and it is not hard to see that, again \[ R_a(n)=f(f^{-1}(R_a(0))/q^n) \] If we do the same analysis as before we find that \(q=pA^{p-1}=p(1+a/A)\). Let \(f_0(x)=A-x\). We define \[ f_{k+1}(x)= f_k^p\left (\frac{x}{q} \right )-a \] Then \[ \lim_{k \rightarrow \infty}f_k(x)=f(x) \] Then similarly we have \[ \lim_{n \rightarrow \infty} q^n(A-R_a(n))=f^{-1}(R_a(0)) \] MATLAB code for evaluating the function for a given a and given radical can be found here.

Some Introductory Quantum Mechanics: Theorems of the Formalism

2016-01-06T10:01:00.002-08:00

Quantum mechanics (QM) has a number of curious and interesting associated phenomena. Some of these were hinted at in the first part of this series. The effects can be inferred from the mathematical formalism discussed in the previous post in this series. Here we will discuss several of these, again without reference to interpretation.

This is part of a multi-part series giving a general introduction to quantum theory. This is part 3.

Heisenberg's Uncertainty Principle

The variance of any observable is defined as \[ \sigma_A^2=\left \langle A^2 \right \rangle-\left \langle A \right \rangle^2=\left \langle \left ( A-\left \langle A \right \rangle \right )^2 \right \rangle \] Where \(\left \langle Q \right \rangle=\left. \langle \psi \right. | Q\left. |\psi \right \rangle\) is the expected value of the operator Q. Roughly speaking, \(\sigma_A\) is the "width" of distribution of the potential values for A. We then define a new state vector as \[ \left. | a \right \rangle=\left (A-\left \langle A \right \rangle \right ) \left. |\psi \right \rangle \] So that \[ \sigma_A^2=\left \langle a \right. |\left. a \right \rangle \] We similarly define \[ \sigma_B^2=\left \langle B^2 \right \rangle-\left \langle B \right \rangle^2=\left \langle \left ( B-\left \langle B \right \rangle \right )^2 \right \rangle=\left \langle b \right. |\left. b \right \rangle \] Where \[ \left. | b \right \rangle=\left (B-\left \langle B \right \rangle \right ) \left. |\psi \right \rangle \] Then, by the Cauchy-Schwartz inequality: \[ \sigma_A^2\sigma_B^2=\left \langle a \right. |\left. a \right \rangle\left \langle b \right. |\left. b \right \rangle \geq \left | \left \langle a \right. |\left. b \right \rangle \right |^2 \] Let \(z= \left \langle a \right. |\left. b \right \rangle\). Then \[ \left | z \right |^2=[\mathrm{Re}(z)]^2+[\mathrm{Im}(z)]^2\geq[\mathrm{Im}(z)]^2=\left [\frac{z-\bar{z}}{2i} \right ]^2=\left [\frac{\left \langle a \right. |\left. b \right \rangle-\left \langle b \right. |\left. a \right \rangle}{2i} \right ]^2 \] However, \[ \left \langle a \right. |\left. b \right \rangle=\left \langle \left ( A-\left \langle A \right \rangle \right ) \left ( B-\left \langle B \right \rangle \right ) \right \rangle=\left \langle AB \right \rangle-\left \langle A \right \rangle\left \langle B \right \rangle \] \[ \left \langle b \right. |\left. a \right \rangle=\left \langle \left ( B-\left \langle B \right \rangle \right ) \left ( A-\left \langle A \right \rangle \right ) \right \rangle=\left \langle BA \right \rangle-\left \langle B \right \rangle\left \langle A \right \rangle \] So \[ \left | z \right |^2\geq\left [\frac{\left \langle AB \right \rangle-\left \langle BA \right \rangle}{2i} \right ]^2=\left [\frac{\left \langle [A,B] \right \rangle}{2i} \right ]^2 \] Where \([A,B]=AB-BA\) is the commutator of the two operators A and B (In general, two operators need not commute, and so the commutator will not vanish). Thus, we can state the general uncertainty principle for any two operators: \[ \sigma_A \sigma_B \geq\tfrac{1}{2}| \left \langle \left [A,B \right ] \right \rangle | \] For example, let us take the one-dimensional position and momentum operators: \[ A=x,\;B=\frac{\hbar}{i}\frac{\partial }{\partial x} \] \[ [x,p_x]\left. | \psi \right \rangle=xp_x\left. | \psi \right \rangle-p_xx\left. | \psi \right \rangle =\frac{\hbar}{i}\left (x\frac{\partial}{\partial x}\left. | \psi \right \rangle-\frac{\partial }{\partial x}x\left. | \psi \right \rangle \right )=i \hbar \left. | \psi \right \rangle \] Thus \[ \sigma_x \sigma_{p_x} \geq\frac{\hbar}{2} \] This is the famous position-momentum uncertainty relation.

No Cloning and Related Theorems

Suppose we want to find an operator that takes a quantum state and produces a copy of it. That is, we feed in a state and a "blank" state, operate on the two of them, and the result is the original state and a copy of it. Let this operator be called C, and the blank state be called b. That is: \[ C \left. | \psi \right \rangle_A\left. | b \right \rangle_B= \left. | \psi \right \rangle_A\left. | \psi \right \rangle_B \] As C is a transformation/evolution operator, it must be unitary, so it preserves inner products, and \(C^\dagger C=I\). Therefore \[ C \left. | \phi \right \rangle_A\left. | b \right \rangle_B=\left. | \phi \right \rangle_A\left. | \phi \right \rangle_B \] \[ \left \langle b \right.|_B \left \langle \phi \right.|_A C^\dagger=\left \langle \phi \right.|_B \left \langle \phi \right.|_A \] \[ \left \langle b \right.|_B \left \langle \phi \right.|_A \left. | \psi \right \rangle_A\left. | b \right \rangle_B = \left \langle b \right.|_B \left \langle \phi \right.|_A C^\dagger C \left. | \psi \right \rangle_A\left. | b \right \rangle_B =\left \langle \phi \right.|_B \left \langle \phi \right.|_A \left. | \psi \right \rangle_A\left. | \psi \right \rangle_B \] However, \(\left \langle b|b \right \rangle=1\), so \[ \left \langle \phi|\psi \right \rangle=\left \langle \phi|\psi \right \rangle^2 \] Thus \(\left \langle \phi|\psi \right \rangle \in \left \{ 0,1 \right \}\), that is, the two wavefunctions are orthogonal or identical. But the two states can be chosen arbitrarily, and need not be identical or orthogonal (indeed we can always construct a wavefunction as a linear combination of an orthogonal state and an identical state, and so achieve any inner product).

Moreover,as C must be linear, if \(\left. | \chi \right \rangle=\alpha \left. | \phi \right \rangle+\beta \left. | \psi \right \rangle\), then \[ C\left. | \chi \right \rangle_A \left. | b \right \rangle_B= C \left ( \alpha \left. | \phi \right \rangle_A+\beta \left. | \psi \right \rangle_B \right ) \left. | b \right \rangle_B =\alpha C \left. | \phi \right \rangle_A \left. | b \right \rangle_B+\beta C \left. | \psi \right \rangle_A \left. | b \right \rangle_B \] \[ C\left. | \chi \right \rangle_A \left. | b \right \rangle_B = \alpha \left. | \phi \right \rangle_A \left. | \phi \right \rangle_B + \beta \left. | \psi \right \rangle_A \left. | \psi \right \rangle_B \] However, \[ C\left. | \chi \right \rangle_A \left. | b \right \rangle_B=\left. | \chi \right \rangle_A \left. | \chi \right \rangle_B=\alpha^2 \left. | \phi \right \rangle_A \left. | \phi \right \rangle_B+\alpha\beta \left ( \left. | \phi \right \rangle_A \left. | \psi \right \rangle_B + \left. | \psi \right \rangle_A \left. | \phi \right \rangle_B \right )+\beta^2 \left. | \psi \right \rangle_A \left. | \psi \right \rangle_B \] And these two expressions clearly need not be equivalent. We are free to choose \(\alpha, \beta, \phi\), and \(\psi\) arbitrarily, and, in general, the two expressions will be unequal. Thus there cannot be a way to copy arbitrary quantum states.

Since there is no way to clone a quantum state, there is thus no way to go in the opposite direction, namely start with two identical states and transform that into a "blank" state and an original. The argument runs in much the same way, and can be seen as a dual of the no cloning theorem, called the no-deleting theorem.

Suppose it were possible to measure and communicate the state of an arbitrary quantum state as a sequence of classical bits. Since classical bits can be easily copied, it would then be possible to copy quantum states, in violation of the no cloning theorem. Thus it is not possible to measure and communicate the state of an arbitrary quantum state as a sequence of classical bits, and this is called the no teleportation theorem.

An extension of the no cloning theorem to mixed statesis the no broadcast theorem, which states that one can't convey a general quantum state to two or more recipients.

Correspondence Principle and the Ehrenfest Theorem

A rather clear demand on quantum mechanics is that its predictions tend to those of standard classical mechanics in the appropriate limits. Given that we do not observe macroscopic objects to display unusual, characteristically quantum phenomena, quantum mechanics must make the same predictions as classical mechanics, asymptotically. The probabilities for macroscopic objects to display such phenomena must be vanishingly small. In general, the observed classical parameters will correspond to the expected values of the quantum analogues.

As an example, let us find the rate of change of the expected value of a generic observable \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle A \right \rangle=\frac{\mathrm{d} }{\mathrm{d} t}\left \langle \psi|A|\psi \right \rangle =\left \langle \frac{\partial }{\partial t}\psi|A|\psi \right \rangle + \left \langle \psi|\frac{\partial }{\partial t}A|\psi \right \rangle + \left \langle \psi|A|\frac{\partial }{\partial t}\psi \right \rangle \] However, since the wavefunction satisfies the Schrodinger equation, we have \[ i \hbar \frac{\partial }{\partial t}\left. | \psi \right \rangle= H\left. | \psi \right \rangle \] And, moreover \[ -i \hbar \frac{\partial }{\partial t}\left \langle \psi | \right.= \left \langle \psi | \right. H \] Thus \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle A \right \rangle =-\frac{1}{i\hbar}\left \langle\psi|HA|\psi \right \rangle + \left \langle \psi|\frac{\partial }{\partial t}A|\psi \right \rangle + \frac{1}{i\hbar}\left \langle \psi|AH|\psi \right \rangle \] \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle A \right \rangle = \frac{1}{i\hbar}\left \langle[A,H] \right \rangle+\left \langle \frac{\partial }{\partial t}A \right \rangle \] Let \(x=A\) \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle x \right \rangle = \frac{1}{i\hbar}\left \langle[x,H] \right \rangle+\left \langle \frac{\partial }{\partial t}x \right \rangle =\frac{1}{i\hbar}\left \langle[x,H] \right \rangle =\frac{1}{i\hbar}\left \langle[x,\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}+V] \right \rangle \] \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle x \right \rangle =\frac{1}{i\hbar}\left \langle[x,\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}+V] \right \rangle =\frac{\hbar}{im}\left \langle \frac{\partial }{\partial x} \right \rangle=\frac{\left \langle p \right \rangle}{m} \] Let \(p=A\) \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle p \right \rangle = \frac{1}{i\hbar}\left \langle[p,H] \right \rangle+\left \langle \frac{\partial }{\partial t}p \right \rangle \] \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle p \right \rangle = \frac{1}{i\hbar}\left \langle[p,H] \right \rangle+\left \langle \frac{\partial }{\partial t}p \right \rangle =\frac{1}{i\hbar}\left \langle[p,\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}+V] \right \rangle = -\left \langle \frac{\partial V}{\partial x} \right \rangle \] These are the same as the classical dynamical equations for the position and momentum. Thus, as it is often the case that the wavefunctions are highly localized, at least compared to macroscopic scales, quantum mechanics predicts the same macroscopic behavior as classical mechanics.

Another fact derivable from the Ehrenfest theorem is the following. Suppose Q is an operator that does not depend explicitly on time. Then we have \[ \frac{\mathrm{d} }{\mathrm{d} t}\left \langle Q \right \rangle = \frac{1}{i\hbar}\left \langle[Q,H] \right \rangle \] From our discussion in the section of the Heisenberg uncertainty principle \[ \sigma_H\sigma_Q\geq\tfrac{1}{2}|\left \langle \left [ H,Q \right ] \right \rangle|=\frac{\hbar}{2}\left | \frac{\mathrm{d} }{\mathrm{d} t} \left \langle Q \right \rangle \right | \] Though time is not an observable, let us nevertheless define \[ \sigma_t=\frac{\sigma_Q}{|\mathrm{d}\left \langle Q \right \rangle/\mathrm{dt}|} \] We then have \[ \sigma_H\sigma_t\geq\frac{\hbar}{2} \] A result analogous to that of position and momentum.

Bell's Theorem and the Kochen-Specker Theorem

Certain interpretations of quantum mechanics hold that the measurements and observations of the quantum systems are deterministic, and the only reason they seem indeterministic is because we lack full knowledge of the system. They hold that there are hidden variables in the system that we have not or maybe even can not uncover that govern the system, and it is only our ignorance of these that makes us unable to predict with certainty what we will observe. Models like these are called realistic, in the sense that, prior to the measurement, there is a definite, singular reality of what we will observe (or in some cases, counterfactually would observe).

Another principle typically regarded as fundamental is that the system is local, that is, causal effects cannot propagate faster than the speed of light. In principle, if the system could be appropriately manipulated, it would be possible to use non-local systems to send messages into the past.

Often, these interpretations are hard or impossible to test. However, certain versions can be tested, as they make predictions that would be inconsistent with those of standard quantum mechanics. Bell's inequality is one way to rule out certain types of local realistic models.

Let us take a source that produces a sequence of identical electron pairs in the entangled state \(\tfrac{1}{\sqrt{2}}\left (\left. | \uparrow\downarrow \right \rangle+\left. | \downarrow\uparrow \right \rangle \right )\). That is, the two particles are perfectly anti-correlated in the z direction.

We then send the particles in opposite directions to two detectors, A and B. These detectors measure the spin along axes at angles \(\alpha\) and \(\beta\) with respect to the z-axis respectively. Let us define \(p(\alpha,\beta)\) as +1 if the measured spins are the same (both up or both down) and -1 if they are different (one up, one down). \(P(\alpha,\beta)\) we then define as the average of p over many trials. Standard quantum theory predicts that \(P(\alpha,\beta)=-\cos(\alpha-\beta)\).

Suppose there are hidden variables that determine what will be measured. For simplicity, we consolidate them all, for the whole system, in the single variable \(\textbf{v}\). Let \(A(\alpha,\textbf{v})=1\) if the particle sent to A, which is set at angle \(\alpha\), with variables \(\textbf{v}\), will be found to have spin up, and similarly with \(A(\alpha,\textbf{v})=-1\) for spin down. Likewise with \(B(\beta,\textbf{v})=1\) and \(B(\beta,\textbf{v})=-1\) for detector B. Clearly \(p(\alpha,\beta,\textbf{v})=A(\alpha,\textbf{v})B(\beta,\textbf{v})\). Since the particles are perfectly anti-correlated when the detectors are aligned, we have \(A(\alpha,\textbf{v})=-B(\alpha,\textbf{v})\).

To average over many trials, we merely average over the different hidden variables, which are assumed to follow some sort of distribution, \(\rho(\textbf{v})\). Thus, we then have \[ P(\alpha,\beta)=\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v})A(\alpha,\textbf{v})B(\beta,\textbf{v})d\textbf{v} =-\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v})A(\alpha,\textbf{v})A(\beta,\textbf{v})d\textbf{v} \] \[ P(\alpha,\beta)-P(\alpha,\gamma)= -\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v}) \left [A(\alpha,\textbf{v})A(\beta,\textbf{v})-A(\alpha,\textbf{v})A(\gamma,\textbf{v}) \right] d\textbf{v} \] As \(A^2(\alpha,\textbf{v})\) for any input variables, we can write: \[ P(\alpha,\beta)-P(\alpha,\gamma)= -\int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v}) \left [1-A(\beta,\textbf{v})A(\gamma,\textbf{v}) \right]A(\alpha,\textbf{v})A(\beta,\textbf{v}) d\textbf{v} \] Given that \[ \left |\int_{R} f(\textbf{x})d\textbf{x} \right |\leq\int_{R} |f(\textbf{x})|d\textbf{x}, \;\;\; | A(\alpha,\textbf{v})|=1, \;\;\; \rho(\textbf{v}) \left [1-A(\beta,\textbf{v})A(\gamma,\textbf{v}) \right]\geq 0 \] We then have \[ |P(\alpha,\beta)-P(\alpha,\gamma)|\leq \int_{\mathrm{all}\; \textbf{v}} \rho(\textbf{v}) \left [1-A(\beta,\textbf{v})A(\gamma,\textbf{v}) \right] d\textbf{v} \] \[ |P(\alpha,\beta)-P(\alpha,\gamma)|\leq1+P(\beta,\gamma) \] Which is Bell's Inequality. This equality should be satisfied for all local hidden variable interpretations (the reason locality is required is to preclude instantaneous or faster-than-light signals being sent from one detector to the other). However, it is incompatible with standard quantum mechanics. For instance, let \(\alpha=0\), \(\beta=\pi/2\) and \(\gamma=\pi/4\). Then \[ |P(\alpha,\beta)-P(\alpha,\gamma) |=\tfrac{1}{\sqrt{2}}\nleq 1+P(\beta,\gamma) =1-\tfrac{1}{\sqrt{2}} \] As there have been experiments performed that violate Bell's inequality, this provides strong evidence against local hidden variables interpretations. However, there are some loopholes: for instance, superdeterminism, a sort of conspiracy theory that not only are the systems we study deterministically governed, but so are our experiments, including us, and are so as to make us observe statistical violations of Bell's inequality regardless.

An associated result called the Kochen-Specker (KS) Theorem shows that non-contextual hidden variable interpretations are incompatible with quantum mechanics. That is, interpretations in which the observables measured have a single definite value independent of how they are measured are incompatible with quantum mechanics. However, it leaves open the possibility for contextual hidden-variables interpretations, in which the manner of measurement is relevant to the obtained result.

One might think that one could use entanglements to send messages faster than light, given that the effects are instantaneous (The moment Alice's electron is observed to have spin up on the z-axis, Bob's electron will have spin down on the z-axis). However, a theorem called the no communication theorem shows that it is not possible for one observer, by measuring some subset of a system, to communicate information to another observer. While the effects may be instantaneous, they do not carry information, and it is only after the two observers meet up and compare results that they note that they have correlations that defy local realism.

The Quantum Zeno Effect

Suppose we have a particle that can be in one of two states (spinning up or down, decayed or not decayed). We can represent it as a 2 by 1 matrix. Suppose it begins in the state \[ \left. |\psi(0) \right \rangle=\begin{bmatrix} 1\\ 0 \end{bmatrix} \] If it is allowed to evolve by itself, its time dependent state is given by \[ \left. |\psi(t) \right \rangle=\begin{bmatrix} \alpha(t)\\ \beta(t) \end{bmatrix} \] Where the functions satisfy the condition stated above at t=0, and the state is properly normalized. Suppose that the other state is stable, i.e., that once it "flips" it stays "flipped".

Suppose \(|\beta(t)|^2\approx (t/\tau)^n\) for t close to 0, where \(\tau\) is some characteristic time of the system. Suppose we allow the state to evolve unperturbed for a length of time T (small relative to \(\tau\)), and then measure it. The probability that it will be found in the original state is simply \[ P_1=|\alpha(T)|^2\approx 1- (T/\tau)^n \] However, suppose, instead, that we measure it N times, after each time of length T/N. Then the chance that it will be found in the original state is the chance that it hadn't been found to have changed after any interval. That can be found by the usual methods of probability theory: \[ P_N=(|\alpha(T/N)|^2)^N\approx (1- \left (\tfrac{T}{N\tau} \right)^n)^N\approx e^{- \left(\tfrac{T}{\tau}\right)^n N^{1-n}} \] Thus, if \(n>1\), the probability tends to 1 as N increases, and if \(n<1\) the probability tends to 0 as N increases (if \(n=1\), the probability tends to an exponential function of time). Thus, if the probability changes in the appropriate way, watching a system repeatedly tends to keep it in the same state. Moreover, if the system is measured continuously, it would never change at all. This has lead some to remark that a quantum watched pot never boils. This phenomenon is called the quantum Zeno effect, after the philosophical paradoxes of a similar nature.

Quantum Teleportation and Indirect Entanglement

Suppose Alice and Bob are in separate locations, but connected by classical communication channels. They also each have one of a pair of entangled particles in the state \[ \tfrac{1}{\sqrt{2}}\left. |\uparrow \right \rangle_A\left. |\uparrow \right \rangle_B+\tfrac{1}{\sqrt{2}}\left. |\downarrow \right \rangle_A\left. |\downarrow \right \rangle_B \] Where the subscripts denote whose particle it is. Alice also has another particle in the arbitrary state \[ \alpha\left. |\uparrow \right \rangle_C+\beta\left. |\downarrow \right \rangle_C \] The state of the entire system can be written as \[ \tfrac{\alpha}{\sqrt{2}}\left. |\uparrow \uparrow \uparrow \right \rangle_{ABC}+ \tfrac{\alpha}{\sqrt{2}}\left. |\downarrow \downarrow \uparrow \right \rangle_{ABC}+ \tfrac{\beta}{\sqrt{2}}\left. |\uparrow \uparrow \downarrow \right \rangle_{ABC}+ \tfrac{\beta}{\sqrt{2}}\left. |\downarrow \downarrow \downarrow \right \rangle_{ABC} \] This can also be written in the form \[ \frac{1}{2} \begin{pmatrix} \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}+\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}}\left (\alpha\left. |\uparrow \right \rangle_{B}+\beta\left. |\downarrow \right \rangle_{B} \right ) + \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}-\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}}\left (\alpha\left. |\uparrow \right \rangle_{B}-\beta\left. |\downarrow \right \rangle_{B} \right ) \\ + \tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}+\left. |\downarrow \uparrow \right \rangle_{AC}}{\sqrt{2}} \left (\beta\left. |\uparrow \right \rangle_{B}+\alpha\left. |\downarrow \right \rangle_{B} \right ) +\tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}-\left. |\downarrow \uparrow \right \rangle_{AC}}{\sqrt{2}}\left (\beta\left. |\uparrow \right \rangle_{B}-\alpha\left. |\downarrow \right \rangle_{B} \right ) \end{pmatrix} \] Thus, if Alice measures her pair of particles to be in any of the four entangled states (all of which are mutually orthogonal, and so are completely distinguishable) \[ \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}+\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}},\; \tfrac{\left. |\uparrow \uparrow \right \rangle_{AC}-\left. |\downarrow \downarrow \right \rangle_{AC}}{\sqrt{2}},\; \tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}+\left. |\downarrow \uparrow \right \rangle_{AC}}{\sqrt{2}},\; \tfrac{\left. |\uparrow \downarrow \right \rangle_{AC}-\left. |\downarrow \uparrow \right \rangle_{AC}} {\sqrt{2}} \] Bob's state will become, respectively \[ \left (\alpha\left. |\uparrow \right \rangle_{B}+\beta\left. |\downarrow \right \rangle_{B} \right ),\; \left (\alpha\left. |\uparrow \right \rangle_{B}-\beta\left. |\downarrow \right \rangle_{B} \right ),\; \left (\beta\left. |\uparrow \right \rangle_{B}+\alpha\left. |\downarrow \right \rangle_{B} \right ),\; \left (\beta\left. |\uparrow \right \rangle_{B}-\alpha\left. |\downarrow \right \rangle_{B} \right ) \] It then suffices for Alice to communicate to Bob which entangled state she measured, and then Bob can apply an appropriate operator to put his particle in the state in which particle C was originally. Thus, the state has been teleported from Alice to Bob. Indeed, neither Alice nor Bob need know what particle C's original state was, though they can know that it was perfectly teleported. Note that the entangelment between Alice's and Bob's particles is, in the end, broken, and Alice's two particles are left entangled.

**********
Another example of the odd nature of entanglement can be demonstrated with the following. Suppose we have two independent sources that produce the entangled particle pairs \[ \tfrac{\left. |\uparrow \uparrow \right \rangle_{AB}+\left. |\downarrow \downarrow \right \rangle_{AB}}{\sqrt{2}},\;\; \tfrac{\left. |\uparrow \uparrow \right \rangle_{CD}+\left. |\downarrow \downarrow \right \rangle_{CD}}{\sqrt{2}} \] Particle A is sent to Alice, D to Dave, and B and C to Becca. We can write the total state of the system as follows \[ \tfrac{1}{2}\left ( \left. |\uparrow \uparrow\uparrow \uparrow \right \rangle_{ABCD}+ \left. |\uparrow \uparrow\downarrow \downarrow \right \rangle_{ABCD}+ \left. |\downarrow \downarrow\uparrow \uparrow \right \rangle_{ABCD}+ \left. |\downarrow \downarrow\downarrow \downarrow \right \rangle_{ABCD} \right ) \] Alternatively, we could write it the following, equivalent way \[ \frac{1}{2} \begin{pmatrix} \tfrac{\left. |\uparrow \uparrow \right \rangle_{BC}+\left. |\downarrow \downarrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \uparrow \right \rangle_{AD}+\left. |\downarrow \downarrow \right \rangle_{AD}}{\sqrt{2}}+ \tfrac{\left. |\uparrow \uparrow \right \rangle_{BC}-\left. |\downarrow \downarrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \uparrow \right \rangle_{AD}-\left. |\downarrow \downarrow \right \rangle_{AD}}{\sqrt{2}} \\ + \tfrac{\left. |\uparrow \downarrow \right \rangle_{BC}+\left. |\downarrow \uparrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \downarrow \right \rangle_{AD}+\left. |\downarrow \uparrow \right \rangle_{AD}}{\sqrt{2}} + \tfrac{\left. |\uparrow \downarrow \right \rangle_{BC}-\left. |\downarrow \uparrow \right \rangle_{BC}}{\sqrt{2}} \tfrac{\left. |\uparrow \downarrow \right \rangle_{AD}-\left. |\downarrow \uparrow \right \rangle_{AD}}{\sqrt{2}} \end{pmatrix} \] Thus, if Becca measures to see if her two particles are in any of the standard entangled states, as in the quantum teleportation setup, Alice's and Dave's particles will become entangled, and in the same entangled state as Becca's particles, no less. Becca can disentangle her particles from Alice's and Dave's, while entangling Alice's and Dave's particles, which initially bore no relation to one another. In this way, two particles can become entangled without ever having interacted, so entanglement need not require interaction.

Spatial Phenomena

Let us look at the case of an electron in an infinite quantum well. That is, an electron in the potential that has the form \[ V(x)=\left\{\begin{matrix} 0\;\;\;0\leq x \leq L \\ \infty\;\;\mathrm{o.w.} \end{matrix}\right. \] As the wavefunction must be continuous, and clearly the wavefunction is zero outside the well, we have \(\psi(0)=0\) and \(\psi(L)=0\). Let us suppose the wavefunction is in an energy eigenstate. In that case, we solve the time-independent Schrodinger equation inside the well: \[ E\psi(x)=\frac{-\hbar^2}{2m}\frac{\partial^2 }{\partial x^2}\psi(x) \] This has the solutions \[ \psi(x)=A\cos(\lambda x)+B\sin(\lambda x) \] Where \(\lambda=\sqrt{2mE}/\hbar\). From the condition that \(\psi(0)=0\), \(A=0\). From the condition that \(\psi(L)=0\), \(\lambda=n\pi/L\), where n is a positive integer. From the normalization condition, we have \(B=\sqrt{2/L}\). Thus \[ \psi_n(x)=\sqrt{\frac{2}{L}}\sin\left ( \frac{n\pi x}{L} \right ) \] And the corresponding energy is \[ E_n=n^2\frac{\pi^2 \hbar^2}{2m L^2} \] Note that the energy is quantized, that is, it is only ever found to have a value in this discrete set of values. This feature is common in quantum mechanics: the boundary conditions, or conditions for convergence will restrict certain observables to fall into a discrete set. A related phenomenon is when the set of possible values for an observable falls into a fragmented set, of the form \([a_1,a_2]\cup[a_3,a_4]\cup...\) where the a's are strictly increasing. In such a case, the system will have allowed bands, and will need sizable "kicks" to get over the gaps. This is the basis for how transistors work.

Note also that in the case of the quantum well, all the eigenfunctions are orthogonal and form a complete set. An arbitrary initial wavefunction \(\psi(x,0)\) will, at time t be equal to \[ \psi(x,t)=\frac{2}{L}\sum_{n=1}^{\infty} c_n \sin\left ( \frac{n\pi x}{L} \right ) e^{-itE_n/\hbar} , \;\; c_n=\int_{0}^{L}\psi(x,0)\sin\left ( \frac{n\pi x}{L}\right )dx \] We can also see something of the correspondence principle, namely, that for high energies, the probability distribution is nearly uniform in the well (it oscillates, as it goes above and below its average value, but the scale of the oscillations, for high enough energy, is imperceptible at macroscopic scales). Classically, for a particle bouncing back and forth in such a well, we would expect a uniform distribution (supposing we didn't know where the particle began).

Another result from this, which is true of quantum systems in general, is that even in the lowest energy state, when the most energy possible has been removed (the system is as "cold" as possible), the energy is non-zero. This is called the zero-point energy. Thus, even at "absolute" zero, the electron would still not be motionless, since, in this case \(0< E_1 =\left \langle p^2 \right \rangle/2m\), and so the root-mean-square momentum would be non-zero.

An important case of a sort of quantum well is the atom, in which the nucleus attracts the electrons and so confines them. In the case of the atom, there are likewise quantized energy states. Since these are stationary states, the wavefunction does not vary with time, and so the effective charge density likewise is constant. This explains why the atom does not radiate energy, as it would in the classical case. However, in the case of the atom, which is necessarily a three-dimensional system, the states are also quantized with respect to angular momentum.

**********
Another interesting phenomenon is that of quantum tunneling. Suppose we have a particle moving in the +x direction impinging on a finite barrier of the form \[ V(x)=\left\{\begin{matrix} \tfrac{\hbar^2 q}{2m}\;\;\;0\leq x \leq L \\ 0\;\;\;\;\; \mathrm{o.w.} \end{matrix}\right. \] Let us call the regions before in and beyond the barrier regions I, II, and III respectively. Suppose it initially has momentum \(\hbar k\). Its energy will be given by \(\hbar^2 k^2/2m\), and further suppose that this energy is less than the potential barrier. Solving the schrodinger equation inside the barrier, we easily find that the wavefunction will be of the form \(A e^{\lambda x}+Be^{-\lambda x}\), where \(\lambda=\sqrt{q-k^2}\).

Then we can write the wavefunction (ignoring normalization) in the three regions as \[ \psi(x)=\left\{\begin{matrix} A_1 e^{ikx}+B_1 e^{-ikx} \;\;\;\, \mathrm{ I} \\ A_2 e^{\lambda x}+B_2 e^{-\lambda x} \;\;\;\;\; \mathrm{ II} \\ A_3 e^{ikx}+B_3 e^{-ikx} \;\;\;\;\; \mathrm{III} \end{matrix}\right. \] However, \(B_3=0\), since that term corresponds to a wave moving to the left, which would not happen in the case of an incident wave going in the +x direction. The other coefficients can be found by ensuring that the wavefunction and its derivative are continuous. In particular, we find that \[ T=|A_1/A_3|^2=\frac{1}{1+\tfrac{q^2}{4k^2(q-k^2)}\sinh(\lambda L)} \] This represents the probability that the particle will be found on the opposite side of the barrier. Note that, contrary to classical mechanics, there is a definite, non-zero probability of finding the particle on the opposite side of the barrier. This feature of particles doing classically impossible things is a frequent characteristics of quantum mechanics. This phenomenon helps explain why the sun continues to fuse hydrogen even though it is not hot enough for the atoms to overcome the electrostatic repulsion, as the particles have a probability of tunneling through the classically forbidden region.

Similarly, we can see that, even if the particle did have enough energy to cross the barrier, there is not a 100% chance of finding it on the other side of the barrier. Just as a particle may sometimes cross a classically forbidden barrier, sometimes it fails to cross a classically allowed barrier.

The Double Angle Formula

2015-12-25T11:26:00.001-08:00

Deriving the formula: \(\sin(2x)=2\sin(x)\cos(x)\)

Way 1: From Geometry

\[ RB=QA \;\;\;\;\;\;\;\;\;\; RQ=BA \] \[ \frac{RQ}{PQ}=\frac{QA}{OQ}=\sin(\alpha) \;\;\;\;\;\;\;\; \frac{PR}{PQ}=\frac{OA}{OQ}=\cos(\alpha) \] \[ \frac{PQ}{OP}=\sin(\beta) \;\;\;\;\;\;\;\; \frac{OQ}{OP}=\cos(\beta) \] \[ \frac{PB}{OP}=\sin(\alpha+\beta) \;\;\;\;\;\;\;\; \frac{OB}{OP}=\cos(\alpha+\beta) \] \[ PB=PR+RB=\frac{OA}{OQ}PQ+QA \] \[ \frac{PB}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OP}=\frac{OA}{OQ}\frac{PQ}{OP}+\frac{QA}{OQ}\frac{OQ}{OP} \] \[ \sin(\alpha+\beta)=\cos(\alpha)\sin(\beta)+\sin(\alpha)\cos(\beta) \] Particularly, if \(\alpha=\beta=x, \;\;\;\; \sin(2x)=2\sin(x)\cos(x)\).

Way 2: From the Product Formula

Recall from this post that the product formulas for sine and cosine are, respectively: \[ \sin(x)=x\prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 n^2} \right ) \] And \[ \cos(x)=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \] Thus \[ \sin(2x)=2x\prod_{n=1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) =2\cdot x\prod_{n=\mathrm{even}\geq1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) \cdot \prod_{n=\mathrm{odd}\geq1}^{\infty}\left ( 1-\frac{4 \cdot x^2}{\pi^2 n^2} \right ) \] \[ \sin(2x) =2\cdot x\prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 n^2} \right ) \cdot \prod_{n=1}^{\infty}\left ( 1-\frac{x^2}{\pi^2 (n-1/2)^2} \right ) \] \[ \sin(2x)=2\cdot \sin(x) \cdot \cos(x) \]

Way 3: From the Taylor Series

The Taylor series for sine and cosine can be construed as, respectively: \[ \frac{\sin(\sqrt{x})}{\sqrt{x}}=\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k+1)!}x^k \] \[ \cos(\sqrt{x})=\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^k \] Thus \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{j=0}^{\infty}\frac{(-1)^j}{(2j+1)!}x^j \sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^k \] Using a Cauchy product, we find: \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{j=0}^{\infty}c_j x^j \] Where \[ c_m=\sum_{n=0}^{m} \frac{(-1)^n}{(2n+1)!}\frac{(-1)^{m-n}}{(2(m-n))!} =\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{m} \binom{2m+1}{2n+1} =\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{m} \binom{2m}{2n+1}+\binom{2m}{2n} \] \[ c_m=\frac{(-1)^m}{(2m+1)!}\sum_{n=0}^{2m} \binom{2m}{n}=\frac{(-1)^m}{(2m+1)!}2^{2m} \] And thus \[ \frac{\sin(\sqrt{x})\cos(\sqrt{x})}{\sqrt{x}}=\sum_{m=0}^{\infty}\frac{(-1)^m}{(2m+1)!}(4x)^m=\frac{\sin(\sqrt{4x})}{\sqrt{4x}}=\frac{\sin(2\sqrt{x})}{2\sqrt{x}} \] Substituting \(x=y^2\) and rearranging, we find: \( 2\sin(y)\cos(y)=\sin(2y) \)

Way 4: From Euler's Formula

Euler's formula is: \[ e^{ix}=\cos(x)+i\sin(x) \] Thus \[ e^{i2x}=\cos(2x)+i\sin(2x)=\left ( e^{ix} \right)^2=\left (\cos(x)+i\sin(x) \right )^2 \] \[ e^{i2x}=\left [\cos^2(x)-\sin^2(x) \right ]+i\left [ 2\sin(x)\cos(x) \right ] \] Thus, by equating real and imaginary parts, \(\sin(2x)=2\sin(x)\cos(x)\) and \(\cos(2x)=\cos^2(x)-\sin^2(x)\)

The Half-Angle Formulas

We find from the last demonstration \[ \cos(2x)=\cos^2(x)-\sin^2(x)=2\cos^2(x)-1=1-2\sin^2(x) \] Substituting \(2x=y\) and solving, we find: \[ \sin\left ( \frac{y}{2} \right )=\sqrt{\frac{1-\cos(y)}{2}} \] \[ \cos\left ( \frac{y}{2} \right )=\sqrt{\frac{1+\cos(y)}{2}} \]

An Infinite Product Formula

We can write the double-angle formula as \[ \sin(x)=2\sin\left ( \frac{x}{2} \right )\cos\left ( \frac{x}{2} \right ) \] Iterating this, we then have \[ \sin(x)=2^n\sin\left ( \frac{x}{2^n} \right ) \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] However, in the limit as n gets large, \(2^n\sin\left ( \frac{x}{2^n} \right )\rightarrow x\). Thus, letting n go to infinity, we have \[ \sin(x)=x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] A simple theorem of this general result is \[ \frac{\pi}{2}=\frac{1}{\cos(\tfrac{\pi}{4})\cos(\tfrac{\pi}{8})\cos(\tfrac{\pi}{16})\cdots } =\frac{1}{\sqrt{\tfrac{1}{2}}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}}}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}+\tfrac{1}{2}\sqrt{\tfrac{1}{2}}}}\cdots }=\frac{2}{\sqrt{2}}\frac{2}{\sqrt{2+\sqrt{2}}}\frac{2}{\sqrt{2+\sqrt{2+\sqrt{2}}}}\cdots \] This is known as Viète's formula.

A Nested Radical Formula

We note that \[ 2\cos(x/2)=\sqrt{2+2\cos(x)} \] Thus, by iterating, we find \[ 2\cos(x/2^n)=\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}} \] Thus \[ 2\sin(x/2^{n+1})=\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}}} \] And we can thus conclude that \[ x=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+2\cos(x)}}}}}} \] For example \[ \pi/3=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2+1}}}}}} \] \[ \pi/2=\underset{n\rightarrow \infty}{\lim} 2^n\sqrt{2-\underset{n\;\, \mathrm{radicals}}{\underbrace{\sqrt{2+\sqrt{2+...\sqrt{2}}}}}} \]

An Infinite Series

Above, we derived \[ \sin(x)=x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \] Taking the log of both sides and differentiating \[ \frac{\mathrm{d} }{\mathrm{d} x}\ln\left (\sin(x) \right )=\frac{\mathrm{d} }{\mathrm{d} x}\ln\left (x \prod_{k=1}^{n}\cos\left ( \frac{x}{2^k} \right ) \right ) \] \[ \cot(x)=\frac{1}{x}-\sum_{k=1}^{\infty}\frac{1}{2^k}\tan \left ( \frac{x}{2^k} \right ) \] \[ \\ \frac{1}{x}-\cot(x)=\sum_{k=1}^{\infty}\frac{1}{2^k}\tan \left ( \frac{x}{2^k} \right ) \] From this we can easily derive \[ \frac{1}{\pi}=\sum_{k=2}^{\infty}\frac{1}{2^k}\tan \left ( \frac{\pi}{2^k} \right ) \]

A Definite Integral

Let \[ I=\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )dx =\int_{\pi/2}^{\pi}\ln\left ( \sin(x) \right )dx =\int_{0}^{\pi/2}\ln\left ( \cos(x) \right )dx \] Then \[ 2I=\int_{0}^{\pi}\ln\left ( \sin(x) \right )dx =2\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )dx =\int_{0}^{\pi/2}\ln\left ( \sin(x) \right )+\ln\left ( \cos(x) \right )dx \] \[ 2I=\int_{0}^{\pi/2}\ln\left ( \sin(x) \cos(x) \right )dx=\int_{0}^{\pi/2}\ln\left (\tfrac{1}{2} \sin(2x) \right )dx=-\frac{\pi}{2}\ln(2)+\int_{0}^{\pi/2}\ln\left (\sin(2x) \right )dx \] By the substitution \(u=2x\), we then have \[ 2I=-\frac{\pi}{2}\ln(2)+\tfrac{1}{2}\int_{0}^{\pi}\ln\left (\sin(u) \right )du=-\frac{\pi}{2}\ln(2)+I \] Therefore \[ I=\int_{0}^{\pi/2}\ln\left (\sin(x) \right )dx=-\frac{\pi}{2}\ln(2) \]

Some Introductory Quantum Mechanics: Mathematico-Theoretical Background

2015-12-15T20:17:00.000-08:00

Quantum mechanics (QM), being a novel and revolutionary framework for describing phenomena, requires a substantially different mathematical tool-set and way of thinking about physical systems and objects. There is dispute over how exactly to interpret the mathematical system used, but we will not discuss here the various interpretations. Rather, we will just describe and examine the framework and how it can be used to make predictions, all of which is agreed upon.

This will be a multi-part series giving a general introduction to quantum theory. This is part 2.

Hilbert, State, and Dual Spaces

Hilbert space is a generalized vector space: a sort of extended analog of the usual Euclidean space. Elements of a Hilbert space are sorts of vectors, and are denoted using a label (basically just a name) and some indication of vector-hood. We will use "bra-ket notation", in which elements of the vector space are denoted as\(\left | \phi \right >\) (a ket) (\(\phi\) is merely a label. We may sometimes use numbers, or other symbols, but these are all merely labels). Every such element has a corresponding "sister" in what is called the dual space, which is denoted by \(\left < \phi \right |\) (a bra). (The name is basically a joke: two halves of the word "bracket"). The use of the dual space will become apparent in our later discussion. In general, and in QM especially, the vector space is complex, meaning the vector's "components" (loosely speaking) are complex numbers.

Inner Products

To be a Hilbert space, there must also be an inner product, or a way of associating a complex number to each pair of vectors (the order may be important: the inner product of A and B need not be the same as that of B and A). The inner product of \(\left | \phi \right > \) and \(\left | \psi \right > \) is denoted by \(\left \langle \psi \right | \left. \phi \right \rangle\), that is the dual of \(\left | \psi \right > \) acting on \(\left | \phi \right > \). In particular, to be a Hilbert space, we must have that if \(\left \langle \psi \right | \left. \phi \right \rangle = z \), \(\left \langle \phi \right | \left. \psi \right \rangle = \bar{z} \), that is, the complex conjugate. If \(\left | \phi \right \rangle= r \left | \psi \right \rangle\) then \(\left \langle \phi \right |= \bar{r} \left \langle \psi \right |\). Also, we must have \(\left \langle \psi \right | \left. \psi \right \rangle \geq 0\), with equality holding iff \(\left | \psi \right >\) is the zero vector. Clearly \(\left \langle \psi \right | \left. \psi \right \rangle \) will be real.

Beyond this, the inner product is linear. In general, if \(\left | \phi \right \rangle= a\left | \alpha \right \rangle+b\left | \beta \right \rangle \) and \( \left | \psi \right \rangle= c\left | \gamma \right \rangle+d\left | \delta \right \rangle \), then we have: \[ \left \langle \psi \right | \left. \phi \right \rangle =a\bar{c}\left \langle \gamma \right | \left. \alpha \right \rangle + a\bar{d}\left \langle \delta \right | \left. \alpha \right \rangle + b\bar{c}\left \langle \gamma \right | \left. \beta \right \rangle + b\bar{d}\left \langle \delta \right | \left. \beta \right \rangle \] We can also prove the famous Cauchy-Schwartz Inequality, namely, that: \[ \left |\left \langle \psi \right | \left. \phi \right \rangle \right |^2 \leq \left \langle \psi \right | \left. \psi \right \rangle \left \langle \phi \right | \left. \phi \right \rangle \] Two vectors \(\left | \phi \right > \) and \(\left | \psi \right > \) are said to be orthogonal if \(\left \langle \psi \right | \left. \phi \right \rangle=0\). A vector is said to be normal or normalized if \(\left \langle \phi \right | \left. \phi \right \rangle =1\). If we have a set of vectors \({| \left. \phi_1 \right \rangle} , {| \left. \phi_2 \right \rangle} , {| \left. \phi_3 \right \rangle},...\) such that \( \left \langle \phi_j \right. | \left. \phi_k \right \rangle = 0 \) for all \(j \neq k\), then the set is called orthogonal set. If it is also the case that \( \left \langle \phi_k \right. | \left. \phi_k \right \rangle = 1 \) for all k, then the set is called orthonormal.

Operators

An operator is something which acts on a vector to produce another vector: \(A \left | \phi \right \rangle= \left | \phi' \right \rangle\). The operator \(A\) is linear if, for any \(\left | \phi \right \rangle= a\left | \alpha \right \rangle+b\left | \beta \right \rangle\), we have \( A\left | \phi \right \rangle= a A\left | \alpha \right \rangle+b A\left | \beta \right \rangle \).
Let \(A \left | \phi \right \rangle= \left | \phi' \right \rangle\) and \(B \left | \psi \right \rangle= \left | \psi' \right \rangle\). If \(\left \langle \psi \right | \left. \phi' \right \rangle=\left \langle \psi' \right | \left. \phi \right \rangle\) then A and B are called conjugate operators, denoted \(A=B^{\dagger}\) and \(B=A^{\dagger}\), so \(A=\left (A^{\dagger} \right )^\dagger\). We also have \(\left \langle \phi' \right |= \left \langle \phi \right | A^\dagger\). If \(A=A^\dagger\), then A is called Hermitian. If \(A=-A^\dagger\), then A is called anti-Hermitian. If \(\left \langle \psi' \left | \right. \phi'\right \rangle = \left \langle \psi \left | \right. \phi\right \rangle \), for all pairs of vectors, then A is called unitary.
We also have the following properties: \[ (A+B)\left | \phi \right \rangle= A\left | \phi \right \rangle+B\left | \phi \right \rangle \] \[ AB\left | \phi \right \rangle= A\left (B\left | \phi \right \rangle \right ) \] Note that it is not necessarily the case that \[ AB\left | \phi \right \rangle= BA\left | \phi \right \rangle \] That is, operators need not commute. In fact, we commonly use the notation \([A,B]=AB-BA\) (this is called the commutator of A and B). Non-commutativity will play an important role in the theory.

For a given A, in some cases, for certain \(\left | \phi \right \rangle\), we have that \(A\left | \phi \right \rangle= \lambda \left | \phi \right \rangle \) for some constant \(\lambda\). In this case, we call \(\lambda\) an eigenvalue of the operator A and \(\left | \phi \right \rangle\) the corresponding eigenvector.
Often it is the case that we can find a set of orthonormal vectors that are the eigenvectors of a given linear operator, such that we can also write any vector as a linear sum of the eigenvectors. In that case, \[| \left. \psi \right \rangle = a_1 | \left. \phi_1 \right \rangle +a_2 | \left. \phi_2 \right \rangle+a_3 | \left. \phi_3 \right \rangle+...\]where \(a_k=\left \langle \phi_k \right. | \left. \psi \right \rangle\) (\(a_k\) is called the projection of \(\psi\) into the \(\phi_k\) direction). Then \[\left \langle \psi\left. \right | \psi \right \rangle=|a_1|^2+|a_2|^2+|a_3|^2+...\] \[A\left| \psi \right \rangle = \lambda_1 a_1 | \left. \phi_1 \right \rangle + \lambda_2 a_2 | \left. \phi_2 \right \rangle+\lambda_3 a_3 | \left. \phi_3 \right \rangle+...\] \[\left \langle \psi \right | A\left| \psi \right \rangle = \lambda_1 \left |a_1 \right |^2 + \lambda_2 \left |a_2 \right |^2+\lambda_3 \left |a_3 \right |^2 +...\] If the operator is also Hermitian, then we call it an observable. Particularly, if an operator is Hermitian, all its eigenvalues are real.
If \(| \left. \psi \right \rangle \) is normalized, then we can use the notation \(\left \langle A \right \rangle_\psi=\left \langle \psi\left | A \right |\psi \right \rangle\) and \(\sigma^2_A=\left \langle A^2 \right \rangle_\psi-\left \langle A \right \rangle^2_\psi\).

Postulates of Quantum Mechanics

Given that mathematical background, we can now lay out the fundamental postulates of QM. Exactly how to interpret these postulates will be left for later discussion.

Wavefunction Postulate
The state of a physical system at a given time is defined by a wavefunction which is a ket vector in the Hilbert space of possible states. Generally, the vector is required to be normalized.
Observable Postulate
Every physically measurable quantity corresponds to an observable operator that acts on the vectors in the Hilbert space of possible states.
Eigenvalue Postulate
The possible results of a measurement of a physically measurable quantity are the eigenvalues of the corresponding observable.
Probability Postulate
Suppose the set of orthonormal eigenvectors of observable A \({| \left. \phi_{k_1} \right \rangle} , {| \left. \phi_{k_2} \right \rangle} , {| \left. \phi_{k_3} \right \rangle},...\) all have eigenvalue \(\lambda\). Suppose the initial wavefunction can be written as \(| \left. \psi \right \rangle = a_1 | \left. \phi_1 \right \rangle +a_2 | \left. \phi_2 \right \rangle+a_3 | \left. \phi_3 \right \rangle+...\) (i.e. the linear sum of orthonormal eigenvectors of A). Note that \(\psi\) is a superposition of other eigenstates. That is, it is a sort of combination of states that have definite properties. Each eigenstate has a well-defined value for the observable, but \(\psi\) does not.
The probability of measuring the observable to have the value \(\lambda\) is given by \(P(\lambda)=\left | a_{k_1} \right |^2+\left | a_{k_2} \right |^2+\left | a_{k_3} \right |^2+...\). More simply, if no two eigenvectors have the same eigenvalue, then the probability that we will measure the observable to have value \(\lambda_k\) is \(| \left \langle \phi_k\left | \right. \psi\right \rangle |^2\). This is called the Born Rule.
Given this, it is easy to see that \(\left \langle A\right \rangle_\psi=\left \langle \psi \left | A \right | \psi\right \rangle\) is the expected value of the operator A.
Collapse Postulate
Immediately after measurement, the wavefunction becomes the normalized projection of the prior wavefunction onto the sub-space of values that give the measured eigenvalue. That is, using the above description, the wavefunction immediately after measurement becomes \(\alpha \cdot( a_{k_1}| \left. \phi_{k_1}\right \rangle +a_{k_2}| \left. \phi_{k_2}\right \rangle+a_{k_3}| \left. \phi_{k_3}\right \rangle +...)\) where \(\alpha\) is a suitable normalization constant, chosen to make the resulting vector normalized. More simply, if no two eigenvectors have the same eigenvalue, then the wavefunction immediately after we measure the observable to have value \(\lambda_k\) is \(| \left. \psi \right \rangle=| \left. \phi_k \right \rangle\).
Evolution Postulate
The time-evolution of the wavefunction, in the absence of measurement, is given by the time-dependent Schrodinger Equation: \[ \hat{E} \left.|\psi \right \rangle=\hat{H}\left.|\psi \right \rangle \] Where \(\hat{E}\) is the energy operator, which is given by \(i \hbar \frac{\partial }{\partial t}\), and \(\hat{H}\) is the Hamiltonian operator, which is defined analogously as in classical mechanics. In particular, it is the sum of the kinetic and potential energy operators.

Spatial Dimensions

A common Hilbert space to use is that of functions of one spatial dimension and time. This is an example of an infinite dimensional Hilbert space (at any x-coordinate, the wavefunction could take on a completely independent value). We often speak of eigenfunctions instead of eigenvectors in such a space. In this Hilbert space, we define the inner product of two wavefunctions to be \[\left \langle \phi\left | \right. \psi\right \rangle =\int_{-\infty}^{\infty}\bar{\phi}(x,t)\psi(x,t)dx\]. The momentum operator in the x-direction is given by \(P_x=\frac{\hbar}{i}\frac{\partial }{\partial x}\). The position operator is quite simply \(X=x\). The (un-normalized) eigenfunctions for each are easily found to be, respectively \[ \left. | \psi\right \rangle_p=e^{ipx/\hbar} \] \[ \left. | \psi\right \rangle_{x_0}=\sqrt{\delta(x-x_0)} \]
The classical kinetic energy is given by \(E_k=\frac{1}{2}mv^2=\frac{p^2}{2m}\). The potential energy is given simply by \(E_p=V(x,t)\), that is, merely a specification of the potential energy as a function of position and possibly time. Thus, the time-dependent Schrodinger Equation can be written as \[ i \hbar \frac{\partial }{\partial t} \left.|\psi \right \rangle=\left ( \frac{-\hbar ^2}{2m} \frac{\partial^2 }{\partial x^2}+V(x,t) \right)\left.|\psi \right \rangle \] If the wavefunction is an eigenfunction of energy, with eigenvalue E, then its energy does not change with time and we can write the time-independent Schrodinger Equation: \[ E \left.|\psi \right \rangle=\left ( \frac{-\hbar ^2}{2m} \frac{\partial^2 }{\partial x^2}+V(x,t) \right)\left.|\psi \right \rangle \] That is, \(\psi\) is an eigenfunction of the Hamiltonian. We can often then solve this to find not only the wavefunction solutions, but the energy solutions: often such an equation will only be soluble with a discrete set of possible energies. The conditions of normalizability and normalization, as well as boundary conditions contribute toward determining energies and solutions.
The extension to multiple dimensions follows analogously.

Spin

The Hilbert space to describe the spin state of an electron (or other spin 1/2 particle) is typically that of a two-by-one matrix. That is, a ket will be of the form \[ \left. |\psi \right \rangle= \begin{pmatrix} a\\ b \end{pmatrix} \] And the corresponding bra will be \[ \left \langle \psi | \right.= \begin{pmatrix} \bar{a} & \bar{b} \end{pmatrix} \] The condition for normalization is that \(|a|^2+|b|^2=1\). A similar description can be used for polarization for photons. The operators for spin in the x, y and z directions, are, respectively: \[ S_x=\frac{\hbar}{2}\begin{pmatrix} 0 & 1\\ 1 & 0 \end{pmatrix} \] \[ S_y=\frac{\hbar}{2}\begin{pmatrix} 0 & -i\\ i & 0 \end{pmatrix} \] \[ S_z=\frac{\hbar}{2}\begin{pmatrix} 1 & 0\\ 0 & -1 \end{pmatrix} \] All of these have eigenvalues \(+\frac{\hbar}{2}\) and \(-\frac{\hbar}{2}\), with corresponding eigenvectors: \[ \left. |+x \right \rangle=\left. |+ \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} 1\\ 1 \end{pmatrix},\; \; \left. |-x \right \rangle=\left. |- \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} 1\\ -1 \end{pmatrix} \] \[ \left. |+y \right \rangle=\left. |\rightarrow \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} -i\\ 1 \end{pmatrix},\; \; \left. |-y \right \rangle=\left. |\leftarrow \right \rangle=\frac{1}{\sqrt{2}}\begin{pmatrix} 1\\ i \end{pmatrix} \] \[ \left. |+z \right \rangle=\left. |\uparrow \right \rangle=\begin{pmatrix} 1\\ 0 \end{pmatrix},\; \; \left. |-z \right \rangle=\left. |\downarrow \right \rangle=\begin{pmatrix} 0\\ 1 \end{pmatrix} \]

Multiple Particles

In the case of more than one particle, we can construct a total wavefunction by composing those of each particle. For instance, if we have two particles, the first with spin up and the second with spin down, we can write that in a variety of ways. For instance: \[ \left. |\uparrow \right \rangle_1 \otimes \left. |\downarrow \right \rangle_2=\left. |\uparrow \right \rangle_1\left. |\downarrow \right \rangle_2=\left. |\uparrow \downarrow \right \rangle \] Clearly this case can be described in a way that treats each particle separately: the first particle is in one state and the second particle is in another state. However, sometimes it can be the case that the total wavefunction cannot be described in such a way. For instance: \[ \left. |\psi \right \rangle=\frac{1}{\sqrt{2}}\left ( \left. |\uparrow \downarrow \right \rangle +\left. | \downarrow \uparrow \right \rangle \right ) \] In this case, if we measure the first particle to have spin up, the wavefunction collapses to the state \(\left. |\uparrow \downarrow \right \rangle\). This is an example of entanglement, which is where two objects' states cannot be independently described.

Stirling's Approximation: Derivation and Corollaries

2015-11-03T08:10:00.001-08:00

Lemma 1: \(\lim_{n \rightarrow \infty} \sqrt[n]{n!}/n=1/e\)

Way 1 (somewhat rigorous)

From elementary calculus, we have that: \[ \int_{0}^{1} \ln(x) dx =-1 \] Taking this as a Riemann sum, as done in introductory calculus, we have: \[ -1=\int_{0}^{1}\ln(x)dx=\lim_{N \rightarrow \infty} \sum_{k=1}^{N}\ln\left (\frac{k}{N} \right ) \cdot \frac{1}{N} \] \[ -1=\lim_{N \rightarrow \infty} -\ln(N)+\frac{1}{N} \sum_{k=1}^{N}\ln\left (k \right ) \] \[ -1=\lim_{N \rightarrow \infty} -\ln(N)+\frac{1}{N} \ln\left (N! \right ) \] Therefore, \[ \lim_{N \rightarrow \infty} \frac{\sqrt[N]{N!}}{N}=\frac{1}{e} \]

Way 2 (less rigorous)

\[ \lim_{n \rightarrow \infty} \frac{\sqrt[n]{n!}}{n}=x \] So, for n big, in a certain sense: \[ n! \approx (nx)^n \] \[ \frac{(n+1)!}{n!(n+1)}=1 \approx \frac{((n+1)x)^{n+1}}{(nx)^n (n+1)}=\left ( 1+ \frac{1}{n} \right )^n x \] Thus, in order to get equality in the limit, we must have: \[ x = \lim_{n \rightarrow \infty} \left ( 1+ \frac{1}{n} \right )^{-n}=\frac{1}{e} \]

Lemma 2: Wallis Product in Factorial Form

Recall from this article the following expression for pi: \[ \frac{\pi}{2}=\prod_{k=1}^{\infty}\frac{2k \cdot 2k}{(2k-1)(2k+1)}=\lim_{N \rightarrow \infty}\prod_{k=1}^{N}\frac{2k \cdot 2k}{(2k-1)(2k+1)}=\lim_{N \rightarrow \infty} \frac{\left ( 2^N \cdot N! \right )^4}{\left ( (2N)! \right )^2(2N+1)} \]

Lemma 3: An Inequality for the Natural Logarithm

Let \(x,y > 0\). Clearly \[ 0 \leq \frac{1}{y^2 (1+y)^2 (2y+1)^2} \] Therefore \[ 0 \leq \int_{x}^{\infty}\frac{dy}{y^2 (1+y)^2 (2y+1)^2}=\frac{1}{x}+\frac{1}{x+1}+\frac{4}{x+1/2}-6\ln \left ( 1+\frac{1}{x} \right ) \] \[ 6\ln \left ( 1+\frac{1}{x} \right ) -\frac{6}{x+1/2} \leq \frac{1}{x}+\frac{1}{x+1}-\frac{2}{x+1/2} \] \[ (x+\tfrac{1}{2})\ln \left ( 1+\frac{1}{x} \right ) -1 \leq \frac{(x+\tfrac{1}{2})}{6}\left (\frac{1}{x}+\frac{1}{x+1} \right )-\frac{1}{3}=\frac{1}{12x(x+1)} \] Also, clearly \[ 0 \leq \frac{16y^2+41y+24}{y(1+y)^2 (2+y)^2 (2y+1)^2} \] Therefore \[ 0 \leq \int_{x}^{\infty}\frac{16y^2+41y+24}{y(1+y)^2 (2+y)^2 (2y+1)^2} dy=6\left (\ln \left ( 1+\frac{1}{x} \right )-\frac{1}{x+\tfrac{1}{2}} \right)-\frac{1}{2(x+\tfrac{1}{2})(x+1)(x+2)} \] And so \[ \frac{1}{12(x+1)(x+2)} \leq (x+\tfrac{1}{2})\ln \left ( 1+\frac{1}{x} \right )-1 \]

Theorem: Stirling's Approximation

Let us define a function and sequence of coefficients as follows: \[ g(n)=\ln\left ( \frac{n!}{\left ( \tfrac{n}{e} \right )^n \sqrt{2\pi n}} \right )=\sum_{k=-\infty}^{\infty} A_k n^k \] We then have, from lemma 1, \[ \frac{1}{e}=\lim_{n \rightarrow \infty} \frac{\sqrt[n]{n!}}{n}=\lim_{n \rightarrow \infty} \frac{\sqrt[n]{\left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{g(n)}}}{n}=\frac{1}{e} \lim_{n \rightarrow \infty} \sqrt[2n]{2\pi n} \cdot e^{g(n)/n} \] Thus \[ 1=\lim_{n \rightarrow \infty} e^{g(n)/n}=\exp\left (\lim_{n \rightarrow \infty} \sum_{k=-\infty}^{\infty} A_k n^{k-1} \right )=\exp\left (\lim_{n \rightarrow \infty} \sum_{k=1}^{\infty} A_k n^{k-1} \right ) \] And therefore \(A_k=0\) for \(k \geq 1\). From lemma 2, \[ \frac{\pi}{2}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot n! \right )^4}{\left ( (2n)! \right )^2(2n+1)}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{g(n)} \right )^4}{\left ( \left (\tfrac{2n}{e} \right )^{2n} \sqrt{4\pi n} \cdot e^{g(2n)} \right )^2(2n+1)} \] \[ \frac{\pi}{2}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot n! \right )^4}{\left ( (2n)! \right )^2(2n+1)}=\lim_{n \rightarrow \infty} \frac{\left ( 2^n \cdot \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{g(n)} \right )^4}{\left ( \left (\tfrac{2n}{e} \right )^{2n} \sqrt{4\pi n} \cdot e^{g(2n)} \right )^2(2n+1)} \] \[ \frac{\pi}{2}=\lim_{n \rightarrow \infty} \frac{n \pi}{2n+1} \cdot e^{4g(n)-2g(2n)} \] \[ 0=\lim_{n \rightarrow \infty} 2g(n)-g(2n)=2A_0-A_0=A_0 \] Therefore, \(A_k=0\) for \(k \geq 0\), and thus \(\lim_{n \rightarrow \infty} g(n)=0\). Thus it follows that \[ \lim_{n \rightarrow \infty} \frac{n!}{\left ( \tfrac{n}{e} \right )^n \sqrt{2\pi n}}=1 \] This fact is known as Stirling's Approximation. Moreover, we have \[ g(n)-g(n+1)=\ln\left ( \frac{n!\left ( \tfrac{n+1}{e} \right )^{n+1} \sqrt{2\pi (n+1)}}{(n+1)!\left ( \tfrac{n}{e} \right )^n \sqrt{2\pi n}} \right )=\ln\left ( \frac{(n+1)^{n+\tfrac{1}{2}}}{e \cdot n^{n+\tfrac{1}{2}}} \right ) \] \[ g(n)-g(n+1)=(n+\tfrac{1}{2})\ln\left ( 1+\frac{1}{n} \right )-1 \] By lemma 3, we then have \[ \frac{1}{12(n+1)(n+2)} \leq g(n)-g(n+1) \leq \frac{1}{12n(n+1)} \] \[ \sum_{k=n}^{\infty} \frac{1}{12(k+1)(k+2)}=\frac{1}{12(n+1)} \leq \sum_{k=n}^{\infty} g(k)-g(k+1)=g(n)-g(\infty)=g(n) \leq \sum_{k=n}^{\infty} \frac{1}{12k(k+1)}=\frac{1}{12n} \] That is \(\tfrac{1}{12(n+1)} \leq g(n) \leq \tfrac{1}{12n}\). And therefore: \[ \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n}\cdot e^{\tfrac{1}{12(n+1)}} \leq n! \leq \left (\tfrac{n}{e} \right )^n \sqrt{2\pi n} \cdot e^{\tfrac{1}{12n}} \] In fact, it is possible to obtain exact formulas for \(g(n)\). For example, by more advanced calculations, we can show that \[ g(n)=\int_{0}^{\infty}\frac{2 \tan^{-1}\left ( \tfrac{y}{n} \right )}{e^{2\pi y}-1}=\sum_{k=1}^{\infty} \frac{B_{2k}}{2k(2k-1)n^{2k-1}}=\frac{1}{12n}-\frac{1}{360n^3}+\frac{1}{1260n^5}- \cdots \] Where \(B_m\) is the mth Bernoulli number. These two expressions are, respectively Binet's second expression and Stirling's series.

Corollary: Product of a Rational Function

Firstly, since \[ \prod_{k=1}^N \left(ak+b\right)=a^N\prod_{k=1}^N \left(k+\frac{b}{a}\right) \] We will just evaluate \[ \prod_{k=1}^N \left(k+b\right)=\frac{(N+b)!}{b!} \approx \left(\frac{N+b}{e}\right)^{N+b}\frac{\sqrt{2\pi(N+b)}}{b!}=N^{N+b+\tfrac{1}{2}}e^{-N}\frac{\sqrt{2\pi}}{b!}e^{-b}\left(1+\frac{b}{N}\right)^{N+b+\tfrac{1}{2}} \] \[ \prod_{k=1}^N \left(k+b\right)=\frac{(N+b)!}{b!} \approx N^{N+b+\tfrac{1}{2}}e^{-N}\frac{\sqrt{2\pi}}{b!} \] More generally, given the above, it is not difficult to demonstrate the following generalization. Let \(m,n > 0\). Let \(a_1,a_2,...,a_m\) and \(b_1,b_2,...,b_n\) and \(r_1,r_2,...,r_m\) and \(s_1,s_2,...,s_n\) be sequences of numbers, such that \[ \sum_{k=1}^m r_k=\sum_{k=1}^n s_k \] and \[ \sum_{k=1}^m a_k r_k=\sum_{k=1}^n b_k s_k \] Then \[ \prod_{k=1}^\infty\frac{(k+a_1)^{r_1}(k+a_2)^{r_2}\cdots (k+a_m)^{r_m}}{(k+b_1)^{s_1}(k+b_2)^{s_2}\cdots (k+b_n)^{s_n}}=\frac{\prod_{j=1}^n (b_j!)^{s_j}}{\prod_{j=1}^m (a_j!)^{r_j}} \] In cases where the coefficients are non-integral, we use the Gamma function (an extension of the factorial to non-integers), instead of factorials: \[ \prod_{k=1}^\infty\frac{(k+a_1)^{r_1}(k+a_2)^{r_2}\cdots (k+a_m)^{r_m}}{(k+b_1)^{s_1}(k+b_2)^{s_2}\cdots (k+b_n)^{s_n}}=\frac{\prod_{j=1}^n (\Gamma (b_j+1))^{s_j}}{\prod_{j=1}^m (\Gamma (a_j+1))^{r_j}} \] For instance \[ \prod_{k=0}^\infty \frac{(k+1)(k+a+b)}{(k+a)(k+b)}=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}=B(a,b) \] \[ \frac{\sin(\pi x)}{\pi x}=\prod_{k=1}^\infty\frac{(k-x)(k+x)}{k^2}=\frac{\Gamma(1)^2}{\Gamma(1-x) \Gamma(1+x)}=\frac{1}{\Gamma(1-x)x \Gamma(x)} \] \[ \prod_{k=1}^\infty\frac{(1+\tfrac{1}{k})^x}{1+\tfrac{x}{k}}=\prod_{k=1}^\infty\frac{(k+1)^x k}{k^x (k+x)}=\frac{\Gamma(1)^x \Gamma (1+x)}{\Gamma(2)^x \Gamma(1)}=\Gamma(1+x) \]

Corollary: Asymptotic Behavior of Bernoulli Numbers

In this article, we found that \[ \zeta(2n)=\frac{1}{2}\frac{(2\pi)^{2n}}{(2n)!}\left | B_{2n} \right | \] Combining this with Stirling's approximation, we find that \[ \left | B_{2n} \right |=2\zeta(2n)\frac{(2n)!}{(2\pi)^{2n}} \approx 4\left ( \frac{n}{\pi e} \right )^{2n} \sqrt{n\pi} \cdot e^{1/24n} \]

Corollary: Approximation for Binomial Coefficients

\[ \binom{a}{b}=\frac{a!}{b!(a-b)!} \approx \frac{\left (\tfrac{a}{e} \right )^a \sqrt{2\pi a}} {\left (\tfrac{b}{e} \right )^b \sqrt{2\pi b}\left (\tfrac{a-b}{e} \right )^{a-b} \sqrt{2\pi (a-b)}}=\frac{1}{\sqrt{2\pi}}\sqrt{\frac{a}{b(a-b)}} \frac{a^a}{b^b (a-b)^{a-b}} \]

Corollary: Normal from Binomial

Let \(0 < p < 1\) and \(p+q=1\). Let \[ F_n(x)=\binom{n}{x}p^x q^{n-x} \] \[ f_n(x)=\sqrt{npq}F_n(np+x\sqrt{npq}) \] \[ \phi_n(x)=\ln(f_n(x)) \\ \\ \phi_n(x)=\ln(n!)-\ln((np+x\sqrt{npq})!)-\ln((nq-x\sqrt{npq})!)+(np+x\sqrt{npq})\ln(p)+(nq-x\sqrt{npq})\ln(q) \] Using Stirling's Approximation and some algebra \[ \phi_n(x) = -\tfrac{1}{2}\ln(2\pi)-\left (\tfrac{1}{2}+ np+x\sqrt{npq} \right )\ln\left ( 1+x\sqrt{\frac{q}{np}} \right)-\left (\tfrac{1}{2}+ nq-x\sqrt{npq} \right )\ln\left ( 1-x\sqrt{\frac{p}{nq}} \right)+O(\tfrac{1}{n}) \] Using the series expansion \(\ln(1+x)=x-\tfrac{1}{2}x^2+O(x^3) \) \[ \phi_n(x) = -\tfrac{1}{2}\ln(2\pi)-\tfrac{1}{2}x^2+O(\tfrac{1}{\sqrt{n}}) \] Thus, as \(n\) goes to infinity \[ \phi_\infty(x) = -\tfrac{1}{2}\ln(2\pi)-\tfrac{1}{2}x^2 \] \[ f_\infty(x) = \frac{e^{-x^2/2}}{\sqrt{2\pi}} \] Thus, in the limit, scaling for the changing means and variances, the binomial distribution tends to the normal distribution. Moreover, since the binomial distribution is normalized, we find that \[ \int_{-\infty}^{\infty}\frac{e^{-x^2/2}}{\sqrt{2\pi}}dx=1 \]

Occam+Bayes=Induction

2015-10-27T14:55:00.000-07:00

A classic problem in philosophy and the philosophy of science is how to justify induction. That is, how to rationally go from the fact that X is true in N previously observed cases to the belief that it is true in all cases, or at least in an additional, unobserved case. We will here propose a quick and simple method to justify induction, based on the combination of Occam's razor (to choose hypotheses) and Bayesian inference to update epistemic probabilities.

Notation

Let us introduce the following notation. Let \(H\) be some hypothesis which we want to judge for plausibility. Let \(X_k\) be the fact that \(X\) is true in the kth instance. Let \(X^n\) be the fact that \(X\) is true in the first n cases, that is \[X^n=X_1 \cap X_2 \cap \cdots \cap X_n=\bigcap_{k=1}^{n}X_k\] so that \[X^{n-1}\cap X_n=X^n\] Thus \(P\left ( X^n|H \right ) \) is the (epistemic) probability that we observe X in n cases, supposing H is true, and \(P\left ( H|X^n \right ) \) is the (epistemic) probability that H is true, supposing we observe X to be the case in n cases.

Occam's Razor

There are three basic, simplest hypotheses we can form, all the rest being more complex. These three are the

Proinductive (P) hypothesis: the chance of X happening again increases as we see more instances of it.
Contrainductive (C) hypothesis: the chance of X happening again decreases as we see more instances of it.
Uninductive (U) hypothesis: the chance of X happening again stays the same as we see more instances of it.

For concreteness, let \(F_H(n)=P\left ( X_{n}|H \cap X^{n-1} \right )\). Thus we say that, for \(m > 0\), \(F_P(n+m) > F_P(n)\), and \(\lim_{n \rightarrow \infty} F_P(n)=1\), and \(F_C(n+m) < F_C(n)\), and \(\lim_{n \rightarrow \infty} F_C(n)=0\), and \(F_U(n)=F_U(0)\).

Bayesian Inference

We want to find \(P\left ( H|X^n \right ) \) for the hypotheses listed in the previous section. We have \[ P\left ( X^n|H \right )=P\left ( X_n \cap X^{n-1}|H \right )=P\left ( X_n |X^{n-1} \cap H \right ) \cdot P\left ( X^{n-1} |H \right )=F_H(n) \cdot P\left ( X^{n-1} |H \right ) \] Therefore \[ P\left ( X^n|H \right )=\prod_{k=1}^{n} F_H(k) \] Suppose that there are \(N\) mutually exclusive and collectively exhaustive hypotheses. Then, Bayes' formula states: \[ P(H_m|A)=\frac{P(A|H_m)P(H_m)}{P(A|H_1)P(H_1)+P(A|H_2)P(H_2)+\cdots+P(A|H_N)P(H_N)} \] Thus, we have \[ P(H_m|X^n)=\frac{P(X^n|H_m)P(H_m)}{P(X^n|H_1)P(H_1)+P(X^n|H_2)P(H_2)+\cdots+P(X^n|H_N)P(H_N)} \] Therefore \[ P(H_m|X^n)=\frac{P(H_m)\prod_{k=1}^{n} F_{H_m}(k)}{P(H_1)\prod_{k=1}^{n} F_{H_1}(k)+P(H_2)\prod_{k=1}^{n} F_{H_2}(k) + \cdots + P(H_N)\prod_{k=1}^{n} F_{H_N}(k)} \] Let us suppose that the three hypotheses mentioned above are collectively exhaustive. Suppose, for concreteness that \(F_P(n)=\frac{n}{n+1}\), \(F_C(n)=\frac{1}{n+1}\), and \(F_U(n)=\frac{1}{2}\). Thus \(\prod_{k=1}^{n} F_{P}(k)=\frac{1}{n+1}\), and \(\prod_{k=1}^{n} F_{C}(k)=\frac{1}{(n+1)!}\), and \(\prod_{k=1}^{n} F_{U}(k)=\frac{1}{2^n}\). Let \(P(P)=p\) and \(P(C)=q\) and \(P(U)=r\) where \(p+q+r=1\). Then: \[ P(P|X^n)=\frac{p\frac{1}{n+1}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] \[ P(C|X^n)=\frac{q\frac{1}{(n+1)!}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] \[ P(U|X^n)=\frac{r\frac{1}{2^n}}{p\frac{1}{n+1}+q\frac{1}{(n+1)!}+r\frac{1}{2^n}} \] A simple assessment of limits shows that the former goes to 1 quite rapidly, for increasing n, for any nonzero p, and the latter two go to zero. In fact, for \(p=q=r=1/3\), for \(n>10\), \(P(P|X^n)>0.99\), and for \(n>17\), \(P(P|X^n)>0.9999\).

This example is meant to be only illustrative, to show the general way in which Occam's razor, combined with Bayesian inference, leads to a support of induction. The same things happening repeatedly lends credence to the hypothesis that the same things happen repeatedly, and detracts from the hypothesis that the same things are unlikely to happen repeatedly, or always happen with the same probability. In a very similar way, a coin repeatedly coming up heads supports the hypothesis that it is biased to come up heads, and detracts from the hypotheses that it is biased to come up tails or is fair. This may seem obvious, but it is beneficial to see exactly how the mathematical machinery supports this intuition.

We may also wish to include other hypotheses, but we must first assess the prior probabilities that they are true, and Occam's razor advises taking the inverse probability as inverse to the complexity of the hypothesis. Thus, even if on the hypothesis, observing n X's is more likely than on the three discussed, it would needs be more complex or ad hoc, and so would have a significantly lower prior probability.

Some Introductory Quantum Mechanics: Classical Background and Non-Classical Phenomena

2015-10-23T12:56:00.001-07:00

Quantum mechanics (QM) is a theoretical framework that describes the fundamental nature of reality, of particles of matter and light among potential others. QM arose from and in contrast to classical mechanics (CM), with many formulations and features still relying heavily on CM ideas. However, several phenomena established that CM cannot be the whole story, and would need to be amended. A new theory would need to be introduced to account for these phenomena, which would also predict some startling other ones. However, the best way to interpret the new theory is still disputed.

This will be a multi-part series giving a general introduction to quantum theory.

Classical Mechanics

QM is distinct from CM, though similar in several respects. CM, in general, looks at the behavior of idealized geometrical bodies, rigid, elastic, and fluid. The state is always definite, and in this state, momentum, energy, position and the like are well-defined and definite (we may make an exception for statistical mechanics, but in that case, these quantities may take on distributions only in the sense of an ensemble: it would still be in principle possible to determine these properties for each element in the ensemble, as Maxwell's demon would do). CM is how we tend naively to see the world. Things look like they are definite, spatially constrained, like a bunch of tiny definite parts, or large definite volumes moving along definite paths. This is decidedly not the case in QM.

CM has three main, equivalent formulations: Newtonian, Lagrangian and Hamiltonian.

Newtonian: Newtonian mechanics is the typical pedagogical formulation. It deals with the position and velocity of point masses, extended bodies, fluids, etc. in terms of forces, which relate back to position via Newton's second Law (which is really more of a definition). That is, for each asymptotically infinitesimal bit of matter in the system, find the net forces (and torques/stresses), in terms of the positions of the other bits of matter, relate it to the acceleration via Newton's second Law, and then solve the big set of differential equations (or use an iterative approximation method like Runge-Kutta) to find the trajectories of each bit of matter (often the problem is much simplified by various symmetries, homogeneities, localities, redundancies, and conservation considerations).

Lagrangian: Lagrangian mechanics deals more in energies, specifically a certain function of time space and momentum (of all the degrees of freedom) called the Lagrangian, which is typically just the kinetic minus the potential energy. Lagrangian mechanics allows one to deal with constraints in a simpler and more elegant way. Integrating the Lagrangian over time gives the action. The principle of stationary action states that objects move so as to make the action at a minimum (or sometimes, though rarely, at a maximum). This can be roughly and loosely interpreted as saying that objects go along the "easiest" trajectories. Lagrangians are still used extensively in modern physics, such as in quantum field theory and the path integral formulation.

Hamiltonian: Hamiltonian mechanics also deals in energies, specifically a certain function, related to the Lagrangian, called the Hamiltonian, which generally is equal to the total energy of the system. The trajectories are then found via Hamilton's equations, which are a set of differential equations relating changes of the Hamiltonian to changes of the position and momentum. This formalism uses rather abstract notions, such as frames of reference, generalized coordinates, phase space and the like. However, it is one of the most powerful formulations of classical mechanics and serves as one of the basic frameworks for the development of quantum mechanics.

Measurement in CM is intuitive and simple. We measure the position of a thing by looking at where it is and recording that. Measurement need not affect the thing being measured, at least in principle. But even if it can't be done in practice, the information being sought is still there and definite regardless. A hypothetical Laplace's demon could know all the parameters as they really are. This is very plausibly not the case in QM

As a rule, in CM, if an object requires energy \(E\) to do X, but only has energy \(E'< E\), then the object won't be able to do X. For example, if a marble is in a bowl with sides at a height requiring energy E to overcome (i.e. if the object is of mass m, the height of the sides is \(E/mg\)), but the marble only has energy \(E'< E\), the marble cannot escape the bowl. There is no chance that anyone will ever make a measurement of the position of the marble and have that be outside the bowl. Interestingly, this is not the case in QM.

CM is generally deterministic in a rather strict sense (though there are certain rare exceptions). Given that all of the above formulations are equivalent, they are all reducible to a set of second-order differential equations of various initial positions. This means that if all initial positions and velocities are known, even if the relevant forces are time-dependent, the trajectory of each object at all future times is unique and determinable. Any apparent indeterminism is merely apparent, namely epistemic. Assigning probabilities to different states or outcomes is done not because the state is ill-defined or there is some amount of indeterminism that emerges somehow. Rather, it is due to not knowing the initial state or not knowing how the system evolves. Were we to know completely the initial state and how it evolves, there would be no indeterminism. Moreover, any correlations arise from epistemically vague definite correlations. For instance, if we have two marbles, one of mass 100g and one of mass 105g, give one to one experimenter and the other to another, though they do not know which they received, once one experimenter weighs his marble, he immediately knows the weight of the other marble, even if it is very far away. We will find that this is not the case in QM.

A further development of CM was the inclusion of electromagnetic phenomena. These were incorporated in Maxwell's equations, which describe how electromagnetic fields are generated and changed by charges and currents. In essence, there is a ubiquitous, continuous electromagnetic field, which can be excited and disturbed in various ways, producing effects like radiation and induction (which lend themselves to a huge array of engineering and technological applications). A relatively simple theorem of electromagnetic theory is that accelerating charges radiate energy. This is most easily seen as being due to producing electric fields of varying strengths, combined with the fact that electromagnetic changes travel at a finite speed. For example, an oscillating charge will produce fields now weaker now stronger as it moves closer and further from a point. If we put a charge on a spring a distance away, it would begin oscillating, too, due to the varying force acting on it. Thus we could extract energy from the oscillating charge, and so it must be radiating energy, and so its oscillations will gradually decay. (Note that this implies that charges in orbit around one another will gradually radiate off their energy and fall into one another.) One of the outcomes of Maxwell's electromagnetic theory was the demonstration that light was electromagnetic in nature: electromagnetic disturbances propagated at the speed of light, and thinking of light as electromagnetic radiation accounted for a huge array of optical phenomena.

Also, electromagnetism is decidedly a wave-theory. The electromagnetic field is continuous and ubiquitous: it doesn't come in discrete "chunks" or "lumps" and it can have any value. It can have arbitrary energy (or energy density, a the case may be). This is opposed to particles, objects like little marbles, with definite extents, centers. When particles move, the stuff they are made of literally goes from one place to another. Whereas, when a wave moves, the field in one place increases, and decreases in another place: the pattern as opposed to the substance moves. Waves display interference effects: two waves could interfere constructively (increasing the size of the wave) or destructively (decreasing the size of the wave), whereas this seems impossible for particles. Destructive interference for particles would mean that when two particles came together, suddenly there was less substance there. We will return to this in discussing the two-slit experiment below.

Non-Classical Phenomena

There were several phenomena that indicated that CM was not the whole story, that it failed to give a full description of the world. These then paved the way for the development of QM.

Millikan's and Rutherford's Experiments
Millikan discovered, by a very ingenious experiment, that charge was quantized, i.e. it came in "chunks" or "lumps". There was a smallest unit of charge. The existence of electrons as objects with a definite mass had already been discovered by Thompson, experimenting with cathode ray tubes, but it was not known whether electrons had a definite, single charge. Millikan found that charge only came in integer multiples of the fundamental charge, known to be about \(1.6 \times 10^{-19} \mathrm{C}\). Rutherford then demonstrated that the atom was structured, not as Thomson supposed, like a plum pudding, but rather with a small, dense, positively charged nucleus with the electrons in some arrangement around it.

Stability and Discrete Radiation of the Atom
Rutherford's model of the atom (as well as any similar model) is impossible, according to classical electromagnetic theory. As discussed above, orbiting charges cannot persist indefinitely, as they will radiate off energy, and the orbit will eventually decay, the particles eventually colliding. As this clearly does not happen, there must be some modification to the understanding of the atom. In addition, it was noticed that an excited atom only emitted radiation at definite frequencies, not in a continuous spectrum. In the case of hydrogen, the radiation frequencies followed a very simple pattern. This behavior, however, could not be accounted for on classical mechanics, as the electron orbiting the nucleus could potentially have any energy. Moreover, if the electron could only have certain definite energies, it became difficult to see how it could go from one definite energy to another without taking on the intermediate energies. Clearly classical theory would have to be modified to allow for this.

Photoelectric Effect
It was observed that shining light on a metal induced a current. This by itself was predictable by CM, given the understanding that the metal had electrons in it, and when light shone on the metal, some electrons absorbed the energy and so were able to escape the metal to produce a current. However, according to CM, the energy of the light depended solely on the amplitude (i.e. brightness): it would not depend on the frequency (i.e. color) of the light used. Also, for sufficiently dim light, there should be a lag time between when the light comes on and electrons are emitted, due to the electrons needing to absorb a sufficient amount of light energy. However, neither of these predictions were correct: very bright light of sufficiently low frequency induced no current. And at sufficiently high frequencies, regardless of how dim the light was, the current began immediately, with no delay. This led Einstein correctly to conclude that light was quantized, in units called photons. The energy of each photon was related to the frequency of the light. The brighter the light, the greater the number of photons per unit time. This would entail that for light of a low frequency, even if bright, no electrons would be ejected from the metal, as each photon lacks enough energy to eject an electron, and the chance of multiple photons hitting the same electron is negligible (and the energy that is absorbed is dissipated as heat in the meantime). Moreover, for high enough frequencies, the energy per electron is linear with respect to frequency, with slope \(h= 6.626 \times 10^{-34} \mathrm{J}\cdot \mathrm{s}\), known as Planck's constant (however, the current, is dependent on the brightness of the light). This leads to the conclusion that the energy of each photon is given by \(E=hf\).

Black Body Radiation
A black body is defined as a perfect radiating source: it absorbs all radiation that falls on it, at a constant temperature. Such a body is known to radiate electromagnetic radiation, but finding and making sense of the spectrum of such a body is non-trivial. According to classical electromagnetic theory, the amount of radiation produced is expected to be proportional to the square of the frequency. That is, the higher the frequency, the more radiation. This is clearly not what happens in nature: otherwise hot objects would emit huge amounts of X-rays and gamma rays, and would instantaneously reach absolute zero, transforming all the thermal energy into electromagnetic radiation, as the total radiation is unbounded. However, Planck found that, by postulating that electromagnetic radiation was quantized as photons, with energies given by \(E=hf\), the total radiation was bounded, and tailed off at higher frequencies. The resulting formula is well born out by experiments, lending support to his postulation.

Double Slit Experiment
An experiment was performed in which a very dim coherent light source was placed in front of a photographic plate, behind an opaque plate with two narrow slits. The light source was so dim that it emitted no more than one photon at a time. What was found was very strange, according to classical mechanics. The photographic plate produced a pattern of spots where each photon hit it, indicating that the light had been behaving like particles. However the pattern produced is what the classical wave theory predicted: an interference pattern. Had the photons been acting like genuine classical particles, a different pattern would have emerged, one with only two peaks as opposed to many. Classical theory had no way to account for this. In addition, whenever any sort of measuring apparatus was put in place to detect which slit the photon passed through (if it was behaving like a classical particle, it would need to have a definite position and hence pass through a definite slit), the wave-pattern disappeared and a particle-pattern emerged. Classical physics has no way to explain this. Moreover, the experiment has these same features, even when performed with electrons, atoms and even molecules. In each case, the interference pattern produced is consistent with thinking of each object as if it were a wave with wavelength \(\lambda=h/p\), where p is the momentum of the object. More generally, \(\mathbf{p}=\frac{h}{2\pi}\mathbf{k}\), where \(\mathbf{k}\) is the wave vector (a sort of generalized, multidimensional wavelength). In fact, the quantity \(\frac{h}{2\pi}\) comes up so frequently that it is given its own symbol: \(\hbar\).

Stern-Gerlach Experiment
It was noticed that when a stream of certain atoms passed through an inhomogeneous magnetic field, the stream separated into several beams, two in the case of silver atoms. This demonstrated not only that the atoms had a magnetic dipole moment, but also that this moment was quantized, as otherwise it would have produced a smear, as opposed to several beams. The magnetic moment was correctly attributed to the charged particles in the atom, in particular the electrons. This implied that the electron had angular momentum. In classical mechanics, an object has angular momentum purely in terms of its structure and rotation. For example a wheel has angular momentum given its distribution of mass combined with its rotation. A point particle in classical mechanics cannot have angular momentum. Thus, as the electron was not known to have any internal structure, nor any literal rotation, the angular momentum could not be accounted for by classical physics. The angular momentum was thus given the name spin. An electron always has a measured angular momentum of either \(+h/2\) (called spin up) or \(-h/2\) (called spin down), relative to the axis of measurement. This itself is non-classical: classically, if an object has angular momentum about a certain axis, its angular momentum about an orthogonal axis will be zero, but electrons are never measured to have zero spin.

Apparent Indeterminacy
Suppose we have an electron with measured spin up along the x-axis. If it is measured along the y-axis, it will be found to have either spin up or spin down along that axis. Moreover, the spin measured along that axis will appear to be perfectly random: the results of such an experiment pass every known test for statistical randomness. This feature arises often in similar cases. For instance, in the two-slit experiment, where the next photon (or electron) hits the screen is also apparently random. A half-silvered mirror is a common device in optics, which transmits half the light shone on it and reflects the other half. However, if we put two detectors at points where transmitted and reflected light would go, and shine very dim light on it, such that no more than one photon is reaching the half-silvered mirror at a time, the pattern of detectors registering will be also apparently random. The pattern of detection passes every known test for statistical randomness. This type of behavior is very different from the usual CM sort. This apparent indeterminacy or randomness is a major aspect of quantum mechanics, and belies much of the disputes and misunderstandings surrounding it.

Product Formula for Sine and Some Interesting Corollaries

2015-10-20T20:57:00.001-07:00

Deriving the Product Formula: The Easy Way

Recall from this post that: \[ \sum_{n=1}^{\infty} \frac{1}{x^2+n^2}=\frac{\pi}{2x} \coth(\pi x)-\frac{1}{2x^2} \] We then substitute \(x=i z\): \[ \sum_{n=1}^{\infty} \frac{1}{n^2-z^2}=-\frac{\pi}{2z} \cot(\pi z)+\frac{1}{2z^2} \] We then go down the following line of calculation: \[ \sum_{n=1}^{\infty} \frac{2z}{n^2-z^2}=\frac{1}{z}-\pi\cot(\pi z) \] \[ \int\sum_{n=1}^{\infty} \frac{2z}{n^2-z^2}dz=C+\int \frac{1}{z}-\pi\cot(\pi z) dz \] \[ \sum_{n=1}^{\infty} -\ln \left (1-\frac{z^2}{n^2} \right )=C+\ln (z) - \ln (\sin (\pi z) ) \] \[ \sin(\pi z)=C' z\prod_{n=1}^{\infty}\left ( 1-\frac{z^2}{n^2} \right ) \] We can find \(C'\) by looking at the behavior near zero, and so find that: \[ \sin(\pi z)=\pi z\prod_{n=1}^{\infty}\left ( 1-\frac{z^2}{n^2} \right ) \] Therefore: \[ \sin(z)=z\prod_{n=1}^{\infty}\left ( 1-\frac{z^2}{\pi^2 n^2} \right ) \]

Deriving the Product Formula: The Overkill Way, by Weierstrass' Factorization Theorem

Suppose a function can be expressed as \[ f(x)=A\frac{\prod_{n=1}^{M}\left ( x-z_n \right )}{\prod_{n=1}^{N}\left ( x-p_n \right )} \] Where \(M \leq N\) and \(N\) can be arbitrarily large, even tending to infinity. Assuming there are no poles of degree >1 (all poles are simple), we can rewrite this as \[ f(x)=K+\sum_{n=1}^{\infty} \frac{b_n}{x-p_n} \] Where some of the \(b_n\) may be zero. We can also write this as \[ f(x)=f(0)+\sum_{n=1}^{\infty} b_n \cdot \left ( \frac{1}{x-p_n}+\frac{1}{p_n} \right ) \] Suppose \(f(0) \neq 0\), and that \(f\) is an integral function (i.e. an entire function). In that case, the logarithmic derivative \(f'(x)/f(x)\) has poles of degree 1. Moreover, \[\lim_{x \rightarrow z_n} (x-z_n)\frac{f'(x)}{f(x)}=d_n \] Where \(d_n\) is the degree of the zero at \(z_n\). Thus: \[ \frac{f'(x)}{f(x)}=\frac{f'(0)}{f(0)}+\sum_{n=1}^{\infty} d_n \cdot \left ( \frac{1}{x-z_n}+\frac{1}{z_n} \right ) \] Integrating: \[ \ln(f(x))=\ln(f(0))+x \frac{f'(0)}{f(0)}+\sum_{n=1}^{\infty} d_n \cdot \left ( \ln \left (1-\frac{x}{z_n} \right ) +\frac{x}{z_n} \right ) \] \[ f(x)=f(0) e^{x \frac{f'(0)}{f(0)}} \prod_{n=1}^{\infty} \left (1-\frac{x}{z_n} \right )^{d_n} e^{x\frac{d_n}{z_n}} \] This is our main result, called the Weierstrass factorization theorem. In particular, for the function \(f(x)=\sin(x)/x\) \[ \frac{\sin(x)}{x}=\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x}{n \pi} \right ) e^{x\frac{1}{n \pi}}=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{n^2 \pi^2} \right ) \] Thus \[ \sin(x)=x\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 n^2 } \right ) \]

Corollary 1: Wallis Product

Let us plug in \(x=\pi/2\): \[ \sin(\pi/2)=1=\frac{\pi}{2}\prod_{n=1}^{\infty} \left (1-\frac{1}{4 n^2 } \right ) \] \[ \pi=2\prod_{n=1}^{\infty} \left (\frac{4 n^2}{4 n^2-1 } \right )=2\frac{2 \cdot 2}{1 \cdot 3} \cdot \frac{4 \cdot 4}{3 \cdot 5} \cdot \frac{6 \cdot 6}{5 \cdot 7} \cdot \frac{8 \cdot 8}{7 \cdot 9} \cdots \] More generally: \[ \pi=\frac{N}{M} \sin(\pi M/N) \prod_{n=1}^{\infty} \left (\frac{N^2 n^2}{N^2 n^2 -M^2} \right ) \] This is useful when \(\sin(\pi M/N)\) is easily computable, such as when \(\sin(\pi M/N)\) is algebraic (e.g. \(M=1\), \(N=2^m\) ). For example: \[ \pi=2 \sqrt{2} \prod_{n=1}^{\infty} \left (\frac{4^2 n^2}{4^2 n^2 -1^2} \right ) \] \[ \pi=\frac{2}{3} \sqrt{2} \prod_{n=1}^{\infty} \left (\frac{4^2 n^2}{4^2 n^2 -3^2} \right ) \] \[ \pi=\frac{3}{2} \sqrt{3} \prod_{n=1}^{\infty} \left (\frac{3^2 n^2}{3^2 n^2 -1^2} \right ) \] \[ \pi=\frac{3}{4} \sqrt{3} \prod_{n=1}^{\infty} \left (\frac{3^2 n^2}{3^2 n^2 -2^2} \right ) \] \[ \pi=3 \prod_{n=1}^{\infty} \left (\frac{6^2 n^2}{6^2 n^2 -1^2} \right ) \] \[ \pi=\frac{3}{5} \prod_{n=1}^{\infty} \left (\frac{6^2 n^2}{6^2 n^2 -5^2} \right ) \] \[ \pi=3\sqrt{2}(-1+\sqrt{3}) \prod_{n=1}^{\infty} \left (\frac{12^2 n^2}{12^2 n^2 -1^2} \right ) \]

Corollary 2: Product Formula for Cosine

Let us evaluate the sine formula at \(x+\pi/2\): \[ \sin(x+\pi/2)=\cos(x)=\left (x+\frac{\pi}{2} \right )\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x+\pi/2}{\pi n } \right ) \] \[ \cos(x)=\frac{\sin(x+\pi/2)}{\sin(\pi/2)}=\left (1+\frac{x}{\pi/2} \right )\prod_{n=-\infty, n \neq 0}^{\infty} \frac{\left (1-\frac{x+\pi/2}{\pi n } \right )}{\left (1-\frac{\pi/2}{\pi n } \right )} \] \[ \cos(x)=\left (1+\frac{x}{\pi/2} \right )\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x}{\pi (n-1/2) } \right )=\prod_{n=-\infty}^{\infty} \left (1-\frac{x}{\pi (n-1/2) } \right ) \] \[ \cos(x)=\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \] Alternatively, we can derive this directly from the Weierstrass factorization theorem.
Additionally, by using imaginary arguments, we can derive the formulae: \[ \sinh(x)=x\prod_{n=1}^{\infty} \left (1+\frac{x^2}{\pi^2 n^2 } \right ) \] \[ \cosh(x)=\prod_{n=1}^{\infty} \left (1+\frac{x^2}{\pi^2 (n-1/2)^2 } \right ) \]

Corollary 3: Sine is Periodic

Let us evaluate the sine formula at \(x+\pi\): \[ \sin(x+\pi)=\left (x+\pi \right )\prod_{n=-\infty, n \neq 0}^{\infty} \left (1-\frac{x+\pi}{\pi n } \right ) \] \[ \sin(x+\pi)=\cdots \left (1+\frac{x+\pi}{3\pi} \right ) \left (1+\frac{x+\pi}{2\pi} \right )\left (1+\frac{x+\pi}{\pi} \right )\left (x+\pi \right ) \left (1-\frac{x+\pi}{\pi} \right )\left (1-\frac{x+\pi}{2\pi} \right ) \left (1-\frac{x+\pi}{3\pi} \right ) \cdots \] \[ \sin(x+\pi)=\cdots \left (\frac{4}{3}+\frac{x}{3\pi} \right ) \left (\frac{3}{2}+\frac{x}{2\pi} \right )\left (2+\frac{x}{\pi} \right ) \pi \left (1+\frac{x}{\pi}\right ) \left (\frac{-x}{\pi} \right )\left (\frac{1}{2}-\frac{x}{2\pi} \right ) \left (\frac{2}{3}-\frac{x}{3\pi} \right ) \cdots \] \[ \sin(x+\pi)=\cdots \frac{4}{3}\left (1+\frac{x}{4\pi} \right ) \frac{3}{2}\left (1+\frac{x}{3\pi} \right )2\left (1+\frac{x}{2\pi} \right ) \pi \left (1+\frac{x}{\pi}\right ) \left (\frac{-x}{\pi} \right ) \frac{1}{2}\left (1-\frac{x}{\pi} \right ) \frac{2}{3}\left (1-\frac{x}{2\pi} \right ) \cdots \] \[ \sin(x+\pi)=-2x\left ( \prod_{k=2}^{\infty} \frac{k^2-1}{k^2} \right ) \left ( \prod_{n=1}^{\infty} \left (1-\frac{x^2}{n^2 \pi^2} \right ) \right )=-\sin(x) \] As the first product easily telescopes. Thus \(\sin(x+2\pi)=\sin((x+\pi)+\pi)=-\sin(x+\pi)=\sin(x)\). Therefore, sine is periodic with period \(2\pi\).

Corollary 3: Some Zeta Values

Let us begin expanding the product for sine in a power series \[ \sin(x)=x\prod_{n=1}^{\infty} \left (1-\frac{x^2}{\pi^2 n^2 } \right )=x-\frac{x^3}{\pi^2}\left (\frac{1}{1^2}+\frac{1}{2^2}+\cdots \right )+\frac{x^5}{\pi^4}\left (\frac{1}{1^2 \cdot2^2}+\frac{1}{1^2 \cdot3^2}+\cdots \frac{1}{2^2 \cdot3^2}+\frac{1}{2^2 \cdot4^2}+\cdots \right )+\cdots \] \[ \sin(x)=x-\frac{x^3}{\pi^2}\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )+\frac{x^5}{\pi^4}\left (\sum_{m=1,n=1, m < n}^{\infty}\frac{1}{m^2n^2} \right )+\cdots \] \[ \sin(x)=x-\frac{x^3}{\pi^2}\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )+\frac{x^5}{2\pi^4}\left (\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )^2- \sum_{k=1}^{\infty}\frac{1}{k^4} \right )+\cdots \] By comparing this to the Taylor series for sine, we find: \[ \frac{1}{3!}=\frac{1}{\pi^2}\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right ) \] \[ \frac{1}{5!}=\frac{1}{2\pi^4}\left (\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )^2- \sum_{k=1}^{\infty}\frac{1}{k^4} \right ) \] From which it follows that \[ \sum_{k=1}^{\infty}\frac{1}{k^2}=\frac{\pi^2}{6} \] \[ \sum_{k=1}^{\infty}\frac{1}{k^4}=\frac{\pi^4}{90} \] In fact, for the fourth term, we find, similarly, that \[ \frac{1}{7!}=\frac{1}{6\pi^6}\left ( \left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )^3-3\left (\sum_{k=1}^{\infty}\frac{1}{k^2} \right )\left (\sum_{k=1}^{\infty}\frac{1}{k^4} \right )+2\left (\sum_{k=1}^{\infty}\frac{1}{k^6} \right ) \right ) \] From which it follows that \[ \sum_{k=1}^{\infty}\frac{1}{k^6}=\frac{\pi^6}{945} \]

Derivation of a Formula for the Even Values of the Riemann Zeta Function

2015-10-10T15:01:00.000-07:00

Lemma 1: Fourier Series of the Dirac Comb

A Dirac comb of period T is defined as \[{\mathrm{III}}_T(x)=\sum_{k=-\infty}^{\infty} \delta(x-kT)\] Where \(\delta(x)\) is the Dirac delta function. Since the Dirac comb is periodic with period T, we can expand it as a fourier series: \[\sum_{k=-\infty}^{\infty} \delta(x-kT)=\sum_{n=-\infty}^{\infty} A_n e^{i 2 \pi n x/T}\] We solve for the \(A_m\) in the usual way: \[ \int_{-T/2}^{T/2}\sum_{k=-\infty}^{\infty} \delta(x-kT)e^{-i 2 \pi m x/T} dx=1=\int_{-T/2}^{T/2}\sum_{n=-\infty}^{\infty} A_n e^{i 2 \pi (n-m) x/T} dx=T\cdot A_m \]\[ A_m=1/T \] Thus: \[\sum_{k=-\infty}^{\infty} \delta(x-kT)=\frac{1}{T}\sum_{n=-\infty}^{\infty} e^{i 2 \pi n x/T}\]

Lemma 2: An Infinite Series

\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=\frac{1}{x}+\sum_{n=1}^{\infty} \frac{1}{x+i n}+\frac{1}{x-i n}=\frac{1}{x}+2x\sum_{n=1}^{\infty} \frac{1}{x^2+n^2} \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=\int_{0}^{\infty} \sum_{n=-\infty}^{\infty} e^{-y(x+i n)} dy \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=\int_{0}^{\infty} e^{-yx} \sum_{n=-\infty}^{\infty} e^{-iyn} dy \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=2\pi \int_{0}^{\infty} e^{-yx} \sum_{k=-\infty}^{\infty} \delta(x-2\pi k) dy \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=2\pi \left (\frac{1}{2}+ \sum_{k=1}^{\infty} e^{-2\pi k x} \right ) \]\[ \sum_{n=-\infty}^{\infty} \frac{1}{x+i n}=2\pi \left (\frac{1}{2}+ \frac{e^{-2\pi x}}{1-e^{-2\pi x}} \right )= \pi \frac{e^{2\pi x}+1}{e^{2\pi x}-1} \] Therefore, combining the first and last expressions and rearranging, we find: \[ \sum_{n=1}^{\infty} \frac{1}{x^2+n^2}=\frac{\pi}{2x} \frac{e^{2\pi x}+1}{e^{2\pi x}-1}-\frac{1}{2x^2}=\frac{\pi}{2x} \coth(\pi x)-\frac{1}{2x^2} \] Additionally, by taking the limit as x approaches zero, we find: \[ \sum_{n=1}^{\infty} \frac{1}{n^2}=\frac{\pi^2}{6} \]

Theorem: Formula for the Even Values of the Riemann Zeta Function

Recall that, by definition: \[ \zeta(n)=\sum_{k=1}^{\infty}\frac{1}{k^n} \] Let us then analyze \[ f(x)=1-\frac{x}{2}+\sum_{n=2}^{\infty}\frac{x^{n}}{n!} A_{n} \] Where \[ A_n=-2 \cdot n! \cdot \cos(n\pi/2) \cdot 2^{-n}\pi^{-n} \zeta(n) \] Thus: \[ f(x)=1-\frac{x}{2}-2\sum_{n=1}^{\infty}\left (\frac{-x^2}{4\pi^2} \right )^n \zeta(2n) \]\[ f(x)=1-\frac{x}{2}-2\sum_{n=1}^{\infty}\left (\frac{-x^2}{4\pi^2} \right )^n \sum_{k=1}^{\infty}\frac{1}{k^{2n}} \]\[ f(x)=1-\frac{x}{2}-2\sum_{k=1}^{\infty}\sum_{n=1}^{\infty}\left (\frac{-x^2}{4\pi^2 k^2} \right )^n \]\[ f(x)=1-\frac{x}{2}-2\sum_{k=1}^{\infty} \frac{-x^2}{4\pi^2 k^2}\frac{1}{1+\frac{x^2}{4\pi^2 k^2}} \]\[ f(x)=1-\frac{x}{2}+\frac{x^2}{2\pi^2}\sum_{k=1}^{\infty} \frac{1}{k^2+\frac{x^2}{4\pi^2}} \]\[ f(x)=1-\frac{x}{2}+\frac{x^2}{2\pi^2} \left ( \frac{\pi^2}{x} \frac{e^x+1}{e^x-1} -\frac{2\pi^2}{x^2} \right ) \]\[ f(x)=\frac{x}{2} \left ( \frac{e^x+1}{e^x-1} -1 \right )=\frac{x}{e^x-1} \] Therefore, for n>1, \[ A_n=\lim_{x \rightarrow 0} \frac{\mathrm{d}^n }{\mathrm{d} x^n} \frac{x}{e^x-1} \] These numbers are called the Bernoulli Numbers, symbolized as \(B_n\) and they are easily found to be all rational. Thus, by rearranging, we find: \[ \zeta(2n)=\frac{\pi^{2n} 2^{2n-1} \left | B_{2n} \right |} {(2n)!} \] Thus, all the even values of the zeta function can be found by finding the appropriate Bernoulli number, which itself can be found by simple differentiation. Moreover, we see that all the values are rational multiples of the corresponding power of pi. Specifically, we find that: \[ \zeta(2)=\frac{\pi^2}{6} \]\[ \zeta(4)=\frac{\pi^4}{90} \]\[ \zeta(6)=\frac{\pi^6}{945} \]\[ \zeta(8)=\frac{\pi^8}{9450} \]\[ \zeta(10)=\frac{\pi^{10}}{93555} \]

Liars, Logic, and Information Theory

2015-09-29T18:17:00.001-07:00

One of the most common types of logic puzzles involves two tribes, one that always tells the truth and another that always tells lies. There are many versions and variations of puzzles with this setup, but we can develop a method of approach that will work generally. The two main versions fall into 2 categories:

Identification: we have a group of N people with some known possible set of identifications, and we ask questions to determine what tribe each is from.
Information: We have a group of N people with some known possible set of identifications, and we ask them questions to determine M bits of information (independent yes/no questions). We do not need to identify the tribe of each person. For concreteness, we will take the bits to be 1s or 0s (i.e. we want to find whether the bit is 1 or 0)

The questions must be asked individually, and must be yes/no questions. We assume that the persons asked know all information relevant to the puzzle and understand the questions, supposing they are comprehensible.

A Brief Primer in some Information concepts

The fundamental unit of information is the bit. A single bit answers one yes/no question. If both answers are equally likely, the answer gives the most information, as otherwise you could guess the answer more easily. (In fact, the formula for the effective number of bits, if the chance of a "yes" answer is p, is given by: \(-p\log_2(p)-(1-p)\log_2(1-p) \approx 4p(1-p)\)). If there are \(2^N\) equally possible options, it takes N bits of information to narrow it down to one: in general, an additional bit halves the possibility space. If there are M possibilities, and \(2^{N-1}< M \leq 2^N\), then N bits of information are required. From a deterministic source--that is, a source with known, predictable behavior--one answer to one yes/no question yields at most 1 bit of information, and exactly one if both answers are equally probable. In general, if we discover M bits of information with N questions, if we only want a smaller number of bits, we will need fewer questions.

We will discuss some specific cases, describing some general methods of approaching the problem. We will forgo trivial cases, like asking a 1-bit question to someone of a known tribe, or identifying a person from an unknown tribe.

Information: One Person of Unknown Tribe, One Bit

Clearly we must ask at least one question, but can we determine it in exactly one question? Indeed we can. Our goal is to formulate a question such that, regardless of whether the person is a liar or a truther, the answer will correspond to the truth. We thus construct the following table, and look for a question such which would produce the listed "real answers" (answers taking into account whether the teller is a liar or truther).

Bit Value	Identity	Given Answer	Honest Answer
1	Truther	Yes	Yes
1	Liar	Yes	No
0	Truther	No	No
0	Liar	No	Yes

The simplest way to construct such a question is just to ask one that corresponds to affirmative answers. In this case, the most easily constructed question is

Is one of the following true: the bit is 1 and you are a truther, the bit is 0 and you are a liar?

Regardless of whether the person asked is a truther or liar, the answer will always be "yes" if the bit is 1 and "no" if it is 0. The question may be found to simplify to something more natural sounding, but the question as given is sufficient. Moreover, if we require N bits of information, we can achieve such in exactly N questions. This will be our general approach. We will make a table in which the given answer corresponds to the information we seek. We will then formulate a question such as to produce the desired answer. This can be done most easily by forming a disjunction of the answers producing an affirmative.

Identification: One Person of Unknown Tribe, Unknown Language

In this case, the tribespeople have a language different than yours. They can understand your questions but reply in a way you can't understand. We will assume that you know the words for "yes" and "no" are "da" and "ja", but you don't know which corresponds to which. If you do not even know what the possible words for "yes" and "no" are, you can find this out with one additional question, merely by asking anything and then knowing that the response either means "yes" or "no". The question is then whether you can identify the tribe of the person, and in as few questions as possible. Given that we only seek one bit of information (the person is either a truther or a liar), we will attempt to do so with a single question. We will look for a question such that the response corresponds to the identity of the person. For concreteness, we will take "Da" to be indicative of a truther, "Ja" of a liar.

Identity	Translation of "Da"	Given Answer	Translated Answer	Honest Answer
Truther	Yes	Da	Yes	Yes
Truther	No	Da	No	No
Liar	Yes	Ja	No	Yes
Liar	No	Ja	Yes	No

Again, the simplest way to construct such a question is just to ask one that corresponds to affirmative answers, by a simple disjunction. In this case, the honest answer is "yes" exactly when "Da" means "yes". So we simply ask:

Does "Da" mean "yes"?

A truther will always answer "Da", and a liar will always answer "Ja". Note that we cannot determine what "Da" actually means from this question, and this accords with information theory concepts. We can only get one bit of information from one question. If we wanted to identify what "Da" meant without knowing the identiy, by a similar method we would find that the following question achieves that:

Is one of the following true: "Da" means "yes" and you are a truther, "Da" means "no" and you are a liar?

If the answer is "Da", "Da" means "yes".

Information: One Person of Unknown Tribe, Unknown Language, One Bit

This case is much like the preceding one, except we require neither then meaning of "Da" nor the identity of the person. As we need only one bit of information, we require at least one question. We will show how to do it in exactly one question. As before, we construct a table, but this time with three independent variables: the value of the bit, the identity of the person, and the meaning of "Da".

Bit Value	Identity	Translation of "Da"	Given Answer	Translated Answer	Honest Answer
1	Truther	Yes	Da	Yes	Yes
1	Truther	No	Da	No	No
1	Liar	Yes	Da	Yes	No
1	Liar	No	Da	No	Yes
0	Truther	Yes	Ja	No	No
0	Truther	No	Ja	Yes	Yes
0	Liar	Yes	Ja	No	Yes
0	Liar	No	Ja	Yes	No

By the same method, the easiest (though not simplest) question to ask is:

Is one of the following true: you are a truther and "Da" means "yes" and the bit is 1, you are a liar and "Da" means "no" and the bit is 1, you are a truther and "Da" means "no" and the bit is 0, you are a liar and "Da" means "yes" and the bit is 0 ?

A simpler way would be to ask:

Is an odd number of the following true: the bit is 1, you are a truther, "Da" means "yes"?

In general, we can see that we can always get exactly one bit of information from one question, given certain other constraints. Not knowing the language or the identity of the person asked are no hindrances to getting information. Also, if we have \(2^M\) people from potentially different tribes who speak the same unknown language, or even if we only know the potential words for "yes" and "no" for one of their languages, we can still identify all of them in exactly M questions just by asking the one person M questions.

Identification: Truther, Liar, and Unhelpful in Unknown Order.

In this case, we have three people known to be some permutation of truther, liar and a third kind we call unhelpful. The unhelpful is a third type of tribesperson who answers so as to be maximally unhelpful. That is, he will answer so as to prevent you from getting information. The goal is to identify him regardless, as well as the other two. The first question is whether we can identify the three, and then, if it is possible, to do so in as few questions as we can. As there are 6 possible orderings, we will need 3 bits of information, corresponding to at least 3 questions. We must ask each person at least one question, as only asking 2 or fewer risks only asking the unhelpful, who provides no information. However, if we ask each of them one question, we only get two bits of information, as the unhelpful provides none. Thus we must ask at least 4 questions, with the 4th question being asked of one of the non-unhelpfuls.

In fact there is a way to do this. We ask a question which the truther and the liar will answer differently. We then take the odd one out among the three, who is guaranteed to be either a truther or a liar (in fact, the way he answers will decide which) and then ask him for one more bit of information to identify one of the others, which we have already described how to do. So, for instance, we can ask all three "Do you exist?" (or, if the language is unknown "Does 'Da' mean 'yes'?"). And then concoct a question to ask the odd one out to get the final requisite bit (left as an exercise for the reader). Thus we can achieve it in exactly 4 questions. In fact, for the first three questions, we only get 2/3 of a bit of information per answer, as, for each answer, we get 1 bit with 2/3 probability.

Preliminary Matters Relating to Morality

2015-07-16T23:23:00.001-07:00

Obligations and Duties

We wish to characterize obligations and duties (taken as essentially synonymous) in a more definite way than is typically used, specifically, as pertains to morality, as typically conceived. Many uses of the term have no relation to morality whatsoever. For instance, a legal obligation is merely something demanded of someone by the governing laws which, should he fail to fulfill it, would result in some sort of penalty. If the penalty were absent, the so-called obligation would be rendered irrelevant, as it would be merely up to the disposition of the one obligated whether to fulfill it or not, and no enforcement could be possible. Thus, legal obligations are no more than demands with enforced consequences: it is demanded of the person to do something, and, failing to do so, punishment will result. Another sort is a social or societal obligation. In this case, there is a certain expectation to behave in a certain way, and failing to behave results in some loss of social esteem, stigmatization, shunning, demotion, reduced access to social assets (like favors or company), etc.

However, clearly moral obligations are not of either of these sorts: with a moral obligation, even if no punishment or repercussion would be visited from without, there would still be the internal drive to act. Moreover, even if it were demanded of us by law to act immorally, or our society expected us to do so, that would have no moral bearing on whether we should so act. The missing ingredient, then, if an obligation or duty is to be different from a mere demand or expectation, with or without penalties for transgression, is the drive from within: there is no duty without a sense of dutifulness. If one feels no obligation to do a thing, then one simply has no such obligation.

"[D]uty has no hold on [a man] unless he desires to be dutiful."
-B. Russell

Truth and Objectivity

The simplest way to analyze objective truth is to begin by looking at statements already agreed to be objectively true: (A) "If X is a triangle, X has three sides", (B) "Horses exist". How is it that these statements are objectively true? Surely it is that, when we interpret them correctly, we get a claim about the world that accurately describes it. The truth value of the propositions will depend on how we interpret the terms. For instance, if we interpret the term "triangle" (merely a word: a set of symbols) to mean what we normally mean by the word "square", then (A) would be false. It is only when the semantic content of the terms is specified (as well as the way in which the content of the sentence is to be educed from the terms, e.g. grammar) that the sentence or proposition can have an objective truth value. When the terms are left unspecified, or determined on a subject by subject basis, then the proposition is subjective. Thus, all that is needed to make a system as objective as, say, geometry, is to have the terms well-defined, be it a moral system or any other.

Voluntary Action

We will define a voluntary action as one a person does as a result of a choice they make. Involuntary actions are basically irrelevant to considerations, in any practical sense, except insofar as they can be changed via voluntary actions. It is then also clear that voluntary actions are the only ones that can be considered in any plausible morality: someone is not moral or immoral based on actions they can't control. This is often summarized in the dictum "ought implies can": regardless of what, exactly, "ought" is taken to mean in the end, it must imply that the thing one ought to do is one that one can do (though "can" might itself need some further analysis). Furthermore, "ought" seems to imply also "can not", as in "can do otherwise". If one can't help but do something, it cannot be meaningfully said that she ought to do it. Thus, oughts imply a choice, where the alternatives can each be acted on

In any choice between alternatives, choosing one must mean that one wanted that option, for if one wanted a different option, she would have chosen it. "Want" here is to be taken in a more general sense than it may often be. You may want to go with your friends to the movies, but do homework instead, and why? Because though you may prefer movies to homework in general, you prefer doing well in a class at the expense of spending less time with your friends to spending more time with your friends and doing worse in the class. In the greater context, you prefer doing homework to going to the movies in this case, as opposed to generally preferring movies to homework with no context. Thus, all voluntary choices are the result of the person doing what she wants: everyone always does what they want most, as far as they can. A clear corollary of this is that to change voluntary behavior, one must appeal to what the person in question wants or cares about. This is abundantly clear in experience as well. Moreover, the converse is also manifestly true: if something affected someone's voluntary behavior, it must have appealed to what she wanted or cared about. For, as everyone does what they want most, as far as they can, what affects their voluntary actions must have appealed to what they want or cared about.

Valuation Theory

2015-02-22T09:59:00.001-08:00

Valuation Systems

Types of Valuation Systems

There are two general sorts of valuation systems:

Comparative Valuation Systems (CVSs): Determines only the ranking of value for the elements of a given, countable set. If X is a CVS and X values A above B, we will write that as \( (A>B)_X\), which we can read as "A is better than B, according to X". Note that CVSs don't have any notion of "good" or "bad", but only "better" and "worse", and possibly "best", if there is some element better than the rest.

A subset of CVSs are Bi-comparative VSs (bCVSs, or C₂VSs), which only rank sets with exactly two elements, either with one better and one worse, or with both equal. If the bCVS has the additional property of being transitive, then the system can be used to impose a partial ordering on the elements of its domain.

Evaluative Valuation Systems (EVSs): Determines the plain value of every element in its domain, like a function. Namely, we can symbolize "the value of A, according to EVS X" as \(V_X(A)\). Without loss of generality, we can take the values assigned to be real numbers. If only order is important, we can take the range to be the numbers in the interval \([-1,1]\). Note that EVSs can have a notion of "good" and "bad", in that we can define "A is bad, according to EVS X" as \(V_X(A)< c \), for some number c, which we can take to be 0. Similar statements can be similarly defined. To keep notation consistent, we will write \((A>B)_X\) iff \(V_X(A)>V_X(B)\), for some EVS X.

Indifferent Extensions

CVSs:
Let \(X\) be a CVS with domain \(D_X\). The CVS \(X'\) is the indifferent extension of \(X\), such that, for any \( a,b \notin D_X\) and \(c \in D_X\), \((a< c )_{X'} \), \((a=b)_{X'}\).

EVSs:
Let \(X\) be an EVS with domain \(D_X\). The EVS \(X'\) is the indifferent extension of \(X\), such that, for any \( a\notin D_X\), \(V_{X'}(a)=0\).

Optimal Elements

Meta-Valuation Systems, Optimal Valuation Systems, and Recommendation

Antagonist Valuation Systems and the Universo-Optimality Absence Theorem

Some Implications for Morality

Probability Problems

2014-07-01T21:15:00.005-07:00

There are many interesting problems that can be studied with probability theory. Here I will discuss a few of my favorites.

Maxima in Data

Suppose we have a sequence of values of random variables \(\left(X_{k} \right )_{k=1}^{n}\)that are independent and identically distributed with density function \(f_{X}(x)\). We wish to find the expected number of local maxima in the data, that is, values of \(X_{k}\) such that \(X_{k-1} < X_{k} > X_{k+1}\). Let \(y=X_k\). Then, the probability \(p\) that a certain value y is a maximum is that of having \(X_{k-1} < y > X_{k+1}\). As all the \(X\)s are independent and identically distributed, we can calculate this probability by finding \[p= \int_{-\infty}^{\infty}f_{X}(x)F_{X}^2(x)dx\] Where \(F_{X}(x)=\int_{-\infty}^{x}f_{X}(y)dy\) is the cumulative distribution function of \(f_{X}\). The form above is like \(\sum_y P(X_k=y)P(X_{k-1} < y)P(X_{k+1} < y)\), except we use the continuous formulation. However, clearly \(F_{X}'(x)=f_X(x)\), and so the formula becomes \[p= \int_{-\infty}^{\infty}F_{X}'(x)F_{X}^2(x)dx\] But, by elementary calculus, we can change this to \[p= \int_{0}^{1}F^2 dF=\frac{1}{3}\] (the change in bounds arises from the fact that \(F_{X}(-\infty)=0\) and \(F_{X}(+\infty)=1\)). Thus, we expect one-third of the values in the sequence to be local maxima. Likewise, we expect one-third to be local minima, and the remaining third to be neither.

By the same method, we can look for other patterns. For instance, the fraction of data points that are higher than all four of their closest neighbors is \(\frac {1}{5}\). The fraction of data points that are bigger than their closest neighbors and smaller than their next-closest neighbors is \(\frac {1}{30}\). In fact, all the calculations can be made by evaluating integrals of the form \( \int_{0}^{1} x^m (1-x)^n dx\). We can also use results like this to test for non-independent or non-identically distributed data. It may even be possible to use it in fraud or bias detection. Based on next-to-nothing-at-all, I would expect human generated data to fail some of these tests.

We can also find that the distribution of the number of maxima in \(n\) data points, regardless of the probability distribution \(f_{X}\), is approximately normally distributed with mean \(\frac{n}{3}\) and variance \(\frac{2 n}{45}\). Thus, if we found fewer than 2960 or more than 3040 maxima in a list of 9000 data points, we could be 95% confident that the list was not of independent and identically distributed values. We can also run the same test for minima, but for non-maxima-non-minima, the variance is instead \(\frac{2 n}{15}\).
The values for the variances were found empirically. I don't really know how one would go about finding them analytically.

We can also find the distribution of the values of the maxima, which is easily found to be \[g(x)=3 f_{X}(x)F_{X}^2\] Other distributions are similarly found.

Joint Lives

Suppose we stat with \(N\) couples (\(2N\) people), and at a later time, \(M\) of the original \(2N\) people remain. We want to find the expected number of intact couples remaining. Let \(C(M)\) and \(W(M)\) be the expected remaining number of couples and widows respectively when M total people are left. We then note that, as any remaining person is equally likely to be eliminated next, we have: \[ C(M-1)=C(M)-2 \frac{C(M)}{M} \\ W(M-1)=W(M)-\frac{W(M)}{M}+2 \frac{C(M)}{M}\] We can solve this recurrence relation, subject to the constraints \(W(M)+2 C(M)=M\) and \(C(2N)=N, W(2N)=0\), and find that \[ C(M)=\frac{M(M-1)}{2(2N-1)} \\ W(M)=\frac{M(2N-M)}{2N-1} \] If we express M as a fraction of the total starting population: \(M=2xN\), and express \(C\) and \(W\) as fractions of the total population, we find, for \(N\) big: \[ C(x)=x^2 \\ W(x)=x(1-x) \] Also, for the general case of starting out with \(kN\) \(k\)-tuples, the expected number of intact \(k\)-tuples when \(M\) individuals remain is given by: \[K(M)=N \frac{\binom{M}{k}}{\binom{kN}{k}}\] For the case of triples, we have the number of triples, doubles and singles when M individuals remain is given by: \[K_3 (M)= \frac{M(M-1)(M-2)}{3(3N-1)(3N-2)} \\ K_2 (M)= \frac{M(M-1)(3N-M)}{(3N-1)(3N-2)} \\ K_1 (M)= \frac{M(3N-M)(3N-M-1)}{(3N-1)(3N-2)} \] Generally, with the same sense as discussed above, the fraction of the population in a m-tuple, beginning with only k-tuples, when fraction \(x\) of the population remains, is given by: \[K_m (x)=\binom{k-1}{m-1} x^m (1-x)^{k-m}\] In fact, the general form for the expected number can be given as \[ K_m (M)= N \frac{\binom{M}{m}\binom{kN-M}{k-m}}{\binom{kN}{k}} \]

Random Finite Discrete Distribution

Suppose we have a discrete random variable that can take on the values \(1,2,3,...,n\) with probabilities \(p_1,p_2,p_3,...p_n\) respectively, subject to the constraint \(\sum_{k=1}^n p_k=1\). Let \(p\) be an arbitrary value among the \(p\)s. We will take any combination of values for the \(p\)s as equally likely. By looking at the cases of n=2 and n=3, we find that the probability density function of \(p\) is given by \[ f_P(p)=(n-1)(1-p)^{n-2} \] And the cumulative distribution function is given by \[ F_P(p)=1-(1-p)^{n-1} \] The average value of \(p\) is then \[\int_{0}^{1} p(n-1)(1-p)^{n-2}dp=\frac{1}{n}\] And the variance is \[\int_{0}^{1} p^2 (n-1)(1-p)^{n-2}dp-\frac{1}{n^2}=\frac{n-1}{n^2 (n+1)}\] We thus find that the chance that \(p\) is above the average value is \[P\left ( p > \frac{1}{n} \right )=\left ( 1-\frac{1}{n} \right )^{n-1}\] In the limit as n becomes large, this value tends to \(\frac{1}{e}\).
A confidence interval containing fraction x of the total probability, for large n, is given by: \[ \frac{1}{n} \ln \left(\frac{e}{e x +1-x} \right) \leq p \leq \frac{1}{n} \ln \left(\frac{e}{1-x} \right) \] For instance, a \(50 \%\) confidence interval is given by \(\frac{1}{n}\ln \left(\frac{2 e}{1+e}\right) \leq p \leq \frac{1}{n}\ln(2 e)\).

We can also extend this to continuous distributions with finite support if we only consider the net probability of landing in equally-sized bins. While the calculation may break down if the number of possible values is actually infinite, it can be used to get some information about distributions with an arbitrarily large number of possible values.

Maximum of Exponential Random Variables

Suppose we have \(N\) independent and identically distributed exponential random variables \(X_1,X_2,...X_N\) with means \(\mu\). That is, \(f_X (x_k)=\frac{1}{\mu} e^{-\frac{x}{\mu}}\) when \(x \geq 0\) and zero otherwise. Let us interpret the random values as lifetimes for \(N\) units. The exponential distribution has the interesting property of memorylessness, which means that \(P(x > a+b| x > b)=P(x > a)\). We can show this by using the definition: \[ P(x>a+b|x>b)=\frac{P(x>a+b \cap x>b)}{P(x>b)}=\frac{P(x>a+b)}{P(x>b)} \\ P(x>a+b|x>b)=\frac{\int_{a+b}^{\infty}e^{-\frac{x}{\mu}}dx}{\int_{b}^{\infty}e^{-\frac{x}{\mu}}dx}=\frac{e^{-\frac{a+b}{\mu}}}{e^{-\frac{b}{\mu}}}=e^{-\frac{a}{\mu}}=P(x>a) \] In other words, given that a unit lasted \(b\) minutes, the chance that it will last another \(a\) minutes is the same as that it would last \(a\) minutes. We now calculate the probability distribution of the minimum of the \(N\) random variables. The probability that the minimum of \(X_1,X_2,...X_N\) is no less than \(x\) is the same as the probability that \(X_1 \geq x \cap X_2 \geq x \cap...X_N \geq x \). As all the \(X\)s are independent, this can be simplified to a product, and as all the \(X\)s are identically-distributed, we can simplify this further: \[ P(\min(X_1,X_2,...)\geq x)=\left ( P(X_1 \geq x) \right )^N= \left ( \int_{x}^{\infty}\frac{1}{\mu} e^{-\frac{x}{\mu}}\right )^N \\ P(\min(X_1,X_2,...)\geq x)= e^{-\frac{xN}{\mu}} \\ P(\min(X_1,X_2,...)\leq x)= 1-e^{-\frac{xN}{\mu}} \\ f_{\min(X)}(x)=\frac{N}{\mu}e^{-\frac{xN}{\mu}} \] Thus, the average of the minimum of \(X_1,X_2,...X_N\) is \(\frac{\mu}{N}\). We now combine these two facts, the mean minimum vale and the memorylessness. We start with all units operational, and we have to wait an average of \(\frac{\mu}{N}\) until the first one fails. However, given that the first one fails, the expected additional wait time until the next one fails is just \(\frac{\mu}{N-1}\), that is, the expected minimum of \(N-1\) units. Thus, the expected time that the \(m\)th unit fails is given by \[\mu\sum_{k=0}^{m-1}\frac{1}{N-k}\] Thus, the expected maximum time, when the \(N\)th unit fails is \[\mu\sum_{k=1}^{N}\frac{1}{k}\]

More generally, we can look at the distributions of the kth order statistic of \(X_1,X_2,...X_N\). The kth order statistic, denoted \(X_{(k)}\), is defined as the kth smallest value, so that \(X_{(1)}\) is the smallest (minimum) value, and \(X_{(N)}\) is the largest (maximum) value. The pdf is easily found to be: \[ f_{X_{(k)}}(x)=k {N \choose k} F_X^{k-1}(x)\left[1-F_X(x)\right]^{N-k}f_X(x) \] Where \(F_X(x)\) is the cdf of X, and \(f_X(x)\) is the pdf of X. So, in this case, \[ f_{X_{(k)}}(x)=\frac{k}{\mu} {N \choose k} e^{-(N-k+1)x/\mu}\left[1-e^{-x/\mu}\right]^{k-1} \] Thus, the moment generating function is given by: \[ g(t)=\frac{k}{\mu} {N \choose k} \int _0 ^\infty e^{-(N-k+1-\mu t)x/\mu}\left[1-e^{-x/\mu}\right]^{k-1} dx \] By a simple transformation, we find that: \[ g(t)=k {N \choose k} \int _0 ^1 u^{N-k-\mu t}(1-u)^{k-1} du \] This puts the integral in a well-known form, which has the value \[ g(t)=\frac{N!}{\Gamma(N+1-\mu t)}\frac{\Gamma(N-k+1-\mu t)}{(N-k)!} \] By a simple calculation, the cumulants are then given by the surprisingly simple form: \[ \kappa_n=\mu^n(n-1)!\sum_{j=N-k+1}^{N} \frac{1}{j^n} \] Several interesting results follow from this:

For the Nth order statistic (the maximum), we already know that the mean value goes as \(\sum_{j=1}^{N} \frac{1}{j}\). But now we see that the other cumulants go as \((n-1)!\sum_{j=1}^{N} \frac{1}{j^n}\). Thus, the variance converges, in the limit, to \(\mu^2 \frac{\pi^2}{6}\). The skewness converges, in the limit, to \(\frac{12 \sqrt{6}}{\pi^3}\zeta(3)\), and the excess kurtosis converges to \(\frac{12}{5}\). In fact, if we shift to take into account the unbounded mean, the distribution of the maximum converges to a Gumbel distribution. This is a special case of a fascinating result known as the extreme value theorem.
For any given, fixed, finite \(k\geq 0\), \(X_{(N-k)}\) converges, as N goes to infinity, to a non-degenerate distribution with finite, positive variance, if we shift it to account for the unbounded mean.
For k of the form \(k=\alpha N\) (or the nearest integer thereto), for some fixed alpha between 0 and 1, for \(\alpha \neq 1\), in the limit at N goes to infinity, the distribution of \(X_{(\alpha N)}\) become degenerate distributions with all the probability density located at \(\mu\ln\left(\frac{1}{1-\alpha}\right)\). These are, of course, the locations of the \(100\alpha \%\) quantiles, and so \(X_{(\alpha N)}\) is a consistent estimator for the \(100\alpha \%\) quantile.

As a more general result, let us find the cdf of \(X_{(\alpha N)}\) for an arbitrarily distributed X, in the limit as N goes to infinity. The cdf of \(X_{(\alpha N)}\) is given by: \[ F_{X_{(\alpha N)}}(y)=\alpha N {N \choose \alpha N} \int _{-\infty} ^{y} F_X^{\alpha N-1}(x)\left[1-F_X(x)\right]^{N-\alpha N}f_X(x) dx \] As \(f_X(x)=\frac{d}{dx}F_X(x)\), we then have, by a simple substitution: \[ F_{X_{(\alpha N)}}(y)=\alpha N {N \choose \alpha N} \int _{0} ^{F_X(y)} u^{\alpha N-1}\left[1-u\right]^{N-\alpha N} du \] This is the cdf of a Beta distributed random variable, with mean \(\mu=\frac{\alpha N}{N+1}\) and variance \(\sigma^2=\frac{\alpha N (N-\alpha N+1)}{(N+1)^2(N+2)}\). Thus, as N goes to infinity, this will converge in distribution to a degenerate distribution with all the density at \(y=F_X^{-1}(\alpha)\), that is, at the \(100\alpha \%\) quantile of the distribution.

Choosing a Secretary

Suppose we need to hire a secretary. We have \(N\) applicants arrive and we interview them sequentially: once we interview and dismiss an applicant, we cannot hire her. The applicants all have differing skill levels, and we want to pick as qualified an applicant as we can. We want to find the optimal strategy for choosing whom to hire. We easily see that the optimal strategy is something like the following. We consider and reject the first \(K\) applicants. We then choose the first applicant who is better than all the preceding ones. Thus, our problem reduces to finding the optimal value for \(K\). We will do so in a way that maximizes the probability that the most qualified secretary is selected. We thus have the probability: \[ P(\mathrm{best\, is\, chosen})=\sum_{n=1}^{N}P(\mathrm{n^{th}\, is\, chosen} \cap \mathrm{n^{th}\, is\, best}) \\ P(\mathrm{best\, is\, chosen})=\sum_{n=1}^{N}P(\mathrm{n^{th}\, is\, chosen}| \mathrm{n^{th}\, is\, best})P(\mathrm{n^{th}\, is\, best}) \] We then note that each applicant in line is the best applicant with equal probability. That is, \(P(\mathrm{n^{th}\, is\, best})=\frac{1}{N}\). Also, we can find the conditional probabilities. If \(M \leq K\), then \(P(\mathrm{M^{th}\, is\, chosen}| \mathrm{M^{th}\, is\, best})=0\). If the \((K+1)\)th applicant is best, she will certainly be chosen, that is \(P(\mathrm{(K+1)^{th}\, is\, chosen}| \mathrm{(K+1)^{th}\, is\, best})=1\). Also, we find that \(P(\mathrm{(K+m)^{th}\, is\, chosen}| \mathrm{(K+m)^{th}\, is\, best})=\frac{K}{K+m}\), as that is the chance that the second-best applicant among the first \(K+m\) applicants is in the first \(K\) applicants. We thus have \[ P(\mathrm{best\, is\, chosen})=\frac{K}{N}\sum_{n=K+1}^{N}\frac{1}{n} \] Let us assume we are dealing with a relatively large number of applicants. In that case, we can approximate \(\sum_{n=A+1}^{B}\frac{1}{n} \approx \ln \left(\frac{B}{A} \right )\). Thus \[ P(\mathrm{best\, is\, chosen})=\frac{K}{N}\ln \left(\frac{N}{K}\right )=-\frac{K}{N}\ln \left(\frac{K}{N}\right ) \] If we then let \(x=\frac{K}{N}\), we just need to maximize \(-x\ln(x)\), which happens at \(x=e^{-1}\). From this, we find that \(P(\mathrm{best\, is\, chosen})=e^{-1}\). Thus, the best strategy is to interview and reject the first \(36.8 \%\) of the applicants, and then choose the next applicant who is better than all the preceding ones. This will get us the best applicant with a probability of \(36.8 \%\).

A related problem involves finding a strategy that minimizes the expected rank of the selected candidate (the best candidate has rank 1, the second best rank 2, etc.). Chow, Moriguti, Robbins and Samuels have found that the optimal strategy involves the following (in the limit of large \(N\)): skip the first \(c_0 N\) applicants, then, for all applicants before the number \(c_1 N\), we stop looking if the applicant is the best so far. If we have not yet selected an applicant, we choose the best or second best so far before the number \(c_2 N\). If we have not yet selected an applicant, we choose the best or second best or third best so far before the number \(c_3 N\). And so on. By choosing the \(c_n\) optimally, we can get an expected rank of \(3.8695\). This is quite surprising: we can expect an applicant in the top 4, even among billions of applicants!
The optimal values for the \(c_n\) are \[ c_0=0.2584... \\ c_1=0.4476... \\ c_2=0.5639... \] The general formula for \(c_n\) is \[ c_n=\prod_{k=n+2}^{\infty}\left ( \frac{k-1}{k+1} \right )^{1/k}=\frac{1}{3.86951924...}\prod_{k=2}^{n+1}\left ( \frac{k+1}{k-1} \right )^{1/k} \]