Notes on the History of Mathematics


Notes on the History of Mathematics
Notes on the History of Mathematics
Jeremy Martin and Judith Roitman
January 14, 2014
Length, area, volume
Length, area, and volume have been studied from the beginning of recorded history (and probably before) because of the importance of accurate measurement— consider what happens when
rivers flood and recede, changing property lines. Every major ancient civilization has been able to
calculate the areas of rectangles and triangles, and the volumes of rectangular solids.
In ancient Egypt, the Rhind papyrus (an Egyptian mathematical text dated to approximately
1650 BCE)1 had sample problems finding the area of a rounded field, finding the amount of grain in
a cylindrical granary (i.e., calculating the volume of a cylinder), calculated the area of an octagon
inscribed in a square (not the same as a regular octagon), and so on. By 300 BCE (the Cairo
papyrus) the Egyptians knew the Pythagorean theorem. They had an incorrect general formula for
the area of convex quadrilaterals, but their formula was correct for special cases (e.g., trapezoids).
Mesapotamia also had basic area formulas for triangles and trapezoids (hence rectangles); knew
the Pythagorean theorem; knew the area of a regular hexagon; approximated π as 3, calculated 2
as 30547/21600 (accurate to 6 decimal places), and so on.
Ancient China knew the Pythagorean theorem (there’s a theme here, isn’t there); knew most
of the polygonal and polyhedral area and volume formulas; while they used 3 to approximate π
they knew it was wrong (and Li Hui in 263 CE calculated π using a 192-sided regular polygon to
get π ≈ 3.1416); by the 5th century CE Zu Chongzhi and Zu Gengzhi (father and son) knew the
volume of a sphere and approximated π to 7 decimal places.
And here is the Chinese diagram for the proof of the Pythagorean theorem, taken from the
Zhoubi sianjing (Zhou Shadow Gauge Manual — various parts were written sometime between
1100 BCE and 100 CE):
Ancient India was obsessed by the importance of accurately constructing altars according to
Vedic instructions; in particular, how do you construct a square altar with the same area as a given
although the writer said it was actually a transcription of another document from 200 years before that
circular altar? They knew that π is a little bigger than 3; they knew how to construct a square
whose area is twice that of a given square; they had many constructions using and several proofs
of the Pythagorean theorem (proofs via diagram). By the 7th century CE Brahmagupta knew the
area of any quadrilateral inscribed in a circle; knew Heron’s formula; knew the volume of a cone. By
the 9th century Sridhara calculated
the volume of a frustram of a cone (= what you get when you
cut off the top) using π ≈ 10. In the 9th century Mahavira had many correct area formulas. By
the 12th century Bhaskara approximated π to be about 3.1429; and knew the surface and volume
of a sphere as follows: if d = diameter and C = circumference, then S = dC, V = Sd
6 — these are
very elegant formulas. Plug in C = 2πr to see that indeed these are the usual formulas.
The ancient Greeks were obsessed with geometry — as we will learn later, they saw algebra as,
in some sense, a branch of geometry — hence they were very sophisticated regarding measurement.
Now for some examples.
1. The octagon in the square
This comes from ancient Babylonia.
Take a square, divide each side into equal thirds, and connect as the the picture to get a nonregular octagon.
What’s its area compared to the area of the square? Each of the small triangles you’re throwing
out has area 12 s9 . You’re throwing away four of them, so you’re throwing away an area of 2s9 .
Hence you’re left with an area of 7s9 , so the octagon has 79 the area of the square.
2. Egyptian calculation of the area of a circle2
Example of a round field of diameter 9 khet. What is its area? Take away 1/9 of
the diameter, 1; the remainder is 8. Multiply 8 times 8; it makes 64. Therefore it
contains 64 setat of land.
Here are some features of this excerpt that are characteristic of ancient Babylonian and Egyptian
• It’s pretty accurate. The relationship between area A and radius r is A = 64d2 /81 =
(256/81)r2 , which is equivalent to approximating π ≈ 256/81 = 3.16049 . . . . This is not
bad at all, and would be perfectly fine for any applications the Egyptians used it for.
from the Rhind Papyrus.
• Mathematics is described verbally instead of symbolically. In modern notation, we can rewrite
the relationship between area A and diameter d given in the excerpt as
A = (d − d/9)2 =
64 2
but the Egyptians lacked the notational tools to do this (to be fair, Western mathematics
didn’t come up with modern algebraic notation until the 1600s or so).3
• In these cultures, mathematics was concerned with solving applied, practical problems. Rather
than talking about the area of a circle, the problem talks about a “round field”. There is
little, if any, geometric abstraction in extant Babylonian and Egyptian texts.
• We have no idea what a “khet” or a “setat” is, but we can infer it from context; one setat is
presumably one square khet. In particular, they had units of measurement.
• The Babylonian and Egyptian writings tend not to include explanations (much less formal
proofs). There’s more focus on how to solve a problem (by following an algorithm) than why
the given solution works.
3. Eratosthenes’ calculation of the size of the earth
It wouldn’t be a history of math course without showing how Eratosthenes (who lived in the
Greek areas of Egypt, 276 - 194 BCE) calculated the size of the earth.
First of all, the ancient Greeks (and they weren’t the only ones) knew perfectly well that the
earth was round — looking at ships’ masts as they disappeared over the horizon was one piece
of evidence. Lunar eclipses (where the earth is directly between the sun and moon) are further
evidence: you can see the earth’s curved shadow falling on the moon. Once you believe that the
earth is curved, a sphere is the simplest possibility. They also knew that the sun was very very
very very far away from the earth (our modern measurement is about 93 million miles).
So Erastosthenes started by assuming the earth was a sphere. He also assumed that, because
the sun was so far away, the sun’s rays striking two places on earth that weren’t very far away from
each other were nearly parallel. He also knew that the sun shone directly into a particular well at
Syene, Egypt (in modern-day Libya) at the summer solstice. He knew that Alexandria was 5000
stadia (a unit of length; the singular is stadium) due north of Syene, and that a staff in Alexandria
cast a short shadow when the sun was at its zenith.4 Erastosthenes put all this information together
in the following diagram.
One wonders what elements of modern mathematics notation will become obsolete in a millennium or two.
That is, when the sun was as high as it would possibly get that day.
Measuring the circumference of the earth
Sun’s rays
Sun’s rays
B: base of staff
C: center of earth
D: shadowless place
S: tip of staff
T: tip of shadow
earth’s center
The diagram, of course, is not at all to scale—the staff is much smaller in real life than in the
diagram. Be that as it may, Eratosthenes made the following observations about the diagram.
First, the fraction of the earth’s circumference swept out by the arc BD is precisely φ/2π (if we
measure φ in radians). We know that the arc measure is 5000 stadia, so this reduces the problem
to figuring out the value of φ.
Second, while we obviously can’t determine the angle φ directly (without a very powerful drill!),
it must be almost equal to θ, because the two red lines (the sun’s rays) are very close to parallel.
(This is a direct application of Euclid’s alternate interior angle axiom: if two parallel lines are cut
by a third line, then alternate interior angles are equal. This fact is in Euclid’s Elements and was
certainly known to Eratosthenes.)
Third, we can determine θ from knowing SB and BT , which are easy to measure directly. In
modern trigonometric notation,
tan θ =
or θ = arctan
Eratosthenes would not have expressed this equation using the term “arctangent”, but he was able
to determine angles from their tangents, at least approximately. He found that θ was 1/50 of a
complete angle (i.e., 2π/50 or π/25), which meant that the earth’s circumference was 50 · 5000 =
250, 000 stadia.
We don’t know how long Eratosthenes’ stadium was. If he was using the Egyptian stadium of
515.7 ft5 , his estimate would have been remarkably accurate: 24419 miles, compared to our modern
value of about 24900 miles. His stadium may have been a bit longer or shorter than this (and there
are additional complications – how do you know the well points toward the center of the earth, for
example?), but whatever its exact value is, it doesn’t diminish the insightfulness of his method.
Close to 3 khet, if you’re wondering.
About 500 years later the great Indian mathematician Aryabhata I (476–500 CE) gave an even
more accurate estimate of the earth’s circumference: 24,835 miles in our modern units. This
remained the best measurement for over a thousand years.
4. The volume of the cone
It’s not known who first figured out that the volume of a right circular cone is 13 the volume of
the cylinder in which it is inscribed, but this fact was known to (and probably known long before)
Archimedes. Its discovery might have come about by pouring liquid from a cone to a cylinder. Or
it could have come about by proportions, in the following manner:
Suppose we know that the volume of a pyramid with a square base has 31 the volume of the
parallelpiped in which it’s embedded. Since there’s a constant ratio between the area of a circle
and the area of the square in which it’s inscribed ( πr
= π4 ), and since volume is really an infinite
sum of stacked up areas (we think of this as integral calculus, but it was known to the ancients)
there is a constant ratio of π4 between the volume of a pyramid with a square base and the right
circular cone which is inscribed in it. So it suffices to establish that the ratio between the volume
of a square pyramid and the parallelpiped in which it’s inscribed is 13 . For then
V (cone)
V (cylinder)
· V (pyramid)
· V (parallelpiped)
So the next step is to sketch how to establish the ratio between the volume of a square pyramid
and the parallelpiped in which it’s inscribed geometrically. (The technique we’ll use is associated
with Li Hui.) First, we define a yangma. It’s what you get when you take a parallelpiped with a
square base, pick the midpoint of one of the top edges, and connect that point to the vertices of
the bottom square:
I.e., a yangma is a slanted pyramid, so it has the same volume as a pyramid with the same base.
The fact that the yangma and the pyramid have the same volume is based on a principle known as
Cavalieri’s principle (17th century).6
You can fit three yangmas together to get the parallelpiped. (For an illustration go to id=1408.) So the volume of the yangma is 13 the
volume of the parallelpiped. Hence the volume of the square pyramid is 31 the volume of the
Cavalieri’s principle says that you can calculate a volume by adding up the cross-sections. This principle was
well known long before Cavalieri, e.g., it was used by both Archimedes and Li Hui, and we will meet it again in the
next section.
Now let’s prove by a different method that the volume of a right circular cone is 13 the volume of
the cylinder in which it’s inscribed. This is a method that Archimedes could have done (but using
modern notation).
The diagram below is a cross section through the center of a right circular cone inscribed in a
r is the radius, h is the height. At an intermediate stage (the red triangle) x is the radius and
y is the height.
By similar triangles, hr = xy , so x = ry
h . The ancients knew that you calculated a volume by
stacking up infinitely many small areas so, in modern notation, the volume of the big cone is gotten
by stacking up all the πx2 ’s. Since each πx2 = π hr 2 y 2 , this means the volume of the cone is, in
2 Rh
modern notation, π hr 2 0 y 2 dy = π hr 2 · 13 h3 = 13 πr2 h.
Now Archimedes didn’t have calculus, but what he did know was how
R hto2 find the area of a
parabolic segment. And this was enough to find the value of the integral 0 y dy.
y 2 dy is the area of the wedge on the right.8 We know that the parabolic segment cut off by
the top horizontal line has area 34 · 21 · 2h · h2 = 43 h3 . We know that the area of the rectangle in
which the parabolic segment is inscribed is 2h · h2 = 2h3 . We are interested in 21 the difference, i.e.,
4 3
1 3
r2 1 3
2 (2h − 3 h ) = 3 h . So the volume of the cone is, once again, π h2 · 3 h = 3 πr h.
Since the volume of the cylinder is πr2 h, we’re done.
Archimedes went on to use the volume of a cone to establish the volume of a sphere, and Li
Hui used the volume of a pyramid to establish the volume of a sphere, roughly 500 years apart.
Archimedes’ version is the next thing we’ll do.
5. Archimedes’ derivation of the volume of the sphere9
We’ll do this in chapter 2.
Note that we flipped our x and y variables — that doesn’t matter.
Li Hui has a different approach,
involving inscribing
This derivation uses a little bit of physics. We conflate volume with mass (which you can do if
the mass is uniform) and then we balance things on a lever.
We start off with two facts: (1) the volume of a right circular cone is 13 the volume of the cylinder
in which it’s inscribed, and (2) the law of the lever:10 two objects are balanced to the left and right
of the fulcrum of a lever iff Aa = Bb where A is the mass of the object a units to the left of the
fulcrum (or balance point) and B is the mass of the object b units to the right of the fulcrum.
The sphere with weight A
weighs more than the sphere
with weight B, so it is closer to
the fulcrum.
We also notice that if a slice of a solid is very thin (a lamina), then its area essentially is (modulo
a constant factor) its mass. We will assume that the material we’re using has a constant of mass
of 1, that is, the area of a lamina is the mass.
We are going to hang a sphere and a cone on one string to the left of a fulcrum, and place a
cylinder on its side to the right so that one side goes through the fulcrum. The radius of the sphere
is r, the radius and height of the cone are 2r, the radius and height of the cylinder are 2r, and the
distance of the string from the fulcrum is 2r.
2r 2r
The claim is that this configuration is balanced.
Suppose we know that. Then Aa = Bb where A is the combined volume of the sphere and the
cone, a = 2r, B is the volume of the cylinder, b = r (because the center of mass is halfway through
the cylinder).
The volume of the cylinder is π(4r2 )(2r) = 8πr3 .
The volume of the cone is 83 πr3 . He ends up using a pyramid, rather than a cone.
which Archimedes discovered
So, letting V = the volume of the sphere, by the law of the lever, (V + 83 πr3 )(2r) = 8πr3 (r),
i.e., V = 4πr3 − 83 πr3 = 34 πr3 .
So we are done if we can prove that this configuration is balanced.
We look at balancing the lamina we get from cross sections. The cross sections we consider are:
(a) a horizontal cross section of the sphere a distance x from the top of the sphere, the circle Sx ;
(b) a horizontal cross section of the cone the same distance x from the top of the cone, the circle
Cx ; (c) a vertical cross section of the cylinder a distance x to the right of the fulcrum, the circle
Dx .
In this diagram, x is the length of each red segment. To find the areas of each lamina, we have
to find the length of each associated black segment, the radius of each circular lamina. Which we
will do shortly. First we explain why this will be sufficient.
Note that the distance of the weight of Sx from the fulcrum is 2r. Similarly, the distance of the
weight of Cx from the fulcrum is 2r. And the distance of the weight of Dx from the fulcrum is x.
Let A(E) denote the area of some figure E. Suppose, for each x,
(†) 2r(A(Sx ) + A(Cx )) = x(A(Dx ))
Then, by the law of the lever,R each set of laminae balances. Since, by Cavalieri’s
R 2r
2r(V (sphere) + V (cone)) = 2r 0 (A(Sx ) + A(Cx ))dx, and rV (cylinder) = 0 xA(Dx ), (†) finishes the proof.
First we find the radius of Sx :
The radius of Sx is y =
r2 − (x − r)2 = 2rx − x2 , so A(Sx ) = π(2rx − x2 ).11
Now we turn to Cx :
The alert reader will notice we are not considering the case x < r. This is left to the reader.
By similar triangles, since the height of the cone equals its radius, y = x. So A(Cx ) = πx2 .
Finally, A(Dx ) = 4πr2 .
We are ready to prove (†):
2r(A(Sx ) + A(Cx )) = 2r(π · 2rx − πx2 + πx2 ) = 4πr2 x
x(A(Dx )) = x · 4πr2 = 4πr2 x
So (†) holds and we are done.
Thales and Pythagoras
The great achievement of the Greek mathematicians was developing the idea of proof. As opposed to
their Babylonian and Egyptian predecessors who were mostly concerned with how to solve practical
problems, the Greeks were interested in why mathematics worked.
The mathematicians in ancient Babylon and Egypt were priests and government officials12 and
focused on practical administrative problems: measurement of land, division of goods, tax assessment, architecture, etc. The intended audience of mathematical texts was probably other administrators, and so there was no perceived need to explain why the rules worked—just how to use the
rules. The excerpt from the Rhind Papyrus is an excellent example of this.
By contrast, many Greek mathematicians were independently wealthy and had spare time on
their hands to concern themselves with knowledge for its own sake. Furthermore, the Greeks
discovered that this abstract knowledge could often be put to practical use: e.g., measuring the
height of the Great Pyramid, or estimating the circumference of the earth.
Thales (624–547 BCE) is often considered the first Greek mathematician in this tradition.
Here are some of the theorems credited to him:
• The base angles of an isosceles triangle are equal. (I.e., if ∆ABC is a triangle and AB = AC,
then m∠ABC = m∠ACB.)
• Any angle inscribed in a semicircle is a right angle (you observed this in Math 409 in problem SA4).
• A circle is bisected by any diameter.
These are simple geometric results from a modern standpoint, but the important difference
between them and earlier geometry was that Thales stated them as abstract observations about
lines, circles, angles and triangles, rather than about counting oxen or measuring fields.
There is a legend that Thales impressed the Egyptians by determining the height of the Great
Pyramid. (The square base could be measured directly, but not the height.) He placed his staff on
the ground and measured the lengths of the shadows cast by the pyramid and by his staff.
Adapted from Burton, p.86
(Presumably these units are in khet; see chapter 1.) In the figure, the number 378 is half the
length of a side of the square base — this could be measured directly. So could 342 (the length of
the shadow) and 6 and 9 the height of the staff and the length of its shadow).
The source for this and much other material in this notes is chapters 2–3 of D. Burton, Burton’s History of
Mathematics: An Introduction, 3rd edn., Wm. C. Brown Publishers, 1991.
Now, said Thales, we have two similar triangles, so the height h of the pyramid is given by the
378 + 342
which can be solved to give h = 480 khet.
Thales was able to realize that an abstract theorem about similar triangles could be applied to
easily solve this practical problem. This is one of the first examples of modeling: solving a real-life
problem by replacing it by a mathematical problem in order to be able to apply general theorems.
Abstract mathematical knowledge can have concrete benefits!
Pythagoras (569–475 BCE) was a mystic whose religious beliefs included a strong mathematical component: “All is number” (by which he and his followers, the Pythagoreans, meant
“integer”). This meant that they believed any two quantities should be able to be compared by a
ratio n : m where n, m are integers.
For an example of how useful ratios can be, Pythagoras (or his followers) noticed that if you two
take two strings made of the same material, one twice the length of the other, and pluck them, then
the longer string will produce a sound an octave lower.13 They noticed that other good-sounding
musical intervals arose from pairs of strings whose lengths were in ratios of small integers.
Here’s a question that the Pythagoreans might have asked: what pitch is half an octave? That
is, we have two strings of lengths L and 2L, and we want to determine the length M of a string that
will produce a sound halfway in between. Since relative pitch is controlled by ratios, the lengths M
and L must satisfy the equation
Clearing denominators gives
M 2 = 2L2 , so M = 2L. If you are a Pythagorean, this raises
natural question: what is 2? The Babylonians had found an excellent approximation to 2,
but the Pythagoreans were not interested in approximations; they wanted to understand the exact
value. And, because they believed quantities should be comparable by means of ratios, they wanted
√ exact value to be what we call a rational number. To their consternation, they discovered that
2 cannot be expressed that way. It is what we call irrational. Here is their famous proof.
Theorem 1. 2 is an irrational number. That is, it is impossible to express it as a fraction P/Q,
where P and Q are integers.
Proof. Suppose that
2 is rational. That is, suppose that there exist positive15 integers P, Q such
2 = P/Q.
If we square both sides of equation (1) and clear denominators, we get
2Q2 = P 2 .
One thing to notice from this equation is that P > Q. Another observation is that P has to be
even. If it were odd, then P 2 would be odd as well, and equation (2) would say that an even
number (namely 2Q2 ) equals an odd number, and that’s impossible. Since P is even, we can write
We know today that musical pitch depends on frequency, and that doubling the length will halve the frequency.
Accurate to five decimal places:
see, e.g., this entry from mathematician John Baez’s blog.
No one is disputing that 2 > 0, so P and Q have to be the same sign, and we might as well make them both
positive instead of both negative.
it as P = 2R, where R is another positive integer. If we substitute P = 2R into equation (2), we
get 2Q2 = P 2 = (2R)2 = 4R2 , which implies that
Q2 = 2R2 .
Looking at equation (3), we now see in turn that Q > R and that Q has to be even—otherwise we
would again have an even number equalling an odd number. So we may write Q = 2S, where S is
another positive integer. Substituting Q = 2S into equation (3), we obtain 2R2 = Q2 = (2S)2 =
4S 2 , that is,
R2 = 2S 2 .
This looks just like equation (2); we see that R > S and that R is even. Have we gotten anywhere,
or are we just chasing ourselves around in circles?
In fact, this process has to end. Otherwise, we will end up with a sequence of positive integers
P, Q, R, S, T, . . . , Z, α, β, . . . , ω, ℵ, . . .
that keeps getting smaller and smaller (remember, P > Q, Q > R, and so on). But this can’t
happen! This means that eventually, it is impossible to continue the process. The only way to
resolve this is to realize
√ that our original assumption—namely,
√ that we could find positive integers
P and Q such that 2 = P/Q—had to be false. Therefore, 2 is irrational.
In case you don’t like going down, here’s a proof where you go up:
Proof. Putting together (2) and (3) says that P 2 = 2Q2 = 2(2R2 ) = 4R2 . Therefore P is divisible
by 4, and in particular P ≥ 4.
Putting this together with (4) says that P 2 = 4R2 = 4(2S 2 ) = 8S 2 . Therefore P is divisible by
8, and so P ≥ 8.
In the next step of the proof, we’ll be able to deduce that P is at least as big as 16, and then
32, and then 64. . . P turns out to be bigger than every power of 2. But every number is smaller
than some power of 2. So P doesn’t exist.
There is a widespread legend, almost certainly false, that the Pythagoreans were so upset over
the existence of irrational numbers that they killed the discoverer of the theorem or the person
who revealed it to the rest of the world. (Pythagoras left no writings himself, but he did acquire a
mythical status after his death, with the result is that we have to rely on secondary sources, many
of whom basically made up stories about him.)
This was one of the first proofs by contradiction. In a proof by contradiction, instead of deriving
conclusions directly from your assumptions, you start by assuming that what you are trying to
prove is false, and then show that this necessarily leads to something false—for example, that
2 + 2 = 6, or that there exists an infinitely long decreasing sequence of positive integers. Proof by
contradiction is a vital tool in mathematics.
More and mostly Archimedes
Archimedes was perhaps the greatest mathematician of the ancient Greek world. Also perhaps the
greatest astronomer, physicist, and engineer of the ancient Greek world.
Archimedes was from a noble family, and was related to King Hieron. He was born in 287 BCE
and died in 212 BCE. According to legend, he was killed by a Roman soldier while working on a
geometry problem he had drawn in the sand — in one version of the story, he enrages the soldier by
complaining of the soldier’s shadow on his diagram. If any version of this story is true, the Roman
soldier would have gotten into big trouble — Archimedes’ expertise would have been invaluable to
the Roman army. Capturing him alive would have been a high priority.
Much of what we think of as Greek did not happen in Greece. Archimedes lived in Syracuse
on the island of Sicily, one of the most wealthy and learned cities of the ancient Greek civilization,
visited Egypt as a young man, and studied at Euclid’s academy in Alexandria, Egypt. But Syracuse
was lost to Greek influence when the Romans won the Second Punic War, the one during which
he died, and now Sicily is one of the poorest parts of Italy, best known for being the home of the
It is often said that ancient Greek mathematics and science were uninterested in practical matters, and that the nobility considered such things beneath them. Archimedes is a strong counterexample to these stereotypes.
Here is a brief list of some of the things Archimedes explored in mathematics:
• area
• center of gravity
• precursors to integral calculus & infinite series
• identified π (but didn’t call it such) as the ratio of the circumference of a circle to its diameter;
showed that this was the same ratio as the area of a circle to its radius squared.
• approximated π: between 3 +
and 3 +
• first to be interested in curves traced by moving points, e.g., Archimedean spiral: point moves
with constant speed and constant angular velocity; r = a + bθ
• area of parabolic segment (quadrature of parabola) = 4/3 area of triangle with same base
and height
• Inscribe a sphere of radius r in a cylinder with same radius, h = 2r. Deduced from this that
the ratio of the volume of the sphere to the volume of the cylinder = ratio of the surface area
of the sphere to the surface area of the cylinder = 2/3.
And here is a brief list of some of the things Archimedes explored in physics (note the connection
with the mathematical topics above — he was perhaps the first mathematical physicist):
• parabolic mirror focuses light to a point
• statics (= analysis of loads in static equilibrium)
• properties of level (used to analyze area and center of gravity)
• hydrostatics: law of buoyancy, law of equilibrium of fluids
• noticed gravity16
• center of gravity
Finally, here is a very brief list of some of the things Archimedes worked on as an engineer (note
the connection to physics):
• ship designer
• improved catapult
• mirror as weapon?
• Archimedean screw (still used)
Now for some mathematics that Archimedes actually did.
I. The approximation of π.
Archimedes knew that the ratio [circumference : diameter] is constant for all circles.18 Here is
a diagram that shows one way he approximated π:
s = old side
s* = new side
π ≈ 1/2 ns
when n = 6, s = 1
How Archimedes estimated π
Does this seem trivial? Noticing something as universal as gravity takes a special kind of imagination.
The reason for the question mark is that it’s not clear whether this was ever used, or even whether he actually
came up with it — it’s part of the Archimedean legend. For a terrific discussion of its plausibility both from
a historical and scientific point of view, see
although he didn’t call it π, William Jones was the first to do so, in 1706.
Here’s what’s behind the diagram:
Step 1: Inscribe a regular polygon (all sides the same, all angles the same) with n sides in a
circle of radius 1. Call the side sn . (In the diagram, this is called s.)
Step 2: Now bisect a side and extend a radius through that point to get a regular polygon with
2n sides. Call the new side s2n . (In the diagram, this is called s∗ ).
Step 3: Calculate s2n in terms of sn .
Step 4: Now go back to steps 2 and 3, to find s4n in terms of s2n . And so on.
Oops — we need a place to start.
Step 0: Start with a regular hexagon. Why? Because its side, i.e., s6 , has length 1.
Let’s do step 3: By the Pythagorean theorem x = 1 − s4 = 12 4 − s2 . Another application
of the Pythagorean theorem gives s2n = (1 − x)2 + s4 . After algebraic simplification we get
s2n = 2 − 4 − (sn )2 .
Note that 2πr = C. And C ≈ nsn . And r = 1. So π ≈ 12 nsn . In fact, π = limn→∞ 12 nsn .
For n = 6, ns = 6, so πq≈ 3. At the next stage, s12 = 2 − 4 − 1 ≈ .517 and π ≈ 3.106.
At the next stage, s24 = 2 − 4 − (s12 )2 ≈ .261 and π ≈ 3.132. At the next stage, s48 =
2 − 4 − (s24 )2 ≈ .131 and π ≈ 3.139. At the next stage s96 ≈ .0654 and π ≈ 3.141 — in only
. That’s converging fairly fast.
5 stages (and the first was trivial) we’ve calculated π to within 1000
The problem with this method, from Archimedes’ point of view, is that while he could calculate
various steps in an infinite process, he couldn’t estimate the error at each step. In particular, this
method gave him a lower bound for π, but without an upper bound he had no way to estimate the
error.19 In the homework we’ll look at another method that Archimedes used which gave him an
upper bound for π as well as a lower one. Knowing that π was squeezed between the two bounds
could give him an idea of how good his approximation at any stage was.
II. The quadrature of the parabola.
First, a quick detour: How would Archimedes have thought of a parabola? He would not have
thought of it as the set of all points (x, y) in the plane so y = ax2 + b for some a, b ∈ R.
He certainly would have thought of it as the curve you get by cutting a cone with a plane parallel
to a side. and that’s what he used.
The “quadrature of the parabola” is the following problem: Take a piece of a parabola and find
its area.
Here’s how Archimedes did it.
If he could have calculated the exact error then, of course, he would have had an exact value for π.
We’re interested in the area of the region bounded between the straight line segment QQ0 and
the parabola that has point P on it. Let’s call this region the parabolic segment QP Q0 .
We could have chosen any point on the parabola between Q and Q0 but we chose P for a reason
— it has the special property that the tangent to the parabola through P is parallel to QQ0 . We’ll
take the area of ∆QP Q0 as our first approximation to the area of the parabolic segment.
Now let’s consider the parabolic segments P RQ and P SQ0 . Any parabolic segment has points
that act like P — the tangent line through the point is parallel to the straight line defining the
segment. We pick R and S so the tangent line through R is parallel to QP , and the tangent line
through S is parallel to P Q0 .
From the definition of the parabola by means of conic sections,20 Archimedes deduced that the
area of ∆QP Q0 is 8 times the area of ∆QRP and also 8 times the area of ∆P SQ0 . We’ll write this
as ∆QP Q0 = 8∆P RQ == 8∆P SQ0 .21
Thus, the two new triangles you added (∆QRP and ∆P SQ0 ) have, when added together,
area of the original triangle QP Q0 .
So our second approximation of the area of the original segment is ∆QP Q0 +∆QRP +∆P SQ0 =
1 + 41 ∆QP Q0 .
Do this again — you’ll now be adding 4 new triangles, two for each of ∆QRP, ∆P SQ0 . You’ve
now added an area of 41 (∆QRP + ∆Q0 SP ) = 14 · 14 ∆QP Q0 = ( 14 )2 ∆QP Q0 .
And so on. You end up with the area of the parabolic segment = [1+ 14 +( 41 )2 +( 41 )3 +...]·∆QP Q0 .
1 n
So all we need to know is what’s 1 + Σ∞
n=1 ( 4 ) ?
You could use geometric series to calculate this, but it has a lovely calculation using only basic
Take a square with area 1 and cut it into 4 congruent squares. Shade the upper left square dark
yellow — that’s 14 . Leave the lower right square alone and color the other two squares light yellow.
You have a yellow “L.”
Now take the lower right square and cut it into 4 congruent squares. Shade its upper left square
dark pink — that’s ( 14 )2 . Leave the new lower right square alone and shade the other two squares
light pink. You have a pink “L” disjoint from the yellow “L.” Do it again (using blue). And again.
And again.
We’re leaving out a lot of detail here.
I.e., we’re conflating our notation for a triangle and its area.
Every time you do this you’re working with the lower right square. Each time the square that
you shade is 31 the L shape of the stuff you’re not going to work with any more — e.g., the dark
yellow square is 31 the area of the yellow region, the dark pink square is 13 the area of the pink
region, the dark blue square (which somehow doesn’t show up as dark blue on my screen) is 31 the
area of the blue region...
The area of the original square (which is 1) is the sum of all these “L” shapes. And each dark
square is 13 the area of its “L” shape.
1 n
So Σ∞
n=1 ( 4 ) = 3 .
And the area of the parabolic segment QP Q0 is
the area of ∆QP Q0 .
III. A modern application
Let’s use Archimedes’ method to calculate 0 x2 dx. Of course Archimedes wouldn’t have had
that notation, but he might have asked himself what happened if you inscribed a parabolic segment
in a rectangle and look at what’s left over:
(h, h^2)
The shaded portion is half of what’s left over, and its area is
The triangle has base 2h and height h2 , so its area is h3 .
Hence the parabolic segment has area 34 h3 .
x2 dx. Let’s calculate it without
The rectangle has area 2h3 .
The shaded portion has area
semester calculus, is 0 x2 dx.
2 (2h
− 34 h3 ) = h3 (1 − 32 ) =
1 3
which, as we know from first
IV. A detour
Aristaeus, who was roughly contemporary with Archimedes, proved that a conic section is a
parabola iff it has the following property: Start with a single point (called the focus) and a line
(called the directrix). The parabola is the set of points whose distance from the focus is the same
as their distance from the directrix.22
Let’s prove half of Aristaeus’ theorem using modern algebraic methods: we’ll prove that if a
curve has this property then it’s a parabola.
First, it doesn’t matter where the parabola is in the plane or how it’s oriented, so let’s assume
the focus is the point (0, 0) and that the directrix is the line y = c. (In the diagram above, c is
negative, but thispdoesn’t matter for the algebraic argument.) The distance between a point
p (x, y)
and the focus is x2 + y 2 . And distance between a point and the directrix is |y − c| = (y − c)2 .
to the directrix iff x2 + y 2 =
p The distance from (x, y) to the focus = the distance from (x, y)
y 2 − 2cy + c2 iff x2 + y 2 = y 2 − 2cy + c2 iff x2 = c2 − 2cy iff y = 2c
(c2 − x2 ). Which is the equation
of a parabola.
The other direction — start with a parabola and find its focus and directrix — is a little harder.
Note that none of this is what Aristaeus did. He proved that the focus/directrix characterization
was equivalent to the conic section definition. He didn’t have the algebraic tools we take for granted.
This means that all parabolas are similar, see chapter 3.
Conics consist of circles, ellipses, parabolas, and hyperbolas. They are named “conics” because they
arise from slicing cones in certain ways (see below). They are studied using geometric, algebraic,
and analytical (e.g., calculus) points of view, thus are a good case study in the inter-relationship
among various fields of mathematics. Conics also provide a good case study on how mathematics
moves through cultures, since their study began in ancient Greece, moved through Persia and
Arabia, into Renaissance Europe and is fundamental in much of modern mathematics.
The first definition of conics was due to Menaechmus (380 - 320 BCE).23 He defined conics via
planes slicing through cones (hence the name), and he studied tangents, normals, and evolutes of
conics.24 Hypatia (370 - 430 CE) studied conics intensively, but her work is now lost. A charismatic
figure, she was the intellectual leader of Alexandria, brutally murdered by a Christian mob. In addition to her work on conics, she edited Euclid’s Elements and worked on Diophantine equations.25
Over 600 years after Hypatia, the great Persian mathematician and poet Omar Khayyam (1048
- 1131) translated Appolonius’ work into Arabic, which is how Appolonius’ work was preserved.
Arabian mathematics made its way into Europe, and eventually Descartes (1596 - 1650) found the
algebraic expressions for conics that most of us first met in high school.
Conics can most easily be described by slices through double cones:
A plane slicing parallel to the inclination of the double cone gives a parabola. A plane slicing
which is not parallel but still intersects only one of the cones gives an ellipse (of which a circle is a
special case). A plane slicing which intersects both cones gives a hyperbola.
There are many other ways to construct conics. For example if you attach a strong to a piece
of paper at each end, hold it taut with a pencil, and move your pencil around, you get an ellipse.
point of pencil
end A
end B
If you Google “ellipse construction” you’ll find a bewildering array of methods. One worth
talking about is the trammel of Archimedes:, which we’ll do
on Sketchpad.
For simple constructions of parabolas and hyperbolas see: via locus/.
His work was motivated by the problem of duplicating a cube, and he solved it via the intersection of two
parabolas, e.g., by solving a cubic equation.
An evolute of a curve is the locus, or path, of the centers of curvature as a point moves around the curve.
Diophantine equations, due to Diophantus, are algebraic equations with integer (including negative integer)
While geometric constructions differ from conic to conic there is in fact a uniform definition
of conics: via focus and directrix. You start with a fixed point F in the plane (the focus) and a
line l in the plane (the directrix) where F is not on l. You also fix a positive number e called the
eccentricity. Then you consider the curve C = {p : F p = eP l}.
If 0 < e < 1, C is a ellipse. The circle is the special case e = 0 and l is infinitely far away from
If e = 1, C is a parabola. Note that, by this definition, all parabolas are similar.26
If e > 1, C is half a hyperbola; to get the other half, reflect about l.
There is also a uniform definition using polar coordinates.27 Consider the function r =
where e, x are constant, and e ≥ 0 (e is called the eccentricity).
1+e cos θ ,
If e < 1 the curve is an ellipse. The circle is the special case e = 0.
If e = 1 the curve is a parabola. This gives us another proof that all parabolas are similar.
If e > 1 the curve is a hyperbola.
Finally, we have the general Cartesian form for conics:28 every conic is the set of solutions to
some equation of the form Ax2 + Bx6 + Cy 2 + Dx + Ey + F = 0 where at least one of A, B, C 6= 0.
To determine which conic is which, we define the discriminant to be B 2 − 4AC.29
If B 2 − 4AC < 0, the curve is either an ellipse or a degenerate form, i.e., an equation with no
real solutions, hence no graph on R2 . For example, x2 + y 2 + 1 = 0 is a degenerate form.
If B 2 − 4AC = 0, the curve is a parabola.
If B 2 − 4AC > 0, the curve is a hyperbola.
In the late 18th century people started noticing that you could generalize the notion of conics
in useful ways in other ares of mathematics. Here are three generalizations, mathematical details
not included.
Second order partial differential equations (PDE’s)
Second order PDE’s (i.e., those which involve first and second partial derivatives) are associated
with quadratic forms, i.e. symmetric polynomial equations in several variables where the degree
of each term is 2.30 The quadratic forms of interest here have the form Ax2 + Bxy + Cy 2 = 0 (or
similar symmetric polynomial equations with perhaps more variables). Quadratic forms allow us
to classify differential equations. (For details on this see a differential equations text.)
To give examples of this classification, we need to mention the Laplacian operator ∇2 . Instead
of defining it I’ll give the three-variable example: ∇2 u = ∂∂xu2 + ∂∂yu2 + ∂∂zu2 .
The Laplace equation ∇2 u = 0 is elliptic.
The heat equation
The wave equation
∂t − α∇ u = 0
− c2 ∇2 u =
is parabolic.
0 is hyperbolic.
We used this in chapter 2.
This definition is the first aspect in our discussion of conics which was not known in the ancient world. Polar
coordinates appeared around the 17th century; I’m not sure when the conic equations first appeared.
This needed the development of the Cartesian plane (Descartes wrote it up in 1637) and co-ordinate geometry;
again, I’m not sure when the general Cartesian form was noticed or by whom.
If this looks familiar from the quadriatic formula – it isn’t. This B is the coefficient of xy, not x; this C is the
coefficient of y 2 , not the constant term.
Quadratic forms are important in several areas of mathematics; their study traces back at least to the 7th century
Indian mathematician Brahmagupta, who studied them from an algebraic point of view.
The terminology here comes from considering a matrix M related to the general form of a second
order differential equation.31 If the det M < 0, the equation is elliptic. If det M = 0, the equation
is parabolic. If det M > 0, the equation is hyperbolic.
Gaussian curvature
In calculus you studied the local geometry — i.e., the geometric attributes that change from
point to point — of a curve: tangent lines, curvature, and so on. Similarly, there is a notion
of local geometry of a surface. One of the important local properties of a surface is called the
Gaussian curvature. If it is positive, the geometry is called elliptic, defined as: no two distinct
lines are parallel. If the Gaussian curvature is 0, the geometry is called Euclidian (i.e., all the
standard two-dimensional axioms apply). If the Gaussian curvature is negative, the geometry is
called hyperbolic: given a line l and a point p on the surface, there are infinitely many lines through
p parallel to l.
The terminology here is easy to explain. If you rotate an ellipse about an axis you get an elliptic
surface (a sphere is the most familiar one). If you rotate a hyperbola about its semi-minor axis
(perpendicular to the line connecting the two foci) you get a hyperbolic surface.
Discrete probability distributions
Discrete probability distributions describe behavior that is not continuous. For example, a coin
flip can either be heads or tails, there is nothing in between. It turns out that these can be
related to quadratic forms. The binomial distribution (the probability of k successes in n trials)
is elliptical. The Poisson distribution (used for analyzing rare events) is parabolic. The negative
binomial distribution (e.g., the probability of getting 7 heads before the fourth tail in a sequence
of coin tosses) is hyperbolic.
The terminology here is not so easy to explain.
For details, check
In the ancient world, trigonometry was largely motivated by astronomy. Other uses such as surveying, navigation and optics didn’t achieve prominence until the 13th century (Arab mathematicians
used trigonometry for surveying and optics) and the 16th century (European mathematicians used
trigonometry for surveying and navigation).
In the ancient world they wouldn’t have spoken about “the trigonometric functions” because
they didn’t have the notion of function, and thought of what we call trigonometry rather differently.
For example, here’s how they thought of the sine of an angle: Take a circle of diameter one. Inscribe
the angle in the circle. Look at the base of the resulting triangle. Its length is the sine.
Let’s sketch a proof of why this gives the same value as our definition of sine.
In this diagram, the inscribed angle α = ∠ACB, and the ancients’ definition of sin α was AB.32
By a theorem on inscribed angles in a circle (in the geometry notes, this is homework problem
EP7),33 ∠BCA = 12 ∠BOA, so ∠DOA also equals α. Since AD ⊥ OD34 , our definition of sin is
that sin α = AD
. But OA has length 21 , so sin α =
defined sin α. So the two definitions are the same.
= AB, which is how the ancients
Similarly, tangents were thought of in the context of calculating lengths of shadows. (Yes, this
is implicitly a ratio...)
It wasn’t until the 16th century that Copernicus’ student Rheticus explicitly defined the trigonometric functions as ratios.
Why chords on circles? The general model for the universe in the ancient world was of the stars
and planets moving around the earth. The general notion was that the movement of a particular
heavenly body was restricted to a particular sphere around the earth. The line segment connecting
two points of such a body’s movement through space were chords on a great circle. So knowing the
length of such chords was crucial to understanding planetary and stellar motion. Later this notion
developed with smaller spheres of motion moving along spherical paths, and then even smaller
spheres... the whole thing became unwieldy and collapsed when Copernicus proposed instead that
the earth and planets moved around the sun. Copernicus still proposed spherical orbits; Kepler
was the one who proposed elliptical orbits. But that’s outside our story.
The development of trigonometry was driven by the need for better calculations of (what we
We’re conflating the length of a line segment with the name of the line segment.
If you aren’t taking Math 409 this semester, EP7 says that if O is the center of a circle, and A, B, C are points
on the circle, then ∠ABC = 12 ∠AOC.
call) trigonometric functions, largely because these were needed for other calculations. Many of the
theorems about them — the formulas for sin(a+b), sin(a−b), etc. — were driven by such calculation.
The real theoretical breakthroughs — trigonometric functions as ratios, their association with the
unit circle, the notions of trigonometric functions as functions, their periodicity, etc. — came much
While several aspects of trigonometry were studied by many Greek mathematicians, including Eudoxus (4th century BC), Euclid (@300 BC), and Archimedes (3rd century BC), it was
Hipparchus (2nd century) who was in some sense the inventor of systematic trigonometry. In
particular, Hipparchus published 12 books of trigonometric tables (all of which have been lost).
Plane trigonometry developed simultaneously with spherical trigonometry (essentially the study of
triangles on a sphere — AAA is a congruence axiom on the sphere, and SSS isn’t)35 and this was
systematized by Menelaus of Alexandra (@ 100 CE). Ptolemy (@ 150 CE) systematized and greatly
developed trigonometry (and much other mathematics) in his Mathematical Synthesis (which Arab
mathematicians called the Almageste, i.e., “The Greatest,” and that is the name that caught on).
Here is some of the trigonometry published in the Almageste, as we would express it:
• sine tables in increments of 1/4◦
• sin2 x + cos2 x = 1
• formulas for sin(a + b), sin(a − b), and for sin( x2 )
• law of sines: sinA a = sin
b = sin c = 2r where A, B, C are the lengths of sides of a triangle, a
is the angle opposite side A etc., and r is the radius of the circle circumscribed around the
Euclid knew a form of the law of cosines (C 2 = A2 + B 2 − 2AB(cos c) where A, B, C, a, b, c as
None of these people had the modern notion of trigonometric function. Instead, they talked
about chords of a circle (recall the definition of sine that this section starts with). Everything was
done in the language of chords of a circle.
Mathematicians in India learned about Greek trigonometry and took it further. In the 6th
century CE they developed all six trigonometric functions, and thought of them as ratios – i.e.,
they had our modern notion of sin, cos, etc.. In 1150 CE Bhaskhara knew how to calculate the sine
of any angle. By the 15th or 16th century they had power series for sine, cosine, and inverse tangent
— this was two centuries before Euler developed such series in Europe. Some sources suggest that
they used tangent lines of trigonometric functions to predict eclipses — I haven’t seen details of
how this was done.
China imported Hindu astronomers (who were necessarily also mathematicians) who had a
strong influence on Chinese mathematics. The Chinese mathematician I-Hsing (also a Buddhist
monk) in 724 CE published tangent tables.
Both Hindu and Greek mathematics made it to the Arabian peninsula. Around 860 CE, tangent
and cotangent ratios were developed by Arab mathematicians. In the late 9th and early 10th centuries, Al-Battari developed better sine and tangent tables. The great 10th century mathematician
Abu-I-Wafar was the first to consider the trigonometric functions as being related to the unit circle
AAA is the statement that if two triangles have corresponding angles congruent, then they are congruent; SSS is
the statement that if two triangles have corresponding sides congruent then they are congruent.
(today we think of this as the coordinates of points on the unit circle: (x, y) = (cos α, sin α), where
α is the angle between the x-axis and the vector from the origin to the point). By the 13th century
Arab mathematicians had broken free of the identification of trigonometry with astronomy. They
knew of all six trigonometric functions, knew many identities, knew how to construct trigonometric
tables by interpolations, and used trigonometry for surveying and optics.
Greek and Arab mathematics in turn influenced European mathematicians in the 15th century. Regiomontanus wrote the first systematic treatment in Europe of both plane and spherical
trigonometry. Rheticus defined the trigonometric functions as ratios. Work on trigonometric tables
continued, and by 1700 European trigonometric tables were accurate up to 15 decimal places —
this was without modern decimal notation. Such tables were crucial for surveying, navigation, and
telling time (in the calendrical sense). In the 16th century Viete united trigonometry with algebra,
which was a crucial step, and after that its development exploded. That trigonometric functions
could be calculated for any number and were periodic functions was a major realization, and in 1635
Roberval created the first graph of the sine function. By the late 17th and mid 18th centuries the
Bernoulli brothers were considering trigonometric functions of complex numbers. In the 18th century Euler considered trigonometric functions in all of their aspects: as ratios, as periodic functions,
and as infinite series. From the latter he developed the formula eix = cos x + i sin x, from which he
concluded (since sin π = 0 and cos π = −1) that eπi = −1. Fourier (1763 - 1830) realized that every
continuous function on a closed interval is equal to an infinite sum of trigonometric functions. In
particular, if the interval is [−π, π] then the series has the form a20 + Σ1=n ∞ an cos nx + bn sin nx.
With this, trigonometry became firmly embedded in the rest of mathematics; the mathematics that
comes out of the study of Fourier series is known as harmonic analysis.
Trigonometry theorems from geometry: Ptolemy’s theorem
Ptolemy’s theorem is another one of those theorems that need to be included in any history of
mathematics course. But where to put it? Since it can be used to derive the law of cosines and the
formula for sin(α + β), I’m putting it here.
Ptolemy’s theorem says that if a quadrilatral can be inscribed in a circle, then the product
of the diagonals is the sum of the products of the opposite sides. I.e., in the following picture,
AC · BD = AD · BC + AB · CD.36
Proof. To prove Ptolemy’s theorem, we need to add a point E so ∠ABE = ∠DBC, as follows:
Again we’re conflating names of sides with their lengths.
Note that if the center of the circle is on BD then E will be on BD. The proof in that case is
much easier, so we leave it to the reader.
The angles marked α are the same by construction. The angles marked β are the same because
they are inscribed in the circle with the same base, BC (e.g., this is a generalization of EP 7 in the
geometry notes). Similarly, the angles marked δ are the same.
So, since plane triangles whose angles are congruent are similar, ∆ABE ∼ ∆DBC. Hence
= DC
. I.e., AB · DC = ED · DB.
By the same reason, ∆ABD ∼ ∆EBC, so
i.e., AD · BC = DB · EC.
Putting this all together,
AB · DC + AD · BC = AE · DB + EC · DB = (AE + EC) · DB = AC · DB
as desired.
Ptolemy’s theorem immediately gives us the Pythagorean theorem when ABCD is a rectangle:
c is the length of each diagonal, and a, b are the lengths of the sides.
The law of cosines comes from considering Ptolemy’s theorem applied to a trapezoid, as follows:
Start with an arbitrary triangle (the red triangle) and inscribe it in a circle. Reflect about the
perpendicular bisector of one side to get an isosceles trapezoid (i.e., two sides have the same length,
in this case AD and BC). If the diagonals have length c, the identical sides have length a, and the
other sides have length b and d respectively (we’ll set AB = b and CD = d) then, by Ptolemy’s
theorem, a2 + bd = c2 .
The claim is that d = b − 2a cos ∠ABC. If that were true, then we’d have
a2 + b2 − 2ab cos ∠ABC = c2
which is exactly the law of cosines.
So we need to prove that d = b − 2a cos ∠ABC.
First we extend the trapezoid into a rectangle.
The alert reader will notice that the proof needs to be adapted when ∠ABC >
We denote ∠ABC = γ and note that, by construction, γ = ∠DAB, so by transversals of parallel
lines, γ = ∠ADE. Also, ED = 12 (b − d).
Since cos γ = cos ∠ADE we have cos γ =
2a cos γ and d = b − 2a cos γ as desired.
So 12 (b − d) = a cos γ. I.e., b − d =
To derive the formula for sin(α + β) (at least where α, β < π2 ) we consider the following picture,
where BD is the diagonal of a circle of radius 12 , and α, β are the desired angles:
By EP 7 in the geometry notes, ∠BAD = ∠BCD = π2 . Since BD = 1, CD = sin α and
AD = sin β; also, BC = cos α and AB = cos β. Meanwhile, by the definition of sin in the
beginning of this chapter, AC = sin(α + β).
By Ptolemy’s theorem, AC·BD = AB·CD+BC·AD. But BD = 1, so AC = AB·CD+BC·CE.
I.e., sin(α + β) = cos β sin α + cos α sin β, as desired.
A quick and dirty proof of the law of sines
Let’s inscribe a triangle in a circle (the red triangle) and consider one angle of the triangle (α).
Drawing a diameter through another vertex of the triangle we can construct (using the theorem
about angles inscribed in a circle with the same base) a right triangle with one angle equal to α:
If a is the length of the side opposite α and r is the radius of the circle, from the blue triangle
. I.e., sina α = 2r. But this is true for all angles of the triangle, so if α, β, γ are
we have sin α = 2r
the angles and a, b, c are respectively the opposite sides, we have
2r =
which implies the law of sines:
sin α
sin α
sin β
sin γ
sin β
sin γ .
An application of trigonometry to the path of a point moving under a constraint: the
versed sine curve
Maria Agnesi was a mathematician in the 18th century who published an important book on
curves. One of the curves she studied was called the “versed sine curve” or averisera (in Latin —
the word is derived from the Latin word for “turn”, vertere). This somehow became avversiera or
“witch” so the curve is often known as the witch of Agnesi.
It was defined as the locus of a moving point. A still picture of it looks like this:
The curve will trace the path of p, but before doing that let’s define p carefully. We start with
a circle tangent to two parallel lines. For convenience, we assume the lines are horizontal. We take
the point at the bottom of the circle (not labeled) and call it s. We draw a line from s through the
circle (the straight red line) and call it l; q is where l meets the circle. We extend l until it meets
the top horizontal line at point r (also not labeled) and draw thevertical line m through r. The
point p is the point on m whose height is the height of q.
That’s the set-up. Then we start moving q around the circle, i.e., we move l. As q moves, we
sketch the movement of p — that’s the red curve below:
We need to refer to this diagram, so let’s repeat it on the same page as our calculations:
Let x be the x-coordinate of p and y the y-coordinate. We want to find x, y as a set of parametric
equations: x is the larger dotted green line; y is the thick part of the diameter of the circle.
We are given is that the diameter of the circle is 2a.
We let t be the angle between the red and blue solid lines; t is our parameter.
x is easy:
= tan t, so x = 2a tan t.
Finding y is a big trickier. First let z be the short thick green line inside the circle. By the
definition of sin, z = 12 · 2a sin 2t. (The 2a is because that’s the diameter of the circle, not 1.). And
sin 2t = 2 sin t cos t. So z = 2a sin t cos t.
By similar triangles,
2a · 2a sin t cos t
= 2a cos2 t
2a sin t/cost
If you eliminate t in the parametric equations, you get y =
x2 +4a2
You can check this by substituting the parametric functions for x and y, and using the fact that
1 + tan2 t = sec2 t = cos12 t .
Curves before functions
Mathematicians were studying functions for thousands of years before the notion of “function” was
defined — we have already seen conics, what we would call the trigonometric functions, formulas for
finding area and volume, and curves such as the versed sine curve which are defined by constrained
paths, all of which were studied long before our modern notion of function, and all of which are
either functions or closely related to functions. 38
Because algebraic notation did not exist until a few hundred years ago, curves were described
geometrically, usually as a constrained path (called a locus).39 For example, a circle (not a function)
is described as “the locus of all points equidistant from a given point” (called the center). The idea
is that you have a center and a single point p at the desired distance from the center. As p travels
around the center, keeping the same distance, it traces out a circle. The versed sine curve (which
is a function) is another example of a curve described by a locus.
In this chapter we discuss a few more of these curves which arose before functions. Their geometric definitions are complicated, but it’s important to understand that they arose from considerations
of very concrete problems. For example here are three major problems of ancient Greece:
• Duplicating a cube: given a side of a cube s, can you construct (using only straightedge and
compass) a side s∗ of a cube whose volume is exactly twice that of the first cube?
• Trisecting an angle: given an angle α, can you construct (using only straightedge and compass)
another angle whose measure is α3 ?
• Squaring a circle: given a circle, can you construct (using only straightedge and compass) a
square with the same area?
It’s important to note that if you take away the phrase “using only straight edge and compass”
all of these constructions can be done (and we’ll see a few examples). It’s the constraint that makes
the following theorem true:
Theorem 2. Doubling a cube, trisecting an angle, and squaring a circle are impossible using only
straightedge and compass.
Here’s a sketch of the proof.
Proof. For doubling a cube: Given a cube whose sides have length a, if x3 = 2a3 , then x = a 3 2.
But (by methods similar to what’s in Stahl’s algebra text on the construction of regular polygons)
you can’t construct the cube root of 2 by straightedge and compass.
For trisecting an angle: If you could trisect α you could construct sin α3 . So let x = α3 . By
the triple angle formula for sin, 4 sin3 x − 3 sin x + sin α = 0. I.e., you’d be solving a cubic. By
straightedge and compass. Which you can’t.
For squaring a circle: Given the radius r you’d need to construct a line of length π. But if you
can construct a using straightedge and compass, then you can construct a. And transcendental
numbers such as π can’t be constructed.
It’s important to note that not every function describes a curve and not every curve can be described as a
Although we have already seen conic curves described in other ways, e.g., by slicing cones.
The alert reader will notice that we’ve already used some functions: a simple cubic in the first
proof, and trigonometric functions and cubics in the second. But the main purpose of this section
to show that much more complicated functions arise naturally.
Let’s look at trisecting an angle when you’re allowed more techniques than just straightedge
and compass. If you Google “trisecting an angle” you’ll find many techniques for doing this. These
generally involve complicated curves, i.e., functions. We will try to look at these curves and their
uses the way they were originally described, but will find ourselves naturally falling into algebraic
terminology because that’s just how folks think these days.
We give two detailed examples of trisecting an angle: using the Archimedean spiral, and using
the quadratix of Hippias. These are both curves but, failing the vertical line test, neither of them
are functions.
As a preliminary step let’s show that, using just straightedge and compass, you can trisect a
straight line segment.
Suppose you want to trisect AB. Draw a another line m through A. On this other line, mark
off some length Ap three times. Now draw parallel lines as in the diagram.
By similar triangles, you’ve just trisected AB.
How Archimedes trisected an angle
Using this, let’s trisect an angle using the Archimedean spiral. Given positive constants a, b, the
Archimedean spiral determined by a, b is the locus of points p so that if r is the distance from p to
the origin, then r − a is a constant multiple by b of the measure of the angle θ formed by the line Op
(where O is the origin) and the x-axis.40 (By using the variables r and θ we’re already anachronistic.
But it’s really hard to make sense of the Archimedean spiral without this algebraic notation.) In
modern terms, the Archimedean spiral has the following equation in polar coordinates: r = a + bθ,
where a, b are constants. That’s a lot easier to make sense of. Try sketching it without a graphing
calculator when a = 0 and b = 1; when a = 0 and b = 2 [hint: let θ = 2π, π, π2 , π3 ...].
Now suppose you have an Archimedean spiral and you want to trisect an angle α. Place the
angle α so the origin O is the vertex, and the x-axis is one side of the angle. Let p be the point
where the spiral intersects the other side of the angle. Trisect Op to get length R3 . Construct a
circle centered at O with radius R3 . This circle intersects the Archimedean spiral at a point s. Os
is the side of the desired angle α3 .
Note that angular measurement and linear measurement are equated here. This might not seem so natural to us,
but it would have seemed natural to the ancient Greeks, since they thought of an angle measure as a measurement
of the length of a segment of a circle.
In this picture, the dark lines form the original angle α and the blue curve is the spiral r = θ.
How do we know that we’ve trisected α? Os = Oq = 13 OP = 13 α.
How Hippias trisected an angle
Next, let’s trisect an angle using the quadratix of Hippias. What is the quadratix of Hippias?
Here’s how Hippias thought of it: Pick a point O, draw a horizontal line through the point, and let
α be an angle (in radian measure) whose vertex is O with one side coinciding with the horizontal
line; call the other side m. Now let p be the point on the vertical line through O whose distance
41 Draw a horizontal line l through p. The point q where l meets m is a point on the
from O is 2α
π .
quadratix. As α varies, the points q trace out the quadratix.
To see how this curve is constructed, ask yourself what happens when α = π2 , π3 , π4 ... and so on.
In modern notation, the equation for the quadratix is x = y cot( πy
2 ). Or, in polar coordinates,
r = π sin
Now let’s return to trisecting an angle. Because of the way p is defined from α in the quadratix,
if you trisect Oq to get the point r (so Or = Oq
3 ), and then draw a horizontal line from r meeting
the quadratix at point s, Os forms the desired angle. [Proof not given.]
Other curves invented by the ancient Greeks include the conchoid of Nicomedes (about 200 BC),
which satisfies the polar equation r = a + b sec θ — this curve can be used both to duplicate the
cube and trisect an angle — and the cissoid of Diocles (about 180 BC), which satisfies the polar
See previous footnote.
equation r = 2a sin θ tan θ. And so on — the ancient Greeks discovered many interesting curves
via loci.
Another locus problem due to Apollonius, and discussed by Pappus in the 3rd century CE, was
the following: suppose you have four lines l, k, m, n, and a positive real number r. Find all points
p so d(p, l) · d(p, k) = r · d(p, m) · d(p, n). This is known as the four line problem. It is worth
mentioning because, when Descartes wrote his Geometry in the 17th century, setting forth much
of what we now know as Cartesian geometry (a.k.a. analytic geometry) (which was independently
developed by Fermat) he specifically cited this problem. I.e., people worked on this problem for
1900 years and still hadn’t solved it.
Once you think of lines having equations of the form y = mx + b, then, because of the formula
for the distance from a point to a line,42 this problem reduces to finding all solutions (x, y) to
√1 x−y+b
√2 x−y+b
√3 x−y+b
√4 x−y+b
the equation |m
· |m
= r |m
· |m
. Finding all x, y satisfying such an
(m1 ) +1
(m2 ) +1
(m3 ) +1
(m4 ) +1
equation is not a trivial task, but look how the problem has shifted — from geometry to algebra.
Note that Decartes’ Geometry was an appendix to his major philosophical work A Discourse on
Method. Scholarship wasn’t split up back then.
which you learned in calculus
Functions and relations
So far we’ve seen curves largely described as paths or locii: how a point moves according to certain
constraints. We also discussed other geometric definitions of hyperbolas, ellipses, and parabolas.
In the ancient world people deduced enough information about curves defined in these ways to
come very close to being able to describe coordinates on a coordinate system. But they didn’t have
coordinate systems.
Meanwhile there were instructions that, when translated into symbols, would have looked like
functions. E.g., “Given a number, square it and add two.” We would write this as: x2 + 2.
For the two approaches to come together, algebra and geometry had to be conflated. I.e., you
needed a coordinate system, you needed algebraic language, you needed to be able to express
geometric notions (such as distance) algebraically, and you needed to put this all together.
This point of view gives the following: a circle in the coordinate plane centered at (a, b) is
defined by an equation (x − a)2 + (y − b)2 = R where R > 0. That is, it consists of all points with
coordinates (x, y) which satisfy this equation. This is the point of view you first met in 6th or 7th
grade. Like many other things you met in elementary or middle school (e.g., the alphabet, or place
value notation) the development of these ideas was non-trivial and took a long time. The work of
Fermat and Descartes was crucial here.
Some of these geometric objects behaved, from an algebraic point of view, very nicely: one
variable was completely determined by the other one (or by the other ones in a multivariable
situation). I.e., in modern terminology, they are algebraically defined as functions. The algebraic
descriptions of objects that don’t satisfy this property, such as circles, ellipses, and hyperbolas, are
called relations.
These sorts of objects were very useful to physicists, who wanted to quantify phenomena as
theoretical relationships between variables. In particular, Galileo thought of physics in terms of
what we would now call functions: the value of all but one of the variables predicts the value of
the remaining variable.
Calculus could not have developed without the notion of function, and much of the early theory
of functions (and of physics) came from calculus. The (independent) inventors of calculus were
Newton and Liebniz (who first used the term “function” in 1673). And while we’ve been discussing
functions in the context of algebraic equations, it’s important to note that infinite series were a
major way of describing functions right from the beginning. For example, y = Σ∞
n=0 n! describes
the function y = ex .
In the late 18th century, mathematicians worked hard on the problem of describing the motion
of a vibrating string, i.e., fix an elastic string at two point, pull a point in between — where
is that point at time t? Major contributors here were dAlembert, Euler, Daniel Bernoulli, and
Lagrange. This problem led to the notion of partial differential equations, and its solutions were
infinite trigonometric series.
Through this history, there was an informal notion of a function as necessarily being very
smooth (we would call such functions infinitely differentiable, and there are generalizations for
higher dimensions), largely because of the connection with physics.
In the 19th century, however, people started thinking of functions that did not have this property.
For example, the Dirichlet function (f (x) = 1 if x is rational, 0 otherwise) has a derivative at no
point. The notion of function was caught up in the drive to put mathematics on a formally sound
foundation. Fairly simple objects were ambiguous without this kind of foundation. For example,
what is the infinite sum 1 - 1 + 1 - 1 + 1.... ? You can make a case for 1, for -1, for 0, and even
(this was Euler’s preference) 12 .
The case for 1: 1 - 1 + 1 - 1 +... = 1 + (-1 + 1) + (-1 + 1) + ...
The case for -1: 1 - 1 + 1 - 1 +... = - 1 + 1 - 1 + 1... = -1 + (1 - 1) + (1 - 1) + ...
The case for 0: 1 - 1 + 1 - 1 +... = (1 - 1) + (1 - 1) +...
The case for 21 : By geometric series, 1 - 1 + 1 - 1 +... = Σ∞
k=0 (−1) = lim Σk=0 (−1) =
0 if n is odd
1 if n is even
Rather than throw up his hands and say the sum was undefined, as we do, Euler took the mean
( 0+1
2 ) to get 2 .
Why did Euler do this? Because of the human propensity to reify, that is, when we have a nice
formal expression for something (in this case Σ∞
k=0 (−1) ) it’s very difficult to admit that maybe it
doesn’t describe anything.
Major work on the problem of finding a firm foundation for the notion of convergence and
smoothness (some of which you learn in Math 500) was done by Bolzano, Cauchy, Abel, Dirichlet,
Weierstrass, Reimann, Cantor, Fourier.
The modern notion of function really derives from the work of Cantor via tweaking some of
what’s in Bourbaki:44 a function is a set of ordered pairs f so if (x, y) ∈ f and (x, z) ∈ f then
y = z.
Which means that “function” is a notion that then can detach itself from the notion of continuous
curve. In fact it doesn’t need geometry at all. You can have a function that takes one function to
another (e.g., the derivative, or the anti-derivative). You can have functions that take one kind of
mathematical object to another (e.g., a ring to its additive group). And so on. This general notion
of function was revolutionary, and is essential to much of contemporary mathematics.
Some definitions of function45
Below are some quotes which give an idea of how mathematicians eventually found their way to
our modern notion of function.
Isaac Newton, 1713: I call any quantity a genitum which is. . . generated or produced in
arithmetic by the multiplication, division, or extraction of the root of any terms whatsoever. . .
These quantities I here consider as variable and indetermined, and increasing or decreasing, as it
were, by a continual motion or flux.
Comment. A function (here called genitum) takes numbers to numbers, is algebraic, and smooth.
Multivariable functions are allowed, but the variables take only numerical values.
Johann Bernoulli, 1718: I call a function of a variable magnitude a quantity composed in any
manner whatsoever from this variable magnitude and from constants.
Comment. Not restricted to algebraic definitions — perhaps he was thinking of functions generated by physics where a formula is not known — nor restricted to smooth functions (but Bernoulli
You can apply this analysis to a lot of things, from unicorns to beauty.
a conglomerate of French mathematicians in the mid 20th century who fairly successfully sought to provide the
kind of foundational and comprehensive treatment of mathematics that Euclid managed in an earlier time.
The source is the Mathematical Association of America’s CDHistorical Modules for the Teaching and Learning
of Mathematics
didn’t consider any other kinds). Restricted to functions of one variable.46
Leonhard Euler, 1748: A function of a variable quantity is an analytic expression composed in
any way whatsoever of the variable quantity and numbers or constant quantitiesIf, therefore, x
denotes a variable quantity, then all quantities which depend upon x in any way or are determined
by it are called functions of it.
Comment. “Analytic expression” is key here. Again, numbers go to numbers, and multi-variable
functions aren’t described. Smoothness isn’t mentioned, but Euler didn’t consider other sorts of
Leonhard Euler, 1755: If some quantities so depend on other quantities that if the latter are
changed the former undergo change, then the former quantities are called functions of the latter.
Comment. Here Euler doesn’t mention analytic expression or any other sort of expression. This
seems closer to physics. Multi-variable functions are allowed.
Joseph-Louis Lagrange, 1797: We define a function of one or more quantities [as] any mathematical expression in which those quantities appear in any manner, linked or not with some other
quantities that are regarded as having given and constant values, whereas the quantities of the
function may take all possible values.
Comment. Numbers go to numbers: “mathematical expression” is key here. Multi-variable
functions allowed.
Jean Baptiste Joseph Fourier, 1822: In general, the function f (x) represents a succession of
values or ordinates each of which is arbitrary. An infinity of values being given to the abscissa x,
there is an equal number of ordinates f (x) . . . We do not suppose these ordinates to be subject to
a common law; they succeed each other in any manner whatever, and each of them is given as if it
were a single quantity.
Comment. Fourier is freeing the notion of function from intelligibility — a function is a function
whether or not we know how to generate it. He is gluing the notion of function to the x − y-plane.
No multi-variable functions.
Nikolai Ivanovich Lobachevsky, 1834: General conception demands that a function of x be called
a number which is given for each x and which changes gradually together with x. The value of
the function could be given by an analytical expression, or by a condition which offers a means
for testing all numbers and selecting one of them; or, lastly, the dependence may exist but remain
Comment. Lobachevsky, like Fourier, doesn’t demand that functions come with generating rules,
but he’s still talking about numbers turning into numbers. The use of “gradually” implies some
kind of continuity. Again, not multi-variable.
Karl Weierstrass, 1861: Two variable magnitudes may be related in such a way that to every
definite value of one there corresponds a definite value of the other; then the latter is called a
function of the former.
Comment. Functions are still about numbers. Like Lobachevsky and Fourier, intelligibility isn’t
necessary for a function to be a function. Not multi-variable.
Hermann Hankel, 1870: y is called a function of x when to every value of the variable quantity
x inside of a certain interval there corresponds a definite value of y, no matter whether y depends
on x according to the same law in the entire interval or not, or whether the dependence can be
No matter how they defined “function,” each of these mathematicians worked with what we would call functions
of several variables. But, as with Bernoulli, they might not have considered the latter to be functions.
expressed by a mathematical operation or not.
Comment. Hankel wants the domain to be a union of intervials. Functions are still about
numbers. Not multi-variable. Not requiring “the same law” allows piecewise definitions.
Nicolas Bourbaki, 1939: Let E and F be two sets, which may or may not be distinct. A relation
between a variable element x of E and a variable element y of F is called a functional relation in y
if, for all x an element of E, there exists a unique y an element of F which is in the given relation
with x.
Comment. Now a function can go from any set to any other set. We aren’t restricted to numbers.
Multi-variable functions are included by allowing E to be a set of tuples, e.g., if E is a set of pairs,
then the function can be described by f (x, y) = z. The word “relation” comes from formal logic
and doesn’t imply intelligibility (i.e., we need not have a formula or any other way to find y given
Which is where we stop, because Bourbaki’s is the modern notion of function, often expressed
as: a function is a set of ordered pairs S where if (x, y) ∈ S and (x, z) ∈ S then y = z.
Bourbaki’s definition builds on the exposition of set theory by Cantor and, later, Zermelo: their analysis of all of
mathematics in terms of sets, and the consequent formal definition of ordered pair.
For thousands of years there was sophisticated mathematics centered around what we would call
polynomials, even though there was no notation in which one could easily denote anything we
would recognize as a polynomial. Eventually people started looking at systems of numbers, then
systems of functions (e.g., permutations; real-valued functions). It is only within the last 200 years
that the more abstract systems into which all of these things could be embedded was developed.
Here’s a rough outline of how this played out.
Babylonia 3500 BCE to 600 CE: Could calculate square roots, understood linear interpolation,
had something analogous to exponential and logarithmic tables; could solve linear and quadratic
functions and even a few special cases of functions of higher degree. Largely practical; carefully
worked out examples took the place of proofs. For example, 4000 years ago they were asking
problems like: “Given an interest rate of 60
per month, compute the doubling time.”48
China from 1600 BCE: Centralized bureaucracy meant the central importance of mathematics
— taxes, standardized weights and measures, commerce and salaries, and so on — from a very early
time. Yet in 212 BCE, nearly all of China’s mathematical works (as well as many other written
works) were destroyed at the command of the emperor. Chinese mathematics recovered quickly,
as witnessed by a major text from the Han dynasty (@200 BCE to 200 CE), the Nine Chapters
on the Mathematical Art. It consisted of 246 problems divided into 9 chapters. These chapters
are: Field Measurement: finding area and computing with fractions; Cereals: proportion problems
(this has to do with exchanging one kind of grain for another); Distribution by proportion: more
proportion; What width?: given the area or volume, find the lengths of sides, i.e., finding square
roots and cube roots; Construction calculations: calculations involved in construction, especially
volume; Fair taxes: how to distribute grain and labor based on population and distance; Excess and
deficiency: the method of false position; Rectangular arrays: simultaneous linear equations, adding
and subtracting positive and negative numbers; Gougu: the Chinese name for the Pythagorean
theorem, hence PT and applications. Nine Chapters on the Mathematical Art was largely practical. Carefully worked out examples mostly took the place of proofs, but there was some theoretical
discussion. Later Chinese mathematicians developed sophisticated algebraic concepts such as negative numbers and matrices. It’s important to note that, until the last two or three hundred years,
Chinese mathematics developed essentially independently from mathematics elsewhere, with very
little communication with other cultures.
The following statement from the Zhoubi sianjing is a good explanation of how most ancient
cultures thought about mathematical reasoning: “A person gains knowledge by analogy, that is,
after understanding a particular line of argument they can infer various kinds of similar reasoning...
Whoever can draw inferences about other cases from one instance can generalize... To be able to
deduce and then generalize... is the mark of an intelligent person.”49
India 1000 BCE to 1200 CE: Ancient India was crucial to the development of algebra in Europe.
They invented 0 and our decimal number notation. They were nuts about number theory and
problems we would describe as solutions to algebraic equations. The surviving literature is both
theoretical and practical. In particular, the translation of Brahmagupta’s Siddhanta into Arabic
was seminal.
Greece 800 BC to 800 CE: Unlike the Babylonians and the Chinese, Greek mathematicians were
See mathematics. The mathfraktion
from the MacTutor overview of Chinese mathematics.
has to do with their number
largely theoretical. They could solve some quadratic and cubic equations. But the way they thought
about algebra was quite different from the way we do, since they thought of most mathematics in
geometrical terms. For example, they didn’t have a clear concept of a variable quantity (although
they did have a clear concept of a variable point, that is, a point moving according to constraints).
We’ll do some activities relating to their notion of geometric algebra, which was central to what
they did.
Islamic culture 700 to 1200 CE: Algebra really explodes in the Islamic culture of Arabia and
Persia. In particular, al-Khwarizmi’s discussion of quadratic equations had a revolutionary effect;
in some sense the notion of variable can be traced to him, and he developed the unifying theory we
now call algebra in which, for example, numbers of different kinds are all treated as (what we would
now call) algebraic objects. The way we think of number theory and algebra — so different from
the Greek geometric approach — largely is rooted in his work, and this approach was taken up by
other Arabian and Persian mathematicians, although they did not abandon the geometric approach
(using it, for example, to solve cubic equations). The word algebra comes from the al-Khwarizmi’s
use of the Arabic term al-jabr, the reunion of broken parts. (And al-Khwarizmi’s name gave us our
word algorithm).
Europe 1200 CE to present: European mathematicians learned a lot from Arabian ones and
took the direction even further. Explicit algebraic solutions to cubics and quartics were found. If
you’re curious about these formulas, look them up on Google — they are quite complicated. What
about quintics (= degree 5)? In the mid 1820’s, Niels Henrik Abel showed that there is no formula
by which you can solve an arbitrary quintic equation. None. Zero. Zip. This was absolutely
revolutionary. Also in the early 19th century, Evariste Galois generalized this kind of thing into
notions that we now call groups and fields — Math 558 deals with this stuff. Over the course of the
19th century, algebra became less and less about individual polynomials and more and more about
abstract structures (such as groups and rings and fields) — the set of polynomials, for example,
forms a ring under the operations of + and ×.
Over the same centuries, in a reverse of the Greek attitude, algebra became a way of doing
geometry (think of the Cartesian coordinate system) and both algebra and geometry intertwined
in ways very different from how they did two thousand years earlier. Geometric objects (such
as symmetries) became objects in algebraic systems — this is discussed in Math 409 — and the
interplay between algebra and geometry (and its generalizations) led to entire new fields, such as
algebraic geometry, or algebraic topology.
So a good way to describe the development of what we call algebra would be: India to Arabia
(which includes, in this shorthand, Persia) to Europe. The sophisticated methods of the Babylonians, Chinese, and ancient Greeks — and their work was highly sophisticated — had little influence.
Now for some examples of how people thought of algebra before there was algebra.
Square roots
Here’s how the Babylonians calculated square roots:
To calculate r: given an approximation sn , define sn+1 = 21 (sn +
s1 .
sn ).
You can start from any
In other words, the Babylonian technique provided a sequence of better and better approximations. The further out you took this, the closer your result was the the actual square root.
Since the Babylonians didn’t provide proofs, we don’t know how they knew this worked. Here’s
a modern proof that it works.
Suppose s = limn→∞ sn where sn is defined as above. Then s = limn→∞ sn+1 , i.e., s =
limn→∞ 12 (sn +
sn ).
+ rs ),
So s = 12 (s
i.e. 2s2 = s2 + r. I.e., s2 = r, as desired.
Let’s try 2, s1 = 1. The function we want to iterate is y = 12 (x + xr ).
s2 = y(1) = 1.5; s3 = y(s2 ) = y(y(1)) = 1.466....; s3 = y(s2 ) = y(y(y(1))) = 1.414215...
The exact value of 2 is 1.414213... — in three steps we’ve come within 5 decimal places.
Try this with 3 [= 1, 73205...], again s1 = 1. How many steps until you’re within 5 decimal
Just for fun, try it with 50 [= 7.071067...], again s1 = 1, just to see how things converge when
s1 is a really bad estimate. Again, how many steps until you’re within 5 decimal points?
The proof we used to show that this method worked actually only shows: if the sequence (sn )n
converges, then s = r.
Here’s a sequence that doesn’t converge, but you can prove that if it converged it would converge
to r: sn+1 = 2sn − srn .
If the sequence did converge, and s = limn→∞ sn , then s = 2s − rs , so (after a little junior high
algebra) r = s2 , i.e., s = r.
But try this when r = 2. The function being iterated is y = 2x − xr . First let s1 = 1:
s2 = 2 − 2 = 0; s3 =... whoops, s3 isn’t defined. Try s1 = 1.5, you get a sequence that goes to ∞.
Try s1 = 1.4. You get a sequence that goes to −∞. The hypothesis (s = limn→∞ sn ) is false so the
conclusion (s = r) need not be true and, in these examples, is in fact meaningless — there is no
s to talk about.
By the way, if you go to MathWorld you’ll find this algorithm credited to Newton. He came
several thousand years later. But by then what the Babylonians knew had been forgotten.
For the algorithm I learned in school (back in the Pleistocene era when things like this were
taught in fifth grade),50 go to the second method (“Finding square roots using an algorithm”) at
Geometric algebra
When people realized that not every number was rational (this is attributed to the Pythagoreans,
5th century BCE), the notion of number became a bit fraught. Lengths were more comfortable
to work with, because they seemed more concrete, more like things in the real world. So a very
elaborate algebraic apparatus arose which allowed people to prove what we would call algebraic
theorems, but which they thought of as theorems about length, area, and volume.
The easiest example is (a + b)2 = a2 + 2ab + b2 . Remember that this notation didn’t arise until
well over 2000 years after Pythagoras. How did the ancient Greeks do it?
Note that taught 6= learned.
The area of the big square is (a + b)2 . The area of the yellow square is a2 . The area of the green
square is b2 . The area of each blue rectangle is ab. So the area of the big square is also a2 + 2ab + b2 .
Now let’s sketch a more complicated example.
First of all, it was known (by an argument similar to the one about (a + b)2 , but much more
a−b 2
complicated) that ab = ( a+b
2 ) − ( 2 ) . The question we’ll ask is: how can you find x so that
x = ab? To translate into geometric terms, given a, b how can you construct the side of a square
whose area is ab? Here’s how it was done.
In the diagram below, AB has length a + b, O is the midpoint of AB, and OC has length
d = a−b
2 . DC ⊥ AB, where D is on the circle about O of radius c = 2 .
O d
a−b 2
Hence, by the Pythagorean theorem, x2 = ( a+b
2 ) − ( 2 ) = ab.
Note what is going on here: instead of looking for ways to calculate a number, the ancient
Greeks were looking for ways to construct a length.
The general question of “construct a square whose area is a given area” was of great interest
in ancient antiquity. The particular instance “construct a square whose area is the area of a
given circle” was known as the problem of “squaring a circle,” which was discussed in chapter
6. Recall that it can’t be done, because then π could √
be constructed as the length of a straight
line segment, and while some irrational lengths (e.g., 2) can be constructed, the lengths that
can be constructed are algebraic, that is, they come about by iterations of arithmetic operations
(including taking integer roots) on integers. But π is not algebraic — this was not known until
the 19th century, when it was proved by von Lindemann in 1882, a mere 38 years after Liouville
proved the existence of transcendental (= non-algebraic) numbers in 1844.
Prime numbers
Recall that a positive integer is prime iff it has exactly two factors: itself and 1. Thus, 1 is not
prime, 2, 3, 5, 7, 11... are prime.
How can you tell if a number is prime? The oldest method we know was the sieve of Eratosthenes
(dating back to at least 2nd century BCE Greece), conceptually very simple but in practice quite
difficult to execute. Do you want to know if n is prime? Simply try dividing n by all positive
integers k with 1 < k < n. If none of them divides n, then n is prime. It’s not hard to see that in
fact you only need to try to divide n by all positive integers k < n. The fundamental theorem of
arithmetic (every positive number ≥ 2 is a product of primes in a unique way; see below) makes
the algorithm a little easier: you only need to try to divide n by all primes k < n.
But if, say, n has 10,000 digits, then the original version would have roughly 1010,000 steps if n
were prime; in the second version of the sieve of Eratosthenes you’d be dividing n by all k with 100
or fewer digits, hence you’d have about 10100 steps, and in the third version you’d be dividing by
all primes with 100 or fewer digits and think of how much work you had to do to find those.
This is not an academic exercise: almost all security codes involve factoring numbers.
How many prime numbers are there? As you probably learned in elementary school
Theorem 3. (Euclid’s prime number theorem) There are infinitely many prime numbers.
The first recorded proof is due to Euclid:
Proof. Suppose S is the set of all primes, and suppose S is finite. Let N be the product of all the
primes in S. What about N + 1? It’s bigger than any element of S, i.e., bigger than any prime, so
it isn’t prime. By the fundamental theorem of arithmetic (see below) some prime, say n, divides
it. Since n ∈ S, n divides N and n divides N + 1. So n divides N + 1 − N = 1, i.e., n = 1 which is
not a prime. Contradiction. Therefore S is infinite.
Let’s ask a similar question: how many pairs of twin primes are there? Here we define n, m to
be twin primes if n, m are both prime and either n = m + 2 or m = n + 2. For example, 3, 5 is such
a pair, as is 5, 7, and 11, 13, and 17, 19... The twin prime conjecture says that there are infinitely
many such pairs. Is the twin prime conjecture true? Nobody knows. This simple, elegant problem
remains unsolved.
3,5,7 is a finite sequence of primes with an interesting property: you get from one to the next
by adding a constant, in this case 2: 3 + 2 = 5, 5 + 2 = 7. Such a sequence is called an arithmetic
sequence. Since 7 + 2 = 9, which is not a prime, this particular arithmetic sequence of primes
stops: 3, 5, 7 and no more.
There are many infinite arithmetic sequences, for example 2, 4, 6, 8...; or 2, 5, 8, 11... Infinite
arithmetic sequences cannot consist of only primes. Why? Every infinite arithmetic sequence has
the same form: a, a + b, a + 2b, a + 3b, a + 4b... But then a + ba is in the sequence, and it is not
However, there are infinite arithmetic sequences which contain infinitely many primes. In 1837
Dirichlet proved that a, b are relatively prime51 iff the sequence a, a + b, a + 2b, a + 3b... contains
infinitely many primes. You can think of this theorem as a consolation prize for not having an
arithmetic sequence all of whose members are prime.
i.e., have no common factors other than 1
Is there a bound on finite arithmetic sequences of primes? That is, is there some number N
so that any arithmetic sequence of primes has at most N elements? Put another way, are there
arbitrarily long arithmetic sequences all of whose members are prime? This would be another
consolation prize for having no infinite arithmetic sequence of primes.
The answer is due to Ben Green and Terence Tao, and was proved in 2004:
Theorem 4. The set of prime numbers contains arbitrarily long arithmetic sequences.
You can access their paper on Arxiv: It’s worth seeing
the abstract, which refers to notions like positive density and pseudo randomness. These concepts
come from areas like analysis and probability theory, and gives yet more evidence for the close
interrelationships among various areas of mathematics.
Now let’s go back to the ancient world. Why did people pay attention to prime numbers? A
major reason is that they form the building blocks of the natural numbers.
Theorem 5. (The fundamental theorem of arithmetic) Every positive integer ≥ 2 factors into
primes and, except for rearrangement, this factorization is unique.
The fundamental theorem of arithmetic (FTA) is thousands of years old. You implicitly learned
it soon after you learned how to multiply, most probably by means of factor trees. Here’s an
In this example, 72 = 4 × 18 = 2 × 2 × 3 × 6 = 2 × 2 × 3 × 2 × 3 = 23 × 32 .
We could have done it another way, for example: 72 = 2×36 = 2×4×9 = 2×2×2×3×3 = 23 ×32 .
And so on. But however we started we would have ended up in the same place: 72 = 23 × 32 .
FTA has two parts: (1) Any positive integer ≥ 2 is a product of primes. (2) Except for
rearrangement, this factorization is unique. Most ancient cultures did not write down formal
proofs, so it’s not clear why people believed the FTA other than the fact that it worked whenever
you tried it. The Greeks did write down formal proofs, and Euclid’s Elements contained a proof.
Or, rather, a “proof” — his attempt at proving (2) was not sufficient. We prove (1).
Suppose some positive integer ≥ 2 is not a product of primes. Then there is a smallest positive
number n > 1 which is not a product of primes.52 Since n is not prime, there are k, m with
1 < k ≤ m < n and n = mk. By definition of n, both m and k are products of primes. But then n
is a product of primes. Which contradicts our assumption about n.53
FTA looks backward: given a positive integer n ≥ 2 you factor it into smaller primes. Another
way of looking backward is to ask, given a number n, how many primes are smaller than n. We
define π(n) = the number of primes ≤ n. For example, π(2) = 1, π(−1) = 0, π(π) = 2 (because
2, 3 ≤ π — note that we’re using π in two ways: as a function and as a number), π(5) = π(6) = 3
and π(7) = 4. What can we say about the function π?
This step uses the fact that every set of positive integers has a least element. This is a fundamental property
of whole numbers, equivalent to the principle of induction. It’s so fundamental that it’s often used without being
noticed, in the same way that we breathe without realizing we are breathing oxygen.
You can also prove (1) by mathematical induction. If you’re familiar with induction, this is a nice exercise.
Theorem 6. (prime number theorem) limn→∞
n/ ln n
= 1.
Why is this theorem given such an important name? The prime number theorem is a deep
result about the distribution of primes among the positive integers. It says that if n is very large,
then π(n) ≈ lnnn . It was first conjectured in 1796 by Legendre, and proved independently by
Hadamard and Poussin in the late 19th century. Its proof involves the deep analytic theory of
complex numbers, in particular the Riemann ζ-function.54
Not every non-ancient theorem about primes involves analysis. Here’s a typical “elementary
number-theory” type theorem about primes.
Theorem 7. (Fermat’s little theorem) Let p be prime. Then ∀n np − n is divisible by p.
This theorem was first stated by Fermat, and proved by Leibniz in 1683. It is called Fermat’s
little theorem to distinguish it from Fermat’s last theorem, also not proved by Fermat (in fact,
proved only in the late 20th century).55
The proof of Fermat’s little theorem is a straightforward induction proof, so let’s do it.
Fix p, and let n vary.
If n = 1, then 1p − 1 = 0 which is divisible by everything except 0, so certainly divisible by p.
Hence Fermat’s little theorem holds for n = 1.
Now suppose Fermat’s little theorem holds for n. We will show that it holds for n + 1:
(n + 1)p − (n + 1) = [the binary expansion of(n + 1)p ] − (n + 1) =
[np + (the sum ofa whole bunch of terms all of which are multiples of p) + 1] − (n + 1)
= (np − n) + (p × [something])
By induction hypothesis, np − n is divisible by p. And p × [something] is divisible by p. So
(n + 1)p − (n + 1) is divisible by p.
When we talked about the sieve of Eratosthenes we mentioned that factoring and primes are
crucial in making the security codes work. The basic idea is: it’s easy to multiply numbers. But
there are numbers which are hard to factor. Here “easy” and “hard” refer to how many steps are
used by an algorithm.
Example. It’s easy to multiply 47 × 31 = 1457. The usual algorithm is 3 steps: 1 × 47; 30 × 47;
add. In fact any multiplication of two two-digit numbers (an input of 4 digits, two for each number)
√ at most three steps. But going backwards is harder: you have to test all the prime numbers
< 1457 to find the first factor... I.e., you have to check 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31 —
11 steps before you get the first factor! So this input of 4 digits gives us 11 steps to work through.
And of course a 4 digit prime will give us even more steps, possibly as many as 25.
The most famous algorithm for codes is the RSA algorithm, named for its inventors (Rivest,
Shamir, Adleman). It’s famous because it’s the first algorithm in which a key is sent publicly, a
ζ(z) = Σ∞
n=1 nz .
Fermat’s last theorem says that if there are positive integers a, b, c, k with ak + bk = ck then k ≤ 2. Its proof was
announced by Andrew Wiles in 1993, an error was discovered in the proof, and a correct proof was given in 1994,
published in 1995. Unlike the simple proof of Fermat’s little theorem, the proof of Fermat’s last theorem involves
advanced areas of mathematics, including algebraic geometry, Galois theory, ring theory, and analytic number theory.
message is sent publicly back, but nobody except the person sending the key (we’ll call her Ann)
and the person sending the message (we’ll call her Betty) can decode the message reliably. I.e.,
Ann publicly sends out a key to the code that Betty must use. Betty codes a message, transforms
the code, and then sends her transformation to Ann. And even if I knew both messages it still
would be prohibitively difficult for me to decode them.
Step 1. Ann chooses two (very large) primes p, q and lets n = pq. She also chooses a number
e < (p − 1)(q − 1) so that e has no common factors with (p − 1)(q − 1). e is called the public key
exponent. She publicly sends a message (n, e) to Betty — she can put it on Facebook, Twitter,
talk about it on Ellen, engrave it on her silverware and write it on the sidewalk in permanent ink.
None of this matters. The pair (n, e) is called the public key.
Step 2. Betty codes her message by a pre-determined method whose details depend on Ann’s
key56 and gets a single number as the code, call it m. She doesn’t send m to Ann; instead she
sends c = me (mod n). Maybe one of the neighbors works for Wikileaks57 and publishes Betty’s
message. Betty doesn’t care.
Step 3. Meanwhile Ann has secretly calculated d = e−1 (mod n). d is called the private key
Step 4. Ann receives Betty’s message! With trembling hands she unlocks the code by calculating
cd (mod n). Because that’s m: m = cd (mod n). And since Ann knows the predetermined method,
if she knows m she knows the message.
But if you knew the predetermined method and the key and c, you still wouldn’t know the
Why does this work? There are two aspects:
1. Why does m = cd (mod n)? The answer to this is complicated. You can’t say “because
= m” because it is not in general true in modular arithmetic that if x = y −1 (mod n)
then (ax )y = a (mod n). Example: (32 )3 = 4 (mod 5), even through 3 = 2−1 (mod 5). But in
this situation n is a product of primes, n = pq, and e is relatively prime to (p − 1)(q − 1). In this
situation you can prove that (me )e = m.58
(me )e
2. Why is this code hard to crack? Because finding d is difficult. If you know p, q then it’s not
difficult — there are formulas that help you. But if you don’t know p, q then it’s hard. And if both
p, q have, say, 1500 digits, then there isn’t enough computer time in the universe to factor n.
A final comment on factoring: the problem of factoring is an example of an NP problem. NP
stands for non-deterministic polynomial time, and an NP problem is one that is hard to do but easy
to check. Problems that are easy to do are called P (for polynomial time) problems. For example,
if I give you a factorization of a number n, you can easily check if my factorization is correct. But
finding that factorization is hard. So factoring is NP. Multiplication, as we have seen, is P.
Maybe the problems we think are hard only seem this way. Maybe there are fantastically clever
ways of making them easy. Maybe P = NP. If P = NP then our entire way of life, based as it is
on codes to protect privacy and keep secrets, comes crashing down, because just about everything
we do online or with a credit card will no longer be secure, everything banks do to transfer money,
everything stock exchanges do to sell stocks, all kinds of records that governments and businesses
keep, all kinds of information transfer in industry and defense — all of that will come crashing down.
agreed on between Ann and Betty ahead of time; even if you knew this method you still wouldn’t be able to
figure out what Betty’s code is
which still exists
The proof is non-trivial, so we won’t give it.
Whether P = NP is such an important problem that the Clay Institute has offered $1,000,000 to
whoever solves it. But an interesting aspect of this problem is that it might be solved and we just
don’t know it. This is because if whoever solves it works for the National Security Administration
or a similar organization, that person is sworn to secrecy, and not only can’t cannot collect the
$1,000,000 but can’t tell anyone without a very high security clearance.59
Homework problem #7 is about a different approach to P = NP.
This might sound unlikely, but aspects of the RSA algorithm were apparently known in the NSA long before
anyone else had thought of them. It was only after outsiders figured them out that people in the NSA were able to
talk about them publicly.
Negative and complex numbers
Negative numbers were known in China (200 CE), Greek Alexandria (Diophantus, 250 CE), India
(500 CE), and Arabia (800 CE) but were not known in Europe until the 12th century (Fibonacci).
What took everyone so long? Early algebraic reasoning was often geometric (ancient Greece to
Arabia, until al Khwarizmi) and numbers were seen as lengths, so what kind of number could a
negative number be?
On the other hand, negative numbers arise somewhat naturally in commerce. Consider debt, or
loss. But rather than saying that someone had net assets of -$300, say, people would say that they
owed $50 and had lost their remaining $250 in assets when the town flooded. Even today, people
tend to do that.
Early negative numbers often seemed to have an operative meaning. For example, if you said
“add -2 to 3” this was seen as shorthand for “subtract 2 from 3.” Remember, there was no real
algebraic or even arithmetic notation. People did mathematics using words.
Negative numbers could appear in equations (that is, sentences that we would translate into
equations), but they were not accepted as solutions. They were described pejoratively: “absurd,”
or “unacceptable...”
The history of negative numbers is closely linked to the history of mathematical notation. For
example, Shang numerals, used in China in the 16th century BC, had place notation, notation
for 0 (as a space, not a symbol), and by 263 CE had been adapted to include negative numbers.
Similarly, negative numbers had a place in Diophantus’ syncopated algebra, and in Indo-Arabic
notation (which developed in India and was adopted by Arabian and Persian mathematicians). In
all of these cultures the rules for adding and subtracting negative numbers were known early, and
the rules for multiplication and division of negative numbers came later.
Because of the close relation between the historical development of our understanding of negative
numbers and the historical development of notation, the two can’t be considered separately, and
we will take a short detour into India to see how people talked mathematics and talked about
In 625 CE the great mathematician Brahmagupta knew quite a bit about negative numbers,
and even considered division by 0, claiming that n0 is infinity, since “In this quantity consisting of
that which has zero for its divisor, there is no alteration, though many be inserted or extracted;
as no change takes place in the infinite and immutable God, at the period of the destruction or
creation of worlds, though numerous orders of beings are absorbed or put forth.”
The 9th century Indian mathematician Mahavira made an eloquent claim for the importance of
mathematics: “In all those transactions which relate to worldly ... or ... religious affairs, calculation
is of use. In the science of love, in the science of wealth, in music and in the drama, in the art of
cooking, and similarly in medicine and in things like the knowledge of architecture; in prosody, in
poetics and poetry, in logic and grammar and such other things,...the science of computation is held
in high esteem. In relation to movements of the sun and other heavenly bodies, in connection with
eclipses and the conjunction of is utilized. The number, the diameter and the perimeter
of islands, oceans and mountains, the extensive dimensions of the rows of habitations and halls
belonging to the inhabitants of the world, ... all of these are made out by means of computation.”
You might guess that the line between mathematical language and the rest of language was
not tightly drawn, and this is clear from the following mathematical problem stated by the 12th
century mathematician Bhaskara:
”The square root of half the number of bees in a swarm
Has flown out upon a jasmine bush;
Eight ninths of the swarm has remained behind;
A female bee flies about a male who is buzzing inside a lotus flower;
In the night, allured by the flower’s sweet odor, he went inside it
And now he is trapped!
Tell me, most enchanting lady, the number of bees.”
Note that the problem is asked of a woman he is trying to please — instead of bringing flowers,
he brings math problems. Mahariva’s “science of love” indeed.
End of detour.
While notation for numbers was highly developed in a number of cultures (China, India, Arabia
and Persia), notation that would be useful for algebra developed later, in Europe.
In the 12th century the Europeans started getting active. In 1489 the German mathematician
Johann Widman used the symbols + and - for the first time. That’s right, it wasn’t until the late
15th century that people wrote things like 2+3.
In 1557, the English mathematician Robert Recorde introduced the symbol = for equality.
That’s right, it wasn’t until the mid 16th century that people wrote things like 2+3 = 5. In 1584
the Dutch mathematician Stevin introduced decimals and the
sign. This is about the time
that people really started shaking off the notion of number as (necessarily positive) magnitude,
and Stevin was an important figure in arguing that negative numbers should be accepted as, well,
numbers, just like 3 and 13 . But even then, in Europe at least, people were not quite comfortable
with the notion of a negative number.
So imagine people’s shock when, again in the mid 16th century, Tartaglia and Cardano discovered
that if you wanted to solve quadratic and cubic equations you found yourself considering, not just
negative numbers, but square roots of negative numbers. For example, Cardano solved
√ the system
of equations x + y = 10 and xy = 40 (i.e., 10x − x = 40) with the solutions 5 + −15, 5 − 15.
A little later in the century, Bombelli was the first to become comfortable with square roots of
negative numbers. Barry Mazur’s wonderful book Imagining Numbers (particularly the square root
of - 15) is about exactly this development, and in some sense the birth of modern algebra can be
traced to the development of complex numbers.
Here’s an example of how square roots of negative numbers arose in solving cubics:
x3 = 15x + 4 has the solution 3 2 + −121 + 3 2 − −121 which turns out to equal 4. For
how this works, see below. At this point, the important thing to note is that square roots of
negative numbers turn out to be closely connected to plain old garden variety positive integers.
This had an emotional impact similar to discovering that your mother is actually the daughter of
By the 17th century negative numbers were used, like infinitesimals in calculus, without a firm
agreement on their meaning. But even without such agreement on what they were, and even given
some of the strange distortions that allowed their presence60 they had permeated mathematics, as
had complex numbers. For example, in 1629 Girard first stated the fundamental theorem of algebra
(a polynomial of degree n is the product of n linear functions, and has exactly n roots up to multi60
for example, a negative root is actually a positive root of an equation which changes the sign of the odd powers
– compare the roots of x2 − x + 6 and x2 + x + 6
plicity) — this only works if complex numbers are allowed. Newton thought of positive/negative as
modeling several distinct types of phenomena: asset/debt, forward/backward motion, direction of
vectors... The familiar number line was taking shape, with negative numbers to the left and positive
to the right. Yet, still, Descartes called negative numbers “false roots” and positive numbers “true
In the 18th century, Euler started to explicitly distinguish algebra and arithmetic, i.e., algebra
became more than just generalized arithmetic; this gave negative numbers (and hence complex
numbers) a firmer logical base.
Yet even in the 19th century some folks still objected. It wasn’t until Peacock and DeMorgan
further separated algebra from arithmetic, Hamilton invented the notion of an abstract algebraic
system with his quaternions61 , and Weierstrass, Dedekind and Cantor firmly put all our number
systems on solid logical grounds that “negative number” stopped being seen as somehow second
An application. Why does 3 2 + −121 + 3 2 − −121 = 4?
Note that −121 = 11i, so we are talking about 3 2 + 11i + 3 2 − 11i. The number 3 2 + 11i −
−2 + 11i turns out to be a root of x3 = 15x + 4. Why?
First we need to note Dal Ferro’s formula for finding a root of the equation x3 = bx + c:
3 c
3 c
From Dal Ferro’s formula62 for solving a cubic, we know that one solution of x3 = 15x + 4 is
3 4
42 (15)3
42 (15)3
+ 2−
4 − 27
= 4−125 = −121;
2 − −121.
= 2. So
27 +
2 + −121+
Maybe you don’t believe Dal Ferro’s formula. We’ll show outright that 3 2 + 11i − 3 −2 + 11i
really is a root of the equation x3 = 14x + 4.
Suppose x = 3 2 + 11i − 3 −2 + 11i. Then 15x = 15[ 3 2 + 11i + 3 2 − 11i] and
x3 = 2 + 11i − 3(2 + 11)2/3 (−2 + 11i)1/3 + 3(2 + 11i)1/3 (−2 + 11i)2/3 − −2 + 11i)
= 4 + 3[(2 + 11i)(2 − 11i)]1/3 [(2 + 11i)1/3 + (2 − 11i)1/3 ]
But x = (2 + 11i)1/3 + (2 − 11i)1/3 . And (2 + 11i)(2 − 11i) = 4 − 121i2 = 4 + 121 = 125.
And 1251/3 = 5.
So indeed x3 = 4 + 15x.
Now why should 3 2 + 11i − 3 −2 + 11i = 4? If you try a few whole numbers, it turns out that
4 is also√a solution of x3 = 15x + 4. But there are two other solutions, so just because 4 and
2 + 11i − 3 −2 + 11i are both solutions of x3 = 4 + 15x doesn’t mean they are equal.
which are now used in writing game software, don’t ask
passed down to Tartaglia, and either stolen by or given to Cardano
A little algebra shows that x3 − 15x − 4 = (x √
− 4)(x2 +√4x + 1). So its roots are 4 and the roots
of x2 + 4x + 1. The roots of x2 + 4x + 1 are −2 +√ 3, −2 − √
3. Both of these roots are negative real
numbers. So we’ll be done if we can show that 3 2 + 11i − 3 −2 + 11i is a positive real number.
If you remember how to represent complex numbers
on the
plane and how to add them and how
to take their roots (see below), you can see that 2 + 11i + 2 − 11i is a positive real number. So
it must be 4.
So we need to represent complex numbers and their arithmetic on the plane.
A complex number a + bi corresponds to the point (a, b) on the plane. Draw the line segment
between the point and the origin. This line segment has a length, r, and makes an angle, θ, with
the x-axis:
To take the cube root of a + bi simply take the point whose associated angle is
length is 3 r:
and whose
Now suppose we’ve got z = (a + bi)1/3 , w = (a − bi)1/3 . w is a reflection of z about the x-axis,
and to add them together (you saw this in linear algebra) you complete the parallelogram, getting
a picture like this (if a, b > 0):
z = (a+bi)^(1/3)
w = (a-bi)^(1/3)
I.e., the sum z + w is a positive real number.
Since 2, 11 > 0, 3 2 + 11i − 3 −2 + 11i is a positive real number.
Classifying numbers
So far we’ve talked about integers (including a section on primes), reals, and, complex numbers.
The purpose of this note is to talk about real and complex numbers more carefully, dividing them
into different classes then we have before.
1. Rational vs. irrational
We’ve already talked about rational and irrational numbers. A rational number is a fraction
(ratio) of the form m
where n, m are integers. An irrational number is a real number which is not
rational. For example (a very famous example) 2.
An important characterization is the following:
Theorem 8. A real number r is rational iff its decimal expansion repeats.
Proof. Suppose r = nk and use the long division algorithm. There are only finitely many digits in
your number base, so eventually you start repeating yourself. On the other hand, if you’re in base
b and the decimal expansion repeats, say it’s a1 .al+1 d1 then d1 = b−m b−1
is rational.
For example, 2.356 = 2.3565656... is rational. But 2.02002000200002... [keep adding one more
zero before the next 2] is not rational.
This characterization is not dependent on the number base. Any number base will work. Thus,
2.02002000200002... [keep adding one more zero before the next 2] in base 3 (where it equals
2 + 2/32 + 2/35 + 2/39 ...) is not rational, nor is 2.02002000200002... [keep adding one more zero
before the next 2] in base 4 (where it equals 2 + 2/42 + 2/45 + 2/49 ...) rational, etc.
It wasn’t until the mid-18th century that Lambert proved that π is not rational, nor is eq where
q is rational, q 6= 0.
It turns out that the rationals and irrationals look different geometrically. In topology, the
rational numbers are a countable dense linear order with no endpoints, and any other such space
looks exactly like (is homeomorphic to) the rational numbers. Meanwhile, the irrational numbers
turn out to be homeomorphic to the product of countably many countable discrete spaces. This is
all late 19th - early 20th century stuff.
2. Algebraic vs. transcendental
2 may be irrational, but it still is easily described: x = 2 iff x > 0 and x2 = 2. Similarly,
i satisfies the equation x2 = −1. Numbers, whether real or complex, which are roots of polynomials with integer exponents are called algebraic. Numbers which are not algebraic are called
There are only countably many algebraic numbers.63 I.e., almost all real or complex numbers
are transcendental.64
Any sum, difference, product, or quotient of algebraic numbers is algebraic. You can find a
sketch of a proof at; go to Applications.
Most transcendental numbers can’t be described (because language is countable65 ) but some of
them can. For example, π, and eq where q is rational non-zero have nice descriptions that you
We’ll talk about countable soon. Right now, you just need to know it’s the smallest kind of infinity
I.e., there are uncountably many real or complex numbers, see below.
see previous footnote
learned in, respectively, elementary school and your first calculus course. It wasn’t until the late
19th century that Hermite proved e was transcendental, and Lindemann proved the same for π.
Lindemann’s proof was improved over the next 10 or 20 years by a number of mathematicians,
including Weierstrass, Hilbert, and, finally, Hurwitz and Gordan, who made it elementary enough
for people to not try to simplify further.
Here’s a nice question: is e + π algebraic? You’d think we’d know the answer by now, but we
don’t. We also don’t know if e − π is algebraic. But we do know that at least one of e + π, e − π
is not algebraic. Why? Let a = e + π, b = e − π. Then a + b = 2e, which is transcendental. So at
least one of a, b is transcendental.
Note: The proofs that π and eq for rational non-zero q are irrational and the later proofs that
they are transcendental involve quite a bit of analysis, and in fact the people who proved these
theorems were among the major people who developed analysis.
3. What is a real number anyway?
√ The set of rationals, thought of as sitting on the real line, has holes, lots of them. For example,
2 is not rational; it represents a hole in the rationals. Same for π, e, etc. Any irrational number
is, in some sense, a hole in the rationals. How do we fill these holes?
In the 19th century, this was an urgent question, since it was equivalent to the question: what
is a real number anyway? A number of approaches were given (all of them equivalent, in the sense
that they all give us the same set of real numbers).
First, we’ll use the approach of Cauchy sequences.
A sequence {an : n ∈ N} is a Cauchy sequence iff its elements get closer and closer to each
other. More precisely, {an : n ∈ N} is Cauchy iff ∀ε > 0 there is N ∈ N so if N < n < m then
|an = am | < ε. I.e., No matter how close you want the elements in your sequence to get (that’s the
ε — we want the elements to be at least ε close) you can get far enough out (that’s the N ) so that
any two elements that far out are within ε distance of each other.
Every time we have a Cauchy sequence, it should converge to some point on the number line,
i.e., to a real number. Because of the holes, not every Cauchy sequence converges to a rational
number. So we just say: okay, if a Cauchy sequence doesn’t converge to a rational number, fill the
hole with a real number.
The problem with this construction is that you can have lots of Cauchy sequences which converge
to the same number, and you have to have some way of filling the hole only once. You use equivalence
classes for this, and it’s kind of messy. The basic idea is to say that two Cauchy sequences are
equivalent iff they converge to the same number. But that’s a little circular. So you have to come
up with a definition (similar to the definition of a single Cauchy sequence) of when two Cauchy
sequences are similar. This is left to the reader.
A second way of filling holes is very simple. Instead of worrying about Cauchy sequences, we
worry about intervals of rationals which go all the way down to −∞, are bounded, but don’t have
a maximum element. For example {q ∈ Q : q < 2}. We call these Dedeklind cuts. Then we define,
for every Dedekind cut, a least upper bound. It’s the set of these least upper bounds of Dedekind
cuts that make up the real line.66
Technically, the Dedekind cuts themselves are the real line, but this is a little strange to anyone but a set theorist...
What does “infinite” mean? How do you measure the size of an infinite set? Do line segments of
different lengths have the same number of points? Are there more points in a big line segment or a
small triangle? These questions are hard, and have occupied both mathematicians and philosophers
for millennia (and still do).
To make things a bit more concrete, consider two line segments of lengths 1 and 2; for example,
the intervals A = [0, 1] and B = [0, 2] on the number line. (Strictly speaking, we’re thinking of
each of A and B as a set of real numbers. For example, π/4 ∈ A, π/2 ∈ B, and π/2 6∈ A.)67
On the one hand, A is a proper subset of B (that is, every point in A is also in B, but not vice
versa), so it is plausible that A should have fewer points than B does. On the other hand, both
sets are clearly infinite, so they ought to have the same size.
Actually, neither of these arguments is correct. Just because A is a subset of B does not mean
that A has fewer points than B. In fact, these two intervals have exactly the same sizes. On the
other hand, something even more mind-boggling is true: there are lots of different sizes of infinite
sets — in fact, infinitely many different infinities!
The ancient Greeks (and the ancient Babylonians — recall the Babylonian procedure for approximating a square root — and others in the ancient world) were familiar with infinite processes,
which they thought of as potentially infinite (you could go as far as you want but couldn’t reach the
end). But they did not think of infinity as a number, as something you could use to measure things
(“this infinite set is bigger than that one”). It was illegal to use infinity in certain ways when doing
mathematics.68 Aristotle codified this by distinguishing between “actual infinity” (which doesn’t
exist) and “potential infinity” (which does). An example of this is Euclid’s prime number theorem
(see “Prime numbers” from the course notes). Today, we would phrase the theorem as
“There are infinitely many prime numbers,”
“The set of all prime numbers is infinite,”
but Euclid said,
“No finite set of prime numbers can possibly be the complete list of all prime numbers.”
Another kind of phrasing you might see in ancient mathematical writings is
“No finite set of prime numbers exhausts all the prime numbers.”
The Greeks found ways to do mathematics without resorting to infinity. For example, let’s
consider Archimedes’ “quadrature of the parabola”, developed in chapter 3, whose ideas anticipate
modern integral calculus, but are phrased differently.
To summarize, Archimedes showed that each triangle in the diagram on p. 16 has one-eighth
the area of its parent triangle. etc.
Remember, the symbols ∈ and 6∈ mean “is a member of” and “is not a member of” respectively.
At times — as recently as the late 19th century — this issue had religious overtones: only God could conceive of
an infinite object, and it was blasphemy to think that a human being could do so.
So the total area covered in the first n steps is
= α 1+ +
α 1+2· +4·
+ ··· + 2
· n−1
+ · · · + n−1 .
4 16
This is a partial sum of a geometric series; it is equal to
n 4α
A modern mathematician would say that the area of the parabolic segment is therefore
n 4α
Area(P ) = lim
n→∞ 3
But Archimedes didn’t have tools like limits. Instead, he reasoned as follows:
• Any number x < 4α/3 must be less than the area of P , because if you repeat this process
enough times, the area covered by the triangles eventually exceeds x.
• Any number y > 4α/3 must be greater than the actual area of P , because no matter how
many triangles you draw, the area they cover (namely the quantity in formula (5)) is always
less than 4α/3.
• Therefore, the area of P must be exactly 4α/3.
This reasoning is perfectly correct, and actually contains some fairly deep ideas about limits
and sequences and things hidden inside it — but notice how it’s phrased so as to avoid any explicit
mention of infinity, or limits, or convergence, or infinitesimals.
Now let’s look at the question of how you can decide whether one infinite set is bigger than
The fact that we use a singular noun — infinity — tells us that people were conflating all infinite
sets. Until the late 19th century and the work of Georg Cantor, everyone thought that all infinite
sets had the same size. Cantor’s realization that this is false was a pioneering realization.
To do this, he had to precisely define what “same size” and “smaller size” meant.
Suppose that F and G are two sets. How can we tell if F and G are the same size? For that
matter, what does “size” (or “cardinality”, which is the technical term) mean?
To say that F has n elements (symbolically, |F | = n) is to say that there’s a function
q : F → {1, 2, . . . , n}
that is one-to-one and onto. Such a function is called a bijection.
So for finite sets F, G, |F | = |G| iff there exist bijections
q : {1, 2, . . . , n} → F,
r : {1, 2, . . . , n} → G
for some nonnegative integer n.
On the other hand, we don’t need to know the actual size of two sets to know that they are the
same cardinality. We can just say
Definition 1. |F | = |G| iff there exists a bijection b : F → G; |F | ≤ |G| iff there is a 1-1 map
from F into G.
In other words, |F | = |G| iff it is possible to pair the elements of F with the elements of G so
that every element of F has exactly one “mate” in G and vice versa; |F | ≤ |G| iff it’s possible to
pair each element of F with some element of G so that no two elements of F have the same “mate”
(but you could have some elements of G left over).
Our instinct that “subset” means “smaller” isn’t totally off the mark. From definition 1 it’s
immediate that
Fact 1. If F ⊆ G then |F | ≤ |G|.
The definition of “same size” and “smaller size” in definition 1 has the advantage that it applies
to infinite sets as well. A good way to think about a bijection between two sets is as a way of
labelling the elements of one set with the elements of the other set, using each label exactly once.
Under this definition, the intervals [0, 1] and [0, 2] have the same cardinality, because there is a
bijection between them, namely f : [0, 1] → [0, 2] defined by f (x) = 2x. It doesn’t matter that the
first interval is a proper subset of second — in fact, the rule “If A is a proper subset of B, then
|A| < |B|” applies if and only if A is finite.
This definition, while sensible, has a number of striking consequences. For example, the function
f (x) = 2x defines a bijection between the infinite sets
N = {0, 1, 2, 3, . . . },
E = {0, 2, 4, 6, . . . }.
So |N| = |E|, even though there are infinitely many numbers that are in N but not in E.
Similarly, |N| = |Z| : Z = {0, −1, 1, −2, 2, −3, 3 . . . }.
Comparing a set to N is of some interest:
Definition 2. X is countable iff X is finite or |X| = |N|. X is uncountable iff |N| < |Y|.
So E, and Z are countable. What about Q and R?
Definition 1 is due to Georg Cantor, who also proved the following theorem:
Theorem 9. |N| = |Q| < |R|.
In other words: Q is countable and R is uncountable. Or, to put it another way: There are the
same number of rational numbers as non-negative integers, but there are more real numbers than
rational numbers.
Proof. First we show that |N| = |Q|. We being by considering non-negative rationals.
For each q ∈ Q with q 6= 0, we assign nq , mq , nq > 0 so q = mqq and nq , mq have no common
divisor except 1. (I.e., q is in lowest terms.) We assign a 1-1 onto function f : N → Q by induction,
as follows:
f (0) = 10 . Suppose we know f (k) = m
. If m > 0, f (k + 1) = −m
. If m < 0, we ask: is there
some integer i so n + i, m − i have no common divisor except 1? If so, let i∗ be the smallest such,
and let f (k + 1) = |m|+i
∗ . If not, let f (k + 1) = n + |m|.
f is 1-1, because each rational number has exactly one representation in lowest terms. An
inductive argument (not given here) shows that f is onto.
Another way to understand this proof (we’ll just do the non-negative rationals) is to follow the
red path in the diagram below:
1/3 2/3
6 ...
4/3 5/3
1/5 2/5 3/5 4/5
6/5 ...
You go down each successive finite diagonal, from right to left. When you hit the left margin,
scoot up to the first number in the top right you haven’t hit yet. Be sure to skip all fractions not
in lowest terms (I simply left them out of the diagram). Repeat.
To get from non-negative rationals to all of Q consider the following table:
Table 1: Counting Q
f (k)
− 12
− 13
− 23
We’ve just proved |N| = |Q| twice. Now we give Cantor’s proof — by contradiction — that
|N| < |R|, i.e., |R| is uncountable.69
For technical reasons, we’ll show that X = {x ∈ [0, 1] : 9 does not appear in the decimal
expansion of x} is uncountable. Since X ⊂ R, by fact 1 we’ll have shown that R is uncountable.70
Suppose that f : N → [0, 1] is any function. Make a table of values of f , where the 1st row
contains the decimal expansion of f (1), the 2nd row contains the decimal expansion of f (2), . . . the
nth row contains the decimal expansion of f (n) . . . Perhaps the table starts out like this:
f (n)
1 5
7 3
8 5
1 0
0 0
Of course, only part of the table can be shown on a piece of paper — it goes on forever down
and to the right.
This method of proof is called a diagonal argument.
The technical reason is that if we include 9’s the decimal representation isn’t unique, e.g., 0.29999... = 0.3. And
that will mess up our proof.
Can f possibly be onto? That is, can every number in [0, 1] appear somewhere in the table?
In fact, the answer is no — there are lots and lots of numbers that can’t possibly appear! For
example, let’s highlight the digits in the main diagonal of the table.
f (n)
1 5
7 3
8 5
1 0
0 8
The highlighted digits are 0.37218 . . . . Suppose that we add 1 to each of these digits — because
of our technical condition, we are adding them mod 9, so if a digit is 8, “adding 1” gives us 0 — to
get the number
0.48320 . . . .
Now, this number can’t be in the table. Why not? Because
• it differs from f (1) in its first digit;
• it differs from f (2) in its second digit;
• ...
• it differs from f (n) in its nth digit;
• ...
So it can’t equal f (n) for any n — that is, it can’t appear in the table.
This looks like a trick, but in fact there are lots of numbers that are not in the table. For
example, we could subtract 1 from each of the highlighted digits (changing 0’s to 8’s), getting
0.26107 — by the same argument, this number isn’t in the table. Or we could subtract 3 from the
odd-numbered digits and add 4 to the even-numbered digits. Or we could even highlight a different
set of digits:
f (n)
1 5
7 3
8 5
1 0
0 0
As long as we highlight at least one digit in each row and at most one digit in each column, we
can change each the digits to get another number not in the table. Here, if we add 1 to all the
highlighted digits, we end up with 0.042081 . . . — and, again, this is a real number that does not
equal f (n) for any positive integer n.
What is the point of all this? Precisely that the function f can’t possibly be onto — there will
always be (infinitely many, in fact uncountably many) missing values. Therefore, there does not
exist a bijection between N and [0, 1].
The basic idea of this proof — the diagonal argument — can be applied in other contexts.
Definition 3. If S is a set, then the power set P(S) is defined as the set of all subsets of S.
For example, if S = {1, 3, 4}, then
P(S) = ∅, {1}, {3}, {4}, {1, 3}, {1, 4}, {3, 4}, {1, 3, 4} .
When S is finite, it’s not hard to see that |P(S)| = 2|S| : (because to choose a subset R of S, you
need to decide whether each element of S does or does not belong to R, and there are 2|S| many
such choices). In the above example, |S| = 3 and |P(S)| = 8 = 23 .
What about infinite sets? Using a version of Cantor’s argument, it is possible to prove the
following theorem:
Theorem 10. For every set S, |S| < |P(S)|.
Proof. Let f : S → P(S) be any function and define
X = {s ∈ S : s ∈
/ f (s)}.
Now, is it possible that X = f (s) for some s ∈ S? If so, then either s belongs to X or it doesn’t.
But by the very definition of X, if s belongs to X then it doesn’t belong to X, and if it doesn’t
then it does. This situation is impossible — so X cannot equal f (s) for any s. But, just as in the
original diagonal argument, this proves that f cannot be onto.
To give a sense of how the proof works, here is a finite example:
If S = {1, 2, 3, 4}, then perhaps f (1) = {1, 3}, f (2) = {1, 3, 4}, f (3) = ∅ and f (4) = {2, 4}. In
this case X does not contain 1 (because 1 ∈ f (1)), X does contain 2 (because 2 6∈ f (2)), X does
contain 3 (because 3 6∈ f (3)), and X does not contain 4 (because 4 ∈ f (4)), so X = {2, 3}.
As a corollary to theorem 10 the set P(N) — whose elements are all sets of positive integers —
has more elements than N itself; that is, P(N) is not countably infinite.
As a consequence of this result, the sequence of infinite sets
N, P(N), P(P(N)), P(P(P(N))), . . .
must keep increasing in cardinality. That is, there are infinitely many different sizes of infinity!71
The diagonal argument is also useful in recursion theory, theoretical computer science, and is
the heart of the proof of Gödel’s incompleteness theorem which, roughly, says that you can’t know
everything — if you have a way of knowing all of the axioms of a mathematical system, and if
the system is complicated enough to produce very basic number theory, then there are statements
which cannot be proved true and cannot be proved false in the system.
And it gets worse, because we’ve only indicated how to generate countably many different sizes of infinity. There
are many many many many more than many lots more than many more.
The origins of graph theory
In the 18th century, the city of Königsberg in Prussia lay along the Pregel River. The river has
several branches, which divided the city into four districts connected by seven bridges,72 as in the
figure shown below (rivers in blue, bridges in red). A longstanding puzzle for the residents of
Königsberg was as follows: Is it possible to design a stroll around the city which would cross each
of the seven bridges exactly once?
Here’s another problem: Draw a figure made out of line segments — a square, or a triangle, or
a square inside another square, or any of the examples below or anything similar.
Is it possible to draw that figure without ever (a) picking your pencil up off the paper or (b)
retracing any segment you’ve already drawn? A drawing with these properties is called “unicursal”,
and, in fact, the Königsberg bridge problem is a problem about a particular unicursal drawing, albeit
differently stated. And both problems are about graphs — not a curve like y = x2 , but a collection
of points and lines defined below in definition 4.
In 1736, the great Swiss mathematician Leonhard Euler solved the Königsberg bridge problem.
Euler’s key insight was that the islands and bridges could be modeled by a simple mathematical
structure called a graph. Graph theory — the theory of Euler’s kind of graphs — has since developed
Not any more. Two of the bridges were destroyed in World War II, two have been demolished, two more bridges
have been built. Königsberg is now Kaliningrad, Russia and the Pregel is now the Pregolya river.
into an extremely beautiful and useful area of mathematics, with deep theorems and surprisingly
diverse applications.
Definition 4. A graph G consists of a set of vertices and a set of edges, where each edge connects
two vertices.
For example, the map of Königsberg can be represented by a graph with four vertices a, b, c, d,
representing the districts of the city, and seven edges 1, . . . , 7, representing the bridges. Edge #1
connects vertices a and b; edge #2 connects vertices b and c; etc.
It looks like there’s a mistake in the figure — shouldn’t edges 2 and 4 be interchanged? (After
all, in the map, bridge 2 is west of bridge 4.) Actually, it doesn’t matter. The definition of a graph
has nothing to do with location; the only information the graph knows is the names of the vertices
and edges and which is attached to which. So as long as both these edges connect the same pair of
vertices (namely b and c), it doesn’t matter how we draw them.
Here are the graph-theoretic definitions we need to talk about the Königsberg bridge problem:
Definition 5. The degree of a vertex in a graph is the number of edges attached73 to that vertex.
For example, vertex a has degree 3 (because it is attached to edges 1, 3, and 5) and vertex b has
degree 5 (because it is attached to edges 1, 2, 3, 4, 6).
Definition 6. An Eulerian path in a graph is a way to walk through the vertices of a graph, one
edge at a time, so as to traverse every edge exactly once. (So an Eulerian path can be thought of
as an order for the set of edges.)
An Eulerian circuit is an Eulerian path whose starting vertex is the same as its ending vertex.
For example, in the graph below, the sequence of edges 1,2,3,4,5,6 (pictured) and the sequence
2,6,3,4,5,1 (not pictured) form Eulerian paths. They arenot Eulerian circuits because the starting
and ending vertex are not the same.
The Königsberg bridge problem is simply this: If G is the graph whose vertices are regions of
Königsberg and whose edges are the bridges, then does G have an Eulerian path or an Eulerian
circuit? Here is Euler’s answer.
There’s one slight complication. It is permissible for an edge to connect a vertex to itself; such an edge is called
a loop. Many of the graphs we want to study don’t have any loops — for example, no bridge connects one of the
districts of Königsberg to itself. However, if an edge connects a vertex to itself, we usually think of that edge as
contributing 2, not 1, to the degree of the vertex. The reason for this will become clear later on.
Theorem 11. If G is a connected
graph, then:
• If G has no vertices of odd degree, then G has an Eulerian circuit.
• If G has 2 vertices of odd degree, then G has an Eulerian path but no Eulerian circuit.
• If G has 4 or more vertices of odd degree, then G has no Eulerian path (or Eulerian circuit).
Here’s part of Euler’s reasoning. Suppose that a graph G has an Eulerian path P . If v is a
vertex that is neither the first nor last vertex of P , then P must enter v exactly as many times as
it leaves v. Since every edge incident to v is traversed exactly once, this means that the number
of such edges — that is, the degree of P — must be even. Therefore, G has at most two vertices
of odd degree, namely the first and last vertices of P . On the other hand, if P was actually an
Eulerian circuit (not just an Eulerian path), then the first and last vertices are the same vertex x,
and in fact x has even degree (because, again, P entered x exactly as many times as it left x). So
in this case G has no vertices of odd degree.
This argument is not a complete proof; it is still necessary to show that if G has zero or two odddegree vertices, then G really does have an Eulerian circuit or Eulerian path respectively. (Another
way of saying this is that we have to rule out the possibility of a connected graph that happens to
have, say, zero odd-degree vertices but happens to have no Eulerian circuit.) But this can be done.
There are a couple of obvious missing cases. What if G has one or three odd vertices? Investigating that is a homework problem.
By the way, remember that a loop contributes 2, not 1, to the degree of the vertex x to which
it is attached. (See the footnote about loops.) This rule makes sense in light of Euler’s theorem.
On the one hand, adding a loop at x doesn’t affect the existence or non-existence of an Eulerian
path or circuit — just insert the loop into an Euler path whenever you’re standing at x. On the
other hand, if the loop contributes 2 to the degree of its vertex, then ignoring loops doesn’t affect
the number of odd-degree vertices in the graph.
Why should we care about graphs? Aside from their intrinsic interest, graphs come up in all
kinds of real-world situations. Here are just a few:
• The Web can be thought of as a graph, where the vertices are webpages and the edges are
• Facebook can be modeled as a graph: vertices are people and edges are friendships. (Likewise
for “Six Degrees of Kevin Bacon.”)
• Family trees are graphs — vertices represent people and edges represent relationships such as
marriage or parenthood. Similarly, so are the trees that evolutionary biologists use to model
relationships between different species.
• The GPS device in your car uses graph theory to calculate the shortest driving route between
two points. For example, edges are blocks and vertices are intersections.
This means that it is possible to walk from any vertex to any other by some sequence of edges.
Statistics and probabilty
Here’s a coarse outline of what we’re talking about:
• data analysis: various ways of analyzing data. Visual representations are important.
• probability: theoretical probability calculates precisely what to expect from a random process;
experimental probability conjectures probabilities (e.g., what to expect) from data and then
uses those conjectures to calculate what to expect from a random process.
• statistics: predictions based on models — the normal curve, the t-distribution, and so on. I.e.,
given summary data from some kind of sample, you analyze the summary data according to
the appropriate model and then predict things about the population. For example, ”cigarette
smoking approximately doubles the risk of stroke.”75
Probability and statistics are truly modern. They developed in tandem beginning in 17th century
Europe; there really wasn’t anything like them in the ancient or pre-modern world, or anywhere
outside Europe until Europeans started colonizing everywhere. Data analysis, on the other hand,
was necessary in many societies (although visual representations didn’t develop until mid 19th
century Europe) and especially important in large societies with centralized government. In these
notes we’ll try to trace the historical development. In some places our notes get very technical.
Feel free to skip over the technical parts.
First came data analysis in the form of a census. Ancient societies would do a census of land, or
of people, or of property, or of certain people (for example, free citizens only), or of certain kinds of
land (for example, farmland), or of certain kinds of property; or of various combinations. The first
census of which we have a record was in Egypt around 3050 BC. Several censuses are mentioned
in the Bible, in both the old and new testaments. Here’s an extract from the English Domesday
Book, a census written up in 1086:
“In (North) Allerton there are 44 carucates of land taxable, which 30 ploughs can plough. Earl
Edwin held this as one manor before 1066, and he had 66 villagers with 35 ploughs. To this manor
are attached 11 outliers [i.e., other estates, listed in the original; we won’t list them here]... Now
it is in the King’s hands. Waste. Value then 80 pounds. There is there, meadow, 40 acres; wood
and open land, 5 leagues long and as wide.”
The first thing you learn these days about data analysis is to have a clear purpose (or clear
purposes) that guides your gathering of data. The second thing is to describe your data clearly. By
modern standards the Domesday Book is quite jumbled and unclear. Do the “66 villagers” include
women? children? old people? Why are estates counted but not the nobility? (Surely some estates
had more than one noble man in residence). Why count ploughs but no other kind of property?
Presumably “taxable land” means farmland. What is worth 80 pounds? Does “waste” refer to all
the outlying estates? Or the meadow, wood, and open land?
The next step in data analysis was gathering together parish records on deaths in order to keep
track of the plague, begun by the English King Henry VIIII in 1532. The data was not accurate —
a parish clerk might miss a week, and then make up for it the next week by reporting two weeks’
worth of data as one; families might not correctly report the cause of death out of fear of being
shunned; and, because medicine was far from an exact science, an honest report could simply be
from a Centers for Disease Control website, information/health effects/heart disease/.
In the mid 17th century, John Graunt systematically studied many decades of this data, looking
for patterns and trends, and using what we would now recognize as statistical processes to make
conclusions about the data. For example, he noticed that in 1625 the number of reported deaths
not due to plague formed a sharp spike in the data. He found that unbelievable, and concluded
that there were more plague deaths in 1625 than reported.
While much of modern statistics has come about through such fields as quality control in manufacturing (e.g., the t-distribution was discovered by the statistician William S. Gosett who worked
for Guiness Brewery), agriculture (the USDA was established in 1862, and its Division of Statistics was formed one year later), public health (the work of Florence Nightingale), or social science
(especially psychology), the first real statistical problem that was widely studied was the problem
of astronomical measurement. Even in the second century BC, astronomers knew that their measurements were not precise. They knew that different measurements of the same event would be
different because of conditions they could not control, such as the vagaries of the instrument, or
of the atmosphere, or of the weather. Different astronomers dealt with this situation differently:
some used what we now call the mean, others the median, others grouped data, or resorted to using
ad hoc formulas (which they often didn’t report)... It wasn’t until the mid 18th century that the
mean was the standard method of summing up repeated observations, and even then there was
controversy over whether you shouldn’t just take one observation and stick to it — why complicate
your thinking with all these other observations? what could they really add to our understanding?
Meanwhile, back in the mid 17th century, Fermat and Pascal were dealing with the queries of
the aristocratic (and somewhat intellectual) gambler Chevalier de Méré. This was the birth of
probability theory: essentially all the basic rules of probability theory came from this work, as well
as quite a bit of advanced probability theory.
You are probably (excuse the expression) familiar with some basic probability. For example,
the probability of flipping a fair coin and getting a tail is 12 and the probability of flipping two fair
coins and getting two tails is 14 . You probably know that, as a consequence, if you flip a fair coin
100 times you can expect to get about 50 tails . You may not know that if you get exactly 50 you
should be suspicious of the randomness of the process. And while you probably know that a run
of 100 would be highly improbable, you you may not know that you can expect runs of 3, 4, or 5
tails in a sequence of 100 coin flips.76
Trying to come up with probabilities for things like sequences of coin flips led to the normal
distribution, which began its mathematical life as a way to calculate probabilities but which was
re-interpreted by Gauss as a statistical model, and is in most common use as a statistical model
for certain kinds of mass behavior — that’s the bell-shaped curve (although there are many other
curves that are bell-shaped): you expect a lot of stuff in the middle, and not so much at the ends
(think of how much people weigh, or how tall they are, or scores on the ACT...).
The next page or so contains a technical discussion of how this came about: Probability theorists
find themselves calculating the binomial distribution, used to figure out what you could expect in
a repetition of n trials when there were only two possible outcomes (for example, flipping a coin
n times). There are handy formulas for the binomial distribution, and if n is small it’s easy to
calculate. But if n is large this is hard. So people tried to find good ways of approximating the
binomial distribution. De Moivre, in 1733, proved that the probabiilty of getting exactly n2 + d
heads in n flips was √2πn
e−2d /n which means that the probability of getting between n2 and n2 + d
A good sleight-of-hand magician can easily produce a run of 100 tails, or a run of 100 heads, etc.; coin tossing
isn’t really random, it just seems random because most of us can’t consciously control it.
heads in n flips of a coin is
d/ n
e−2y dy.
e−2d /n as one of the family of
If you’ve had a calculus based statistics course, you recognize √2πn
curves we now call the normal distribution, and you may remember that part of the course involved
using the normal distribution to approximate the binomial distribution. Given the centrality of the
normal distribution to the rest of statistics this might have struck you as somewhat quaint, but in
fact it is exactly the effort to approximate the binomial distribution that is the origin of the normal
Going back to astronomy, mathematicians were trying to figure out what curves could best
describe the errors of observation they saw in astronomical data. Some principles were clear, for
1. Small errors are more likely than large errors.
2. For any real number , the likelihood of errors of magnitudes and − are equal.
R s goal was to find a function y = φ(x) so that the probability of an error between r and s
was r ydx. This automatically adds a third principle: the total area between the curve and the
x-axis must be 1.
Using these principles, Laplace proposed two curves which had the disadvantage of either not
−m|x| ) for some constant m, or having a vertical asymptote at 0
being differentiable at 0 (y = m
(y = 2a
ln xa where −a ≤ x ≤ a).
This meant that these principles were not sufficient to determine the error curve.
Gauss added a fourth principle: Given several measurements of the same quantity, the most
likely value of the quantity is their average.
Using these four principles he determined that φ(x) = σ√12π e−x /2σ , where σ is the standard
deviation. (Actually, he didn’t quite do this, since he didn’t have the notion of standard deviation.
Instead, he had a quantity h which he thought of as the “precision of the measurement process”).
This is, of course, yet another normal distribution.
Gauss did not do this in an abstract context. On January 1, 1601, an astronomical object (in
fact, the first asteroid to be noticed by humans) was discovered by the Italian astronomer Giuseppe
Piazzi, who named it Ceres. To ascertain its orbit, many people observed it and recorded their
observations. But six weeks later Ceres disappeared behind the sun. Where would it reappear?
Gauss suggested searching an area of the sky that differed from the one most astronomers predicted,
and he was right. It was this error curve that enabled him to make the prediction.
When Gauss published his work in 1809, he claimed that his fourth principle depended on the
method of least squares — what we call least squares regression. This was first published by
Legendre in 1805, although Gauss claimed that he had known about it since 1795. This method is
used to find a straight line that best fits the data, and since it’s very technical, that’s all I’ll say
about it.
The error curve has the property that its maximum height is at x = 0. Generalizing the formula
to account for maximum heights elsewhere gives the family of normal distributions: N (µ, σ) =
√1 e−(x−µ) /2σ where µ is the mean and σ the standard deviation. End of technical discussion.
σ 2π
In the mid 19th century, Quetelet was the first to apply the normal distribution to social science
data. He was the first to hypthesize the mythical creature known as the “average man.” To Quetelet,
this creature was not mythological but ideal, and the rest of us simply represented deviations from
this norm. If indeed the rest of us (including Quetelet) are deviations of this ideal creature, it would
be important to know its dimensions. The first data set he dealt with were measurements of the
chest circumference of Scottish soldiers. (By modern standards this is ridiculously biased — only
Scottish? only soldiers?) Quetelet set out to prove that the distribution of these measurements was
normal. In fact he was wrong on several counts: the original measurements did not form a normal
distribution; he copied several of the measurements incorrectly; and the notion of an “average man”
(or woman, or bird, or fish, or earthworm, or...) compared to whom the rest of us are some kind of
error was useless. But his ideas that certain kinds of data should fit the normal distribution (later
there were other distributions to fit data to) and that you should be able to prove that data fit a
particular distribution, were important to the development of statistics.
The true importance of the normal distribution didn’t become apparent until the end of the
19th century when Lyapunov proved the central limit theorem (his proof published in 1901; the
theorem itself is often attributed in an early form to Lagrange), the main idea of which is: if you
take all possible samples of size n from a very large population, the distribution of the means is
essentially a normal curve, whose mean is the population mean. (This was actually implicit in the
early work on binomial probabilities.)
The explosion of applications of statistics and data analysis came not only from high-level
mathematics (such as the normal distribution and the central limit theorem) but also from carefully
developed ways of presenting data visually. William Playfair (late 18th century) was a leader in
coming up with clever ways of presenting data. He developed the time series graph, the bar
graph, and the pie chart. Florence Nightingale’s famous graph of the monthly death rate during
the Crimean War raised the pie chart to a level of sophistication it has seldom reached again.
Playfair’s circle graph (not a pie chart) was another way to present complex data in a visually clear
manner. (Edward Tufte’s book The Visual Display of Quantitative Information, written in the late
20th century, is the classic work on such innovative data displays.) The histogram was invented by
A.M Guerry in 1833 (although not named until 1895 in Karl Pearson’s The Mathematical Theory
of Evolution). The psychologist Galton invented the ogive (cumulative frequency distribution) in
1875. The history of the scatterplot is more obscure — nobody knows who invented it — but it
was popularized by Galton. In 1952 Mary Eleanor Spear essentially created the box plot, later
refined by J.W. Tukey in 1977; Tukey also invented the stem and leaf graph.
And, finally, the explosion of these applications came about from the 19th century’s desire to
put things on a scientific, i.e., mathematical, footing, so that issues of public health and welfare,
agriculture, psychology, and so on were to be decided not only on political or theological or philosophical grounds, but by looking at trends in data (data analysis) and making predictions based
on the data (statistics). We are still reaping the benefits and flaws of this approach.
Women in mathematics
The history of women in mathematics is both more extensive than you might think, and far less
extensive (due to cultural norms) than it should be. Here are brief biographies of some important
women mathematicians.
Theano (? - ?, Greek)
Theano was the wife of Pythagoras. According to legend, she took over the Pythogorean cult
after he died. But note that just about everything about the Pythagoreans is legend, and almost
nothing is verifiable.
Hypatia (360 CE? 355 CE? - 415, lived in the Greek culture of Alexandria, Egypt)
Rather unusually, her father Theon Junion — himself a major intellectual figure — encouraged
her education, which was both wide and deep. She was a major orator, and recognized as a
leading scholar. She worked in astronomy, astrology, and mathematics, doing major work on conic
sections. Towards the end of her life, Alexandria moved from a society tolerant of many religions
to one dominated by Christianity. Although Hypatia corresponded with and taught many highly
placed Christians, she was seen as a symbol of pagan culture, and was pulled from her chariot
and brutally murdered by Christian rioters. Once Christianity became securely dominant, she was
(long after her death of course) considered a model of virtue and chastity.
Maria Agnesi (1718 - 1799, Italian)
The eldest of 21 children, she was a child prodigy, giving a speech at the age of 9 on the
importance of educating women. Her major book on analysis, Analytical Institutions, appeared in
1748. It covered maxima and minima, tangent lines, inflection points, differentials, integral calculus,
and differential equations. It was a major contribution, perhaps the first systematic presentation
of the state of the art at the time. Highly influential, it was widely translated and used as a text.
She is famous for the versed sine curve — see chapter 5 —unfortunately known as the Witch of
Agnesi, which was intensively studied by a number of mathematicians, including Fermat.
She was elected to the Bologna Academy of Sciences — a high honor — and appointed as only
the second woman professor in a European university, but gave up mathematics when her father
died. After his death she gave up mathematics and spent the rest of her life studying theology and
leading a life of pious service
Sophie Germain (1776 - 1831, French)
Her parents tried to stop her learning mathematics because it was unsuitable for women. So
she obtained lecture notes from the Ecole Polytechnique and taught herself from them. When she
anonymously submitted a paper on analysis to Lagrange, he was so impressed that he became her
mentor. She corresponded with Gauss on number theory, in particular on Fermat’s last theorem,
and proved that if x5 + y 5 = z 5 with x, y, z integers, then at least one is divisible by 5. (We now
know there are no such integer triples.) This was one of the major results in early 19th century
number theory. She switched to the study of elastic surfaces, and in 1816 won a major prize on this
work (again, her entry was anonymous). Fourier was another mentor. She was allowed to attend
sessions of the Institut de France, and offered an honorary degree from Göttingen just before she
Ada Byron, Lady Lovelace (1815 - 1851, English)
Her father was the poet Lord Byron. Her mother, who herself loved mathematics, left him when
Ada was an infant and raised her daughter to be a mathematician and scientist. Ada liked both
mathematics and poetry, and, marrying into nobility, moved in both high society and intellectual
society . She worked with Babbage on early computing machines (never built). In a theoretical
sense — because there were no computers on which to test her ideas — she was the inventor of
computer programming. She led a fairly stable life as an upper-class wife and mother, but is often
seen through a romantic lens, and has appeared as or inspired fictional characters in novels, plays,
cartoons, movies, and comic strips (for the latter, go to
Sofia Kovalevskaya (1850 - 1889, Russian, also lived in France, Germany, and Sweden)
Born in Russia to a wealthy family, she taught herself calculus from wallpaper made from a
calculus book. Because women could not get university degrees in Russia, she had to leave Russia
for a university educaton; because Russian women could not get their own passports, she had to
marry to be allowed to leave. She worked with Weierstrass in Berlin (but was not allowed to
take classes from him; he tutored her privately). Her major work was in PDE’s and other areas
of analysis and mathematical physics. She was one of the first women to receive a PhD from
Göttingen. She found permanent employment in Stockholm. Her work On the Rotation of a Solid
Body about a Fixed Point won the Prix Bordin in spectacular fashion (the judges were so impressed
they increased the value of the prize). A member of the political left, she moved back and forth
among Germany, Russia, France, and Sweden. A major mathematician, she was also a talented
writer. At times she would suspend her work in mathematics for literature; at times she would
suspend her work in literature for mathematics; at times she would do both. Her personal life was
dramatic and, in many ways, tragic — she married for convenience; after many years convenience
turned into deep love; she gave birth to a daughter; her husband killed himself because of financial
reverses, leaving her a single mother; late in life she had a scandalous and passionate affair; until
she was hired by the University of Stockholm, her financial situation was insecure; and she died
of pneumonia incurred during a difficult journey in winter. She wrote a charming autobiography,
Recollections of Childhood, which says almost nothing about mathematics. An asteroid is named
after her.
Florence Nightingale (1920 - 1910, English)
She founded modern nursing. And she was a major figure in statistics. She did important work
on how to present data visually, and also was a major pioneer in applied statistics.
Emmy Noether (1882 - 1935, German, moved to the U.S. in 1933)
She far surpassed her father, the well-known mathematician Max Noether. She was one of the
(perhaps the major) founders of modern algebra and also did important work on relativity theory.
The University of Erlangen wouldn’t let her take undergraduate classes but let her audit; when
she passed the test for admission to doctoral study they let her become an official student. But
they wouldn’t hire her after she got her PhD. So she worked for no salary at the Mathematics
Institute there. She was invited by Felix Klein and David Hilbert to Göttingen, where she was an
unpaid lecturer for three years (her mentor Hilbert angrily asked, “Is the university a bathhouse
that it keeps women out?”) before receiving a small salary. She mentored many graduate students
and did important work at Göttingen, but, being a Jew, left for the U.S. because of Hitler. No
major university in America would hire her, so she went to Bryn Mawr College where she continued
influencing students and producing seminal mathematics until her death two years later from cancer.
When she began, the study of algebra was focused on specific objects, such as algebraic curves.
Her work embedded these notions in far more general and abstract notions, such as ideals in rings.
This was a fundamental shift in the way mathematicians thought about algebra. Her life was a
quiet one, devoted to mathematics and to her students, and she was greatly loved for her warmth
and caring; she was unquestionably among the greatest mathematicians of the 20th century.
Africa, Pre-Columbian America, and U.S. minorities: an overview
Comparatively little is known about mathematics in southern Africa before the European invasions, although we know much more than we used to. For example, the Fida in Benin were able
to do complex calculations without pencil and paper and essentially memorized their financial
records — this sort of thing can’t be done without a sophisticated arithmetic that goes beyond
basic algorithms. The Falani Nigerian Muhammad ibn Muhammad al-Fullani al-Kishnawi (his
Arabic name) travelled to Egypt and wrote a major treatise on magic squares. And design and
architecture, as in every culture, showed that there was extensive study of geometry, including
transformational geometry. Some information about ancient mathematics in southern African can
be found at ASP/historymaf.htm, but much of this site is about northern Africa, especially ancient Egypt. More information on mathematics south of the Saraha can
be found at chma 09.html#2 — the AMU is the
African Mathematical Union.
As in Africa, there was significant mathematical activity in the Americas before the European
invasions. For example, recent scholarship on the Mayan calendar (done by both archeologists and
mathematicians) has uncovered the complex abstract algebra notions that underlay its construction,
and there is other evidence that there was a lot of sophisticated mathematics going on; scholars are
still not quite sure what. A good reference on pre-Columbian American culture in general is the
book 1491, and a more technical overview is found in the anthology Native American Mathematics
edited by Michael Closs.
Studies of mathematics in cultures outside the web of Mediterranean/European/Arabic/south
Asian/east Asian cultures belong to the field of ethnomathematics, and major scholars in the
field include Michael Closs, Marcia Ascher, and Ubiratan D’Ambrosio. Claudia Zaslafsky was a
popularizer in the field, and her books are fairly accessible (although necessarily not as deep).
A significant difficulty in doing such studies is the tendency to project our ways of thinking
about mathematics on cultures with different ways of looking at things. For example, here is a
description from the AMU web page about techniques of traditional sand drawings in contemporary
(i.e., 1970’s) Angola: “After cleaning and smoothing the ground, they [i.e., the artists] first set out
with their fingertips an orthogonal net of equidistant points. Now one or more lines are drawn
that ’embrace’ the points of the reference frame. By applying their method the drawing experts
reduce the memorisation of a whole drawing to that of mostly two numbers (the dimensions of the
reference frame) and a geometric algorithm (the rule of how to draw the embracing line(s)). Most
drawings belong to a long tradition.” I.e., undoubtedly these sand drawing are using sophisticated
mathematical techniques, but the language we use to describe them is our mathematical language,
the way we think about mathematics. Whatever the artists in Angola are doing, we can be pretty
sure that they are not thinking of it as “an orthogonal net” — they are almost certainly working
in a different context. This is, of course, a general problem in the history of mathematics — it
is essentially impossible to think the way Archimedes did — but the gap is much larger between
cultures that did not particularly influence each other.
Why do we know so little? The mathematics we use comes from a complex lineage involving
ancient India, ancient Greece, not-so-ancient Arabia, and semi-modern Europe. Chinese mathematics — remember the silk road — was not so different. But it is sometimes difficult to recognize
mathematics that is not explicitly stated in terms that we relate to as mathematics, e.g., the Mayan
calendar (which, from our point of view, is all about modular arithmetic). And many of these cultures embedded their mathematics in what they did, without writing it down — we know the Fida
in Benin were up to something, but whatever they were up to is lost. We never thought to ask
what they were doing, and now it may be too late to know. Or maybe not. For example, there has
been extensive work on trying to understand how the Mayan’s thought about their calendar. In
this way, intellectual historians are able to at least partially reconstruct ways of thought that no
longer exist.
As for modern Africa, in the last 80 years or so there have been a number of excellent African
mathematicians, for example: George Okikiolu from Nigeria (whose daughter Katherine Okikiolu,
born in Britain and now at UCLA is an important American mathematician), James Ezeilo, also
from Nigeria, Themba Dube from South Africa (who was a featured speaker at a conference here
ins June 2011). Many Africans who do serious research in math and science tend not to live in
Africa, but there are more and more exceptions to this generalization. For example, Ezeilo helped
create a thriving research environment in Nigeria, and most recently was working to do the same
in Swaziland; Dube has created a strong research community in South Africa. Mathematicians in
modern Africa generally have high teaching loads, which makes research difficult, but that too is
What about the situation for U.S. minorities? There were a handful of known prodigiously
mathematically talented African American youths in the late 18th through early 20th centuries,
and while some of of them were highly accomplished, their later accomplishments generally were not
mathematical. For example Kelly Miller, in the late 19th century, was the first African American
mathematics graduate student (at Johns Hopkins). Forced to leave Johns Hopkins for financial
reasons, he began teaching at Howard University (a historically black university). While there
he also got an MA in mathematics and a law degree (all from Howard) but focused most of his
energies on both general administration (as a dean) and especially on the relatively new discipline
of sociology.
A relatively small number of African Americans received PhD’s in mathematics in the first half of
the 20th century, and while the number has continued to grow the proportion is disproportionately
small. The same is true of Hispanic American and Native American mathematicians, where the
proportions are even smaller. The amount of discrimination U.S. minorities faced was shocking,
even when their talents were both remarkable and obvious. For example, J. Ernest Wilkins earned
a PhD from the University of Chicago at the age of 19, only the eighth African-American to earn
a PhD in mathematics. He was unable to get a job in a research institution, and eventually left
mathematics for engineering. David Blackwell, who earned a PhD from the University of Illinois at
the age of 22, could not get a job at a research institution for 13 years. He persevered, was hired
at Berkeley, and went on to receive many honors as a distinguished statistician.
Even today, much depends on a small number of relatively welcoming departments. For example, by 1945 there were 14 African-Americans with math PhD’s; half of them were from the
University of Michigan. This pattern of a small number of places granting a disproportionately
large percentage of degrees continues to this day, with the University of Maryland (which had
a prominent African-American mathematician, Ray Johnson, in its administration) and Howard
University accounting for a disproportionately large number of relatively recent African-American
mathematics PhD’s. 1943 saw the first African-American woman to get a PhD in mathematics
(Euphemia Lofton Hayes at Catholic University). The basic situation of African-American mathematicians in the 21st century, as far as numbers go, is about the same as situation for women
mathematicians in the beginning of the 20th century. The major difference is the lack of overt
discrimination and stereotyping, but covert discrimination and stereotyping remain, especially at
crucial early levels in K-12.
A comprehensive Web site devoted to black mathematicians is Mathematicians of the African
Diaspora, founded by Prof. Scott Williams of SUNY Buffalo:
It is encyclopedic and quite wonderful. Unfortunately, there doesn’t seem to be a central website
devoted to Hispanic or Native American mathematicians.
Because race and ethnicity are social constructs, just who counts as what is problematic, and
neither names nor life histories are helpful. For example, the mathematician Bob Megginson had
a British father and a Sioux (or maybe part-Sioux) mother; he considers himself an Oglala Sioux.
Cora Sadofsky’s name is far from Hispanic, but she was Argentinean-American (she came to the
U.S. after obtaining her PhD). Further complicating matters, from the mid-19th century to the early
20th century, people who could “pass” as white sometimes chose to do so, severing later generations
from their origins, so there are mathematicians with significant Native American, Hispanic, or
African-American ancestry who have been raised with no or little connection to their non-European
Rules of the game If a problem (or part of a problem) is marked with an asterisk (*) you can not
use the internet in any way. For unmarked problems you can look things up.
* 1. (a) In the 6th century, the Indian mathematician Aryabhata wrote Half the circumference
multiplied by half the diameter is the area of a circle. Is his statement correct? Why or why not?
(b) Arbyabhata also gave a method to approximate π: Add four to one hundred, multiply by
either and then add sixty-two thousand. The result is approximately the circumference of a circle
of diameter twenty thousand. By this rule and relation of the circumference to diameter is given.
What numerical value of π does this method give?
2. In chapter 3 we showed one way in which Archimedes calculated π. Here’s another way he
did it.
Start with a circle, inscribe a regular n-gon inside it, and circumscribe a regular n-gon outside
it, as in the picture below (in general, An is the perimeter of the circumscribed polygon, and Bn is
the perimeter of the inscribed polygon):
perimeter of large square: A4
perimeter of small square: B4
perimeter of large hexagon: A6
perimeter of small hexagon: B6
He knew that the circumference of the circle was squeezed between An , the perimeter of the
large n-gon and the Bn , the perimeter of the small n-gon.77 Using this, he got an estimate for π
using regular 96-gons.78
After that lengthy introduction, here’s the homework problem: find the formulas (figure it out or
go online) for the perimeters An and Bn (we’ve essentially already done one of these) And explain
why it works.
* 3. The year is 500 BCE. You are a Greek mathematician, currently working for the Egyptian
government as a consultant. You’ve been asked to determine the distance from a lighthouse, which
stands on a rock in the sea, to the mainland (see figure).
Distance to be determined
Explain to your Egyptian employers how you’re going to carry out the project, and why your
method works. You can assume you know the point on shore that is closest to the rock. You can
So, with both an upper and a lower bound, Archimedes could get an idea of how good his approximation was.
and no calculator.
use your compass, and you can measure as many line segments as you need, as long as they are all
between points on land. You can’t use a protractor since they haven’t been invented yet.
Bonus: How do you do it even if you don’t know where the nearest point on shore is?
[Hint: if you need a hint, ask me the Tuesday before the problem is due.]
* 4. Consider the Babylonian method of approximating r in chapter 8.79
(a) Use this to approximate 3, starting with an estimate of 1. How many iterations does it
take to be accurate to within 7 decimal places? (Use your calculator to find the correct 7 decimal
(b) What happens if you start with the really bad estimate of 5?
(c) And what happens if you start with the even worse estimate of -1?80
(d) Now try using this method to approximate −3 with a starting estimate of -1.81 What
5. (a) State the following: (i) the Goldbach conjecture; (ii) Dirichlet’s theorem on primes; (iii)
the prime number theorem; (iv) the Green-Tao theorem.
(b) In the last year there has been a tremendous leap forward in our understanding of the twin
prime conjecture. What happened and who did it?
* 6. (a) Here a test for divisibility by 3: A number is divisible by 3 iff the sum of its digits is
divisible by 3. Example: 72,465,702 is divisible by 3 because 7 + 2 + 4 + 6 + 5 + 7 + 0 + 2 =
33, which is divisible by 3, but 69,428,123 is not, because 6 + 9 + 4 + 2 + 8 + 1 + 2 + 3 = 35.
which is not divisible by 3.
Here is a test for divisibility by a mystery number m: A number is divisible by m iff the
alternating sum of its digits is 0. That is, 3,438,556 is divisible by m because 3 - 4 + 3 - 8 + 5 5 + 6 = 0, but 4,438,557 is not divisible by m because 4 - 4 + 3 - 8 + 5 - 5 + 7 = 2, which i not
divisible by m.
(a) Make a conjecture: what’s m? [Hint: experiment with numbers less than 100.]
(b) Test your conjecture on the numbers 2009, 3124, 4567, 8481, and 21,870,2088. That is, for
each of these numbers, form the alternating sum and see whether or not it’s 0. Then divide by
your conjectured value of m from (a). Does your conjecture work out?
(c) Honors problem: prove that your conjecture works. [Hint: think of multiplication as repeated
addition and use induction.82 ]
7. Do a Web search on Shor’s algorithm and quantum computing.
(a) In a short sentence or two, explain what this has to do with P = NP.
(b) If indeed quantum computing becomes practical, what will have to be done differently? (Just
mention two or three things briefly. For example — and this is not correct, by the way — “we will
have to find an alternative to using metal in cars, and an alternative to eating broccoli.”)
you can find an extended discussion at Babylon and the Square Root of 2.
Which the Babylonian wouldn’t have done because, as far as we know, they didn’t know about negative numbers.
Which, if you’d tried it in Babylonia, probably would have gotten you burned as a witch or something – −3?
What’s that?.
if you know how to use induction
* 8. The Persian Omar Khayyam (c. 1050–1130 CE), best known as a poet, was also an
outstanding mathematician. Several centuries before del Ferro (and/or Tartaglia and/or Cardano)
solved the cubic equation algebraically, Khayyam came up with a geometric solution. His solution
is non-Euclidean because it involves a parabola, but it’s not hard to see that it works.83
Khayyam considered the equation
x3 + a2 x = b
where a and b are positive real numbers. His solution is as follows (in modern coordinate notation):
1. Construct the parabola with equation x2 = ay (shown in blue below).
2. Construct a semicircle with diameter AC = b/a2 on the x-axis (shown in red below).
3. Let P be the point where the parabola meets the semicircle. Drop a perpendicular from P
to the x-axis to find the point Q.
4. Let z be the length of segment AQ.
ay = x 2
Claim: z is a solution of the equation. Verify the claim by the following steps.
(a) Prove that
z 2 = a · P Q.
(b) Prove that
(Hint: Use similar triangles and a theorem or two from Euclidean geometry.)
(c) Use (b) to write (P Q)2 in terms of a, b and z.
(d) (Before going on, take a step back and remind yourself of what Khayyam was trying to
do!) Combine the equations from parts (a) and (c) to complete your verification that Khayyam’s
construction is correct.
* 9. The Hilbert Hotel has infinitely (countably infinitely) many rooms.
(a) A guest wants to check in, but all the rooms are full. You are the manager. How can you
accommodate the new guest and not kick anyone out? [Hint: Some guests might have to change
(b) Now 100 new guests arrive. All the rooms are still full. You can still accommodate the new
guests without kicking anyone out. How?
Adapted from Burton’s HIstory of Mathematics: An Introduction, pp. 300–301.
(c) Now infinitely (countably infinitely) many guests arrive. And all the rooms are full! Yet you
can still accommodate everyone. How?
(d) Now uncountably many guests arrive, one for every real number. Can you accommodate
them? Briefly explain.
10. This problem is in three steps.
* Step 1. Do your best to write down a “random” sequence of fifteen 0’s and 1’s. No assistance
from coin-tossing or computer apps or dice or... — you have to do it out of your own head. Write
down your sequence.
Step 2. Go to, random integer generator (at and generate
fifteen random integers, with values between 0 and 1. Write down’s sequence.
Step 3. Read this blog on one way to tell if a sequence of heads and tails is random: Check your sequence and check’s sequence with this test. Give me the results. What conclusion can
you draw?
11. The normal distribution is a particular bell-shaped curve (look it up online). The central
limit theorem (discussed in the chapter on statistics and probability) says that the distribution of
the means of samples of a given size should resemble a normal distribution. The rest of this exercise
is designed to explain what the previous sentence means.
Go to’s random integer generator (see the previous problem) and generate a sequence
of 10 random integers between 0 and 1. Take the average. For example, if your sequence is
0111010000, its average is (0+1+1+1+0+1+0+0+0+0)/10 = .4.
Do this 100 times.
Plot a graph in which the x-coordinates give the possible averages (0, .1, .2, .3, .4, ... 1) and
the y-coordinates give us how many times the value occurs (a.k.a.. frequency). For example, if you
got an average of .1 thirteen times, the the frequency of .1 is 13 and the point (.1, 13) would be on
your graph.
Now connect the points on your graph as smoothly as possible.
Hand in your data (in a table like table 2) and your finished smoothly connected graph. Given
the central limit theorem, does this graph surprise you?
Table 2: sample data
12. Measure the height of the Campanile! Do it the way the ancient Greeks would have done
it: no tools other than some kind of linear measuring tool (measuring tape, ruler, etc.) and, if
it’s appropriate, compass. No trig functions! Length and area calculations are okay, as are similar
triangles and the Pythagorean theorem. Explain yourself clearly, including diagrams as needed
(explanation and diagrams are what you will be graded on). Finally, compare your result with the
official height on the KU website. (You will not be graded on how close you got, but you should
tell me anyway.)
13. In the proof of theorem 11 some parts are missing. What are they? Fill at least one of them