How to add primes Jan Vonk

Transcription

How to add primes Jan Vonk
Jan Vonk
How to add primes
Contents
1 Exponential sums
1.1
1.2
5
Motivating exponential sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.1.1
A historic example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.1.2
A motivation for exponential sums . . . . . . . . . . . . . . . . . . . . . . .
7
M¨
obius randomness
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.2.1
Discussion and consequences . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.2.2
A note on generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.3
A strategy outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
1.4
Sums involving Λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.4.1
An estimate for sums of Λ
∞
. . . . . . . . . . . . . . . . . . . . . . . . . . .
0
15
1.4.2
An estimate for a double sum with Λ
. . . . . . . . . . . . . . . . . . . . .
17
1.4.3
A consequence of both estimates . . . . . . . . . . . . . . . . . . . . . . . .
20
2 The Goldbach problem
23
2.1
The binary Goldbach problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
The ternary Goldbach problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.3
Conclusion
29
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
A The toolbox
31
B The M¨
obius function µ
34
B.1 A proof of Davenport’s result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.1.1 The case
x
τ
<q≤τ
B.1.2 The case q ≤
x
τ
36
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
B.2 Discussion of Vinogradov’s method . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
B.3 Discussion of importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
2
Introduction
When one first starts thinking about additive properties of primes, one finds himself immediately
confronted with the fundamental difficulty underlying the statements. Prime numbers are in a
certain sense meant for multiplying, not for adding. Being defined as fundamental numbers for
multiplication, they seem hard to characterise when we start adding them. The first challenge
therefore is finding a good way of adding primes.
Part 1. We will start by motivating an interest in exponential sums, and outlining our general
strategy using Fourier transforms. With this motivation, we investigate the fundamental M¨obius
function and discuss an exponential sum involving this function and its consequences. A general
heuristic is stated. We then proceed to treating sums involving another fundamental function: the
von Mangoldt function. We derive results on sums involving this function, using our results on
the M¨
obius function. At the end of the first chapter, we will also have the necessary machinery
needed in the next part.
Part 2. This part is about Goldbach’s problem. Based on a conjecture by Goldbach in a letter to
Euler, we suspect that every even integer greater than 2 can be written as the sum of two primes
(binary Goldbach problem) and every odd integer greater than 5 can be written as the sum of
three primes (ternary Goldbach problem). These questions have been confirmed to some extent.
We will prove the following celebrated theorem:
Theorem. (Vinogradov, 1937) For any fixed A > 0,
X
Λ(k1 )Λ(k2 )Λ(k3 ) =
k1 +k2 +k3 =n
1
S3 (n)n2 + O n2 (log n)−A ,
2
where the implied constant depends only on A, and



Y
Y
1
1

.
S3 = 
1−
1+
(p − 1)2
(p − 1)3
p|n
p-n
This theorem has as a consequence the asymptotic result for the ternary Goldbach conjecture. A
similar theorem for sums of two primes remains to date still unproven. The main conjecture is
Conjecture 1. For any even n ≥ 4 and fixed A > 0,
X
Λ(k1 )Λ(k2 ) = S2 (n)n + O n(log n)−A ,
k1 +k2 =n
where the implied constant depends only on A, and



Y
Y
1
1

.
S2 (n) = 
1+
1−
(p − 1)
(p − 1)2
p|n
p-n
3
This would indeed imply the asymptotic result for the binary Goldbach problem. We will prove
this conjecture for almost all even numbers, hence ’almost’ establishing the conjectured asymptotic
result. A full proof remains as to date still unknown.
A note on the implied constant. In analytic number theory, one will almost exclusively deal
with asymptotic results. This means certain results hold ’from some number onwards’. Depending
on the method of proof, it might be possible (although it is certainly no trivial task) to determine
this number exactly. For example, Vinogradov [9] showed as discussed above that every odd
number, from a certain number onwards, can be written as the sum of three primes. Borodzkin
15
[2] showed that the number can be taken to be 33 . This is still beyond all hopes of checking the
remaining cases.
An example. The reader who is unfamiliar with asymptotic results might find it useful to see
an example. This example will be of use to us later. Imagine we want to count the number of
solutions (x, y) ∈ N2 such that ax + by = n, for a given a, b, n ∈ N with (a, b) = 1. We could try
and find explicit examples, and quickly find that for (a, b, n) = (2, 3, 5) we have 1 solution. Or
picking (a, b, n) = (1, 3, 7) gives us 3 solutions. One easily sees that it suffices to solve ax ≡ n
n
. Here
(mod b) for 0 ≤ x ≤ na . Instead of trying to make this explicit, we say this is roughly ab
n
roughly means not deviating more than a constant value from ab (not necessarily an integer). We
restate this in the language of this essay as
X
n
1=
+ O(1).
ab
ax+by=n
We use the definition of O which can be found for example in [1], [4], or [6]. Notice the convenience
of this approach. Any explicit formula is bound to get too complicated so as to cause unclarity
in presenting calculations. In the end the result will not be greatly affected by using the explicit
formula rather than the ’roughly correct’ answer. Furthermore, allowing one of x, y to be zero
or not is irrelevant, since it will only change the constant. We choose therefore to work with Onotation, making the arguments cleaner and giving us more computational comfort. The price we
pay is the unknown implied constant, possibly taking on monstrous proportions as in our previous
remark.
Acknowledgements. We mainly follow Iwaniec and Kowalski [8] in most proofs. The rest of the
essay is a collection of ideas taken from various sources, notably Davenport [4], Apostol [1], and
Green [6].
4
Part 1
Exponential sums
1.1
Motivating exponential sums
This section is mainly intended to illustrate the importance and outline the general procedure
of using exponential sums to prove certain claims. We will be inspired by the theory of Fourier
transformation on locally compact abelian groups. Our hope is to convince the reader that it is
important and maybe even natural
P to try and obtain good results about a function f by finding
estimates for sums of the form n f (n)e2πinα , for various values of α.
1.1.1
A historic example
Quadratic reciprocity. Suppose we want to prove the quadratic reciprocity law, which states
that for any distinct odd primes p, q we have
p−1 q−1
p
q
·
= (−1) 2 · 2 .
q
p
Since this theorem deals with Legendre symbols, we want to get to know them better. To do that,
we wish to apply classical analysis and the multitude of tools available there. We see ourselves
confronted with the fact that Legendre symbols are defined on finite groups of residue classes, and
do not live in the realm of classical real or complex analysis. Wewant tomake
them continuous
somehow, and do it in a symmetric way that does not favour p2 above p3 , say. We choose to
5
examine (consider it an experiment at this point)
p−1 X
2πim
m
G=
e p .
p
m=1
This choice seems promising, since exponential functions behave nicely when analytically manipulated, and the periodicity of the Legendre symbol is reflected in the periodicity of the exponentials.
It is called a Gauss sum, honouring the man who discovered a considerable amount of their properties and applications.
p−1
Setting = (−1) 2 , simple computation shows that G2 = p. Note that this does not uniquely determine G. Finding the sign of G took Gauss several years, and any such problem must unarguably
be hard. It turns out that
√
p
√
G=
−p
if p ≡ 1 (mod 4)
if p ≡ 3 (mod 4)
Dirichlet proved this in more generality with analytic methods. In fact, he evaluated the more
general sum
S :=
n−1
X
e
2πik2
n
.
k=0
One can easily rewrite G to see that it is nothing more than the special case n = p. Dirichlet then
rewrote the definition of a Gauss sum using an identity commonly
Poisson’s summation
2called
,
he
obtains
formula (see Appendix A). Applying this formula for f (x) = cos 2πx
n
S=
+∞ Z
X
1
e2πin(x
2
+tx)
dt.
0
i=−∞
After rewriting this result with various substitutions and evaluations, he ended up with
−n
S = n(1 + i
Z
+∞
)
2
e2πint dt,
−∞
reducing the problem to integration, a theory which is very well developed and confronts us with
little problems. Indeed, if one is familiar with the theory of Lebesgue integration, the above integral
6
is an easy exercise. In fact, the simple substitution y = √un reduces the integral to a rather famous
one. In particular, one obtains the above result for G. For full details, see [4, Chapter 2].
Finally, we look at the behavior of a prime q in the extension Q(ζp )/Q. We know that K := Q(G)/Q
is the unique quadratic subfield, since Gal(Q(ζp )/Q) ∼
= (Z/pZ)× . Now we look at the splitting of
q in two ways:
• By Kummer-Dedekind (note that q - [OK : Z[G]] since 2 6= q), q splits completely in
if
K/Q
p−1 q−1
p
2
·
and only if x −p has a root mod q. Clearly this happens if and only if (−1) 2 2
q = 1.
• Since Gal(Q(ζp )/K) is the unique subgroup of index 2 in Gal(Q(ζp )/Q)∼
=(Z/pZ)× , it is
exactly the subgroup of squares. Therefore q splits in K/Q if and only if pq = 1.
This proves the quadratic reciprocity law.
1.1.2
A motivation for exponential sums
Inspired by the success of this approach, we try to imitate it in a general setting. The reader might
object that all the magic happens in the algebraic number theory, which will be hard to generalise
to apply to, say, the M¨
obius function. A valid guess, but it seems that the only effect of the use of
deeper theorems is a shorter proof. The true core of the proof seems to lie in the determination of
the sign of G, merely sketched here for brevity.
Let us try and find the ideas in the preceding proof that made everything work. We found some
crucial piece of information by taking the function under consideration (the Legendre symbol) and
associating to it a new number (the Gauss sum). The latter has the fortunate property of being
suitable for the use of analytic tools (the Poisson summation formula). We then have to find a
way of translating the information back to our original function (here from the Gauss sum to the
Legendre symbol).
How do we associate a ’Gauss sum’ to our ’Legendre
symbol’ ? A method is suggested by the
theory of Fourier transformation. Indeed, setting p0 = 0, the Legendre symbol is a function
f : Z/pZ → C, and hence its Fourier transform is
fˆ(r) =
p−1 X
2πirm
m
e p
p
m=0
where fˆ(1) = G. So we could try and generalise.
7
General strategy. Given an interesting well behaved arithmetic function f : Z → C, we define
fˆ : R → C as
fˆ(α) =
X
f (n)e2πinα .
n∈Z
This function is attacked using our extensive toolbox from analysis. As noted before, this theory
is well developed and hence might enable us to find a considerably important piece of information.
Obtained information about fˆ will transform back, using Fourier inversion, to information about
f , our object of interest. In this fashion we will be able to subtitute our less developed theory for
the realm in which f lives, to a theory we understand better and have more experience in. These
two worlds are connected by the theory of Fourier transformation.
Application to the M¨
obius function. This looks like a promising idea. It will however require
some thought to make it work in concrete situations. Let us investigate one. The M¨obius function
µ seems, in comparision to the Legendre symbol, to live on a much more fundamental level. It is
defined as
µ(n) =
(−1)r
0
if n is the product of r distinct primes
otherwise
This function shows up in various places in number theory. Notably, it plays a central role in
the theory of Dirichlet convolution, resulting in useful theorems as M¨obius inversion that allow
us to restate the definition of many interesting arithmetic functions in terms of µ. It has many
other equivalent definitions, and remarkable is how easily it can be defined in terms of divisibility.
We therefore expect this function to be very fundamental and would like to know more about it.
Since we do not want to worry about convergence, we try to adapt our ’Fourier’-attack and instead
consider
X
µ(n)e2πinα .
n≤x
Notice that we have two very good reasons to assume that an analysis of this quantity will be
considerably harder.
Firstly, we have an unnatural cutoff x.
Secondly, the M¨
obius function is much harder to grasp and as remarked more fundamental. It will
contain a considerable amount of information regarding divisibility of numbers, and will therefore
not likely give away its secrets easily.
These comments make us aware that it might not be possible to give an explicit evaluation of our
sum for a given x and α, as we did before in the particular case of the Gauss sum. Instead, we
aim towards finding good estimates of the sum. This is exactly our goal in the next section.
8
1.2
M¨
obius randomness
In this section we will discuss the chosen exponential sum in the M¨obius function µ. All results
stated here are roughly the prerequisites of this essay. Since the ideas are fundamental and form
the true core of the results, their proofs are sketched in the appendices. The main theorem is
Theorem 1. For any real α, any A > 0 and x ≥ 2, we have
X
−A
2πiαm =
O
x
(log
x)
.
µ(m)e
m≤x
The implied constant only depends on A.
The first proof was found by Davenport [3, Theorem 1], based on the technique used by Vinogradov
[9] in his solution for the ternary Goldbach problem. However, we feel that it is more natural to
turn things around when adapting the viewpoint of exponential sums as we do. We therefore
give a proof of Vinogradov’s theorem (see Chapter 2) in a more modern setting, where we base
ourselves on Theorem 1. Since Vinogradov’s ideas are fundamental, we decided to sketch the proof
of Theorem 1 in Appendix B. Full details can be found in [3], and [8] for a more modern treatment,
which we will follow.
1.2.1
Discussion and consequences
In a way, this theorem can be considered the true core of many theorems in number theory.
Examples of rather impressive corollaries (however untrivial) are the prime number theorem and
Vinogradov’s theorem, as we will see in Chapter 2. For the latter, we will recast the theorem in
the following essentially equivalent form that will be used later on.
Corollary 1. For any real α, A > 0 and x ≥ 2, we have
X
−A
2πiαm µ(m)
log(m)e
=
O
x
(log
x)
,
m≤x
The implied constant only depends on A.
Remark. This is a typical example of an essentially equivalent reformulation that follows straightforwardly from partial summation. The indeterminacy in the log-factor absorbs any extra such
factors that arise, resulting
in an unchanged error term. This applies in general to estimates with
error term O x(log x)−A , making an insertion of extra log-factors in the sum possible. This error
term is more common than one might expect. It arises for whole families of sums in a result of
Green and Tao we will mention at the end of this section.
9
Proof.
Summation by parts (see Appendix A) gives us
X
µ(m) log(m)e2πiαm = log x
m≤x
X
µ(m)e2πiαm −
Z
x

X

1
m≤x

m≤y
1
µ(m)e2πiαm  dy.
y
This last integral is hard to calculate, but easy to estimate from above, keeping in mind that the
logarithm is stricly increasing. The extra log-factor is absorbed in the error term we obtain from
Theorem 1, giving the desired result.
We will conclude this section by presenting the prime number theorem, which we all know very
well, as a corollary of Theorem 1. Define first the von Mangoldt function as
Λ(n) =
log p if n = pm for some prime p and some m ≥ 1
0
otherwise.
This function can be related to µ, since
Λ(n) =
X
d|n
P
d|n
Λ(d) = log n implies by M¨obius inversion that
µ(d) log
X
n
=−
µ(d) log d.
d
d|n
Remark. This function will play a crucial role in our attack of the Goldbach problem. For now it
turns out a convenient counter for the abundancy of primes. In the next section, we will emphasise
how it incorporates the multiplicative information of primes in an additive way.
X
Corollary 2. Let ψ(x) =
Λ(n), then for any A > 0 we have
n≤x
ψ(x) = x + O x(log x)−A .
A proof outline based on Theorem 1 can be found in Appendix B. By estimating our exponential
sum in µ, we obtain crucial information which is translated to the above result. We are encouraged
to try our luck again with some function different from µ. When we were confronted with quadratic
reciprocity, we identified the Legendre symbol as the fundamental function and investigated the
associated Gauss sum. Now our ultimate goal is trying to prove the Goldbach conjecture. In the
next section, we identify the fundamental function Λ and investigate an exponential sum which
will be the analogue of the Gauss sum.
Remark. We recall that there are various stronger forms of the prime number theorem whose
proofs generally depend on the location of zeroes of L-functions. However, elementary proofs are
sometimes possible. We will state a stronger version which will be used later on. Note that even
10
stronger results have been obtained, but we choose to state this one since it is sufficient for our
later estimates.
We define for any (a, q) = 1 the function
X
ψ(x; q, a) =
Λ(n).
n≤x
n≡a (mod q)
Corollary 3 (Prime number theorem). For any fixed A > 0 we have, with (a, q) = 1, that
ψ(x; q, a) =
x
+ O x(log x)−A .
ϕ(q)
Remark. This form of the prime number theorem can be interpreted as the statement that the
prime numbers are uniformly distributed over the invertible residue classes modulo q. Of course
there is at most one prime in the other residue classes since (a, q) is a divisor of any number in
such a class.
1.2.2
A note on generalisation
This is the place to discuss a general heuristic. As it turns out, the M¨obius function
P µ behaves
randomly in the following way: If an is any ’reasonable’ sequence, then the sum n≤x µ(x)an is
very ’small’ due to ’cancellation’ of terms by the flipping sign of µ. This is a good heuristic to
keep in mind when estimating sums involving the M¨obius function, and indeed we will let it guide
us frequently throughout this essay.
Of course we need to find a rigorous result every time we consider a concrete sequence an . One
method we could adapt is the so called Vinogradov’s method. We will discuss the method in some
generality in Appendix B, after applying it to the sequence an = e2πiαn when proving Theorem 1.
It gives a fairly explicit description of a method that seems to give us the desired results in many
cases.
When working with this heuristic, we need some feeling on what sequences turn out to be ’reasonable’ and how ’small’ exactly the resulting sum
is. Precise meanings can be given to these words
to some extent. If we consider O x(log x)−A for every A > 0 as ’small’, then Theorem 1 says
that an = e2πinα is ’reasonable’ for any α. Green and Tao’s result [5] presents a much larger class
of ’reasonable’ sequences. The mere statement of their result requires deep definitions and will
not be discussed in detail here. The reader should however be aware of the fact that there is a
huge family of sequences for which our heuristic is proven to hold when ’reasonable’ and ’small’
are defined as above. This heuristic seems to have almost unlimited power and it is not clear what
the boundaries are for a general meaning of ’small’ and ’reasonable’, and even less how we could
prove it.
11
1.3
A strategy outline
Our ultimate goal is obtaining additive properties of prime numbers. As we noted in the introduction, the first challenge is to find a convenient way of adding primes, which contains their
characteristic multiplicative behaviour. The logarithm seems to have the convenient mechanism
of converting products to sums. The von Mangoldt function Λ seems the natural setting. Indeed,
notice that
X
log n =
Λ(d),
d|n
which can be seen as an additive translation of prime factorisation. Hence Λ seems to contain
all information about factorisation, while behaving additively. The function has proven to be a
convenient way of keeping track of primes in the above formulations of the prime number theorem.
We are hence fully convinced that the true opproach to the Goldbach problem, if existent, might
lie in the von Mangoldt function.
Alternatives. Since we are initially only interested in adding prime numbers, the fact that Λ
is nonzero at every proper prime power might seem problematic. We will explain in Chapter 2
how this can be solved. However, at this point, one could choose many different functions to get a
grip on prime numbers. The results we obtain here with Λ can sometimes be reformulated using
a different function. An example of this
formulation of the prime
P phenomenon is an equivalent
P
number theorem in terms of π(x) = p≤x 1 instead of ψ(x) = n≤x Λ(n). The function π(x)
has the advantage of having a clear interpretation, and the disadvantage of being hard to grasp in
computations. However, we can often freely switch between π and ψ with the aid of straightforward
estimates. Our motivation makes us choose for Λ, giving more computational comfort and I dare
say a more natural approach.
Natural approach. Back to the Goldbach problem. According to our general principle, we feel
like attempting to find good estimates for
T (α) =
X
Λ(n)e2πinα .
n≤x
We expect that T (α) contains crucial information on the additive properties of primes. This is
indeed the case, and if we specify our interest to the Goldbach problem, we can find an easy way
of applying an estimate of T (α). Indeed, once we have a sufficiently sharp estimate, we consider
ourselves brave enough to investigate
S2 (n) :=
X
Λ(k1 )Λ(k2 )
and S3 (n) :=
k1 +k2 =n
X
k1 +k2 +k3 =n
12
Λ(k1 )Λ(k2 )Λ(k3 ).
These functions count the number of representations of n as the sum of two (resp. three) prime
powers in a weighted manner. Proving that they are > 0 for the desired values proves the Goldbach
problems. Moreover, a decent understanding of them gives us knowledge on the number of representations, a much deeper question. So how do we estimate these quantities using our estimate of
the exponential sum above? Note that a good estimate for S2 (n) implies a good estimate for
S3 (n) =
X
S2 (m1 )Λ(m2 ).
m1 +m2 =n
This is in accordance with the fact that the binary Goldbach problem is clearly the stronger
statement. Furthermore, we have that
X
Z
Λ(k1 )Λ(k2 ) =
1
T (α)2 e−2πinα dα.
0
k1 +k2 =n
k1 ,k2 ≤x
This is in essence the idea of Fourier inversion, and it gives us a very concrete approach to the problem. In fact, T (α) can indeed be sufficiently sharply bounded, allowing a full proof of Vinogradov’s
theorem. Full details can be found in [9] and [4, Chapter 26].
Our approach. The above approach is exactly what we desired to motivate in this essay, and
works fine in obtaining our desired results. However, we shall present a different approach that
relies on Theorem 1. The intention is to show how the ’randomness’ of the M¨obius function is in
a certain sense the fundamental issue. We already mentioned that the prime number theorem can
be derived from it (see Appendix B) and we will base our proof of Vinogradov’s theorem on it.
The strategy we will follow is a direct one. We will deduce the necessary information about Λ from
our estimate in Theorem 1, both functions being connected by
Λ(n) =
X
µ(d) log
d|n
n
.
d
In this way, we will directly estimate S2 (n). These estimates will finally be used to come to a
serious attack on the Goldbach problem in Chapter 2.
13
1.4
Sums involving Λ
Let us reflect on how to estimate
S2 (n) :=
X
Λ(k1 )Λ(k2 ).
k1 +k2 =n
P
We remember that Λ(n) = − d|n µ(d) log d. Notice that when d is large, we expect the randomness of the M¨
obius function to cancel out lots of the terms, and that the small terms will have
most influence. If we have a chance of successfully estimating S2 (n) through the use of our result
on the M¨
obius function, we would better split up the defining sum for Λ(n).
We now come to formal definitions. For any k ≥ 2, define
Λ0 (n) := −
X
µ(d) log d
d|n
d≤k
and
Λ∞ (n) := −
X
µ(d) log d,
d|n
d>k
respectively. We have chosen not to mention k explicitly in this notation, so the reader is warned
that the choice of a fixed k is implied in these quantities.
Since our goal is to estimate
S2 (n) =
X
k1 +k2 =n
Λ0 (k1 )Λ0 (k2 ) + 2
X
Λ0 (k1 )Λ∞ (k2 ) +
k1 +k2 =n
X
Λ∞ (k1 )Λ∞ (k2 ),
k1 +k2 =n
we will try and obtain good estimates for the inner terms of this sum. Notice that we expect our
estimates involving Λ∞ to be more accurate and general, since we expect Theorem 1 to bound
things sharply.
14
1.4.1
An estimate for sums of Λ∞
Following our general motto, we will try and obtain information about Λ∞ by considering the sum
S ∞ (α) =
X
Λ∞ (n)e2πinα .
n≤x
The next lemma gives a good estimate of this quantity.
Lemma 1. For any k ≥ 2, α ∈ R and A > 0, we have
|S ∞ (α)| = O x log x(log k)−A ,
the implied constant depending only on A.
Remark. Note that S ∞ (α) implicitly depends on k, although not explicitly mentioned.
Proof.
We have


S ∞ (α) = −
X X
 2πinα

µ(d) log d
.
e

n≤x
d|n
d>k
Writing n = dd0 , we get
S ∞ (α) = −
X X
0
µ(d) log d e2πidd α .
d>k d0 ≤ x
d
Interchanging the order of summation now gives us
S ∞ (α) = −
X
X
0
µ(d) log d e2πidd α .
x
d0 < x
k k<d≤ d0
P
P
Now by by the inequality | i ai | ≤ i |ai | , we obtain using Corollary 1,


−A
X x x
,
|S ∞ (α)| = O 
log 0
0
d
x d
0
d <k
15
the implied constants only depending on A > 0. Note that we make use of the uniformity in α
here. The usage of Corollary 1 is justified because letting the sum run over k < d ≤ dx0 gives the
1
= O(log m), this is easily seen to be
same estimate. Now using log dx0 > log k and 1 + 12 + . . . + m
the desired result.
Remark. The step of finding a good estimate for the exponential sum is done in the previous
lemma. Now the challenge consists of finding a suitable application of this estimate that gives us
the information we need about the function Λ∞ .
Lemma 2. For any vectors u, v ∈ Cn and fixed A ≥ 0, we have
X
−A
.
ua vb Λ∞ (c) = O kukkvkn log n (log k)
a+b+c=n
Furthermore, the implied constant can be chosen to depend only on A.
Remark.
R 1 2πinx The proof is straightforward estimation, cunningly invoking the orthogonality relations
e
dx = 1 if n = 0 and 0 otherwise. This is exactly what we expect, since this step
0
corresponds to the Fourier inversion in our general strategy. Closely investigating the proof will
make this apparent, and remind us that in fact we are applying our attack using Fourier analysis.
Proof of Lemma 2.
Z
1
Using the orthogonality relations, we obtain that the sum equals


X

0
m1 ≤n
um1 e2πiαm1  

X
vm2 e2πiαm2  S ∞ (α)e−2πinα dα.
m2 ≤n
The uniformity in α of our estimate for S ∞ (α) allows us to integrate it. We hence obtain the
upper bound



Z 1
X
X
2πiαm1  
2πiαm2  −2πinα
· O n log n(log k)−A ,

u
e
v
e
e
dα
m1
m2
0
m1 ≤n
m2 ≤n
for any A > 0, the implied constant only depending on A. By the Cauchy-Schwarz inequality, we
have the upper bound

Z


0
1
2
2  21
Z 1 X
X

2πiαm1 2πiαm2 −A
um1 e
um2 e
.
dα ·
dα · O n log n(log k)
0 m ≤n
m1 ≤n
2
16
The statement now just follows from Parseval’s identity (see Appendix A), or immediately from
the orthogonality relations.
1.4.2
An estimate for a double sum with Λ0
Having obtained a very general result estimating sums involving Λ∞ , the task remains to do the
same for Λ0 . Because for large values of n, the expression for Λ0 contains relatively very few terms,
we do not in general expect a lot of ’cancellation’, hence a general result with a sharp bound as
we obtained for Λ∞ seems somehow no triviality. Therefore we will relax the generality and focus
on a more specific sum which is of fundamental importance to the Goldbach problem. Recall from
the introduction that an attractive idea to attack this problem is considering the sum
X
Λ(k1 )Λ(k2 ).
k1 +k2 =n
The main result in this section is
Lemma 3. For k ≥ 2, and fixed A ≥ 0,
X
1
Λ0 (k1 )Λ0 (k2 ) = S2 (n)n + O n(log k)−A + τ (n)nk − 3 + k 3 ,
k1 +k2 =n
the implied constant depending only on A.
Remark. We defined in the introduction

Y
S2 (n) = 
1+
p|n

1

(p − 1)
Y
1−
p-n

1
.
(p − 1)2
This function might seem rather hard to interpret. Note that plugging in an odd number yields 0.
However the growth of this function for even n is easily estimated from the equivalent form
Y p−1
2
p−2
p|n
p>2
Y
1−
p>2
1
(p − 1)2
!
=C
Y p−1
.
p−2
p|n
p>2
This infinite product is clearly convergent, since the sum of the squared reciprocals of the natural
numbers is convergent, justifying the constant C.
17
To estimate the product appearing here, some trivial estimates suffice to see that
1≤
Y p−1
≤ n,
p−2
p|n
p>2
hence S2 (n), however badly behaved, is controlled by two basic functions. Finally, the reader
verifies easily that
S2 (n) =
X µ2 (d)d X
d|n
ϕ2 (d)
(c,d)=1
µ(c)
,
ϕ2 (c)
which is the form in which this function will naturally arise in later estimates.
Remark. Notice that considering the estimate for Λ∞ , by giving up some generality, a sharp
bound was still obtained for our sum in Λ0 . The error term might seem rather frightening, but
with a suitable choice of k, things simplify greatly. The proof of this estimate is rather long and
technical. It seems to consist of little more than clever algebraic manipulations that allow one to
apply the prime number theorem.
Proof of Lemma 3.
estimate. We have
Let us try and rewrite the left hand side to a form which is easier to
X
Λ0 (k1 )Λ0 (k2 ) =
k1 +k2 =n
X
k1 +k2 =n




X
X
.


µ(d
)
log
d
µ(d
)
log
d
2
2
1
1



d2 |k2
d2 ≤k
d1 |k1
d1 ≤k
Writing d1 d01 = k1 and d2 d02 = k2 , we obtain
X
µ(d1 )µ(d2 ) log d1 log d2
n
[d1 ,d2 ]
1.
d1 d01 +d2 d02 =n
d1 ,d2 ≤k
The inner sum is
X
+ O(1) whenever (d1 , d2 ) | n, according to our very first example.
18
So if we set
X
S2 (n, k) =
d1 ,d2 ≤k
(d1 ,d2 )|n
µ(d1 )µ(d2 )
log d1 log d2 ,
[d1 , d2 ]
we have
X
Λ0 (k1 )Λ0 (k2 ) = nS2 (n, k) + O((k log k)2 ).
k1 +k2 =n
We have now reduced our task to estimating the quantity S2 (n, k). This notation is suggestive,
and as it turns out this definition, although depending on k, will turn out to be a good estimate of
the quantity S2 (n) defined before. We will first estimate S2 (n, k), and then show how it is related
to S2 (n).
Estimating S2 (n, k). We rearrange to
2

S2 (n, k) =
X µ(d) X µ(cd) 
 X

d
c2 
k
k
d|n
c≤ d
m≤ cd
(m,cd)=1

µ(m)

log cdm ,
m

since this contains an inner sum which is very convenient to estimate. This identity can be checked
by direct verification and some coffee.
√
For cd > k, we have by the trivial estimates µ(x) ≤ 1 and log cdm ≤ log k that this contribution
is dominated by
X X (log k)4
2τ (n)
≤ √ (log k)4 .
2
c d
√
k
d|n
cd> k
√
For cd ≤ k, we use log cdm = log cd + log m, and on both pieces we apply some trivial estimates
and the prime number theorem rewritten with M¨obius functions to show that
−
X
k
m≤ cd
(m,cd)=1
µ(m)
cd
log cdm =
+O
m
ϕ(cd)
19
τ (cd)cd
−A
(log k)
,
ϕ(cd)
the implied constant depending solely on the choice of A.
Combining both, we get the estimate
X X µ(d)µ(cd)d
τ (n)
4−A
4
S2 (n, k) =
+ O (log k)
+ √ (log k) .
ϕ(cd)2
√
k
d|n
cd≤ k
√
Linking S2 (n, k) to S2 (n). Notice that in the above expression, replacing the sum over cd ≤ k
with a sum over all c, we make a really small mistake, absorbed in the error term which is already
present. Doing this, we approximate S2 (n, k) by
S2 (n) =
X µ2 (d)d X
d|n
ϕ2 (d)
(c,d)=1
µ(c)
.
ϕ2 (c)
Now we have obtained the required dominant term. Our error term is now
τ (n)n
O n(log k)4−A + √ (log k)4 + (k log k)2 ,
k
the implied constant only depending on A > 0. Hence we obtain the result, upon simplifying
slightly keeping in mind that log k is smaller than any power of k for k large enough.
1.4.3
A consequence of both estimates
We now close in on the Goldbach problem. Define the quantity
S2 (n) :=
X
Λ(k1 )Λ(k2 ).
k1 +k2 =n
The following theorem will be the core of the proof of all results on the Goldbach problems
appearing in the next part.
Theorem 2. We have for any A > 0 and un ∈ C that
X
X
3
un S2 (n) =
un S2 (n)n + O kukx 2 (log x)−A ,
n≤x
n≤x
the implied constant depending only on A.
20
Remark. To obtain this result we will specify k and apply the two estimates we obtained before
for Λ∞ and Λ0 . The following proof seems to consist of little more, we have done all the hard work
already.
Proof of Theorem 2.
We start by recalling
X
X
!
n≤x
un S2 (n) =
X
un
n≤x
0
X
0
Λ (k1 )Λ (k2 ) + 2
k1 +k2 =n
0
∞
Λ (k1 )Λ (k2 ) +
k1 +k2 =n
X
∞
∞
Λ (k1 )Λ (k2 ) .
k1 +k2 =n
1
The strategic choice at this point is k = x 4 .
Step 1. Note that Lemma 3 allows us to estimate
X
n≤x
un
X
Λ0 (k1 )Λ0 (k2 ) =
X
3
un S2 (n)n + O kukx 2 (log x)−A ,
n≤x
k1 +k2 =n
for any A > 0. This deserves some explanation. The leading term is straightforward, while the
error term is in this case a sum over
1
3
un · O n(log x)−A + τ (n)nx− 12 + x 4
This is estimated using the discrete Cauchy-Schwarz inequality. The first and last terms can be
dealt with by straightforward estimates. For the middle term, we need a slightly more sophisticated result analogous to Dirichlet’s formula (see Appendix A). Indeed, it can be proven fairly
elementarily (see [8][Chapter 1]) that
X
τ (n)2 = O x(log x)3 .
n≤x
Applying this to the second term we obtain an error term of
3
3
1
5
3
O kukx 2 (log x)−A + kukx 2 − 12 log x + kukx 4 = O kukx 2 (log x)−A .
21
Step 2. According to Theorem 2, we have
X
un
n≤x
X
Λ0 (k1 )Λ∞ (k2 ) = O kukkΛ0 kx(log x)−A .
k1 +k2 =n
1
Now clearly kΛ0 k = O x 2 log x , which can be derived in numerous ways. Most directly, it follows
from the prime number theorem that
X
Λ(n)2 ≤ log x
X
Λ(n) = O (x log x) ,
n≤x
n≤x
giving us a contribution of
3
O kukx 2 (log x)1−A ,
for any A > 0.
1
Step 3. Similarly to the previous step, estimating kΛ∞ k = O x 2 log x .
The above three steps show that
X
n≤x
un S2 (n) =
X
3
un S2 (n)n + O kukx 2 (log x)−A .
n≤x
Remark. Notice how we have the uniformity in α in every step.
22
Part 2
The Goldbach problem
In the first part, we already found ways of estimating sums involving the von Mangoldt function,
as an attempt to gather useful information for proving the Goldbach conjecture. We now turn to
investigate the growth of
S2 (n) =
X
Λ(k1 )Λ(k2 )
X
and S3 (n) =
k1 +k2 =n
Λ(k1 )Λ(k2 )Λ(k3 ).
k1 +k2 +k3 =n
These functions count the number of representations of n as the sum of two (resp. three) prime
powers in a weighted manner. Proving that they are strictly greater than 0 for all appropriate n
would prove the Goldbach problems.
Remarks. The reader might have three immediate objections at this point.
1. The weights seem inconvenient and have no easy interpretation.
2. We are being too ambitious in trying to find the (weighted) number of representations.
3. The fact we allow all prime powers seems a problem.
For the first inconvenience, we can remind the reader of our earlier remarks on the von Mangoldt
function being the natural choice in this setting. The weight will turn out to be convenient in the
proof, making the prime numbers more flexible in algebraic manipulations.
The second objection might well be true for the binary case, it is simply an underestimation of our
estimates in the ternary case, as we will see shortly.
23
As for the third objection, we remark that the prime powers are very sparse compared to the
primes, and the inconvenience is not as terrifying as it might seem.
√
Consider S2 (n). Say k1 is a proper prime power, then we have at most n choices for k1 . Once
2
k1 is chosen, k2 is fixed. Each of the choices√contributes
at most (log n) to S2 (n). We conclude
2
that the proper prime powers contribute O n(log n) to S2 (n).
Similarly for S3 (n), except that for fixed k1 , we
can pick k2 , hence fixing k3 , in at most n
3
3
ways. Now the proper prime powers contribute O n 2 (log n) to S3 (n). Notice that taking the
particular order of the prime powers into account does only multiply the estimate by at most a
constant.
If we succeed in obtaining an estimate for S2 and S3 with a dominant term which is larger, we
do not need to worry about the proper prime power contribution. This allows us to benefit from
the nice properties of Λ in calculations, getting results in which we may as well ignore the proper
prime powers.
24
2.1
The binary Goldbach problem
Definitions. As before, we define
S2 (n) :=
X
Λ(k1 )Λ(k2 ).
k1 +k2 =n
For any x, define the exceptional set to be the set of even natural numbers ≤ x that are not the
sum of two primes. The size of the exceptional set is E(x).
In this section, we will try and prove the Goldbach conjecture. Even though this goal will not be
achieved, we will prove the conjecture from the introduction for almost all integers. Even though
this does not settle the Goldbach conjecture, we will prove that the exceptional set has density
zero. First we remind the reader of the main conjecture.
Conjecture 2. For n ≥ 4 and fixed A > 0,
X
Λ(k1 )Λ(k2 ) = 2S2 (n)n + O n(log n)−A ,
k1 +k2 =n
where the implied constant depends only on A, and



Y
Y
1
1

.
1−
1+
S2 (n) = 
(p − 1)
(p − 1)2
p|n
p-n
The proof of this conjecture is not within reach with our current estimates. However, it follows
straightforwardly that it ’almost holds’ in the sense that for every possible choice of the implied
constant, the set of exceptions has density zero. More precisely, we have
Theorem 3. For a fixed A, C, x > 0, the number of integers 4 ≤ n ≤ x for which
|S2 (n) − S2 (n)n| > Cn(log n)−A ,
is of size
O x(log x)−A ,
where the implied constant depends only on A, B.
Proof. We apply Theorem 2 with un = S2 (n) − S2 (n)n and the exponent of the log-factor
This gives us
X
2
(S2 (n) − S2 (n)n) = O x3 (log x)−3A ,
n≤x
25
3A
2 .
where the implied constant depends only on A > 0. Trivially, every term is ≥ 0. The terms
2
corresponding to an n as described in the statement have (S2 (n) − S2 (n)n) > C 2 n2 (log n)−2A ,
hence if we call EA,C (x) the number of such n we clearly have

EA,C (x)
C 2 (log x)−2A
X
n2 = O 
i=1

X
(S2 (n) − S2 (n)n)
2
= O x3 (log x)−3A ,
n≤x
from which the desired clearly follows.
We now proceed to proving that the exceptional set has density zero. More precisely, we will prove
the following theorem.
Theorem 4. For any A > 0, we have
E(x) = O(x(log x)−A ).
Proof. Any integer n is not the sum of two prime powers if and only if S2 (n) = 0. The set for
which S2 (n) = 0 is by no means of size as great as E(x), but due to our previous remark on the
sparseness of prime powers, we√can applya similar trick as in the proof of last theorem, keeping in
mind that S2 (n) = S20 (n)+O n(log n)2 , with S20 (n) defined similarly to S2 (n) while only taking
primes into account and neglecting proper prime power contributions. We arrive analogously at
E(x)
X
i2 = O x3 (log x)−2A .
i=1
The sum on the left hand side is O(E(x)3 ). This implies the result.
Remark. Notice how easily we got rid of the unwanted prime power contribution in the proof.
As long as the error term is large enough, we are free to ignore proper powers as above. The set of
proper prime powers is indeed very sparse. So we have the benefit of working with the convenient
von Mangoldt function in our calculations, and can easily get rid of its less convenient properties
in the given context.
Remark. This result shows that the counterexamples to the binary Goldbach conjecture have
natural density zero. Of course the theorem is more precise than this statement. It seems as close
as we can get to a proof of the full conjecture. More refined versions are available now, but they
would require a lot more machinery. We will discuss later why our proof has no real chance of
going all the way.
26
2.2
The ternary Goldbach problem
As before, define
X
S3 (n) :=
Λ(k1 )Λ(k2 )Λ(k3 ).
k1 +k2 +k3 =n
In this section, we will find the growth of this function, with a small error term compared to the
dominating term for odd n. This implies that for large enough odd n, this function is strictly
positive, hence proving the asymptotic result for the ternary Goldbach conjecture. Moreover, we
have established the growth of the number of weighted representations.
Theorem 5 (Vinogradov, 1937). For any fixed A > 0, we have
S3 (n) =
where
1
S3 (n)n2 + O n2 (log n)−A ,
2

Y
S3 = 
1−
p|n


Y
1
1

.
1+
(p − 1)2
(p − 1)3
p-n
Remark. As we did for S2 (n), we can do a similar analysis to obtain that the rather wildly
behaved function S3 (n) is of something between constant and linear growth. For even n, we have
S3 (n) = 0, in which case the dominant term would vanish, giving us a useless result. Furthermore,
we have the identity
X µ2 (d)dµ(c)
S3 (n) =
.
ϕ3 (d)ϕ2 (c)
(d,cn)=1
Proof.
obtain:
√
By applying Theorem 2 to um = Λ(n − m), using as before that kΛk = O ( n log n) we
S3 (n) =
X
Λ(m)S2 (n − m)(n − m) + O n2 (log n)−A .
m<n
We remember that
X µ2 (d)d X
d|m
ϕ2 (d)
(c,d)=1
µ(c)
= S2 (m),
ϕ2 (c)
27
hence the above may be rewritten by changing the order of summation into
S3 (n) =
X µ2 (d)dµ(c)
ϕ2 (d)ϕ2 (c)
(c,d)=1
X
Λ(m)(n − m) + O n2 (log n)−A ,
m<n
m≡n (mod d)
the purpose of this rearrangement being that the inner sum can be estimated easily. Indeed, if we
use partial summation then by the prime number theorem we have for (d, n) = 1 that,
X
m≡d
Λ(m)(n − m) =
m<n
(mod n)
n2
+ O n2 (log n)−A .
2ϕ(d)
The factor 12 comes from the integration and is the extra factor in the statement of the theorem.
The result now follows from the identity
S3 (n) =
X
(d,cn)=1
µ2 (d)dµ(c)
.
ϕ3 (d)ϕ2 (c)
Remark. As we can restate the prime number theorem in terms of π(x) rather than ψ(x), an
analogous statement for the number of representations can be deduced. This is a relatively simple
matter, and might be of interest to the reader that is bothered by the weighted way of counting in
our approach. However, we hope that the above approach seems more natural after our motivation.
The corresponding result is
Theorem 6. Let s(n) be the number of representations of n as the sum of three primes. Then for
any fixed A > 0, we have
s(n) =
1
n2
S3 (n)
+ O n2 (log n)−4 .
2
(log n)3
28
2.3
Conclusion
Summary. As we discussed, the fundamental function to consider in the setting of number theory
is the M¨
obius function. A thorough understanding of its properties might be the key to solving
some problems which seem hard to access by naive approaches. The need for a good way of
obtaining information on the behaviour of the M¨obius function announced itself.
The theory of Fourier transformation gives us a way to transform any arithmetic function under
consideration to a function on the unit circle. This function can then be studies using analysis, and
the result could be translated back using Fourier inversion. The M¨obius function is however not
behaved appropriately for our theory, forcing us to make some adjustments to this ideal strategy.
This leads us to estimating the exponential sum involving µ, which was done by Davenport. He
obtained that for any real α and A > 0, we have uniformly in α that
X
−A
µ(m)e2πiαm = O x (log x)
.
m≤x
We deduced crucial information on some sums involving the von Mangoldt function Λ. The reason
this was possible is because of the easy expression of the von Mangoldt function as a sum involving
µ. Note that an analogous connection would be harder to find or at least more complicated when
a function different from Λ was chosen to approach the Goldbach problem. Having obtained
information on various sums involving the von Mangoldt function, we deduced fairly painlessly a
proof of Vinogradov’s theorem, as well as a zero density result for the exceptional set consisting of
the counterexamples to the Goldbach conjecture. We conclude with some final remarks.
Remark. While we focused on the von Mangoldt function here, seen as our goal was the Goldbach
conjecture, it would have been equally possible to investigate other arithmetic functions that have
a definition involving µ (and indeed many have because of M¨obius inversion), hence obtaining
other interesting results.
Remark. A full proof of the ternary Goldbach conjecture is still not found. Even though we have
established the asymptotic result with Vinogradov’s theorem, the implied constant is enormous.
Assuming the GRH, Zinoviev et. al. [10], have given a full proof. The paper contains some
mistakes however, but as I heard they can be fixed. An unconditional proof remains as to date
unknown.
Remark. It might be interesting to reflect on why our method fails in proving the asymptotic
result for the binary Goldbach problem. We identified the dominant term in S2 (n) as S2 (n)n,
and hence obtaining any error term of smaller growth would solve our problem. The growth of
S2 is very mysterious, and it is clear that our rough upper bound n previously obtained is not
sharp, and is usually even hopelessly inaccurate. Our lower bound is however very sharp, due to
the small value taken when n is prime. In any case, an error term solving the problem would have
to be smaller than linear growth. The approach to the problem is complicated enough to obscure
29
the weaker points in our estimates, but a careful analysis shows that both our estimates for the
double sum in Λ0 and Λ∞ would have to improve. Obtaining better estimates is certainly a very
nontrivial task, since one would almost have to improve Davenport’s result of Theorem 1, and the
prime number theorem. Even though we admit that for the latter much better bounds are known,
the reader who believes that this is the way out is invited to try it for him/herself.
A new approach is needed to prove the conjectured result, if true. Since we based ourselves on
the heuristic involving the M¨
obius function, we made our cutoff at k accordingly. The nature of
this heuristic prescribes a natural cutoff for large values, but dictates no natural way of further
subdividing the regions, hence making our estimates sharper. However, an approach based on
a different heuristic or obtained inequality might lead to a different approach and a different
subdivision. It is however not clear what this might be.
For the ternary Goldbach problem, we have the luxury of an extra factor n in the dominant term.
This is a huge extra freedom which makes the result accessible for our obtained estimates, as we
saw. The result is now definitely settled, from a certain number N onwards. As quoted before,
15
Borodzkin [2] showed that N can be taken 33 . There are two ways to proceed towards a full
proof. One is to try and obtain better estimates so as to lower N (for example by proving the
GRH). Another is to improve our computational methods to check te remaining cases, so as to get
N within range. Both approaches are growing towards each other, and it seems the full proof is
just a matter of time.
Epilogue. This concludes our discussion of the Goldbach problem. Analytic number theory
certainly owes a lot of its development to this problem, and is now a flourishing branch of mathematics with many results and still many open problems. We hope this essay made the reader as
enthousiastic as it did the author while writing it.
30
Appendix A
The toolbox
Some theorems and techniques that have been used or refered to in the main text are presented
here. Proofs are omitted, but can be found in most texts on analysis or analytic number theory,
such as [1] and [8].
Summation by parts. Let f be an arithmetic function, and g : R+ → R be continuously
differentiable. Then we have

X
n≤x
f (n)g(n) = 

X
Z
x
f (n) g(x) −
1
n≤x


X

f (n) g 0 (t)dt.
n≤t
This is probably one of the most useful tools in analytic number theory that allows us to deform
obtained results to slightly different ones. The name comes from the analogy with the theorem of
integration by parts.
Poisson summation formula. Let f : R → R be monotonic in stretches, then
n−1
+∞ Z n
X
X
f (m) + f (n)
+
f (k) =
f (x)e2πikx dx.
2
m
k=m+1
k=−∞
In fact, this is reminiscent of the theory of Fourier transformation, see below. Indeed, this could be
restated in more general terms, but we choose to present it thus since it suffices for our applications,
and it is one of the most commonly used forms in number theory.
31
The Cauchy-Schwarz inequality I. Let u, v ∈ Cn be two complex vectors, then
kuk · kvk ≥ u · v.
This inequality is sometimes called the ’discrete version’ of the Cauchy-Schwarz inequality in this
essay. This is the version most high school students are acquainted with. The obvious generalisation
to arbitrary inner product spaces holds, and in fact the following version is nothing more than
another particular case. We have chosen to state both separately here. This allows us to refer
merely to the used version, while omitting explicit inequalities. The following version will be
referred to as the integral version of Cauchy-Schwarz, or merely Cauchy-Schwarz.
The Cauchy-Schwarz inequality II. Let f, g : R → C be square-integrable, then
Z
2 Z
Z b
b
b
2
2
f (x)g(x)dx ≤
|f (x)| dx ·
|g(x)| dx.
a
a
a
Dirichlet’s formula. We have that
X
√
τ (n) = x log x + (2γ − 1)x + O( x).
n≤x
The proof can be found in most undergraduate texts on analytic number theory, for example [1].
The problem of finding the minimal error term in the above theorem is a fascinating one with
many stronger results then the one stated here. More precisely, finding the infimum of θ such that
the error term in Dirichlet’s formula may be replaced with O(xθ ) is an unsolved problem.
Fourier analysis on Z and T. Although the theory of Fourier analysis is a vast one and
generalises greatly to arbitrary locally compact abelian groups, we will only state the results for Z
and T. This is because none of the results have been used, and this summary is only included to
provide a considerable piece of inspiration that motivates our interest in exponential sums. Since
our ultimate goal is doing number theory, we are not surprised that working in the group Z is most
useful to us.
For a function f : Z → C, we define the Fourier transform
fˆ(α) =
X
f (n)e−2πinα ,
n∈Z
32
as a formal series. It would be especially interesting should this define a function suitable for
analytic manipulation. Unfortunately, this is not always a convergent series, unless we restrict our
original function somewhat. Let S(Z) be the space of functions f : Z → C such that for all k we
have lim |n|k |f (n)| = 0. Define T to be the unit circle, and the space S(T) := C ∞ (T). A famous
|n|→∞
theorem in Fourier analysis now says that if f ∈ S(Z), then fˆ ∈ S(T). Hence to any well behaved
function on the integers, we can associate a function in C ∞ (T).
Not only can we associate such a function, we can also ’go back’. The Fourier inversion theorem
states that for any f ∈ S(Z), we have
1
Z
fˆ(α)e2πinα dα.
f (n) =
0
This could be considered a motivation for the exponential sums we considered in the text. Information about arithmetic functions could be obtained by investigating its Fourier transform with the
extensive toolbox of analysis (which is understood to be much more than the tools mentioned in
this appendix). One could then transform this information back to the original function by means
of the Fourier inversion theorem. Of course, one needs the arithmetic function under consideration
to be contained in S(Z) for this to work. This is unfortunately not always the case, notably the
M¨
obius function is not behaved as desired. There are various ways out, as we discussed and put
into practise in the text.
Parseval’s identity. This identity might be considered as a general form of Pythagoras’ theorem.
It applies in the general setting of separable Hilbert spaces, but for our goals the following form
will suffice. For any f ∈ L2 (T), set
1
Z
f (α)e−2πinα dα.
cn :=
0
Then we have the identity
X
n∈Z
2
Z
1
|cn | =
0
33
|f (α)|2 dα.
Appendix B
The M¨
obius function µ
However well concealed, the ideas of Davenport [3], using ideas of Vinogradov [9], using ideas
of Hardy and Littlewood [7], are fundamental. It seems all we did was deduce good estimates
from Davenport’s result, here labeled as Theorem 1. However plausible this theorem seems given
our heuristic about the M¨
obius function, the proof is not at all easy. The main objective of this
appendix is presenting the ideas for a proof of this result of [3]. We use more modern language,
and the ideas are taken from [8].
Theorem. For any real α, any A > 0 and x ≥ 2, we have
X
−A
µ(m)e2πiαm = O x (log x)
,
m≤x
uniformly in α.
Important remark. Mathematicians knew before the development of Vinogradov’s method that
a similar result was true for rational α. In fact, they knew that for any x ≥ 2 and A > 0, we have
for any (a, q) = 1 that
X
µ(m)e
2πiam
q
= O qx(log x)−A ,
(B.1)
m≤x
the implied constant only depending on A. We will need this result in the proof of Theorem 1, in
quite a crucial way. However, to prove it, we need some deep estimates of Dirichlet L-functions. An
inclusion of their vast and beautiful theory would certainly double the size of this essay. Therefore,
we will merely sketch the proof of B.1 and direct the interested reader (which is hopefully everyone)
to [8][Chapter 5].
34
One starts from the observation

X
m≤x
µ(m)e
2πiam
q
X µ(d)
=
ϕ( dq )
d|q
X
χ
(mod
τ (χ)χ(a) 
q
d)

X
µ(n)χχ0(d) (n) ,
n≤x
P
2πin
where the inner sum is over all Dirichlet characters, τ (χ) = n (mod q) χ(n)e q the Gauss sum,
and χ0(d) the
P principal Dirichlet character modulo d. We have hence reduced the problem to
estimating n≤x µ(n)χ(n) for a Dirichlet character modulo q.
We will show how an estimate for this last sum can be obtained from corresponding good estimates
for the Dirichlet L-function L(χ, s). Indeed, notice that we have (wherever everything converges,
including for example the region <(s) > 1)
1
L(χ, s)
=
Y
(1 − χ(p)p−s )
=
X
=
X
p
µ(n)χ(n)n−s
n
Z
+∞
µ(n)χ(n)s
n
n
∞
Z
P
n≤x
= s
µ(n)χ(n)
xs+1
1
dx
xs+1
dx
While this gives us the L-function in function of our sum, we can invert the relation using Perron’s
formula (see [4], [8] for instance) to obtain that for any real c for which L(χ, s) converges, we have
X
n≤x
µ(n)χ(n) =
1
2πi
Z
c+i∞
c−i∞
xt
dt.
L(χ, t)t
Note that in principle, we need x not to be integer for this to hold. If x happens to be integer,
the last term in the sum should be multiplied by 12 . If one has sufficiently good estimates for the
L-function, this is how an estimate for B.1 is deduced. This estimation can indeed be accomplished
sufficiently sharply in a zero-free region of this function. The precise details of the estimates for
this L-function can be found in [8][Chapter 5].
35
B.1
A proof of Davenport’s result
Proof sketch. The proof uses the fact that for any integer τ , there exist a, q with (a, q) = 1
1
and q ≤ τ such that α − aq ≤ qτ
. This can be proven using elementary properties of continued
fractions. We pick some τ to be specified later. The proof now splits up in two cases, according
to the value of q arising from our α. This is motivated by noting that the exponential sum will be
large when α is close to a rational number with small denominator.
B.1.1
The case
x
τ
<q≤τ
Let us start by observing that if we pick any y, z ≥ 1, then for any n > max {y, z} we have
µ(n) = −
X
X
µ(d1 )µ(d2 ) +
d1 ,d2
d1 d2 |n,d1 ≤y,d2 ≤z
µ(d1 )µ(d2 ).
d1 ,d2
d1 d2 |n,d1 >y,d2 >z
This identity follows from straightforward manipulations. Indeed, since
n = 1, we have
X
µ(n) =
P
d|n
µ(d) = 0 unless
µ(d1 )µ(d2 ).
d1 d2 |n
We let the ranges
P for d1 , d2 be split up by y, z respectively, giving us four summations. Now apply
the identity d|n µ(d) = 0 unless n = 1 again to get
X
µ(d1 )µ(d2 ) = −
d1 ,d2
d1 d2 |n,d1 ≤y,d2 >z
X
µ(d1 )µ(d2 ) =
d1 ,d2
d1 d2 |n,d1 ≤y,d2 ≤z
X
µ(d1 )µ(d2 ),
d1 ,d2
d1 d2 |n,d1 >y,d2 ≤z
from which the required identity follows. It allows us to rewrite the quantity under consideration:
X
µ(m)e2πiαm =
m≤x
−
X X
X
d1 ≤y d2 ≤z d1 d2 d3 ≤x
µ(d1 )µ(d2 )e2πiαd1 d2 d3 +
X X
X
d1 >y d2 >z d1 d2 d3 ≤x
36
µ(d1 )µ(d2 )e2πiαd1 d2 d3 +O (max {y, z}) ,
where the error term arises since our expression for µ(n) is only valid for n > max {y, z}. It will
turn out to be negligible. As we will see, the sums on the right hand side are both relatively easy
to estimate. The reduction to sums of this form is what is commonly referred to as Vinogradov’s
method. We shall discuss it in more general terms later. We now proceed to finding estimates for
the sums in the above expression.
We consider some general exponential sums, forgetting about our M¨obius functions for a while and
treating the problem slightly more generally. Notice that
X
e2πiαn =
n≤N
sin παN 2πiα N +1
2 .
e
sin πα
If we denote kαk for the distance of α to the nearest integer, we clearly obtain
X
1
2πiαn e
≤
min
N,
.
2kαk
n≤N
By a repeated series of occasionally clever applications of this fact, we obtain the two estimates
X X
x
2πiαmn e
=
O
M
+
+
q
log
2qx
,
q
x
m≤M n≤ m
and
X
u m vn e
2πiαmn
=O
m,n
mn≤x, m>M, n>N
x
x
x
+
+ +q
M
N
q
12
!
1
2
2
x (log x)
,
where u, v are any complex vectors such that |um | , |vn | ≤ 1. Details can be found in [8][Chapter
13], but no deep results are needed apart from some elementary combinatorial arguments.
We
these two results to the first and second term respectively in our expression for
P can apply2πiαm
µ(m)e
, obtaining that
m≤x
X
1 1
1
4 1
1
µ(m)e2πiαm = O (q 2 x 2 + q − 2 x + x 5 ) 2 x 2 (log x)4 .
m≤x
37
(B.2)
So far we have made no use of the specific case we are considering. Looking at our obtained error
1
term, we see it will give us the best result when q is relatively large, to keep the term q − 2 x inside
the square root under control. Indeed, specifying q to our considered range in this case, we obtain
X
1
1
4 1
1
µ(m)e2πiαm = O (2τ 2 x 2 + x 5 ) 2 x 2 (log x)4 .
m≤x
B.1.2
The case q ≤
x
τ
In this case α is close to a rational number with small denominator, making the sequence of
exponentials conspire with the M¨
obius function. This gives us the feeling our heuristic on the
M¨
obius function will not work, making a large contribution possible. We therefore expect this to
be the dominant contribution, and will probably need more machinery to give a sharp bound than
in the last section, which was essentially a sequence of elementary estimates.
We define S(x) :=
X
µ(n)e
2πina
q
. Now after writing α =
a
q
+ β, we obtain
n≤x
X
µ(n)e2πinα =
n≤x
X
(S(n) − S(n − 1)) e2πinβ .
n≤x
Rewriting a sum with an inner difference is usually done when one is hoping to apply partial
summation and hence reducing the question to a quantity which is easier to estimate. Indeed,
partial summation gives us that
X
2πinα
µ(n)e
= S(x)e
2πixβ
Z
−
x
S(t)2πiβe2πitβ dt.
1
n≤x
Calculating the integral explicitly seems hard. However, by picking 1 ≤ y ≤ x strategically, we
can make S(y) maximal, hence obtaining
x
X
2πx
2πinα |S(y)| ,
µ(n)e
≤ 1+
qτ
n=1
where we simplified by β ≤
we treated in B.1.
1
qτ .
We have reduced our task to estimating S(x), which is the case
38
Applying B.1 to our estimate in the first paragraph, we obtain that
X
µ(m)e2πiαm = O
q+
m≤x
x
x(log x)−5A ,
τ
which simplifies in our case, using q ≤ xτ , to
X
µ(m)e
2πiαm
=O
m≤x
B.2
x2
−5A
(log x)
.
τ
Discussion of Vinogradov’s method
Conclusion of the proof. In both cases, we get the estimate
X
µ(m)e
2πiαm
=O
m≤x
1
3
9
x2
−5A
4
4
4
4
10
(log x)
+ τ x (log x) + x (log x) .
τ
−4A
We conclude by picking the value τ = x (log x)
x ≥ 2, we have our desired result
X
, obtaining that for any real α, any A > 0 and
−A
µ(m)e2πiαm = O x (log x)
.
m≤x
The uniformity in α is a wonderful feature of this result. Unfortunately, some omitted steps in the
above proof obscure this, but full details can be found in [8][Chapter 13].
Remark. Notice the difference in techniques used in both cases. The first case gave us a small
contribution, a feature which is reflected in the fact that very elementary and straightforward
estimates suffice. Indeed, it was only in the second case, the dominant contribution, that we
resorted to using a deeper theorem on L-functions.
Vinogradov’s method. It should also be noted that the above method generalises considerably,
and the general method is commonly called Vinogradov’s method. It is however hard to describe
the general line of attack in this method, since its appearance varies greatly when applied to
different problems. However, there seems to be a pattern present in many applications which
could be described thus: We have a natural interest in prime numbers, and in this branch of
39
mathematics specifically in sums over primes. These sums are closely related to sums over M¨obius
functions, where the connection goes by the name of the von Mangoldt function. We have seen
this process in the main text and will meet it again later when we investigate the prime number
theorem.
Vinogradov’s method could be seen as a way of making our heuristic of the M¨obius randomness
rigorous. The method developed successfully applies to a wide range of functions, and shows its
orthogonality to the M¨
obius function. This is often done roughly as follows. Suppose we want to
bound
X
µ(n)an ,
n≤x
that is, trying to find out how orthogonal an is to the M¨obius function. By combinatorial arguments, often analogous to the ones used above, we split the sum into several smaller sums of the
form
S1 :=
X
um amn
and
m,n
S2 :=
X
vm wn amn ,
m,n
where the sum runs over suitable values of m and n. These two types of sums are usually referred to
as Type I/II sums. The point of this rearrangements is that sums of these types can be estimated
successfully. Typically, one estimates Type II sums using some more general estimates for bilinear
forms, much like the ones used in the above proof. The details take on many different forms, but a
possibility might be to split the Type II sums further, letting the variables range over, say, dyadic
segments. (See [6][Chapter 4] or [8][Chapter 13] for two examples) The Cauchy-Schwarz inequality
allows one to eliminate coefficients. The generality of this description is in accordance to its wide
applicability. The above proof can be seen as a concrete example of this general description, and
another stunning example can be found in [6][Chapter 4].
B.3
Discussion of importance
The importance of Theorem 1 can hardly be underestimated. The randomness of the M¨obius
function is fundamental to number theory, as we will see from its impressive consequences. As we
have seen before it also produces estimates strong enough for proving Vinogradov’s theorem and
’almost’ establishing the binary Goldbach conjecture. Another consequence we wish to treat is the
prime number theorem, which is essentially equivalent to the previous result with α = 0 in the
form we will consider now.
40
Corollary 4. Let ψ(x) =
X
Λ(n), then for any A > 0, we have
n≤x
ψ(x) = x + O x(log x)−A .
Remark. The reader might object to deducing the prime number theorem from Davenport’s
result. If one is prepared to use results on Dirichlet L-functions as the ones quoted in the discussion
of estimate B.1, then we are virtually using the prime number theory in the proof. Indeed, the
situation should be investigated more closely to find out exactly how equivalent the properties
of the L-functions are to the prime number theorem. In any case, we have chosen to present it
this way, since we feel it underlines the fundamentality of the M¨obius randomness more. Indeed,
Theorem 1 for α = 0 is essentially equivalent to the prime number theorem, and it makes the mind
wonder about what other deep results might correspond to different values of α. Moreover, all
these values of α give the same error term since we have uniformity.
Proof. Let us call
γ = lim
1+
n→+∞
the Euler constant. Since τ (n) =
P
d|n
1
1
+ . . . − log n
2
n
1, we have (M¨obius inversion) that
1=
X
µ(d)τ
d|n
Also remember that
P
d|n
n
d
.
µ(d) = 0 unless n = 1. This allows us to write
bxc − ψ(x) − 2γ
=
XX
n≤x d|n
=
X
n
n
µ(d) τ
− log − 2γ
d
d
µ(d) (τ (d0 ) − log(d0 ) − 2γ) .
dd0 ≤x
To bound this sum, set f (n) = (τ (n) − log(n) − 2γ), so we can restate our reduced problem to
finding an estimate for
X
µ(d)f (d0 ).
dd0 ≤x
41
Notice that
X
f (n) =
n≤x
X
τ (n) −
n≤x
X
√
log n − 2γbxc = O( x),
n≤x
by Dirichlet’s formula and the easily verified
P
n≤x
log n = x log x − x + O(log x).
We have information about the partial sums of µ (by Theorem 1) and f by the above, so a
rearrangement in terms of partial sums of these functions would allow us to give an estimate. We
can in fact rearrange in the following fashion
X
µ(d)f (d0 ) =
dd0 ≤x
X
µ(n)
X
f (m) +
x
m≤ n
n≤a
X
X
f (n)
µ(m) −
x
m≤ n
n≤b
X
f (n) ·
n≤a
X
µ(n),
n≤b
for any postive a, b such that ab = x. This is √
an easily verified
rearrangement,
√ which is exactly
what we are looking for. The first term is O xa(log a)−A , the last one O ab(log b)−A . To
estimate the second term, we note that by partial summation
X f (n)
n≤b
We see now that a = b =
√
n
− 12
=O b
Z
−
!
b
t
− 23
dt
1
= O b− 2 .
1
x seems to give the optimal bound. This shows that
X
µ(d)f (d0 ) = O x(log x)−A ,
dd0 ≤x
for any A > 0, as required.
42
Bibliography
[1] T. Apostol, Introduction to Analytic Number Theory, Springer-Verlag (1976).
[2] K.G. Borodzkin, On I. M. Vinogradov’s constant, Proc. 3rd All-Union Math. Conf., vol. 1,.
Izdat. Akad. Nauk SSSR, Moscow (1956).
[3] H. Davenport, On some infinite series involving arithmetical functions. II, Quart. J. Math.
Oxf. 8 (1937), 313–320.
[4] H. Davenport, Multiplicative number theory, Springer-Verlag, Third edition (2000).
[5] B.J. Green and T. Tao, The M¨
obius function is strongly orthogonal to nilsequences, Annals of
Math., to appear.
[6] B.J. Green, Additive number theory, Part
http://www.dpmms.cam.ac.uk/ bjg23/ANT.html
III
lecture
notes,
available
at
[7] G. H. Hardy and L. E. Littlewood, Some problems of ‘Partitio Numerorum’. III: On the
expression of a number as a sum of primes, Acta Mathematica 44 (1922), 1–70.
[8] H. Iwaniec and E. Kowalski, Analytic number theory, AMS Colloquium publications, 53
(2004).
[9] I. M. Vinogradov, Representation of an odd number as a sum of three primes, Comptes Rendues (Doklady) de l’Academy des Sciences de l’USSR 15 (1937), 191–294.
[10] Deshouillers, Effinger, Te Riele and Zinoviev, A complete Vinogradov 3-primes theorem under
the Riemann hypothesis, Electronic Research Announcements of the American Mathematical
Society, 3 (1997), 99-104.
43