I’ve just uploaded to the arXiv my paper “Failure of the pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map
on a probability space
, then for any
, the averages
converge pointwise almost everywhere. (In the important case when the shift map
is ergodic, the pointwise limit is simply the mean
of the original function
.)
The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group on two generators.
Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for -actions
on probability spaces
. For instance, for the spherical averaging operators
(where denotes the length of the reduced word that forms
), they showed that
converged pointwise almost everywhere provided that
was in
for some
. (The need to restrict to spheres of even radius can be seen by considering the action of
on the two-element set
in which both generators of
act by interchanging the elements, in which case
is determined by the parity of
.) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition
to the weaker condition
.
The question remained open as to whether the pointwise ergodic theorem for -actions held if one only assumed that
was in
. Nevo and Stein were able to establish this for the Cesáro averages
, but not for
itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on
, but due to the non-amenability of
, this inequality did not transfer to
and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.
Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our
maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an
ergodic theorem for iterates
of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the
setting, thus settling the problem in the negative:
Theorem 1 (Failure of
pointwise ergodic theorem) There exists a measure-preserving
-action on a probability space
and a non-negative function
such that
for almost every
.
To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator on a probability space
and a non-negative
such that
for almost every
. By some standard manipulations, it suffices to show that for any given
and
, there exists a self-adjoint Markov operator
on a probability space
and a non-negative
with
, such that
on a set of measure at least
. Actually, it will be convenient to replace the Markov chain
with an ancient Markov chain
– that is to say, a sequence of non-negative functions
for both positive and negative
, such that
for all
. The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the
version of the argument can be run using infinitely ancient chains.)
For any , let
denote the claim that for any
, there exists an ancient Markov chain
with
such that
on a set of measure at least
. Clearly
holds since we can just take
for all
. Our objective is to show that
holds for arbitrarily small
. The heart of Ornstein’s argument is then the implication
for any , which upon iteration quickly gives the desired claim.
Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain on some probability space
of total mass
, such that
attains the value of
or greater almost everywhere. Assuming that the Markov process is irreducible, the
will eventually converge as
to the constant value of
, in particular its final state will essentially stay above
(up to small errors).
Now suppose we duplicate the Markov process by replacing with a double copy
(giving
the uniform probability measure), and using the disjoint sum of the Markov operators on
and
as the propagator, so that there is no interaction between the two components of this new system. Then the functions
form an ancient Markov chain of mass at most
that lives solely in the first half
of this copy, and
attains the value of
or greater on almost all of the first half
, but is zero on the second half. The final state of
will be to stay above
in the first half
, but be zero on the second half.
Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves ,
of the system (I mentally think of
and
as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain
in the previous example gets replaced by a slightly different ancient Markov chain
which is more or less identical with
for negative times
, or for bounded positive times
, but for very large values of
the final state is now constant across the entire state space
, and will stay above
on this space.
Finally, we consider an ancient Markov chain which is basically of the form
for some large parameter and for all
(the approximation becomes increasingly inaccurate for
much larger than
, but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces
, but with the second copy delayed by a large time delay
, and also attenuated in amplitude by a factor of
. The total mass of this process is now
. Because of the
component of
, we see that
basically attains the value of
or greater on the first half
. On the second half
, we work with times
close to
. If
is large enough,
would have averaged out to about
at such times, but the
component can get as large as
here. Summing (and continuing to ignore various epsilon losses), we see that
can get as large as
on almost all of the second half of
. This concludes the rough sketch of how one establishes the implication (1).
It was observed by Bufetov that the spherical averages for a free group action can be lifted up to become powers
of a Markov operator, basically by randomly assigning a “velocity vector”
to one’s base point
and then applying the Markov process that moves
along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from
to
). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of
systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with
-measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case
) comes from basically considering the action of
on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time
, and only reaches the boundary of this ball at the time
.