Ehrenfest Urn Problem with Applications

Daniel J. Castellano

December 18, 1997

1 Introduction

The Ehrenfest urn problem was originally proposed as a model for dissipation of heat, but has since come to be applied in a wide variety of fields, thanks in part to generalizations and variations of the problem, and also, no less importantly, to visualizing the exact original problem in a different light. To illustrate, we will consider specific examples and analyze them both algebraically and geometrically, and also use physical analogies to demonstrate the relevance of this model to the pure sciences.

First, we shall consider the original problem as formulated by the Ehrenfests, and derive some of its basic results and implications. We will then recast the problem as a random walk on a hypercube, and consider it yet again as a Markov chain. Finally, we will look at a couple of variations of the problem, checking that the results of the Ehrenfest model are confirmed, and then expound further physical implications of these modified urn problems.

2 The original Ehrenfest urn model

Consider two urns A and B. Urn A contains N marbles and Urn B contains none. The marbles are labelled 1,2,...N. In each step of the algorithm, a number between 1 and N is chosen randomly, with all values having equal probability. The marble corresponding to that value is moved to the opposite urn. Hence the first step of the algorithm will always involve moving a marble from A to B.

What will the two urns look like after k steps? If k is sufficiently large, we may expect the urns to have equal populations, as the probabilities of drawing a marble from A or from B become increasingly similar. After the first step, for example, the odds of putting the marble in Urn B back into A is 1/N. Going a step further, the probability of Urn B having three marbles after three steps is N−1/N⋅ N−2/N. For N large, this is a very high probability, but as we multiply repeatedly by N−k/N for increasingly large k, the probability of moving marbles from A to B diminishes over time, as we would expect intuitively.

States in which one urn has many more marbles than the other may be said to be unstable, as there is an overwhelming tendency to move marbles to the urn that contains fewer. As the populations equalize, the transition probabilities from A to B and B to A approach each other, creating a stable, or stationary, state of roughly equal populations thereafter.

These qualitative results are in perfect accordance with familiar concepts in thermodynamics. Two bodies of different temperatures eventually approach the same intermediate temperature when in contact with each other for an extended period of time. Since thermal motion is random, we would expect an urn model which gives equal probabilities to each unit of heat transfer to be the most appropriate. To proceed to more quantitative results, it is helpful to recast the problem in a more convenient form.

2.1 Random walk on a hypercube

Each of the N marbles may be in one of 2 states: it is in A or it is in B. In each step of the algorithm we reverse the state of one of the marbles, an action which is independent of the states of the other marbles, hence the states are orthogonal. If we consider each marble to correspond to a coordinate axis, and states A and B are treated as values 0 and 1, then the state of the system at any point in time may be expressed as the vertex of an N-dimensional unit hypercube. Executing a step in the algorithm is analogous to moving along an edge to an adjacent vertex. Our condition that all marbles be chosen with equal probability means that each edge connected to the vertex has equal chance of being traversed. What we have is a uniform random walk on the hypercube.

The state of the system may be expressed as a binary N-tuple if we align the edges of the hypercube with the standard basis in ℜ^N. These states form a group under binary vector addition. We may define a probability measure P on this space which describes the first step of the algorithm. This would be P(x)=1/N if x=e_i where e_i is the ith basis vector. However, to remove parity problems arising from an odd number of steps, it is better to use P(x)=1/N+1 if x=e_i or 0. (In both cases, P is zero-valued elsewhere, of course.)

As in all random walk problems, further steps are expressed by convoluting the probability function. The second step is

P^{∗ 2}(x)=ΣP(x−y)P(y) (1)

where P(y) is the probability distribution of the first step. For the kth step, we have:

P^{∗ k}=P∗ P^{∗ k−1} (2)

As k approaches infinity, this expression converges to the uniform distribution. That is, after a sufficiently long random walk, each vertex on the hypercube has an equal probability of being the walker's location. Thus the expectation value of the sum of the coordinates of the corresponding N-tuple is N/2. This reaffirms our thermodynamic model.

An interesting question is that of how big k must get in order to have a uniform distribution. The transition from an unstable to a stable configuration is actually quite dramatic. For k greater than N/4log N, the distribution rapidly approaches uniformity. To show that P^{∗ k} tends to uniformity for this value, a lengthy proof involving Fourier transforms is required.¹ (Taking the Fourier transform of a convolution gives a normal product.)

2.1.1 Electric model

Random walks may also be considered treating each edge as a resistor of unit resistance. A linear random walk, for instance, could never go off toward infinity because the resistors add in series to give infinite resistance (a random walk in a plane, however is a different story, both electrically and mathematically).

Our hypercube may be thought of us as a highly symmetric parallel circuit between the origin and the opposite vertex. Using this method gives some interesting results. For example, the expectation value of the number of steps it takes for an urn lacking one marble to get the last one is 2^N−1!²

2.2 The urn problem as a Markov chain

Discussion to this point has made the problem a bit more complicated than it needs to be, as far as physical considerations are concerned. For one thing, we have 2^N possible states, but many of these are degenerate. If we are considering the diffusion of gases, for example, with each molecule acting as a marble (so N is on the order of Avogadro's number), we really don't care which molecule goes where, but only the relative quantities in each region. So our hypercube collapses considerably, with vertices that have the same coordinate sum collapse into a single vertex. The number of edges should remain the same, in order to preserve the correct relative probabilities, and it is easy to see that geometrically this becomes something of a mess, even though our new graph has many fewer vertices than the hypercube.

It may be advisable to abandon our geometric model at this point, and consider instead a further simplification. The random walk model expresses each state as being determined by all previous states. This is a consequence of labelling marbles only in the initial state. If instead, we relabel the marbles after each step, flaunting our disconcern for the identity of each marble, the state of the system after k steps is only determined by the state after k−1 steps. Instead of 2^N states, we have only N+1 states (0, 1,...N marbles in Urn A). This is a two-state Markov chain, since each step can only increase or decrease the number of marbles in A by 1. With each state being determined only by the previous state, we need only concern ourselves with the probability of transition from one state to the next. These are easily computed:

P(i→ i+1)=

N−i

(3)

P(i→ i−1)=

(4)

with zero probability for all other states.

As in most Markov chain problems, this has a variation cutoff; in other words, a sharp transition to a stationary state. This cutoff is at N/4 log N, as before. The cutoff phenomenon was made mathematically precise by Diaconis and Aldous (1987), and led to the result that a 52-card deck needs only 7 shuffles to achieve uniform mixing. Here we have two-way mixing; we exchange the order of cards in a single switch rather than moving a card from one pile to another, as in the Ehrenfest model. Thus the cutoff is only at N/2log N, which for N=52 gives 102.7. At 13 switches per shuffle (26 pairs, half of which are switched), we have a maximum of 7.9 shuffles (in practice, there are slightly more switches per shuffle).

3 Variations of the Ehrenfest model

The treatment of urn problems as Markov chains lends itself readily to generalizations and modifications of the model. We will consider a couple of these below.

3.1 The Krafft-Schaefer generalization⁴

Fundamental to the Ehrenfest model is the condition that marbles be chosen with equal probability. However, the properties of the urns (e.g., permeability) may be generalized as follows. If the marble chosen is in Urn A, then it will be placed in B with probability s; otherwise, it remains in A. Similarly, a marble in Urn B is moved to A with conditional probability t. The original model may be retrieved by simply setting s=t=1. The transition probabilities for the Markov chain are (zero otherwise):

P(i→ i+1)=(1−

)s (5)

P(i→ i−1)=

t (6)

P(i→ i)=

(1−s)N+i(s−t)

(7)

For the one parameter case s=t, we have:

P(i→ i+1)=(1−

)s (8)

P(i→ i−1)=

d (9)

P(i→ i)=1−s (10)

In the two parameter case, the matrix of transition probabilities has N+1 distinct eigenvalues λ_j=1−2j/N, where j=0, 1,…, N.

3.2 Uppuluri-Wright variation⁶

In another twist of the urn problem, we have a single urn with w₀ white balls and b₀ black balls. Once again, balls are chosen at random with equal probability. If the ball is white, it is replaced with a black ball with probability α₁; otherwise, the system is unchanged. Similarly, if the ball is black, it becomes white with probability α₂. This model is probabilistically similar to the Krafft-Schaefer model, but it is presented in such a way that lends itself to explicit analysis. The expectation values of the number of black and white balls after k steps is

μ_k=[I+

A]^kμ₀ (11)

where A=

⎛
⎝

α₁	α₂
α₁	−α₂

⎞
⎠

and μ₀=(w₀ b₀)^t. The matrix exponential can be evaluated using Blatz's (1968) result:⁷

Mⁿ=

λ₁ⁿ−λ₂ⁿ

λ₁−λ₂

M−

λ₁ⁿ⁻¹−λ₂ⁿ⁻¹

λ₁−λ₂

λ₁λ₂I (12)

Thus we have an explicit computation of the probability of the final state from an arbitrary initial state without anything more involved than finding the eigenvalues of a two by two matrix and raising them to the kth power. For the classical Ehrenfest model, α₁=α₂=1, and I've computed the eigenvalues of I+1/NA to be -1, -(2/N)-1.⁸

4 Conclusion

It took some doing, but we finally arrived at a completely generalized formula that solves the Ehrenfest urn problem explicitly. More importantly, we've covered various geometric and physical interpretations of the problem which have rendered certain aspects of it more intelligible and at the same time illustrated the applications of the model to other fields. This is but one urn model among many, and one of the simplest ones at that. Nonetheless, it has continued to provoke significant developments even in recent years, exposing some highly fertile ground in combinatorics and other applied mathematics.

1: Such a proof is given in Diaconis, P. (1991). Finite Fourier Methods: Access to Tools, Proceedings of Symposia in Applied Mathematics, 44, 174-175.
2: This, and more general results, may be found in Palacios, J. L. (1994). Another look at the Ehrenfest urn via electric networks, Advances in Applied Probability, 26, 820-824.
3: Krafft, O. and Schaefer, M. (1993). Mean passage times for triangular transition matrices and a two parameter Ehrenfest urn model, Journal of Applied Probability, 30, 964-970.
4: Krafft, O. and Schaefer, M. (1993). Mean passage times for triangular transition matrices and a two parameter Ehrenfest urn model, Journal of Applied Probability, 30, 964-970.
5: Uppuluri, V. R. R. and Wright, T. (1981). A note on a further generalization of the Ehrenfest urn model, Proceedings of the American Statistical Association, Sampling Survey Section, 564-569.
6: Uppuluri, V. R. R. and Wright, T. (1981). A note on a further generalization of the Ehrenfest urn model, Proceedings of the American Statistical Association, Sampling Survey Section, 564-569.
7: Blatz, P. J. (1968). On the arbitrary power of an arbitrary 2 x 2 matrix, American Mathematical Monthly, 75, 57-58. 10 Apr 2020: Corrected journal's typo in last exponent, thanks to Aldo Boiti.
8: Notes attached.

This document was translated from L^AT_EX by H^EV^EA.

Ehrenfest Urn Problem with Applications

Daniel J. Castellano

December 18, 1997

1 Introduction

2 The original Ehrenfest urn model

2.1 Random walk on a hypercube

2.1.1 Electric model

2.2 The urn problem as a Markov chain

3 Variations of the Ehrenfest model

3.1 The Krafft-Schaefer generalization4

3.2 Uppuluri-Wright variation6

4 Conclusion

3.1 The Krafft-Schaefer generalization⁴

3.2 Uppuluri-Wright variation⁶