Numerics: Welford's online algorithm
For improved floating point behavior for the computation of variances

<2020-12-16 Wed 11:52> Add historical classification for VEGAS and Welford's online algorithm.
<2020-12-08 Tue 10:22>

I worked on a mock implementation of VEGAS in Python to find ways for improving VEGAS' performance using vectorization, which I could lay out later in a Fortran implementation. However, I stumbled over some lines of source code in the VEGAS C-implementation of the GNU scientific library, whose implementation I've used as reference.

Particularly, I looked at the implementation of the sampling loop:¹

volatile double m = 0, q = 0;
for (k = 0; k < calls_per_box; k++)
  {
    volatile double fval;
    /* sample integrand */
    {
      double d = fval - m;
      m += d / (k + 1.0);
      q += d * d * (k / (k + 1.0));
    }
    /* stratified sampling: store contribution */
  }
intgrl += m * calls_per_box;
f_sq_sum = q * calls_per_box;

These few lines can be summed up as: The implementation loops over the integration grid using a superficial stratification (not shown). Each strata is sampled calls_per_box times, and we keep the mean and variance of the integrand for each strata. Using those means and variances, we compute the overall integral and variance in the end. However, I want to focus on the calculation of the means and variances for each strata: Welford's online algorithm for computing the variance. The respective lines of source code are:

volatile double m = 0, q = 0;
/* … */
{
  double d = fval - m;
  m += d / (k + 1.0);
  q += d * d * (k / (k + 1.0));
}

As usual - lo and behold the sass - comments are an unnecessary burden, and therefore, are omitted; the interested reader is expected to perform the usual reverse engineering. Sigh.

I think a senior programmer involved with scientific code development would recognize the above pattern directly. However, it took me quite some time to wrap my mind around those few lines of code. Despite the fact that I was already familiar with Welford's online algorithm,² I did not catch the algorithm directly. I only knew it in another form, which Donald Knuth presents in his second volume of The Art Of Computer Programming without him giving a proof of his version of the online algorithm, however, he references Welford's original publication.³ And, of course, neither Wikipedia nor The Art Of Computer Programming had been around in those days, when the first version of the VEGAS implementation was published. However, Welford's original publication was, thus, I used it to reproduce the underlying recurrence relations of the online algorithm

In the following, I outline the derivation of the recurrence relations for the mean and variance along the lines of Welford's original publication, in order to cast them in the end into Python code. First, let me give the formulas for the sample mean and the sum of squares:

\begin{align} \bar{x}_n & = \sum_{i = 1}^n \frac{x_i}{n}, \\ S_n & = \sum_{i = 1}^n (x_i - \bar{x}_n)^2. \end{align}

We can easily derive a recurrence relation for the sample mean:

\begin{equation} \bar{x}_n = \sum_{i = 1}^{n - 1} \frac{x_i}{n} + \frac{x_n}{n} = \frac{n - 1}{n} \bar{x}_{n - 1} + \frac{x_n}{n} = \bar{x}_{n - 1} + \frac{x_n - \bar{x}_{n - 1}}{n}. \end{equation}

For the sum of squares, we introduce two identities:

\begin{align} i < n:& & x_i - m_n & = x_i - \frac{n - 1}{n} \bar{x}_{n - 1} - \frac{x_n}{n} = x_i - \bar{x}_{n - 1} - \frac{1}{n} (x_n - \bar{x}_{n - 1}), \\ i = n:& & x_n - \bar{x}_n & = \frac{n - 1}{n} (x_n - \bar{x}_{n - 1}). \end{align}

And, we find for the sum of squares using the above identities the following recurrence relation,

\begin{equation} \begin{split} S_n & = \sum_{i = 1}^n (x_i - \bar{x}_n)^2 \\ & = \sum_{i = 1}^{n - 1} (x_ i - \bar{x}_n)^2 + (x_n - \bar{x}_n)^2 \\ & = \sum_{i = 1}^{n - 1} \left( (x_i - \bar{x}_{n - 1}) - \frac{1}{n} (x_n - \bar{x}_{n - 1}) \right)^2 + \frac{(n - 1)^2}{n^2} (x_n - \bar{x}_{n - 1})^2 \\ & = \sum_{i = 1}^{n - 1} \left( (x_i - \bar{x}_{n - 1})^2 + \underbrace{\frac{1}{n^2}(x_n - \bar{x}_{n - 1})^2}_{\rightarrow (n - 1) \times …} - \frac{2(x_n - \bar{x}_{n - 1})}{n} \underbrace{\sum_{i = 1}^{n - 1} (x_i - \bar{x}_{n - 1})}_{= 0} \right) + \frac{(n - 1)^2}{n^2} (x_n - \bar{x}_{n - 1})^2 \\ & = \sum_{i = 1}^{n - 1} (x_i - \bar{x}_{n - 1})^2 + \underbrace{\left( \frac{n - 1}{n^2} + \frac{(n - 1)^2}{n^2}\right)}_{= \frac{(n - 1)}{n}} (x_n - \bar{x}_{n - 1})^2 \\ & = S_{n - 1} + \frac{(n - 1)}{n} (x_n - \bar{x}_{n - 1})^2. \end{split} \end{equation}

The unbiased sample variance is then given by

\begin{equation} s_n = \frac{S_n}{n - 1}. \end{equation}

Last, but not least, we cast these recurrence relations into Python code:

m = 0
v = 0
n = 0
def update_mean_and_variance(val, m, s, n):
    Δ = val - m
    n += 1
    m += Δ / n
    s += Δ * Δ * (n - 1) / n
    return m, s, n

And, that is actually what happens in the VEGAS implementation, the original recurrence relations by Welford.

Sidenote:

\begin{equation} \sum_{i = 1}^{n - 1} (x_i - \bar{x}_{n - 1}) = (n - 1) \bar{x}_{n - 1} - (n - 1) \bar{x}_{n - 1} = 0. \end{equation}

Footnotes:

https://www.gnu.org/software/gsl/doc/html/montecarlo.html#vegas

Wikipedia provides the original prescription by Welford and from Donald Knuth's The Art Of Computer Programming II.

B. P. Welford (1962) Note on a Method for Calculating Corrected Sums of Squares and Products, Technometrics, 4:3, 419-420, DOI: 10.1080/00401706.1962.10490022

Numerics: Welford's online algorithm For improved floating point behavior for the computation of variances

Footnotes:

Numerics: Welford's online algorithm
For improved floating point behavior for the computation of variances