Analysis of Stochastic Mux

Multiplexing objects (mux for short, muxen for plural) allow multiple Streamer objects to be combined into a single stream by selectively sampling from each constituent stream. The different kinds of mux objects provide different behaviors, but among them, and among them, StochasticMux is the most complex. This section provides an in-depth analysis of StochasticMux’s behavior.

Stream activation and replacement

StochasticMux differs from other muxen (ShuffledMux, RoundRobinMux, etc.) by maintaining an active set of streamers from the full collection it is multiplexing. At any given time, samples are drawn only from the active set, while the remaining streamers are inactive. Each active streamer is limited to produce a (possibly random) number of samples, after which, it is removed from the active set and replaced by a new streamer selected at random; hence the name StochasticMux.

A key quantity to understand when using StochasticMux is the streamer replacement rate: how often should we expect streamers to be replaced from the active set, as a function of samples generated by the mux? This quantity is important for a couple of reasons:

  • If we care about the distribution of samples produced by StochasticMux being a good approximation of what you would get if all streamers were active simultaneously (i.e., ShuffledMux behavior), then the streamer replacement rate should be small.

  • If we have large startup costs involved with activating a streamer (e.g., loading data from disk), then streamer replacement should be infrequent to ensure high throughput. What’s more, replacement events should be spread out among the active set, to avoid having several replacement events in a short period of time.

In the following sections, we’ll analyze replacement rates for the different choices of rate distributions (constant, poisson, and binomial). We’ll focus the analysis on a single (active) streamer at a time. The question we’ll analyze is specifically: how many samples \(N\) must we generate (in expectation) before a specific streamer is deactivated and replaced? Understanding the distribution of N (its mean and variance) will help us understand how often we should expect to see streamer replacement events.

Notation

Let \(A\) denote the size of the active set, let \(r\) denote the number of samples generated by a particular streamer, and let \(p\) denote the probability of selecting the active streamer in question. We’ll make the simplifying assumption that the weights attached to all streamers are uniform, i.e., \(p = 1/A\).

Constant distribution

When using the constant distribution, the sample limit \(r\) is fixed in advance. Our question about the number of samples generated by StochasticMux can then be rephrased slightly: how many samples \(K\) must we draw from all other active streamers before drawing the \(r\)th sample from the streamer under analysis?

This number \(K\) is a random variable, modeled by the negative binomial distribution:

\[\text{Pr}[K = k] = {k + r - 1 \choose k} {(1-p)^k p^r}\]

It has expected value

\[\text{E}[K] = r \times \frac{1-p}{p},\]

and variance

\[\text{Var}[K] = r \times \frac{1-p}{p^2}.\]

The total number of samples produced by the mux before the streamer is replaced is now a random variable \(N = K + r\). We can use linearity of expectation to compute its expected value as

\[\text{E}[N] = \text{E}[K] + r = r \times\frac{1-p}{p} + r = \frac{r}{p}.\]

Since \(N\) and \(K\) differ only by a constant (\(r\)), they have the same variance:

\[\text{Var}[N] = \text{Var}[K].\]

If we apply the simplifying assumption that streamers are selected uniformly at random (\(p = 1/A\)), then we get the following:

  • \(\text{E}[N] = r \times A\), and

  • \(\text{Var}[N] = r \times A \times (A-1)\).

In plain language, this says that the streamer replacement rate scales like the product of the size of the active set and the number of samples per streamer. Making either of these values large implies that we should expect to wait longer to replace an active streamer. However, the variance of replacement event times is approximately quadratic in the size of the active set. This means that making the active set larger will increase the dispersion of replacement events away from the expected value.

Poisson distribution

In pescador version 2 and earlier, the sample limit \(r\) was not a constant value, but a random variable \(R\) drawn from a Poisson distribution with rate parameter \(\lambda\). In this case, the mean and variance of \(R\) are simply \(\text{E}[R] = \text{Var}[R] = \lambda\).

However, this does not lead to a closed form expression for \(\text{E}[N]\) or \(\text{Var}[N]\) because we must now marginalize over \(R\):

\[\begin{split}\text{Pr}[N=n] &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r, R = r]\\ &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r ~|~ R = r] \times \text{Pr}[R=r]\\ &= \sum_{r=0}^{\infty} {n - 1 \choose n-r} {(1-p)^{n-r} p^r} \times \frac{\lambda^r e^{-\lambda}}{r!}\end{split}\]

While this distribution is still supported, it has been replaced as the default by a binomial distribution mode which is more amenable to analysis.

Note

In pescador ≥ 3.0, poisson mode is actually implemented as \(R \sim 1 + \text{Poisson}(\lambda - 1)\). This maintains the expected value of \(\lambda\), with slightly reduced variance (\(\lambda - 1\)), but it ensures that at least one sample is produced from active streamers before deactivation.

This logic is also applied to the binomial mode described below, but omitted from the analysis here for simplicity.

Binomial distribution

In the binomial distribution mode, \(R\) is a random variable governed by a binomial distribution with parameters \((m, q)\):

\[\text{Pr}[R=r] = {m \choose r} q^r (1-q)^{m-r}.\]

(We will come back to determining values for \((m, q)\) later.)

This distribution can be integrated with the negative binomial distribution above to yield a straightforward computation of \(\text{Pr}[N]\).

\[\begin{split}\text{Pr}[N=n] &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r ~|~ R= r] \times \text{Pr}[R=r]\\ &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \times {m \choose r} q^r {(1-q)}^{m-r}.\end{split}\]

If we set \(q = 1-p\), this simplifies as follows:

\[\begin{split}\text{Pr}[N=n] &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \times {m \choose r} {(1-p)}^r p^{m-r}\\ &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^n p^m {m \choose r}\\ &= {\left(1-p\right)}^n p^m {n + m - 1\choose n}.\end{split}\]

This distribution again has the form of a negative binomial with parameters \((m, 1-p)\). If we further set

\[m = \frac{\lambda}{1-p}\]

for an expected rate parameter \(\lambda > 0\) (as in the Poisson case above), then the distribution \(\text{Pr}[N=n]\) is

\[N \sim \text{NB}\left(\frac{\lambda}{1-p}, 1-p\right),\]

where NB denotes the probability mass function of the negative binomial distribution. This yields:

  • \(\text{E}[R] = \lambda\): each streamer generates \(\lambda\) samples on average,

  • \(\text{Var}[R] = \lambda \times p\),

  • \(\text{E}[N] = \lambda / p\), and

  • \(\text{Var}[N] = \lambda \frac{1-p}{p^2}\).

These match the analysis of the constant-mode case above, except that the number of samples per streamer is now a random variable with expectation \(\lambda\). Again, in the special case where \(p=1/A\), we recover

  • \(\text{E}[N] = \lambda \times A\), and

  • \(\text{Var}[N] = \lambda \times A \times (A-1)\).

In short, binomial mode StochasticMux exhibits the same stream replacement characteristics as the constant-mode case, but relaxes the need for each streamer to generate an identical number of samples.

Limiting case \(p=1\)

As defined above, the binomial mode is ill-defined when \(p=1\) due to a division-by-zero in the parametrization. This situation does occur in practice with some configurations of StochasticMux, e.g. when operating in exhaustive mode so that streamers are activated without replacement and are discarded after deactivation. In this case, the size of the active set \(A\) can eventually decay, and the probability of choosing the last active streamer \(p \rightarrow 1\).

To circumvent this issue, StochasticMux detects this situation automatically and falls back on a Poisson distribution for \(R\). This is justified by the Poisson limit theorem if we take the product \(\lambda/(1-p) \times (1-p) = \lambda\) as the limit value as \(p \rightarrow 1\).

Discussion

The above analysis tells us, on average, how long we should expect to wait before a given streamer is exhausted and replaced. Because this distribution applies equally to all streamers, the variance of this distribution tells us how dispersed these replacement events are likely to be.

Qualitatively, there are a few things we can observe from the above analysis.

First, for large active set sizes \(A\), binomial mode will behave similarly to constant mode because \(\text{Var}[R]\) will be inversely proportional to \(A\). For small active sets, binomial mode will behave more similarly to Poisson mode.

Second, Poisson mode will exhibit the highest variance of sample limit values \(\text{Var}[R] = \lambda\) upper-bounds that of the binomial mode \(\text{Var}[R] = \lambda \times p\). We can therefore expect that the replacement event distribution \(\text{Pr}[N]\) under Poisson mode will also exhibit slightly higher variance in general.

Third, the binomial mode provides a controlled interaction between the stream replacement rate and the size of the active set, which is difficult to achieve with Poisson mode.

Finally, we should emphasize that the analysis in this section represents a common, if simplified application of StochasticMux, and there are many other variables at play that may alter the mux’s behavior, including:

  • whether streamers are activated with or without replacement,

  • whether streamers are used exhaustively or replaced after deactivation,

  • whether streamer weights are uniform or non-uniform,

  • whether streamers self-limit instead of relying on the mux for deactivation.

The final point is subtle: remember that streamers encapsulate arbitrary generator code, and there’s nothing stopping a generator from determining its own maximum number of samples to produce. If this number is smaller than the value assigned by the mux, the streamer will act as if it has been exhausted and the mux will replace it immediately. This situation would both reduce the replacement time average and variance. (If a streamer self-limits at a number larger than the mux’s limit, the mux will terminate it first and the analysis above still holds.)