pescador.maps.cache

pescador.maps.cache(stream, n_cache, prob=0.5, random_state=None)

Stochastic stream caching.

  • With probability prob: yield a new item from stream and place it in the cache

  • With probability 1-prob: yield a previously seen item from the cache

  • When the cache exceeds size n_cache, a previously seen item is selected at random for eviction.

Stream caching can reduce latency in producing items, particularly when the items are large or take a non-trivial amount of time for the underlying stream to produce. Note that the statistics of the cached stream will differ from those of stream because items in the cache may be relatively over-represented, so use with caution.

A cached stream will generate at least as many items as the raw stream. Cached streams will terminate when they attempt to collect a new item from the input and the input has terminated.

Note

The first n_cache items will be generated from stream in order. Caching only becomes active after this startup phase.

Parameters:
streamiterable

The stream from which to sample

n_cacheint > 0

The size of the cache

probfloat in (0, 1]

The probability with which to select a new item. Small values of prob lead to high reuse of data; prob=1 is equivalent to not caching at all.

random_stateNone, int, or np.random.RandomState

If int, random_state is the seed used by the random number generator;

If RandomState instance, random_state is the random number generator;

If None, the random number generator is the RandomState instance used by np.random.

Yields:
data

elements of stream