Random Numbers in SimPy: Seeds, Streams, and Reproducibility

Randomness is the soul of simulation. But uncontrolled randomness is chaos. Here's how to do it right.

The Reproducibility Problem

Run a simulation twice. Get different results. Which one's right?

Neither. Both. That's randomness.

Reproducibility means getting the same random results when you need them. Different results when comparing scenarios.

Basic Seeding

import random

random.seed(42)  # Fixed seed
result1 = random.expovariate(1/5)

random.seed(42)  # Same seed
result2 = random.expovariate(1/5)

print(result1 == result2)  # True

Same seed = same sequence of random numbers.

Python's random Module

import random

random.seed(42)

# Common distributions
random.random()              # Uniform [0, 1)
random.uniform(5, 10)        # Uniform [5, 10]
random.expovariate(1/5)      # Exponential, mean 5
random.gauss(10, 2)          # Normal, mean 10, std 2
random.triangular(1, 10, 5)  # Triangular
random.choice([1, 2, 3])     # Random selection
random.randint(1, 10)        # Integer in range

More distributions. Better control. Modern API:

import numpy as np

rng = np.random.default_rng(seed=42)

rng.random()                      # Uniform [0, 1)
rng.uniform(5, 10)                # Uniform [5, 10]
rng.exponential(5)                # Exponential, mean 5
rng.normal(10, 2)                 # Normal, mean 10, std 2
rng.gamma(2, 2.5)                 # Gamma
rng.lognormal(1, 0.5)             # Lognormal
rng.poisson(5)                    # Poisson
rng.choice([1, 2, 3])             # Random selection
rng.integers(1, 10, endpoint=True)  # Integer in range

Per-Process Random Streams

Best practice: each process type gets its own random stream:

class Simulation:
    def __init__(self, seed):
        base_rng = np.random.default_rng(seed)

        # Generate seeds for sub-streams
        seeds = base_rng.integers(0, 2**31, size=5)

        self.arrival_rng = np.random.default_rng(seeds[0])
        self.service_rng = np.random.default_rng(seeds[1])
        self.breakdown_rng = np.random.default_rng(seeds[2])

    def arrival_time(self):
        return self.arrival_rng.exponential(5)

    def service_time(self):
        return self.service_rng.exponential(3)

Why separate streams?

  1. Isolation - Changing arrivals doesn't affect service randomness
  2. Comparison - Same arrivals, different service configurations
  3. Debugging - Reproducible sub-components

Common Random Numbers (CRN)

Compare scenarios fairly by using the same random numbers:

def run_scenario(env, num_servers, rng_seed):
    rng = np.random.default_rng(rng_seed)
    server = simpy.Resource(env, capacity=num_servers)
    # ... run simulation

# Same random arrivals for both scenarios
results_2_servers = run_scenario(simpy.Environment(), 2, seed=42)
results_3_servers = run_scenario(simpy.Environment(), 3, seed=42)

Now the difference in results is due to server count, not random variation.

Independent Replications

For confidence intervals, use different seeds:

def run_replication(seed):
    random.seed(seed)
    env = simpy.Environment()
    # ... run simulation
    return mean_wait_time

# Multiple replications with different seeds
base_seed = 42
results = []
for i in range(30):
    results.append(run_replication(base_seed + i))

mean = np.mean(results)
ci = 1.96 * np.std(results, ddof=1) / np.sqrt(len(results))
print(f"Mean: {mean:.2f} ± {ci:.2f}")

Global vs Local State

Global state (not recommended for complex simulations):

import random
random.seed(42)  # Affects all random calls globally

Local state (recommended):

import numpy as np
rng = np.random.default_rng(42)  # Only affects this generator

Reproducibility Checklist

  1. Always set a seed at the start
  2. Log your seed so you can reproduce bugs
  3. Use separate streams for different random processes
  4. Document your seeds in results
def run_simulation(config_seed=None):
    if config_seed is None:
        config_seed = int(time.time())  # Random if not specified

    print(f"Running with seed: {config_seed}")

    rng = np.random.default_rng(config_seed)
    # ...
    return results, config_seed

Anti-Patterns

Don't reseed mid-simulation

# Bad!
def customer(env):
    random.seed(42)  # Resets sequence!
    wait = random.expovariate(1/5)

Don't use time-based seeds for comparisons

# Bad for comparison!
random.seed(int(time.time()))

Don't mix random modules carelessly

# Confusing - which seed controls what?
import random
import numpy as np

random.seed(42)
np.random.seed(42)  # Different generators!

Testing Randomness

Sanity check your distributions:

rng = np.random.default_rng(42)

# Generate many samples
samples = [rng.exponential(5) for _ in range(10000)]

# Check mean (should be ~5)
print(f"Sample mean: {np.mean(samples):.2f}")

# Visual check
import matplotlib.pyplot as plt
plt.hist(samples, bins=50)
plt.show()

Advanced: Variance Reduction

Techniques to reduce variance without more replications:

Antithetic variates:

def generate_antithetic_pair(rng, mean):
    u = rng.random()
    return -mean * np.log(u), -mean * np.log(1 - u)

Common random numbers (already covered) - use same stream across scenarios.

Complete Example

import simpy
import numpy as np

class SimulationRNG:
    """Manages all random streams for a simulation."""

    def __init__(self, master_seed):
        master = np.random.default_rng(master_seed)
        seeds = master.integers(0, 2**31, size=10)

        self.arrival = np.random.default_rng(seeds[0])
        self.service = np.random.default_rng(seeds[1])
        self.failure = np.random.default_rng(seeds[2])

def run_simulation(master_seed, num_servers, run_time):
    env = simpy.Environment()
    rng = SimulationRNG(master_seed)
    server = simpy.Resource(env, capacity=num_servers)
    stats = []

    def customer(name):
        arrival = env.now
        with server.request() as req:
            yield req
            wait = env.now - arrival
            yield env.timeout(rng.service.exponential(5))
            stats.append(wait)

    def arrivals():
        i = 0
        while True:
            yield env.timeout(rng.arrival.exponential(6))
            env.process(customer(f"C{i}"))
            i += 1

    env.process(arrivals())
    env.run(until=run_time)

    return np.mean(stats) if stats else 0

# Run with reproducible seeds
for seed in [42, 43, 44]:
    result = run_simulation(seed, num_servers=2, run_time=1000)
    print(f"Seed {seed}: Mean wait = {result:.2f}")

Summary

Randomness in simulation: - Always seed for reproducibility - Use separate streams for isolation - NumPy's default_rng is the modern approach - Common random numbers for fair comparisons - Log your seeds!

Random doesn't mean uncontrolled.

Next Steps


Discover the Power of Simulation

Want to become a go-to expert in simulation with Python? The Complete Simulation Bootcamp will show you how simulation can transform your career and your projects.

Explore the Bootcamp