Skip to content

Reliability

simweave.reliability provides an availability and maintainability simulation framework built on top of the existing discrete-event and supply-chain primitives. It is designed for scenarios where the operational availability of a fleet of assets -- taxis, military vehicles, aircraft, industrial machines -- is a key performance indicator driven by:

  • failure rates of individual subsystems,
  • spare parts holdings in one or more warehouses,
  • repair-bay capacity and technician resources, and
  • the financial cost of new buys vs. repairs.

The stacked area chart below shows a simulated year for an 8-vehicle taxi fleet. Green = operational, amber = in repair, red = awaiting parts. Calendar months are displayed by pairing the recorder with a SimTimeAxis.


Concepts

Subsystem

A subsystem is any replaceable or repairable component fitted to an asset (engine, gearbox, tyre set, avionics module, etc.). Each subsystem is described by a SubsystemSpec and can be either:

Type On failure
Consumable Failed unit discarded; new unit drawn from warehouse stock
Repairable Failed unit sent to a RepairCentre for repair
Beyond economic repair (BER) A fraction of repairable failures that are uneconomical to fix → treated as new buy

Failure events follow an exponential (memoryless) distribution, the standard model for electronic and mechanical components in the absence of wear-out. Both time-based (failure_rate in failures/day) and cycle-based (failure_rate_per_cycle in failures/km, sortie, etc.) failure rates can be active simultaneously.

ReliableEntity

A ReliableEntity inherits from Entity and owns a list of SubsystemSpec objects. On every simulation tick it:

  1. Checks each UP subsystem for a random failure event.
  2. For newly failed subsystems, draws a spare part from the linked Warehouse.
  3. Retries AWAITING_PART subsystems every tick until stock is replenished.
  4. Submits a RepairJob to the RepairCentre once parts are in hand.
  5. Tracks cumulative operational time, downtime, and costs.

An entity is operational only when all of its subsystems are UP.

RepairCentre

RepairCentre is a subclass of Service. It inherits all of Service's queuing and multi-channel machinery. Model a repair team under the operator's employment by passing a ResourcePool of technicians. For a third-party maintenance contract simply tune capacity and buffer_size to reflect the contracted throughput.

On each job completion the RepairCentre:

  • Returns repaired units to warehouse stock (repairable, non-BER cases).
  • Records cost and counters (total_newbuys, total_repairs, total_cost).
  • Calls back into the owning ReliableEntity to restore the subsystem to UP.

Fleet and FleetAvailabilityRecorder

Fleet is a thin wrapper around a list of ReliableEntity instances with aggregate properties:

Property Description
operational_count Entities fully operational right now
operational_availability Fraction of fleet operational right now
mean_availability Mean of each entity's time-based empirical Ao
total_cost Sum of new-buy + repair costs across the fleet

FleetAvailabilityRecorder is registered with the environment and snapshots fleet state each tick. Its times, operational, in_repair, and awaiting_part lists are fed directly to plot_fleet_availability.


Sensitivity Analysis

sensitivity_sweep varies one or two scalar parameters of a scenario builder function across a grid and collects a scalar metric (e.g. Ao) from each cell. Monte Carlo averaging is supported via the n_runs argument.

from simweave.reliability import sensitivity_sweep

def build(n_bays, stock_mult, seed):
    # ... build and run scenario ...
    return operational_availability   # scalar

result = sensitivity_sweep(
    build,
    param1_name="repair_bays",
    param1_values=[1, 2, 3, 4],
    param2_name="stock_multiplier",
    param2_values=[0.5, 1.0, 1.5, 2.0],
    metric_name="Ao",
    n_runs=30,
)

The SweepResult can be passed to plot_sensitivity_surface for a 3-D surface, heatmap, or grouped bar chart.


Quick start

import numpy as np
import simweave as sw

# 1. Describe subsystems
specs = [
    sw.SubsystemSpec(
        name="engine",
        failure_rate=1/120,      # MTBF = 120 days
        sku_index=0,
        consumable=False,
        beyond_economic_repair_prc=0.10,
        repair_time=5.0,
        unit_cost=8_000.0,
        repair_cost=2_500.0,
    ),
    sw.SubsystemSpec(
        name="tyres",
        failure_rate=1/45,
        sku_index=1,
        consumable=True,
        repair_time=0.5,
        unit_cost=400.0,
    ),
]

# 2. Build warehouse
inv = sw.InventoryItems(
    part_names=["engine", "tyres"],
    unit_cost=[8_000.0, 400.0],
    stock_level=[3.0, 10.0],
    batchsize=[2.0, 4.0],
    reorder_points=[1.0, 2.0],
    repairable_prc=[0.90, 0.0],
    repair_times=[5.0, 0.0],
    newbuy_leadtimes=[14.0, 3.0],
)
warehouse = sw.Warehouse(inventory=inv, name="depot")

# 3. Build repair centre (2 bays, 3 technicians)
technicians = sw.ResourcePool(maxlen=3, name="technicians")
for i in range(3):
    technicians.deposit(sw.Resource(name=f"tech_{i}"))
repair_centre = sw.RepairCentre(capacity=2, resources=technicians)

# 4. Build fleet
rng = np.random.default_rng(42)
vehicles = [
    sw.ReliableEntity(
        subsystems=specs,
        warehouse=warehouse,
        repair_centre=repair_centre,
        name=f"taxi_{i:02d}",
        rng=np.random.default_rng(rng.integers(0, 2**32)),
    )
    for i in range(10)
]
fleet = sw.Fleet(vehicles, name="taxi_fleet")
recorder = sw.FleetAvailabilityRecorder(fleet)

# 5. Run
env = sw.SimEnvironment(dt=1.0, end=365.0)
env.register(warehouse)
env.register(repair_centre)
for v in vehicles:
    env.register(v)
env.register(recorder)
env.run(until=365.0)

# 6. Summarise
print(f"Operational availability: {recorder.mean_operational_availability:.3f}")
print(f"Total fleet cost: £{fleet.total_cost:,.0f}")

# 7. Plot
fig = sw.plot_fleet_availability(recorder, title="Taxi Fleet Availability")
fig.show()

API reference

SubsystemSpec dataclass

SubsystemSpec(name: str, failure_rate: float, sku_index: int, consumable: bool = True, beyond_economic_repair_prc: float = 0.0, repair_time: float = 1.0, unit_cost: float = 0.0, repair_cost: float = 0.0, failure_rate_per_cycle: float = 0.0)

Immutable description of one subsystem fitted to a ReliableEntity.

Parameters:

Name Type Description Default
name str

Human-readable label (e.g. "engine", "front_tyre").

required
failure_rate float

Time-based failure rate λ in failures per unit simulation time. Set to 0.0 if only cycle-based failure is used.

required
sku_index int

Index into the associated :class:~simweave.supplychain.warehouse.Warehouse's :attr:~simweave.supplychain.inventory.InventoryItems.part_names list. When the subsystem fails, one unit of this SKU is consumed.

required
consumable bool

If True the failed unit is discarded and a new one is bought. If False the failed unit is returned to a :class:~simweave.reliability.repair.RepairCentre for repair.

True
beyond_economic_repair_prc float

Fraction of failures on a repairable subsystem that are beyond economic repair and therefore require a new buy instead. Ignored when consumable=True.

0.0
repair_time float

Nominal repair / fit time in simulation time units. This becomes the remaining_service_time of the :class:~simweave.reliability.repair.RepairJob submitted to the repair centre.

1.0
unit_cost float

Cost charged per new unit purchased (new buy or BER replacement).

0.0
repair_cost float

Cost charged per repair (non-BER repairable failure).

0.0
failure_rate_per_cycle float

Cycle-based failure rate in failures per operational cycle. Set to 0.0 to disable cycle-based failures.

0.0

SubsystemState

Bases: Enum

Operational state of a single subsystem.

UP class-attribute instance-attribute

UP = auto()

Subsystem is fully functional.

AWAITING_PART class-attribute instance-attribute

AWAITING_PART = auto()

Subsystem has failed; waiting for a spare part to become available.

IN_REPAIR class-attribute instance-attribute

IN_REPAIR = auto()

Part obtained; repair / fit job is queued or active at the RepairCentre.

SubsystemStatus dataclass

SubsystemStatus(spec: SubsystemSpec, state: SubsystemState = UP, time_in_state: float = 0.0, total_failures: int = 0, total_downtime: float = 0.0, cost_newbuy: float = 0.0, cost_repair: float = 0.0)

Live state of one subsystem on a specific entity.

Created automatically by :class:~simweave.reliability.entity.ReliableEntity for each :class:SubsystemSpec it is initialised with.

ReliableEntity

ReliableEntity(subsystems: Sequence[SubsystemSpec], warehouse: 'Warehouse', repair_centre: 'RepairCentre | None' = None, name: str | None = None, rng: Generator | None = None)

Bases: Entity

An entity composed of subsystems that can fail and require repair.

Parameters:

Name Type Description Default
subsystems Sequence[SubsystemSpec]

One :class:~simweave.reliability.subsystem.SubsystemSpec per subsystem fitted to this entity.

required
warehouse 'Warehouse'

Parts warehouse. When a subsystem fails, one unit of its SKU is consumed from here. When a repairable unit is returned to service, one unit is added back.

required
repair_centre 'RepairCentre | None'

Optional :class:~simweave.reliability.repair.RepairCentre. If supplied, failed subsystems are queued here for repair/fitting. If None, repairs are instantaneous (the subsystem is restored on the same tick the part is obtained).

None
name str | None

Display name.

None
rng Generator | None

Numpy random generator. Defaults to np.random.default_rng().

None

Attributes:

Name Type Description
subsystems list[SubsystemStatus]

Live state of each fitted subsystem.

operational_cycles float

Cumulative operating cycles. Increment this in your scenario script (e.g. each km driven, each sortie flown) to activate cycle-based failure rates.

total_operational_time float

Simulation time spent fully operational.

total_downtime float

Simulation time spent with at least one subsystem not UP.

cost_newbuy float

Cumulative spend on new part purchases.

cost_repair float

Cumulative spend on repairs.

is_operational property

is_operational: bool

True when every subsystem is in the UP state.

availability property

availability: float

Empirical operational availability (time-based).

summary

summary() -> dict

Return a snapshot dict suitable for logging / MC aggregation.

RepairJob

RepairJob(owner: 'ReliableEntity', subsystem_idx: int, is_new_buy: bool, return_to_stock: bool, repair_time: float, cost: float, name: str | None = None)

Bases: Entity

A work item representing a repair or new-unit-fit operation.

Parameters:

Name Type Description Default
owner 'ReliableEntity'

The :class:~simweave.reliability.entity.ReliableEntity whose subsystem has failed.

required
subsystem_idx int

Position of the failed subsystem in owner.subsystems.

required
is_new_buy bool

True if this job represents fitting a brand-new part (consumable failure or beyond-economic-repair). False if it is a repair of the existing unit.

required
return_to_stock bool

True when the repaired unit should be returned to the warehouse stock on job completion (only applicable to non-BER repairable items).

required
repair_time float

How long the job takes at the repair centre (simulation time units). Stored in remaining_service_time so the :class:~simweave.discrete.services._WorkChannel can pick it up without needing sim_properties.

required
cost float

Financial cost charged to owner upon completion.

required

RepairCentre

RepairCentre(capacity: int = 1, buffer_size: int = 100, resources=None, rng=None, name: str | None = None)

Bases: Service

A repair facility; a :class:~simweave.discrete.services.Service whose completions restore failed subsystems on :class:~simweave.reliability.entity.ReliableEntity instances.

The centre accepts :class:RepairJob items in its queue. On completion:

  1. If job.return_to_stock the repaired part is returned to the owning entity's warehouse (incrementing stock by one unit).
  2. The owning entity's subsystem is transitioned back to UP.
  3. Cost and counter metrics on this centre are updated.

Parameters:

Name Type Description Default
capacity int

Number of parallel work channels (repair bays / technicians when no explicit resource pool is used).

1
buffer_size int

Maximum number of jobs that can wait in the pre-repair queue.

100
resources

Optional :class:~simweave.discrete.resources.ResourcePool. Attach a pool of n technicians here to gate each repair bay on staff availability. If None the centre is unconstrained by personnel.

None
rng

Random number generator forwarded to the parent Service.

None
name str | None

Display name.

None

Fleet

Fleet(entities: Sequence[ReliableEntity], name: str = 'fleet')

A collection of :class:~simweave.reliability.entity.ReliableEntity instances with aggregate operational metrics.

Parameters:

Name Type Description Default
entities Sequence[ReliableEntity]

The vehicles / platforms that make up the fleet.

required
name str

Display name used in plot titles.

'fleet'

operational_count property

operational_count: int

Number of entities that are fully operational right now.

operational_availability property

operational_availability: float

Instantaneous operational availability (0–1).

mean_availability property

mean_availability: float

Mean of each entity's time-based empirical availability.

status_counts

status_counts() -> dict[str, int]

Classify every entity into one of three broad states.

Returns:

Type Description
dict with keys ``"operational"``, ``"in_repair"``, ``"awaiting_part"``.
An entity is *awaiting_part* if any subsystem is in that state.
An entity is *in_repair* if it has at least one subsystem IN_REPAIR
and none AWAITING_PART.

summary

summary() -> dict

Aggregate snapshot suitable for Monte Carlo result dicts.

FleetAvailabilityRecorder

FleetAvailabilityRecorder(fleet: Fleet)

Records fleet state at each simulation tick.

Register with the environment after all :class:~simweave.reliability.entity.ReliableEntity instances so the snapshot captures the state after each tick's failures and repairs.

Parameters:

Name Type Description Default
fleet Fleet

The :class:Fleet to monitor.

required

Attributes:

Name Type Description
times list[float]

Simulation clock value at each snapshot.

operational list[int]

Count of operational entities at each snapshot.

in_repair list[int]

Count of entities in repair (part available) at each snapshot.

awaiting_part list[int]

Count of entities waiting for parts at each snapshot.

mean_operational_availability property

mean_operational_availability: float

Time-averaged fraction of the fleet that was operational.

SweepResult dataclass

SweepResult(param1_name: str, param1_values: ndarray, param2_name: str | None, param2_values: ndarray | None, metric_name: str, metric_mean: ndarray, metric_std: ndarray, n_runs: int = 1)

Result of a 1-D or 2-D sensitivity sweep.

Attributes:

Name Type Description
param1_name str

Name of the first swept parameter.

param1_values ndarray

Array of values swept for parameter 1.

param2_name str | None

Name of the second parameter, or None for a 1-D sweep.

param2_values ndarray | None

Array of values swept for parameter 2, or None for a 1-D sweep.

metric_name str

Label for the output metric (used in plot axis titles).

metric_mean ndarray

Mean metric value. Shape (n1,) for 1-D or (n1, n2) for 2-D.

metric_std ndarray

Standard deviation across MC replicates. All zeros when n_runs == 1. Same shape as metric_mean.

n_runs int

Number of Monte Carlo replicates per grid point.

sensitivity_sweep

sensitivity_sweep(scenario_builder: Callable[..., float], param1_name: str, param1_values: Sequence[float], param2_name: str | None = None, param2_values: Sequence[float] | None = None, metric_name: str = 'metric', n_runs: int = 1, seed: int = 0, executor: str = 'serial', n_workers: int | None = None) -> SweepResult

Run a 1-D or 2-D parameter sensitivity sweep with optional MC averaging.

Parameters:

Name Type Description Default
scenario_builder Callable[..., float]

Callable with signature f(p1, seed) -> float for a 1-D sweep, or f(p1, p2, seed) -> float for a 2-D sweep. Must return a scalar metric (e.g. operational availability). Must be picklable when executor="processes".

required
param1_name str

Name of the first parameter (used in plot labels).

required
param1_values Sequence[float]

Values to sweep for parameter 1.

required
param2_name str | None

Name of the second parameter. None → 1-D sweep.

None
param2_values Sequence[float] | None

Values to sweep for parameter 2. Required when param2_name is not None.

None
metric_name str

Label for the output metric.

'metric'
n_runs int

Number of Monte Carlo replicates per grid point. Each replicate receives a unique seed derived from the base seed.

1
seed int

Base random seed. Replicate r at grid point i (or (i, j)) receives seed seed + r + i * n_runs (or similar) to ensure independence across the grid.

0
executor str

"serial" (default) or "processes" for multi-core parallelism.

'serial'
n_workers int | None

Number of worker processes. None → OS default.

None

Returns:

Type Description
SweepResult