pygsti.baseobjs.resourceallocation

Resource allocation manager

Module Contents

Classes

ResourceAllocation

Describes available resources and how they should be allocated.

Functions

_gethostname()

Mimics multiple hosts on a single host, mostly for debugging

Attributes

_dummy_profiler

_GB

pygsti.baseobjs.resourceallocation._dummy_profiler
pygsti.baseobjs.resourceallocation._GB
class pygsti.baseobjs.resourceallocation.ResourceAllocation(comm=None, mem_limit=None, profiler=None, distribute_method='default', allocated_memory=0)

Bases: object

Describes available resources and how they should be allocated.

This includes the number of processors and amount of memory, as well as a strategy for how computations should be distributed among them.

Parameters
  • comm (mpi4py.MPI.Comm, optional) – MPI communicator holding the number of available processors.

  • mem_limit (int, optional) – A rough per-processor memory limit in bytes.

  • profiler (Profiler, optional) – A lightweight profiler object for tracking resource usage.

  • distribute_method (str, optional) – The name of a distribution strategy.

classmethod cast(cls, arg)

Cast arg to a ResourceAllocation object.

If arg already is a ResourceAllocation instance, it just returned. Otherwise this function attempts to create a new instance from arg.

Parameters

arg (ResourceAllocation or dict) – An object that can be cast to a ResourceAllocation.

Returns

ResourceAllocation

build_hostcomms(self)
property comm_rank(self)

A safe way to get self.comm.rank (0 if self.comm is None)

property comm_size(self)

A safe way to get self.comm.size (1 if self.comm is None)

property is_host_leader(self)

True if this processors is the rank-0 “leader” of its host (node). False otherwise.

host_comm_barrier(self)

Calls self.host_comm.barrier() when self.host_comm is not None.

This convenience function provides an often-used barrier that follows code where a single “leader” processor modifies a memory block shared between all members of self.host_comm, and the other processors must wait until this modification is performed before proceeding with their own computations.

Returns

None

copy(self)

Copy this object.

Returns

ResourceAllocation

reset(self, allocated_memory=0)

Resets internal allocation counters to given values (defaults to zero).

Parameters

allocated_memory (int64) – The value to set the memory allocation counter to.

Returns

None

add_tracked_memory(self, num_elements, dtype='d')

Adds nelements * itemsize bytes to the total amount of allocated memory being tracked.

If the total (tracked) memory exceeds self.mem_limit a MemoryError exception is raised.

Parameters
  • num_elements (int) – The number of elements to track allocation of.

  • dtype (numpy.dtype, optional) – The type of elements, needed to compute the number of bytes per element.

Returns

None

check_can_allocate_memory(self, num_elements, dtype='d')

Checks that allocating nelements doesn’t cause the memory limit to be exceeded.

This memory isn’t tracked - it’s just added to the current tracked memory and a MemoryError exception is raised if the result exceeds self.mem_limit.

Parameters
  • num_elements (int) – The number of elements to track allocation of.

  • dtype (numpy.dtype, optional) – The type of elements, needed to compute the number of bytes per element.

Returns

None

temporarily_track_memory(self, num_elements, dtype='d')

Temporarily adds nelements to tracked memory (a context manager).

A MemoryError exception is raised if the tracked memory exceeds self.mem_limit.

Parameters
  • num_elements (int) – The number of elements to track allocation of.

  • dtype (numpy.dtype, optional) – The type of elements, needed to compute the number of bytes per element.

Returns

contextmanager

gather_base(self, result, local, slice_of_global, unit_ralloc=None, all_gather=False)

Gather or all-gather operation using local arrays and a unit resource allocation.

Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with unit_ralloc.rank == 0 need to contribute to the gather operation.

Parameters
  • result (numpy.ndarray, possibly shared) – The destination “global” array. When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather.

  • local (numpy.ndarray) – The locally computed quantity. This can be a shared-memory array, but need not be.

  • slice_of_global (slice or numpy.ndarray) – The slice of result that local constitutes, i.e., in the end result[slice_of_global] = local. This may be a Python slice or a NumPy array of indices.

  • unit_ralloc (ResourceAllocation, optional) – A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the gather operation. If None, then it is assumed that all processors compute different local results.

  • all_gather (bool, optional) – Whether the final result should be gathered on all the processors of this ResourceAllocation or just the root (rank 0) processor.

Returns

None

gather(self, result, local, slice_of_global, unit_ralloc=None)

Gather local arrays into a global result array potentially with a unit resource allocation.

Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with unit_ralloc.rank == 0 need to contribute to the gather operation.

The global array is only gathered on the root (rank 0) processor of this resource allocation.

Parameters
  • result (numpy.ndarray, possibly shared) – The destination “global” array, only needed on the root (rank 0) processor. When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather.

  • local (numpy.ndarray) – The locally computed quantity. This can be a shared-memory array, but need not be.

  • slice_of_global (slice or numpy.ndarray) – The slice of result that local constitutes, i.e., in the end result[slice_of_global] = local. This may be a Python slice or a NumPy array of indices.

  • unit_ralloc (ResourceAllocation, optional) – A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the gather operation. If None, then it is assumed that all processors compute different local results.

Returns

None

allgather(self, result, local, slice_of_global, unit_ralloc=None)

All-gather local arrays into global arrays on each processor, potentially using a unit resource allocation.

Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with unit_ralloc.rank == 0 need to contribute to the gather operation.

Parameters
  • result (numpy.ndarray, possibly shared) – The destination “global” array. When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather.

  • local (numpy.ndarray) – The locally computed quantity. This can be a shared-memory array, but need not be.

  • slice_of_global (slice or numpy.ndarray) – The slice of result that local constitutes, i.e., in the end result[slice_of_global] = local. This may be a Python slice or a NumPy array of indices.

  • unit_ralloc (ResourceAllocation, optional) – A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the gather operation. If None, then it is assumed that all processors compute different local results.

Returns

None

allreduce_sum(self, result, local, unit_ralloc=None)

Sum local arrays on different processors, potentially using a unit resource allocation.

Similar to a normal MPI reduce call (with MPI.SUM type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the sum, only processors with unit_ralloc.rank == 0 contribute to the sum. This handles the case where simply summing the local contributions from all processors would result in over-counting because of multiple processors hold the same logical result (summand).

Parameters
  • result (numpy.ndarray, possibly shared) – The destination “global” array, with the same shape as all the local arrays being summed. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by distributing computation of result over each host’s processors and performing these sums in parallel.

  • local (numpy.ndarray) – The locally computed quantity. This can be a shared-memory array, but need not be.

  • unit_ralloc (ResourceAllocation, optional) – A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the sum operation. If None, then it is assumed that all processors compute different local results.

Returns

None

allreduce_sum_simple(self, local, unit_ralloc=None)

A simplified sum over quantities on different processors that doesn’t use shared memory.

The shared memory usage of :method:`allreduce_sum` can be overkill when just summing a single scalar quantity. This method provides a way to easily sum a quantity across all the processors in this ResourceAllocation object using a unit resource allocation.

Parameters
  • local (int or float) – The local (per-processor) value to sum.

  • unit_ralloc (ResourceAllocation, optional) – A resource allocation (essentially a comm) for the group of processors that all compute the same local value, so that only the unit_ralloc.rank == 0 processors will contribute to the sum. If None, then it is assumed that each processor computes a logically different local value.

Returns

float or int – The sum of all local quantities, returned on all the processors.

allreduce_min(self, result, local, unit_ralloc=None)

Take elementwise min of local arrays on different processors, potentially using a unit resource allocation.

Similar to a normal MPI reduce call (with MPI.MIN type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the min operation, only processors with unit_ralloc.rank == 0 contribute.

Parameters
  • result (numpy.ndarray, possibly shared) – The destination “global” array, with the same shape as all the local arrays being operated on. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by distributing computation of result over each host’s processors and performing these sums in parallel.

  • local (numpy.ndarray) – The locally computed quantity. This can be a shared-memory array, but need not be.

  • unit_ralloc (ResourceAllocation, optional) – A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the sum operation. If None, then it is assumed that all processors compute different local results.

Returns

None

allreduce_max(self, result, local, unit_ralloc=None)

Take elementwise max of local arrays on different processors, potentially using a unit resource allocation.

Similar to a normal MPI reduce call (with MPI.MAX type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the max operation, only processors with unit_ralloc.rank == 0 contribute.

Parameters
  • result (numpy.ndarray, possibly shared) – The destination “global” array, with the same shape as all the local arrays being operated on. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by distributing computation of result over each host’s processors and performing these sums in parallel.

  • local (numpy.ndarray) – The locally computed quantity. This can be a shared-memory array, but need not be.

  • unit_ralloc (ResourceAllocation, optional) – A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the sum operation. If None, then it is assumed that all processors compute different local results.

Returns

None

bcast(self, value, root=0)

Broadcasts a value from the root processor/host to the others in this resource allocation.

This is similar to a usual MPI broadcast, except it takes advantage of shared memory when it is available. When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, then this routine places value in a shared memory buffer and uses the resource allocation’s inter-host communicator to broadcast the result from the root host to all the other hosts using all the processor on the root host in parallel (all processors with the same intra-host rank participate in a MPI broadcast).

Parameters
  • value (numpy.ndarray) – The value to broadcast. May be shared memory but doesn’t need to be. Only need to specify this on the rank root processor, other processors can provide any value for this argument (it’s unused).

  • root (int) – The rank of the processor whose value will be to broadcast.

Returns

numpy.ndarray – The broadcast value, in a new, non-shared-memory array.

__getstate__(self)
pygsti.baseobjs.resourceallocation._gethostname()

Mimics multiple hosts on a single host, mostly for debugging