pygsti.baseobjs.resourceallocation

Resource allocation manager

Module Contents

Classes

ResourceAllocation

Describes available resources and how they should be allocated.

class pygsti.baseobjs.resourceallocation.ResourceAllocation(comm=None, mem_limit=None, profiler=None, distribute_method='default', allocated_memory=0)

Bases: object

Describes available resources and how they should be allocated.

This includes the number of processors and amount of memory, as well as a strategy for how computations should be distributed among them.

Parameters

commmpi4py.MPI.Comm, optional

MPI communicator holding the number of available processors.

mem_limitint, optional

A rough per-processor memory limit in bytes.

profilerProfiler, optional

A lightweight profiler object for tracking resource usage.

distribute_methodstr, optional

The name of a distribution strategy.

property comm_rank

A safe way to get self.comm.rank (0 if self.comm is None)

property comm_size

A safe way to get self.comm.size (1 if self.comm is None)

property is_host_leader

True if this processors is the rank-0 “leader” of its host (node). False otherwise.

classmethod cast(arg)

Cast arg to a ResourceAllocation object.

If arg already is a ResourceAllocation instance, it just returned. Otherwise this function attempts to create a new instance from arg.

Parameters
argResourceAllocation or dict

An object that can be cast to a ResourceAllocation.

Returns

ResourceAllocation

build_hostcomms()
host_comm_barrier()

Calls self.host_comm.barrier() when self.host_comm is not None.

This convenience function provides an often-used barrier that follows code where a single “leader” processor modifies a memory block shared between all members of self.host_comm, and the other processors must wait until this modification is performed before proceeding with their own computations.

Returns

None

copy()

Copy this object.

Returns

ResourceAllocation

reset(allocated_memory=0)

Resets internal allocation counters to given values (defaults to zero).

Parameters
allocated_memoryint64

The value to set the memory allocation counter to.

Returns

None

add_tracked_memory(num_elements, dtype='d')

Adds nelements * itemsize bytes to the total amount of allocated memory being tracked.

If the total (tracked) memory exceeds self.mem_limit a MemoryError exception is raised.

Parameters
num_elementsint

The number of elements to track allocation of.

dtypenumpy.dtype, optional

The type of elements, needed to compute the number of bytes per element.

Returns

None

check_can_allocate_memory(num_elements, dtype='d')

Checks that allocating nelements doesn’t cause the memory limit to be exceeded.

This memory isn’t tracked - it’s just added to the current tracked memory and a MemoryError exception is raised if the result exceeds self.mem_limit.

Parameters
num_elementsint

The number of elements to track allocation of.

dtypenumpy.dtype, optional

The type of elements, needed to compute the number of bytes per element.

Returns

None

temporarily_track_memory(num_elements, dtype='d')

Temporarily adds nelements to tracked memory (a context manager).

A MemoryError exception is raised if the tracked memory exceeds self.mem_limit.

Parameters
num_elementsint

The number of elements to track allocation of.

dtypenumpy.dtype, optional

The type of elements, needed to compute the number of bytes per element.

Returns

contextmanager

gather_base(result, local, slice_of_global, unit_ralloc=None, all_gather=False)

Gather or all-gather operation using local arrays and a unit resource allocation.

Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with unit_ralloc.rank == 0 need to contribute to the gather operation.

Parameters
resultnumpy.ndarray, possibly shared

The destination “global” array. When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather.

localnumpy.ndarray

The locally computed quantity. This can be a shared-memory array, but need not be.

slice_of_globalslice or numpy.ndarray

The slice of result that local constitutes, i.e., in the end result[slice_of_global] = local. This may be a Python slice or a NumPy array of indices.

unit_rallocResourceAllocation, optional

A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the gather operation. If None, then it is assumed that all processors compute different local results.

all_gatherbool, optional

Whether the final result should be gathered on all the processors of this ResourceAllocation or just the root (rank 0) processor.

Returns

None

gather(result, local, slice_of_global, unit_ralloc=None)

Gather local arrays into a global result array potentially with a unit resource allocation.

Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with unit_ralloc.rank == 0 need to contribute to the gather operation.

The global array is only gathered on the root (rank 0) processor of this resource allocation.

Parameters
resultnumpy.ndarray, possibly shared

The destination “global” array, only needed on the root (rank 0) processor. When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather.

localnumpy.ndarray

The locally computed quantity. This can be a shared-memory array, but need not be.

slice_of_globalslice or numpy.ndarray

The slice of result that local constitutes, i.e., in the end result[slice_of_global] = local. This may be a Python slice or a NumPy array of indices.

unit_rallocResourceAllocation, optional

A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the gather operation. If None, then it is assumed that all processors compute different local results.

Returns

None

allgather(result, local, slice_of_global, unit_ralloc=None)

All-gather local arrays into global arrays on each processor, potentially using a unit resource allocation.

Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with unit_ralloc.rank == 0 need to contribute to the gather operation.

Parameters
resultnumpy.ndarray, possibly shared

The destination “global” array. When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather.

localnumpy.ndarray

The locally computed quantity. This can be a shared-memory array, but need not be.

slice_of_globalslice or numpy.ndarray

The slice of result that local constitutes, i.e., in the end result[slice_of_global] = local. This may be a Python slice or a NumPy array of indices.

unit_rallocResourceAllocation, optional

A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the gather operation. If None, then it is assumed that all processors compute different local results.

Returns

None

allreduce_sum(result, local, unit_ralloc=None)

Sum local arrays on different processors, potentially using a unit resource allocation.

Similar to a normal MPI reduce call (with MPI.SUM type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the sum, only processors with unit_ralloc.rank == 0 contribute to the sum. This handles the case where simply summing the local contributions from all processors would result in over-counting because of multiple processors hold the same logical result (summand).

Parameters
resultnumpy.ndarray, possibly shared

The destination “global” array, with the same shape as all the local arrays being summed. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by distributing computation of result over each host’s processors and performing these sums in parallel.

localnumpy.ndarray

The locally computed quantity. This can be a shared-memory array, but need not be.

unit_rallocResourceAllocation, optional

A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the sum operation. If None, then it is assumed that all processors compute different local results.

Returns

None

allreduce_sum_simple(local, unit_ralloc=None)

A simplified sum over quantities on different processors that doesn’t use shared memory.

The shared memory usage of allreduce_sum() can be overkill when just summing a single scalar quantity. This method provides a way to easily sum a quantity across all the processors in this ResourceAllocation object using a unit resource allocation.

Parameters
localint or float

The local (per-processor) value to sum.

unit_rallocResourceAllocation, optional

A resource allocation (essentially a comm) for the group of processors that all compute the same local value, so that only the unit_ralloc.rank == 0 processors will contribute to the sum. If None, then it is assumed that each processor computes a logically different local value.

Returns
float or int

The sum of all local quantities, returned on all the processors.

allreduce_min(result, local, unit_ralloc=None)

Take elementwise min of local arrays on different processors, potentially using a unit resource allocation.

Similar to a normal MPI reduce call (with MPI.MIN type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the min operation, only processors with unit_ralloc.rank == 0 contribute.

Parameters
resultnumpy.ndarray, possibly shared

The destination “global” array, with the same shape as all the local arrays being operated on. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by distributing computation of result over each host’s processors and performing these sums in parallel.

localnumpy.ndarray

The locally computed quantity. This can be a shared-memory array, but need not be.

unit_rallocResourceAllocation, optional

A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the sum operation. If None, then it is assumed that all processors compute different local results.

Returns

None

allreduce_max(result, local, unit_ralloc=None)

Take elementwise max of local arrays on different processors, potentially using a unit resource allocation.

Similar to a normal MPI reduce call (with MPI.MAX type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the max operation, only processors with unit_ralloc.rank == 0 contribute.

Parameters
resultnumpy.ndarray, possibly shared

The destination “global” array, with the same shape as all the local arrays being operated on. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by distributing computation of result over each host’s processors and performing these sums in parallel.

localnumpy.ndarray

The locally computed quantity. This can be a shared-memory array, but need not be.

unit_rallocResourceAllocation, optional

A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the sum operation. If None, then it is assumed that all processors compute different local results.

Returns

None

bcast(value, root=0)

Broadcasts a value from the root processor/host to the others in this resource allocation.

This is similar to a usual MPI broadcast, except it takes advantage of shared memory when it is available. When shared memory is being used, i.e. when this ResourceAllocation object has a nontrivial inter-host comm, then this routine places value in a shared memory buffer and uses the resource allocation’s inter-host communicator to broadcast the result from the root host to all the other hosts using all the processor on the root host in parallel (all processors with the same intra-host rank participate in a MPI broadcast).

Parameters
valuenumpy.ndarray

The value to broadcast. May be shared memory but doesn’t need to be. Only need to specify this on the rank root processor, other processors can provide any value for this argument (it’s unused).

rootint

The rank of the processor whose value will be to broadcast.

Returns
numpy.ndarray

The broadcast value, in a new, non-shared-memory array.