pygsti.baseobjs.resourceallocation
Resource allocation manager
Module Contents
Classes
Describes available resources and how they should be allocated. |
- class pygsti.baseobjs.resourceallocation.ResourceAllocation(comm=None, mem_limit=None, profiler=None, distribute_method='default', allocated_memory=0)
Bases:
object
Describes available resources and how they should be allocated.
This includes the number of processors and amount of memory, as well as a strategy for how computations should be distributed among them.
Parameters
- commmpi4py.MPI.Comm, optional
MPI communicator holding the number of available processors.
- mem_limitint, optional
A rough per-processor memory limit in bytes.
- profilerProfiler, optional
A lightweight profiler object for tracking resource usage.
- distribute_methodstr, optional
The name of a distribution strategy.
- property comm_rank
A safe way to get self.comm.rank (0 if self.comm is None)
- property comm_size
A safe way to get self.comm.size (1 if self.comm is None)
- property is_host_leader
True if this processors is the rank-0 “leader” of its host (node). False otherwise.
- classmethod cast(arg)
Cast arg to a
ResourceAllocation
object.If arg already is a
ResourceAllocation
instance, it just returned. Otherwise this function attempts to create a new instance from arg.Parameters
- argResourceAllocation or dict
An object that can be cast to a
ResourceAllocation
.
Returns
ResourceAllocation
- build_hostcomms()
- host_comm_barrier()
Calls self.host_comm.barrier() when self.host_comm is not None.
This convenience function provides an often-used barrier that follows code where a single “leader” processor modifies a memory block shared between all members of self.host_comm, and the other processors must wait until this modification is performed before proceeding with their own computations.
Returns
None
- reset(allocated_memory=0)
Resets internal allocation counters to given values (defaults to zero).
Parameters
- allocated_memoryint64
The value to set the memory allocation counter to.
Returns
None
- add_tracked_memory(num_elements, dtype='d')
Adds nelements * itemsize bytes to the total amount of allocated memory being tracked.
If the total (tracked) memory exceeds self.mem_limit a
MemoryError
exception is raised.Parameters
- num_elementsint
The number of elements to track allocation of.
- dtypenumpy.dtype, optional
The type of elements, needed to compute the number of bytes per element.
Returns
None
- check_can_allocate_memory(num_elements, dtype='d')
Checks that allocating nelements doesn’t cause the memory limit to be exceeded.
This memory isn’t tracked - it’s just added to the current tracked memory and a
MemoryError
exception is raised if the result exceeds self.mem_limit.Parameters
- num_elementsint
The number of elements to track allocation of.
- dtypenumpy.dtype, optional
The type of elements, needed to compute the number of bytes per element.
Returns
None
- temporarily_track_memory(num_elements, dtype='d')
Temporarily adds nelements to tracked memory (a context manager).
A
MemoryError
exception is raised if the tracked memory exceeds self.mem_limit.Parameters
- num_elementsint
The number of elements to track allocation of.
- dtypenumpy.dtype, optional
The type of elements, needed to compute the number of bytes per element.
Returns
contextmanager
- gather_base(result, local, slice_of_global, unit_ralloc=None, all_gather=False)
Gather or all-gather operation using local arrays and a unit resource allocation.
Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with unit_ralloc.rank == 0 need to contribute to the gather operation.
Parameters
- resultnumpy.ndarray, possibly shared
The destination “global” array. When shared memory is being used, i.e. when this
ResourceAllocation
object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather.- localnumpy.ndarray
The locally computed quantity. This can be a shared-memory array, but need not be.
- slice_of_globalslice or numpy.ndarray
The slice of result that local constitutes, i.e., in the end result[slice_of_global] = local. This may be a Python slice or a NumPy array of indices.
- unit_rallocResourceAllocation, optional
A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the gather operation. If None, then it is assumed that all processors compute different local results.
- all_gatherbool, optional
Whether the final result should be gathered on all the processors of this
ResourceAllocation
or just the root (rank 0) processor.
Returns
None
- gather(result, local, slice_of_global, unit_ralloc=None)
Gather local arrays into a global result array potentially with a unit resource allocation.
Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with unit_ralloc.rank == 0 need to contribute to the gather operation.
The global array is only gathered on the root (rank 0) processor of this resource allocation.
Parameters
- resultnumpy.ndarray, possibly shared
The destination “global” array, only needed on the root (rank 0) processor. When shared memory is being used, i.e. when this
ResourceAllocation
object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather.- localnumpy.ndarray
The locally computed quantity. This can be a shared-memory array, but need not be.
- slice_of_globalslice or numpy.ndarray
The slice of result that local constitutes, i.e., in the end result[slice_of_global] = local. This may be a Python slice or a NumPy array of indices.
- unit_rallocResourceAllocation, optional
A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the gather operation. If None, then it is assumed that all processors compute different local results.
Returns
None
- allgather(result, local, slice_of_global, unit_ralloc=None)
All-gather local arrays into global arrays on each processor, potentially using a unit resource allocation.
Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with unit_ralloc.rank == 0 need to contribute to the gather operation.
Parameters
- resultnumpy.ndarray, possibly shared
The destination “global” array. When shared memory is being used, i.e. when this
ResourceAllocation
object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather.- localnumpy.ndarray
The locally computed quantity. This can be a shared-memory array, but need not be.
- slice_of_globalslice or numpy.ndarray
The slice of result that local constitutes, i.e., in the end result[slice_of_global] = local. This may be a Python slice or a NumPy array of indices.
- unit_rallocResourceAllocation, optional
A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the gather operation. If None, then it is assumed that all processors compute different local results.
Returns
None
- allreduce_sum(result, local, unit_ralloc=None)
Sum local arrays on different processors, potentially using a unit resource allocation.
Similar to a normal MPI reduce call (with MPI.SUM type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the sum, only processors with unit_ralloc.rank == 0 contribute to the sum. This handles the case where simply summing the local contributions from all processors would result in over-counting because of multiple processors hold the same logical result (summand).
Parameters
- resultnumpy.ndarray, possibly shared
The destination “global” array, with the same shape as all the local arrays being summed. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this
ResourceAllocation
object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by distributing computation of result over each host’s processors and performing these sums in parallel.- localnumpy.ndarray
The locally computed quantity. This can be a shared-memory array, but need not be.
- unit_rallocResourceAllocation, optional
A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the sum operation. If None, then it is assumed that all processors compute different local results.
Returns
None
- allreduce_sum_simple(local, unit_ralloc=None)
A simplified sum over quantities on different processors that doesn’t use shared memory.
The shared memory usage of
allreduce_sum()
can be overkill when just summing a single scalar quantity. This method provides a way to easily sum a quantity across all the processors in thisResourceAllocation
object using a unit resource allocation.Parameters
- localint or float
The local (per-processor) value to sum.
- unit_rallocResourceAllocation, optional
A resource allocation (essentially a comm) for the group of processors that all compute the same local value, so that only the unit_ralloc.rank == 0 processors will contribute to the sum. If None, then it is assumed that each processor computes a logically different local value.
Returns
- float or int
The sum of all local quantities, returned on all the processors.
- allreduce_min(result, local, unit_ralloc=None)
Take elementwise min of local arrays on different processors, potentially using a unit resource allocation.
Similar to a normal MPI reduce call (with MPI.MIN type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the min operation, only processors with unit_ralloc.rank == 0 contribute.
Parameters
- resultnumpy.ndarray, possibly shared
The destination “global” array, with the same shape as all the local arrays being operated on. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this
ResourceAllocation
object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by distributing computation of result over each host’s processors and performing these sums in parallel.- localnumpy.ndarray
The locally computed quantity. This can be a shared-memory array, but need not be.
- unit_rallocResourceAllocation, optional
A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the sum operation. If None, then it is assumed that all processors compute different local results.
Returns
None
- allreduce_max(result, local, unit_ralloc=None)
Take elementwise max of local arrays on different processors, potentially using a unit resource allocation.
Similar to a normal MPI reduce call (with MPI.MAX type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a unit_ralloc argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the max operation, only processors with unit_ralloc.rank == 0 contribute.
Parameters
- resultnumpy.ndarray, possibly shared
The destination “global” array, with the same shape as all the local arrays being operated on. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this
ResourceAllocation
object has a nontrivial inter-host comm, this array must be allocated as a shared array using this ralloc or a larger so that result is shared between all the processors for this resource allocation’s intra-host communicator. This allows a speedup when shared memory is used by distributing computation of result over each host’s processors and performing these sums in parallel.- localnumpy.ndarray
The locally computed quantity. This can be a shared-memory array, but need not be.
- unit_rallocResourceAllocation, optional
A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the unit_ralloc.rank == 0 processors will contribute to the sum operation. If None, then it is assumed that all processors compute different local results.
Returns
None
- bcast(value, root=0)
Broadcasts a value from the root processor/host to the others in this resource allocation.
This is similar to a usual MPI broadcast, except it takes advantage of shared memory when it is available. When shared memory is being used, i.e. when this
ResourceAllocation
object has a nontrivial inter-host comm, then this routine places value in a shared memory buffer and uses the resource allocation’s inter-host communicator to broadcast the result from the root host to all the other hosts using all the processor on the root host in parallel (all processors with the same intra-host rank participate in a MPI broadcast).Parameters
- valuenumpy.ndarray
The value to broadcast. May be shared memory but doesn’t need to be. Only need to specify this on the rank root processor, other processors can provide any value for this argument (it’s unused).
- rootint
The rank of the processor whose value will be to broadcast.
Returns
- numpy.ndarray
The broadcast value, in a new, non-shared-memory array.