:py:mod:`pygsti.baseobjs.resourceallocation` ============================================ .. py:module:: pygsti.baseobjs.resourceallocation .. autoapi-nested-parse:: Resource allocation manager Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: pygsti.baseobjs.resourceallocation.ResourceAllocation .. py:class:: ResourceAllocation(comm=None, mem_limit=None, profiler=None, distribute_method='default', allocated_memory=0) Bases: :py:obj:`object` Describes available resources and how they should be allocated. This includes the number of processors and amount of memory, as well as a strategy for how computations should be distributed among them. Parameters ---------- comm : mpi4py.MPI.Comm, optional MPI communicator holding the number of available processors. mem_limit : int, optional A rough per-processor memory limit in bytes. profiler : Profiler, optional A lightweight profiler object for tracking resource usage. distribute_method : str, optional The name of a distribution strategy. .. py:property:: comm_rank A safe way to get `self.comm.rank` (0 if `self.comm` is None) .. py:property:: comm_size A safe way to get `self.comm.size` (1 if `self.comm` is None) .. py:property:: is_host_leader True if this processors is the rank-0 "leader" of its host (node). False otherwise. .. py:attribute:: comm :value: 'None' .. py:attribute:: mem_limit :value: 'None' .. py:attribute:: host_comm :value: 'None' .. py:attribute:: host_ranks :value: 'None' .. py:attribute:: interhost_comm :value: 'None' .. py:attribute:: interhost_ranks :value: 'None' .. py:attribute:: host_index :value: '0' .. py:attribute:: host_index_for_rank :value: 'None' .. py:attribute:: jac_distribution_method :value: 'None' .. py:attribute:: jac_slice :value: 'None' .. py:attribute:: distribute_method :value: "'default'" .. py:method:: cast(arg) :classmethod: Cast `arg` to a :class:`ResourceAllocation` object. If `arg` already is a :class:`ResourceAllocation` instance, it just returned. Otherwise this function attempts to create a new instance from `arg`. Parameters ---------- arg : ResourceAllocation or dict An object that can be cast to a :class:`ResourceAllocation`. Returns ------- ResourceAllocation .. py:method:: build_hostcomms() .. py:method:: host_comm_barrier() Calls self.host_comm.barrier() when self.host_comm is not None. This convenience function provides an often-used barrier that follows code where a single "leader" processor modifies a memory block shared between all members of `self.host_comm`, and the other processors must wait until this modification is performed before proceeding with their own computations. Returns ------- None .. py:method:: copy() Copy this object. Returns ------- ResourceAllocation .. py:method:: reset(allocated_memory=0) Resets internal allocation counters to given values (defaults to zero). Parameters ---------- allocated_memory : int64 The value to set the memory allocation counter to. Returns ------- None .. py:method:: add_tracked_memory(num_elements, dtype='d') Adds `nelements * itemsize` bytes to the total amount of allocated memory being tracked. If the total (tracked) memory exceeds `self.mem_limit` a :class:`MemoryError` exception is raised. Parameters ---------- num_elements : int The number of elements to track allocation of. dtype : numpy.dtype, optional The type of elements, needed to compute the number of bytes per element. Returns ------- None .. py:method:: check_can_allocate_memory(num_elements, dtype='d') Checks that allocating `nelements` doesn't cause the memory limit to be exceeded. This memory isn't tracked - it's just added to the current tracked memory and a :class:`MemoryError` exception is raised if the result exceeds `self.mem_limit`. Parameters ---------- num_elements : int The number of elements to track allocation of. dtype : numpy.dtype, optional The type of elements, needed to compute the number of bytes per element. Returns ------- None .. py:method:: temporarily_track_memory(num_elements, dtype='d') Temporarily adds `nelements` to tracked memory (a context manager). A :class:`MemoryError` exception is raised if the tracked memory exceeds `self.mem_limit`. Parameters ---------- num_elements : int The number of elements to track allocation of. dtype : numpy.dtype, optional The type of elements, needed to compute the number of bytes per element. Returns ------- contextmanager .. py:method:: gather_base(result, local, slice_of_global, unit_ralloc=None, all_gather=False) Gather or all-gather operation using local arrays and a *unit* resource allocation. Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc` argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with `unit_ralloc.rank == 0` need to contribute to the gather operation. Parameters ---------- result : numpy.ndarray, possibly shared The destination "global" array. When shared memory is being used, i.e. when this :class:`ResourceAllocation` object has a nontrivial inter-host comm, this array must be allocated as a shared array using *this* ralloc or a larger so that `result` is shared between all the processors for this resource allocation's intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather. local : numpy.ndarray The locally computed quantity. This can be a shared-memory array, but need not be. slice_of_global : slice or numpy.ndarray The slice of `result` that `local` constitutes, i.e., in the end `result[slice_of_global] = local`. This may be a Python `slice` or a NumPy array of indices. unit_ralloc : ResourceAllocation, optional A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the `unit_ralloc.rank == 0` processors will contribute to the gather operation. If `None`, then it is assumed that all processors compute different local results. all_gather : bool, optional Whether the final result should be gathered on all the processors of this :class:`ResourceAllocation` or just the root (rank 0) processor. Returns ------- None .. py:method:: gather(result, local, slice_of_global, unit_ralloc=None) Gather local arrays into a global result array potentially with a *unit* resource allocation. Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc` argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with `unit_ralloc.rank == 0` need to contribute to the gather operation. The global array is only gathered on the root (rank 0) processor of this resource allocation. Parameters ---------- result : numpy.ndarray, possibly shared The destination "global" array, only needed on the root (rank 0) processor. When shared memory is being used, i.e. when this :class:`ResourceAllocation` object has a nontrivial inter-host comm, this array must be allocated as a shared array using *this* ralloc or a larger so that `result` is shared between all the processors for this resource allocation's intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather. local : numpy.ndarray The locally computed quantity. This can be a shared-memory array, but need not be. slice_of_global : slice or numpy.ndarray The slice of `result` that `local` constitutes, i.e., in the end `result[slice_of_global] = local`. This may be a Python `slice` or a NumPy array of indices. unit_ralloc : ResourceAllocation, optional A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the `unit_ralloc.rank == 0` processors will contribute to the gather operation. If `None`, then it is assumed that all processors compute different local results. Returns ------- None .. py:method:: allgather(result, local, slice_of_global, unit_ralloc=None) All-gather local arrays into global arrays on each processor, potentially using a *unit* resource allocation. Similar to a normal MPI gather call, but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc` argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array, i.e., slice of the final to-be gathered array. So, when gathering the result, only processors with `unit_ralloc.rank == 0` need to contribute to the gather operation. Parameters ---------- result : numpy.ndarray, possibly shared The destination "global" array. When shared memory is being used, i.e. when this :class:`ResourceAllocation` object has a nontrivial inter-host comm, this array must be allocated as a shared array using *this* ralloc or a larger so that `result` is shared between all the processors for this resource allocation's intra-host communicator. This allows a speedup when shared memory is used by having multiple smaller gather operations in parallel instead of one large gather. local : numpy.ndarray The locally computed quantity. This can be a shared-memory array, but need not be. slice_of_global : slice or numpy.ndarray The slice of `result` that `local` constitutes, i.e., in the end `result[slice_of_global] = local`. This may be a Python `slice` or a NumPy array of indices. unit_ralloc : ResourceAllocation, optional A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the `unit_ralloc.rank == 0` processors will contribute to the gather operation. If `None`, then it is assumed that all processors compute different local results. Returns ------- None .. py:method:: allreduce_sum(result, local, unit_ralloc=None) Sum local arrays on different processors, potentially using a *unit* resource allocation. Similar to a normal MPI reduce call (with MPI.SUM type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc` argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the sum, only processors with `unit_ralloc.rank == 0` contribute to the sum. This handles the case where simply summing the local contributions from all processors would result in over-counting because of multiple processors hold the same logical result (summand). Parameters ---------- result : numpy.ndarray, possibly shared The destination "global" array, with the same shape as all the local arrays being summed. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this :class:`ResourceAllocation` object has a nontrivial inter-host comm, this array must be allocated as a shared array using *this* ralloc or a larger so that `result` is shared between all the processors for this resource allocation's intra-host communicator. This allows a speedup when shared memory is used by distributing computation of `result` over each host's processors and performing these sums in parallel. local : numpy.ndarray The locally computed quantity. This can be a shared-memory array, but need not be. unit_ralloc : ResourceAllocation, optional A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the `unit_ralloc.rank == 0` processors will contribute to the sum operation. If `None`, then it is assumed that all processors compute different local results. Returns ------- None .. py:method:: allreduce_sum_simple(local, unit_ralloc=None) A simplified sum over quantities on different processors that doesn't use shared memory. The shared memory usage of :meth:`allreduce_sum` can be overkill when just summing a single scalar quantity. This method provides a way to easily sum a quantity across all the processors in this :class:`ResourceAllocation` object using a unit resource allocation. Parameters ---------- local : int or float The local (per-processor) value to sum. unit_ralloc : ResourceAllocation, optional A resource allocation (essentially a comm) for the group of processors that all compute the same local value, so that only the `unit_ralloc.rank == 0` processors will contribute to the sum. If `None`, then it is assumed that each processor computes a logically different local value. Returns ------- float or int The sum of all `local` quantities, returned on all the processors. .. py:method:: allreduce_min(result, local, unit_ralloc=None) Take elementwise min of local arrays on different processors, potentially using a *unit* resource allocation. Similar to a normal MPI reduce call (with MPI.MIN type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc` argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the min operation, only processors with `unit_ralloc.rank == 0` contribute. Parameters ---------- result : numpy.ndarray, possibly shared The destination "global" array, with the same shape as all the local arrays being operated on. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this :class:`ResourceAllocation` object has a nontrivial inter-host comm, this array must be allocated as a shared array using *this* ralloc or a larger so that `result` is shared between all the processors for this resource allocation's intra-host communicator. This allows a speedup when shared memory is used by distributing computation of `result` over each host's processors and performing these sums in parallel. local : numpy.ndarray The locally computed quantity. This can be a shared-memory array, but need not be. unit_ralloc : ResourceAllocation, optional A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the `unit_ralloc.rank == 0` processors will contribute to the sum operation. If `None`, then it is assumed that all processors compute different local results. Returns ------- None .. py:method:: allreduce_max(result, local, unit_ralloc=None) Take elementwise max of local arrays on different processors, potentially using a *unit* resource allocation. Similar to a normal MPI reduce call (with MPI.MAX type), but more easily integrates with a hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc` argument. This is essentially another comm that specifies the groups of processors that have all computed the same local array. So, when performing the max operation, only processors with `unit_ralloc.rank == 0` contribute. Parameters ---------- result : numpy.ndarray, possibly shared The destination "global" array, with the same shape as all the local arrays being operated on. This can be any shape (including any number of dimensions). When shared memory is being used, i.e. when this :class:`ResourceAllocation` object has a nontrivial inter-host comm, this array must be allocated as a shared array using *this* ralloc or a larger so that `result` is shared between all the processors for this resource allocation's intra-host communicator. This allows a speedup when shared memory is used by distributing computation of `result` over each host's processors and performing these sums in parallel. local : numpy.ndarray The locally computed quantity. This can be a shared-memory array, but need not be. unit_ralloc : ResourceAllocation, optional A resource allocation (essentially a comm) for the group of processors that all compute the same local result, so that only the `unit_ralloc.rank == 0` processors will contribute to the sum operation. If `None`, then it is assumed that all processors compute different local results. Returns ------- None .. py:method:: bcast(value, root=0) Broadcasts a value from the root processor/host to the others in this resource allocation. This is similar to a usual MPI broadcast, except it takes advantage of shared memory when it is available. When shared memory is being used, i.e. when this :class:`ResourceAllocation` object has a nontrivial inter-host comm, then this routine places `value` in a shared memory buffer and uses the resource allocation's inter-host communicator to broadcast the result from the root *host* to all the other hosts using all the processor on the root host in parallel (all processors with the same intra-host rank participate in a MPI broadcast). Parameters ---------- value : numpy.ndarray The value to broadcast. May be shared memory but doesn't need to be. Only need to specify this on the rank `root` processor, other processors can provide any value for this argument (it's unused). root : int The rank of the processor whose `value` will be to broadcast. Returns ------- numpy.ndarray The broadcast value, in a new, non-shared-memory array.