:py:mod:`pygsti.baseobjs.resourceallocation`
============================================

.. py:module:: pygsti.baseobjs.resourceallocation

.. autoapi-nested-parse::

   Resource allocation manager


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   pygsti.baseobjs.resourceallocation.ResourceAllocation


.. py:class:: ResourceAllocation(comm=None, mem_limit=None, profiler=None, distribute_method='default', allocated_memory=0)


   Bases: :py:obj:`object`

   Describes available resources and how they should be allocated.

   This includes the number of processors and amount of memory,
   as well as a strategy for how computations should be distributed
   among them.

   Parameters
   ----------
   comm : mpi4py.MPI.Comm, optional
       MPI communicator holding the number of available processors.

   mem_limit : int, optional
       A rough per-processor memory limit in bytes.

   profiler : Profiler, optional
       A lightweight profiler object for tracking resource usage.

   distribute_method : str, optional
       The name of a distribution strategy.

   .. py:property:: comm_rank

      A safe way to get `self.comm.rank` (0 if `self.comm` is None) 


   .. py:property:: comm_size

      A safe way to get `self.comm.size` (1 if `self.comm` is None) 


   .. py:property:: is_host_leader

      True if this processors is the rank-0 "leader" of its host (node).  False otherwise. 


   .. py:attribute:: comm
      :value: 'None'

      
   .. py:attribute:: mem_limit
      :value: 'None'

      
   .. py:attribute:: host_comm
      :value: 'None'

      
   .. py:attribute:: host_ranks
      :value: 'None'

      
   .. py:attribute:: interhost_comm
      :value: 'None'

      
   .. py:attribute:: interhost_ranks
      :value: 'None'

      
   .. py:attribute:: host_index
      :value: '0'

      
   .. py:attribute:: host_index_for_rank
      :value: 'None'

      
   .. py:attribute:: jac_distribution_method
      :value: 'None'

      
   .. py:attribute:: jac_slice
      :value: 'None'

      
   .. py:attribute:: distribute_method
      :value: "'default'"

      
   .. py:method:: cast(arg)
      :classmethod:

      Cast `arg` to a :class:`ResourceAllocation` object.

      If `arg` already is a :class:`ResourceAllocation` instance, it
      just returned.  Otherwise this function attempts to create a new
      instance from `arg`.

      Parameters
      ----------
      arg : ResourceAllocation or dict
          An object that can be cast to a :class:`ResourceAllocation`.

      Returns
      -------
      ResourceAllocation


   .. py:method:: build_hostcomms()


   .. py:method:: host_comm_barrier()

      Calls self.host_comm.barrier() when self.host_comm is not None.

      This convenience function provides an often-used barrier that
      follows code where a single "leader" processor modifies a memory
      block shared between all members of `self.host_comm`, and the
      other processors must wait until this modification is performed
      before proceeding with their own computations.

      Returns
      -------
      None


   .. py:method:: copy()

      Copy this object.

      Returns
      -------
      ResourceAllocation


   .. py:method:: reset(allocated_memory=0)

      Resets internal allocation counters to given values (defaults to zero).

      Parameters
      ----------
      allocated_memory : int64
          The value to set the memory allocation counter to.

      Returns
      -------
      None


   .. py:method:: add_tracked_memory(num_elements, dtype='d')

      Adds `nelements * itemsize` bytes to the total amount of allocated memory being tracked.

      If the total (tracked) memory exceeds `self.mem_limit` a :class:`MemoryError`
      exception is raised.

      Parameters
      ----------
      num_elements : int
          The number of elements to track allocation of.

      dtype : numpy.dtype, optional
          The type of elements, needed to compute the number of bytes per element.

      Returns
      -------
      None


   .. py:method:: check_can_allocate_memory(num_elements, dtype='d')

      Checks that allocating `nelements` doesn't cause the memory limit to be exceeded.

      This memory isn't tracked - it's just added to the current tracked memory and a
      :class:`MemoryError` exception is raised if the result exceeds `self.mem_limit`.

      Parameters
      ----------
      num_elements : int
          The number of elements to track allocation of.

      dtype : numpy.dtype, optional
          The type of elements, needed to compute the number of bytes per element.

      Returns
      -------
      None


   .. py:method:: temporarily_track_memory(num_elements, dtype='d')

      Temporarily adds `nelements` to tracked memory (a context manager).

      A :class:`MemoryError` exception is raised if the tracked memory exceeds `self.mem_limit`.

      Parameters
      ----------
      num_elements : int
          The number of elements to track allocation of.

      dtype : numpy.dtype, optional
          The type of elements, needed to compute the number of bytes per element.

      Returns
      -------
      contextmanager


   .. py:method:: gather_base(result, local, slice_of_global, unit_ralloc=None, all_gather=False)

      Gather or all-gather operation using local arrays and a *unit* resource allocation.

      Similar to a normal MPI gather call, but more easily integrates with a
      hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc`
      argument.  This is essentially another comm that specifies the groups of processors
      that have all computed the same local array, i.e., slice of the final to-be gathered
      array.  So, when gathering the result, only processors with `unit_ralloc.rank == 0`
      need to contribute to the gather operation.

      Parameters
      ----------
      result : numpy.ndarray, possibly shared
          The destination "global" array.  When shared memory is being used, i.e.
          when this :class:`ResourceAllocation` object has a nontrivial inter-host comm,
          this array must be allocated as a shared array using *this* ralloc or a larger
          so that `result` is shared between all the processors for this resource allocation's
          intra-host communicator.  This allows a speedup when shared memory is used by
          having multiple smaller gather operations in parallel instead of one large gather.

      local : numpy.ndarray
          The locally computed quantity.  This can be a shared-memory array, but need
          not be.

      slice_of_global : slice or numpy.ndarray
          The slice of `result` that `local` constitutes, i.e., in the end
          `result[slice_of_global] = local`.  This may be a Python `slice` or
          a NumPy array of indices.

      unit_ralloc : ResourceAllocation, optional
          A resource allocation (essentially a comm) for the group of processors that
          all compute the same local result, so that only the `unit_ralloc.rank == 0`
          processors will contribute to the gather operation.  If `None`, then it is
          assumed that all processors compute different local results.

      all_gather : bool, optional
          Whether the final result should be gathered on all the processors of this
          :class:`ResourceAllocation` or just the root (rank 0) processor.

      Returns
      -------
      None


   .. py:method:: gather(result, local, slice_of_global, unit_ralloc=None)

      Gather local arrays into a global result array potentially with a *unit* resource allocation.

      Similar to a normal MPI gather call, but more easily integrates with a
      hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc`
      argument.  This is essentially another comm that specifies the groups of processors
      that have all computed the same local array, i.e., slice of the final to-be gathered
      array.  So, when gathering the result, only processors with `unit_ralloc.rank == 0`
      need to contribute to the gather operation.

      The global array is only gathered on the root (rank 0) processor of this
      resource allocation.

      Parameters
      ----------
      result : numpy.ndarray, possibly shared
          The destination "global" array, only needed on the root (rank 0) processor.
          When shared memory is being used, i.e.  when this :class:`ResourceAllocation`
          object has a nontrivial inter-host comm, this array must be allocated as a
          shared array using *this* ralloc or a larger so that `result` is shared
          between all the processors for this resource allocation's intra-host
          communicator.  This allows a speedup when shared memory is used by having
          multiple smaller gather operations in parallel instead of one large gather.

      local : numpy.ndarray
          The locally computed quantity.  This can be a shared-memory array, but need
          not be.

      slice_of_global : slice or numpy.ndarray
          The slice of `result` that `local` constitutes, i.e., in the end
          `result[slice_of_global] = local`.  This may be a Python `slice` or
          a NumPy array of indices.

      unit_ralloc : ResourceAllocation, optional
          A resource allocation (essentially a comm) for the group of processors that
          all compute the same local result, so that only the `unit_ralloc.rank == 0`
          processors will contribute to the gather operation.  If `None`, then it is
          assumed that all processors compute different local results.

      Returns
      -------
      None


   .. py:method:: allgather(result, local, slice_of_global, unit_ralloc=None)

      All-gather local arrays into global arrays on each processor, potentially using a *unit* resource allocation.

      Similar to a normal MPI gather call, but more easily integrates with a
      hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc`
      argument.  This is essentially another comm that specifies the groups of processors
      that have all computed the same local array, i.e., slice of the final to-be gathered
      array.  So, when gathering the result, only processors with `unit_ralloc.rank == 0`
      need to contribute to the gather operation.

      Parameters
      ----------
      result : numpy.ndarray, possibly shared
          The destination "global" array.  When shared memory is being used, i.e.
          when this :class:`ResourceAllocation` object has a nontrivial inter-host comm,
          this array must be allocated as a shared array using *this* ralloc or a larger
          so that `result` is shared between all the processors for this resource allocation's
          intra-host communicator.  This allows a speedup when shared memory is used by
          having multiple smaller gather operations in parallel instead of one large gather.

      local : numpy.ndarray
          The locally computed quantity.  This can be a shared-memory array, but need
          not be.

      slice_of_global : slice or numpy.ndarray
          The slice of `result` that `local` constitutes, i.e., in the end
          `result[slice_of_global] = local`.  This may be a Python `slice` or
          a NumPy array of indices.

      unit_ralloc : ResourceAllocation, optional
          A resource allocation (essentially a comm) for the group of processors that
          all compute the same local result, so that only the `unit_ralloc.rank == 0`
          processors will contribute to the gather operation.  If `None`, then it is
          assumed that all processors compute different local results.

      Returns
      -------
      None


   .. py:method:: allreduce_sum(result, local, unit_ralloc=None)

      Sum local arrays on different processors, potentially using a *unit* resource allocation.

      Similar to a normal MPI reduce call (with MPI.SUM type), but more easily integrates
      with a hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc`
      argument.  This is essentially another comm that specifies the groups of processors
      that have all computed the same local array.  So, when performing the sum, only
      processors with `unit_ralloc.rank == 0` contribute to the sum.  This handles the
      case where simply summing the local contributions from all processors would result
      in over-counting because of multiple processors hold the same logical result (summand).

      Parameters
      ----------
      result : numpy.ndarray, possibly shared
          The destination "global" array, with the same shape as all the local arrays
          being summed.  This can be any shape (including any number of dimensions).  When
          shared memory is being used, i.e. when this :class:`ResourceAllocation` object
          has a nontrivial inter-host comm, this array must be allocated as a shared array
          using *this* ralloc or a larger so that `result` is shared between all the processors
          for this resource allocation's intra-host communicator.  This allows a speedup when
          shared memory is used by distributing computation of `result` over each host's
          processors and performing these sums in parallel.

      local : numpy.ndarray
          The locally computed quantity.  This can be a shared-memory array, but need
          not be.

      unit_ralloc : ResourceAllocation, optional
          A resource allocation (essentially a comm) for the group of processors that
          all compute the same local result, so that only the `unit_ralloc.rank == 0`
          processors will contribute to the sum operation.  If `None`, then it is
          assumed that all processors compute different local results.

      Returns
      -------
      None


   .. py:method:: allreduce_sum_simple(local, unit_ralloc=None)

      A simplified sum over quantities on different processors that doesn't use shared memory.

      The shared memory usage of :meth:`allreduce_sum` can be overkill when just summing a single
      scalar quantity.  This method provides a way to easily sum a quantity across all the processors
      in this :class:`ResourceAllocation` object using a unit resource allocation.

      Parameters
      ----------
      local : int or float
          The local (per-processor) value to sum.

      unit_ralloc : ResourceAllocation, optional
          A resource allocation (essentially a comm) for the group of processors that
          all compute the same local value, so that only the `unit_ralloc.rank == 0`
          processors will contribute to the sum.  If `None`, then it is assumed that each
          processor computes a logically different local value.

      Returns
      -------
      float or int
          The sum of all `local` quantities, returned on all the processors.


   .. py:method:: allreduce_min(result, local, unit_ralloc=None)

      Take elementwise min of local arrays on different processors, potentially using a *unit* resource allocation.

      Similar to a normal MPI reduce call (with MPI.MIN type), but more easily integrates
      with a hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc`
      argument.  This is essentially another comm that specifies the groups of processors
      that have all computed the same local array.  So, when performing the min operation, only
      processors with `unit_ralloc.rank == 0` contribute.

      Parameters
      ----------
      result : numpy.ndarray, possibly shared
          The destination "global" array, with the same shape as all the local arrays
          being operated on.  This can be any shape (including any number of dimensions).  When
          shared memory is being used, i.e. when this :class:`ResourceAllocation` object
          has a nontrivial inter-host comm, this array must be allocated as a shared array
          using *this* ralloc or a larger so that `result` is shared between all the processors
          for this resource allocation's intra-host communicator.  This allows a speedup when
          shared memory is used by distributing computation of `result` over each host's
          processors and performing these sums in parallel.

      local : numpy.ndarray
          The locally computed quantity.  This can be a shared-memory array, but need
          not be.

      unit_ralloc : ResourceAllocation, optional
          A resource allocation (essentially a comm) for the group of processors that
          all compute the same local result, so that only the `unit_ralloc.rank == 0`
          processors will contribute to the sum operation.  If `None`, then it is
          assumed that all processors compute different local results.

      Returns
      -------
      None


   .. py:method:: allreduce_max(result, local, unit_ralloc=None)

      Take elementwise max of local arrays on different processors, potentially using a *unit* resource allocation.

      Similar to a normal MPI reduce call (with MPI.MAX type), but more easily integrates
      with a hierarchy of processor divisions, or nested comms, by taking a `unit_ralloc`
      argument.  This is essentially another comm that specifies the groups of processors
      that have all computed the same local array.  So, when performing the max operation, only
      processors with `unit_ralloc.rank == 0` contribute.

      Parameters
      ----------
      result : numpy.ndarray, possibly shared
          The destination "global" array, with the same shape as all the local arrays
          being operated on.  This can be any shape (including any number of dimensions).  When
          shared memory is being used, i.e. when this :class:`ResourceAllocation` object
          has a nontrivial inter-host comm, this array must be allocated as a shared array
          using *this* ralloc or a larger so that `result` is shared between all the processors
          for this resource allocation's intra-host communicator.  This allows a speedup when
          shared memory is used by distributing computation of `result` over each host's
          processors and performing these sums in parallel.

      local : numpy.ndarray
          The locally computed quantity.  This can be a shared-memory array, but need
          not be.

      unit_ralloc : ResourceAllocation, optional
          A resource allocation (essentially a comm) for the group of processors that
          all compute the same local result, so that only the `unit_ralloc.rank == 0`
          processors will contribute to the sum operation.  If `None`, then it is
          assumed that all processors compute different local results.

      Returns
      -------
      None


   .. py:method:: bcast(value, root=0)

      Broadcasts a value from the root processor/host to the others in this resource allocation.

      This is similar to a usual MPI broadcast, except it takes advantage of shared memory when
      it is available.  When shared memory is being used, i.e. when this :class:`ResourceAllocation`
      object has a nontrivial inter-host comm, then this routine places `value` in a shared memory
      buffer and uses the resource allocation's inter-host communicator to broadcast the result
      from the root *host* to all the other hosts using all the processor on the root host in
      parallel (all processors with the same intra-host rank participate in a MPI broadcast).

      Parameters
      ----------
      value : numpy.ndarray
          The value to broadcast.  May be shared memory but doesn't need to be.  Only
          need to specify this on the rank `root` processor, other processors can provide
          any value for this argument (it's unused).

      root : int
          The rank of the processor whose `value` will be to broadcast.

      Returns
      -------
      numpy.ndarray
          The broadcast value, in a new, non-shared-memory array.