pygsti.tools.mpitools

Functions for working with MPI processor distributions

Module Contents

Functions

distribute_indices(indices, comm, allow_split_comm=True)

Partition an array of indices (any type) evenly among comm's processors.

distribute_indices_base(indices, nprocs, rank, allow_split_comm=True)

Partition an array of "indices" evenly among a given number of "processors"

slice_up_slice(slc, num_slices)

Divides up slc into num_slices slices.

slice_up_range(n, num_slices, start=0)

Divides up range(start,start+n) into num_slices slices.

distribute_slice(s, comm, allow_split_comm=True)

Partition a continuous slice evenly among comm's processors.

gather_slices(slices, slice_owners, ar_to_fill, ar_to_fill_inds, axes, comm, max_buffer_size=None)

Gathers data within a numpy array, ar_to_fill, according to given slices.

gather_slices_by_owner(current_slices, ar_to_fill, ar_to_fill_inds, axes, comm, max_buffer_size=None)

Gathers data within a numpy array, ar_to_fill, according to given slices.

gather_indices(indices, index_owners, ar_to_fill, ar_to_fill_inds, axes, comm, max_buffer_size=None)

Gathers data within a numpy array, ar_to_fill, according to given indices.

distribute_for_dot(a_shape, b_shape, comm)

Prepares for one or muliple distributed dot products given the dimensions to be dotted.

mpidot(a, b, loc_row_slice, loc_col_slice, slice_tuples_by_rank, comm, out=None, out_shm=None)

Performs a distributed dot product, dot(a,b).

parallel_apply(f, l, comm)

Apply a function, f to every element of a list, l in parallel, using MPI.

mpi4py_comm()

Get a comm object

sum_across_procs(x, comm)

Sum a value across all processors in comm.

processor_group_size(nprocs, number_of_tasks)

Find the number of groups to divide nprocs processors into to tackle number_of_tasks tasks.

sum_arrays(local_array, owners, comm)

Sums arrays across all "owner" processors.

closest_divisor(a, b)

Returns the divisor of a that is closest to b.

pygsti.tools.mpitools.distribute_indices(indices, comm, allow_split_comm=True)

Partition an array of indices (any type) evenly among comm’s processors.

Parameters
  • indices (list) – An array of items (any type) which are to be partitioned.

  • comm (mpi4py.MPI.Comm or ResourceAllocation) – The communicator which specifies the number of processors and which may be split into returned sub-communicators. If a ResourceAllocation object, node information is also taken into account when available (for shared memory compatibility).

  • allow_split_comm (bool) – If True, when there are more processors than indices, multiple processors will be given the same set of local indices and comm will be split into sub-communicators, one for each group of processors that are given the same indices. If False, then “extra” processors are simply given nothing to do, i.e. empty lists of local indices.

Returns

  • loc_indices (list) – A list containing the elements of indices belonging to the current processor.

  • owners (dict) – A dictionary mapping the elements of indices to integer ranks, such that owners[el] gives the rank of the processor responsible for communicating that element’s results to the other processors. Note that in the case when allow_split_comm=True and multiple procesors have computed the results for a given element, only a single (the first) processor rank “owns” the element, and is thus responsible for sharing the results. This notion of ownership is useful when gathering the results.

  • loc_comm (mpi4py.MPI.Comm or ResourceAllocation or None) – The local communicator for the group of processors which have been given the same loc_indices to compute, obtained by splitting comm. If loc_indices is unique to the current processor, or if allow_split_comm is False, None is returned.

pygsti.tools.mpitools.distribute_indices_base(indices, nprocs, rank, allow_split_comm=True)

Partition an array of “indices” evenly among a given number of “processors”

This function is similar to distribute_indices(), but allows for more a more generalized notion of what a “processor” is, since the number of processors and rank are given independently and do not have to be associated with an MPI comm. Note also that indices can be an arbitrary list of items, making this function very general.

Parameters
  • indices (list) – An array of items (any type) which are to be partitioned.

  • nprocs (int) – The number of “processors” to distribute the elements of indices among.

  • rank (int) – The rank of the current “processor” (must be an integer between 0 and nprocs-1). Note that this value is not obtained from any MPI communicator.

  • allow_split_comm (bool) – If True, when there are more processors than indices, multiple processors will be given the same set of local indices. If False, then extra processors are simply given nothing to do, i.e. empty lists of local indices.

Returns

  • loc_indices (list) – A list containing the elements of indices belonging to the current processor (i.e. the one specified by rank).

  • owners (dict) – A dictionary mapping the elements of indices to integer ranks, such that owners[el] gives the rank of the processor responsible for communicating that element’s results to the other processors. Note that in the case when allow_split_comm=True and multiple procesors have computed the results for a given element, only a single (the first) processor rank “owns” the element, and is thus responsible for sharing the results. This notion of ownership is useful when gathering the results.

pygsti.tools.mpitools.slice_up_slice(slc, num_slices)

Divides up slc into num_slices slices.

Parameters
  • slc (slice) – The slice to be divided.

  • num_slices (int) – The number of slices to divide the range into.

Returns

list of slices

pygsti.tools.mpitools.slice_up_range(n, num_slices, start=0)

Divides up range(start,start+n) into num_slices slices.

Parameters
  • n (int) – The number of (consecutive) indices in the range to be divided.

  • num_slices (int) – The number of slices to divide the range into.

  • start (int, optional) – The starting entry of the range, so that the range to be divided is range(start,start+n).

Returns

list of slices

pygsti.tools.mpitools.distribute_slice(s, comm, allow_split_comm=True)

Partition a continuous slice evenly among comm’s processors.

This function is similar to distribute_indices(), but is specific to the case when the indices being distributed are a consecutive set of integers (specified by a slice).

Parameters
  • s (slice) – The slice to be partitioned.

  • comm (mpi4py.MPI.Comm or ResourceAllocation) – The communicator which specifies the number of processors and which may be split into returned sub-communicators. If a ResourceAllocation object, node information is also taken into account when available (for shared memory compatibility).

  • allow_split_comm (bool) – If True, when there are more processors than slice indices, multiple processors will be given the same local slice and comm will be split into sub-communicators, one for each group of processors that are given the same local slice. If False, then “extra” processors are simply given nothing to do, i.e. an empty local slice.

Returns

  • slices (list of slices) – The list of unique slices assigned to different processors. It’s possible that a single slice (i.e. element of slices) is assigned to multiple processors (when there are more processors than indices in s.

  • loc_slice (slice) – A slice specifying the indices belonging to the current processor.

  • owners (dict) – A dictionary giving the owning rank of each slice. Values are integer ranks and keys are integers into slices, specifying which slice.

  • loc_comm (mpi4py.MPI.Comm or ResourceAllocation or None) – The local communicator/ResourceAllocation for the group of processors which have been given the same loc_slice to compute, obtained by splitting comm. If loc_slice is unique to the current processor, or if allow_split_comm is False, None is returned.

pygsti.tools.mpitools.gather_slices(slices, slice_owners, ar_to_fill, ar_to_fill_inds, axes, comm, max_buffer_size=None)

Gathers data within a numpy array, ar_to_fill, according to given slices.

Upon entry it is assumed that the different processors within comm have computed different parts of ar_to_fill, namely different slices of the axis-th axis. At exit, data has been gathered such that all processors have the results for the entire ar_to_fill (or at least for all the slices given).

Parameters
  • slices (list) – A list of all the slices (computed by any of the processors, not just the current one). Each element of slices may be either a single slice or a tuple of slices (when gathering across multiple dimensions).

  • slice_owners (dict) – A dictionary mapping the index of a slice (or tuple of slices) within slices to an integer rank of the processor responsible for communicating that slice’s data to the rest of the processors.

  • ar_to_fill (numpy.ndarray) – The array which contains partial data upon entry and the gathered data upon exit.

  • ar_to_fill_inds (list) – A list of slice or index-arrays specifying the (fixed) sub-array of ar_to_fill that should be gathered into. The elements of ar_to_fill_inds are taken to be indices for the leading dimension first, and any unspecified dimensions or None elements are assumed to be unrestricted (as if slice(None,None)). Note that the combination of ar_to_fill and ar_to_fill_inds is essentally like passing ar_to_fill[ar_to_fill_inds] to this function, except it will work with index arrays as well as slices.

  • axes (int or tuple of ints) – The axis or axes of ar_to_fill on which the slices apply (which axis do the slices in slices refer to?). Note that len(axes) must be equal to the number of slices (i.e. the tuple length) of each element of slices.

  • comm (mpi4py.MPI.Comm or ResourceAllocation or None) – The communicator specifying the processors involved and used to perform the gather operation. If a ResourceAllocation is provided, then inter-host communication is used when available to facilitate use of shared intra-host memory.

  • max_buffer_size (int or None) – The maximum buffer size in bytes that is allowed to be used for gathering data. If None, there is no limit.

Returns

None

pygsti.tools.mpitools.gather_slices_by_owner(current_slices, ar_to_fill, ar_to_fill_inds, axes, comm, max_buffer_size=None)

Gathers data within a numpy array, ar_to_fill, according to given slices.

Upon entry it is assumed that the different processors within comm have computed different parts of ar_to_fill, namely different slices of the axes indexed by axes. At exit, data has been gathered such that all processors have the results for the entire ar_to_fill (or at least for all the slices given).

Parameters
  • current_slices (list) – A list of all the slices computed by the current processor. Each element of slices may be either a single slice or a tuple of slices (when gathering across multiple dimensions).

  • ar_to_fill (numpy.ndarray) – The array which contains partial data upon entry and the gathered data upon exit.

  • ar_to_fill_inds (list) – A list of slice or index-arrays specifying the (fixed) sub-array of ar_to_fill that should be gathered into. The elements of ar_to_fill_inds are taken to be indices for the leading dimension first, and any unspecified dimensions or None elements are assumed to be unrestricted (as if slice(None,None)). Note that the combination of ar_to_fill and ar_to_fill_inds is essentally like passing ar_to_fill[ar_to_fill_inds] to this function, except it will work with index arrays as well as slices.

  • axes (int or tuple of ints) – The axis or axes of ar_to_fill on which the slices apply (which axis do the slices in slices refer to?). Note that len(axes) must be equal to the number of slices (i.e. the tuple length) of each element of slices.

  • comm (mpi4py.MPI.Comm or None) – The communicator specifying the processors involved and used to perform the gather operation.

  • max_buffer_size (int or None) – The maximum buffer size in bytes that is allowed to be used for gathering data. If None, there is no limit.

Returns

None

pygsti.tools.mpitools.gather_indices(indices, index_owners, ar_to_fill, ar_to_fill_inds, axes, comm, max_buffer_size=None)

Gathers data within a numpy array, ar_to_fill, according to given indices.

Upon entry it is assumed that the different processors within comm have computed different parts of ar_to_fill, namely different slices or index-arrays of the axis-th axis. At exit, data has been gathered such that all processors have the results for the entire ar_to_fill (or at least for all the indices given).

Parameters
  • indices (list) – A list of all the integer-arrays or slices (computed by any of the processors, not just the current one). Each element of indices may be either a single slice/index-array or a tuple of such elements (when gathering across multiple dimensions).

  • index_owners (dict) – A dictionary mapping the index of an element within slices to an integer rank of the processor responsible for communicating that slice/index-array’s data to the rest of the processors.

  • ar_to_fill (numpy.ndarray) – The array which contains partial data upon entry and the gathered data upon exit.

  • ar_to_fill_inds (list) – A list of slice or index-arrays specifying the (fixed) sub-array of ar_to_fill that should be gathered into. The elements of ar_to_fill_inds are taken to be indices for the leading dimension first, and any unspecified dimensions or None elements are assumed to be unrestricted (as if slice(None,None)). Note that the combination of ar_to_fill and ar_to_fill_inds is essentally like passing ar_to_fill[ar_to_fill_inds] to this function, except it will work with index arrays as well as slices.

  • axes (int or tuple of ints) – The axis or axes of ar_to_fill on which the slices apply (which axis do the elements of indices refer to?). Note that len(axes) must be equal to the number of sub-indices (i.e. the tuple length) of each element of indices.

  • comm (mpi4py.MPI.Comm or None) – The communicator specifying the processors involved and used to perform the gather operation.

  • max_buffer_size (int or None) – The maximum buffer size in bytes that is allowed to be used for gathering data. If None, there is no limit.

Returns

None

pygsti.tools.mpitools.distribute_for_dot(a_shape, b_shape, comm)

Prepares for one or muliple distributed dot products given the dimensions to be dotted.

The returned values should be passed as loc_slices to mpidot().

Parameters
  • a_shape (tuple) – The shapes of the arrays that will be dotted together in ensuing mpidot() calls (see above).

  • b_shape (tuple) – The shapes of the arrays that will be dotted together in ensuing mpidot() calls (see above).

  • comm (mpi4py.MPI.Comm or ResourceAllocation or None) – The communicator used to perform the distribution.

Returns

  • row_slice, col_slice (slice) – The “local” row slice of “A” and column slice of “B” belonging to the current processor, which computes result[row slice, col slice]. These should be passed to mpidot().

  • slice_tuples_by_rank (list) – A list of the (row_slice, col_slice) owned by each processor, ordered by rank. If a ResourceAllocation is given that utilizes shared memory, then this list is for the ranks in this processor’s inter-host communication group. This should be passed as the slice_tuples_by_rank argument of mpidot().

pygsti.tools.mpitools.mpidot(a, b, loc_row_slice, loc_col_slice, slice_tuples_by_rank, comm, out=None, out_shm=None)

Performs a distributed dot product, dot(a,b).

Parameters
  • a (numpy.ndarray) – First array to dot together.

  • b (numpy.ndarray) – Second array to dot together.

  • loc_row_slice (slice) – Specify the row or column indices, respectively, of the resulting dot product that are computed by this processor (the rows of a and columns of b that are used). Obtained from distribute_for_dot().

  • loc_col_slice (slice) – Specify the row or column indices, respectively, of the resulting dot product that are computed by this processor (the rows of a and columns of b that are used). Obtained from distribute_for_dot().

  • slice_tuples_by_rank (list) – A list of (row_slice, col_slice) tuples, one per processor within this processors broadcast group, ordered by rank. Provided by distribute_for_dot().

  • comm (mpi4py.MPI.Comm or ResourceAllocation or None) – The communicator used to parallelize the dot product. If a ResourceAllocation object is given, then a shared memory result will be returned when appropriate.

  • out (numpy.ndarray, optional) – If not None, the array to use for the result. This should be the same type of array (size, and whether it’s shared or not) as this function would have created if out were None.

  • out_shm (multiprocessing.shared_memory.SharedMemory, optinal) – The shared memory object corresponding to out when it uses shared memory.

Returns

  • result (numpy.ndarray) – The resulting array

  • shm (multiprocessing.shared_memory.SharedMemory) – A shared memory object needed to cleanup the shared memory. If a normal array is created, this is None. Provide this to :function:`cleanup_shared_ndarray` to ensure ar is deallocated properly.

pygsti.tools.mpitools.parallel_apply(f, l, comm)

Apply a function, f to every element of a list, l in parallel, using MPI.

Parameters
  • f (function) – function of an item in the list l

  • l (list) – list of items as arguments to f

  • comm (MPI Comm) – MPI communicator object for organizing parallel programs

Returns

results (list) – list of items after f has been applied

pygsti.tools.mpitools.mpi4py_comm()

Get a comm object

Returns

MPI.Comm – Comm object to be passed down to parallel pygsti routines

pygsti.tools.mpitools.sum_across_procs(x, comm)

Sum a value across all processors in comm.

Parameters
  • x (object) – Local value - the current processor’s contrubution to the sum.

  • comm (mpi4py.MPI.Comm) – MPI communicator

Returns

object – Of the same type as the x objects that were summed.

pygsti.tools.mpitools.processor_group_size(nprocs, number_of_tasks)

Find the number of groups to divide nprocs processors into to tackle number_of_tasks tasks.

When number_of_tasks > nprocs the smallest integer multiple of nprocs is returned that equals or exceeds number_of_tasks is returned.

When number_of_tasks < nprocs the smallest divisor of nprocs that equals or exceeds number_of_tasks is returned.

Parameters
  • nprocs (int) – The number of processors to divide into groups.

  • number_of_tasks (int or float) – The number of tasks to perform, which can also be seen as the desired number of processor groups. If a floating point value is given the next highest integer is used.

Returns

int

pygsti.tools.mpitools.sum_arrays(local_array, owners, comm)

Sums arrays across all “owner” processors.

Parameters
  • local_array (numpy.ndarray) – The array contributed by this processor. This array will be zeroed out on processors whose ranks are not in owners.

  • owners (list or set) – The ranks whose contributions should be summed. These are the ranks of the processors that “own” the responsibility to communicate their local array to the rest of the processors.

  • comm (mpi4py.MPI.Comm) – MPI communicator

Returns

numpy.ndarray – The summed local arrays.

pygsti.tools.mpitools.closest_divisor(a, b)

Returns the divisor of a that is closest to b.

Parameters
  • a (int) –

  • b (int) –

Returns

int