pygsti.tools.mpitools

Functions for working with MPI processor distributions

Module Contents

Functions

distribute_indices(indices, comm[, allow_split_comm])

Partition an array of indices (any type) evenly among comm's processors.

distribute_indices_base(indices, nprocs, rank[, ...])

Partition an array of "indices" evenly among a given number of "processors"

slice_up_slice(slc, num_slices)

Divides up slc into num_slices slices.

slice_up_range(n, num_slices[, start])

Divides up range(start,start+n) into num_slices slices.

distribute_slice(s, comm[, allow_split_comm])

Partition a continuous slice evenly among comm's processors.

gather_slices(slices, slice_owners, ar_to_fill, ...[, ...])

Gathers data within a numpy array, ar_to_fill, according to given slices.

gather_slices_by_owner(current_slices, ar_to_fill, ...)

Gathers data within a numpy array, ar_to_fill, according to given slices.

gather_indices(indices, index_owners, ar_to_fill, ...)

Gathers data within a numpy array, ar_to_fill, according to given indices.

distribute_for_dot(a_shape, b_shape, comm)

Prepares for one or muliple distributed dot products given the dimensions to be dotted.

mpidot(a, b, loc_row_slice, loc_col_slice, ...[, out, ...])

Performs a distributed dot product, dot(a,b).

parallel_apply(f, l, comm)

Apply a function, f to every element of a list, l in parallel, using MPI.

mpi4py_comm()

Get a comm object

sum_across_procs(x, comm)

Sum a value across all processors in comm.

processor_group_size(nprocs, number_of_tasks)

Find the number of groups to divide nprocs processors into to tackle number_of_tasks tasks.

sum_arrays(local_array, owners, comm)

Sums arrays across all "owner" processors.

closest_divisor(a, b)

Returns the divisor of a that is closest to b.

pygsti.tools.mpitools.distribute_indices(indices, comm, allow_split_comm=True)

Partition an array of indices (any type) evenly among comm’s processors.

Parameters

indiceslist

An array of items (any type) which are to be partitioned.

commmpi4py.MPI.Comm or ResourceAllocation

The communicator which specifies the number of processors and which may be split into returned sub-communicators. If a ResourceAllocation object, node information is also taken into account when available (for shared memory compatibility).

allow_split_commbool

If True, when there are more processors than indices, multiple processors will be given the same set of local indices and comm will be split into sub-communicators, one for each group of processors that are given the same indices. If False, then “extra” processors are simply given nothing to do, i.e. empty lists of local indices.

Returns

loc_indiceslist

A list containing the elements of indices belonging to the current processor.

ownersdict

A dictionary mapping the elements of indices to integer ranks, such that owners[el] gives the rank of the processor responsible for communicating that element’s results to the other processors. Note that in the case when allow_split_comm=True and multiple procesors have computed the results for a given element, only a single (the first) processor rank “owns” the element, and is thus responsible for sharing the results. This notion of ownership is useful when gathering the results.

loc_commmpi4py.MPI.Comm or ResourceAllocation or None

The local communicator for the group of processors which have been given the same loc_indices to compute, obtained by splitting comm. If loc_indices is unique to the current processor, or if allow_split_comm is False, None is returned.

pygsti.tools.mpitools.distribute_indices_base(indices, nprocs, rank, allow_split_comm=True)

Partition an array of “indices” evenly among a given number of “processors”

This function is similar to distribute_indices(), but allows for more a more generalized notion of what a “processor” is, since the number of processors and rank are given independently and do not have to be associated with an MPI comm. Note also that indices can be an arbitrary list of items, making this function very general.

Parameters

indiceslist

An array of items (any type) which are to be partitioned.

nprocsint

The number of “processors” to distribute the elements of indices among.

rankint

The rank of the current “processor” (must be an integer between 0 and nprocs-1). Note that this value is not obtained from any MPI communicator.

allow_split_commbool

If True, when there are more processors than indices, multiple processors will be given the same set of local indices. If False, then extra processors are simply given nothing to do, i.e. empty lists of local indices.

Returns

loc_indiceslist

A list containing the elements of indices belonging to the current processor (i.e. the one specified by rank).

ownersdict

A dictionary mapping the elements of indices to integer ranks, such that owners[el] gives the rank of the processor responsible for communicating that element’s results to the other processors. Note that in the case when allow_split_comm=True and multiple procesors have computed the results for a given element, only a single (the first) processor rank “owns” the element, and is thus responsible for sharing the results. This notion of ownership is useful when gathering the results.

pygsti.tools.mpitools.slice_up_slice(slc, num_slices)

Divides up slc into num_slices slices.

Parameters

slcslice

The slice to be divided.

num_slicesint

The number of slices to divide the range into.

Returns

list of slices

pygsti.tools.mpitools.slice_up_range(n, num_slices, start=0)

Divides up range(start,start+n) into num_slices slices.

Parameters

nint

The number of (consecutive) indices in the range to be divided.

num_slicesint

The number of slices to divide the range into.

startint, optional

The starting entry of the range, so that the range to be divided is range(start,start+n).

Returns

list of slices

pygsti.tools.mpitools.distribute_slice(s, comm, allow_split_comm=True)

Partition a continuous slice evenly among comm’s processors.

This function is similar to distribute_indices(), but is specific to the case when the indices being distributed are a consecutive set of integers (specified by a slice).

Parameters

sslice

The slice to be partitioned.

commmpi4py.MPI.Comm or ResourceAllocation

The communicator which specifies the number of processors and which may be split into returned sub-communicators. If a ResourceAllocation object, node information is also taken into account when available (for shared memory compatibility).

allow_split_commbool

If True, when there are more processors than slice indices, multiple processors will be given the same local slice and comm will be split into sub-communicators, one for each group of processors that are given the same local slice. If False, then “extra” processors are simply given nothing to do, i.e. an empty local slice.

Returns

sliceslist of slices

The list of unique slices assigned to different processors. It’s possible that a single slice (i.e. element of slices) is assigned to multiple processors (when there are more processors than indices in s.

loc_sliceslice

A slice specifying the indices belonging to the current processor.

ownersdict

A dictionary giving the owning rank of each slice. Values are integer ranks and keys are integers into slices, specifying which slice.

loc_commmpi4py.MPI.Comm or ResourceAllocation or None

The local communicator/ResourceAllocation for the group of processors which have been given the same loc_slice to compute, obtained by splitting comm. If loc_slice is unique to the current processor, or if allow_split_comm is False, None is returned.

pygsti.tools.mpitools.gather_slices(slices, slice_owners, ar_to_fill, ar_to_fill_inds, axes, comm, max_buffer_size=None)

Gathers data within a numpy array, ar_to_fill, according to given slices.

Upon entry it is assumed that the different processors within comm have computed different parts of ar_to_fill, namely different slices of the axis-th axis. At exit, data has been gathered such that all processors have the results for the entire ar_to_fill (or at least for all the slices given).

Parameters

sliceslist

A list of all the slices (computed by any of the processors, not just the current one). Each element of slices may be either a single slice or a tuple of slices (when gathering across multiple dimensions).

slice_ownersdict

A dictionary mapping the index of a slice (or tuple of slices) within slices to an integer rank of the processor responsible for communicating that slice’s data to the rest of the processors.

ar_to_fillnumpy.ndarray

The array which contains partial data upon entry and the gathered data upon exit.

ar_to_fill_indslist

A list of slice or index-arrays specifying the (fixed) sub-array of ar_to_fill that should be gathered into. The elements of ar_to_fill_inds are taken to be indices for the leading dimension first, and any unspecified dimensions or None elements are assumed to be unrestricted (as if slice(None,None)). Note that the combination of ar_to_fill and ar_to_fill_inds is essentally like passing ar_to_fill[ar_to_fill_inds] to this function, except it will work with index arrays as well as slices.

axesint or tuple of ints

The axis or axes of ar_to_fill on which the slices apply (which axis do the slices in slices refer to?). Note that len(axes) must be equal to the number of slices (i.e. the tuple length) of each element of slices.

commmpi4py.MPI.Comm or ResourceAllocation or None

The communicator specifying the processors involved and used to perform the gather operation. If a ResourceAllocation is provided, then inter-host communication is used when available to facilitate use of shared intra-host memory.

max_buffer_sizeint or None

The maximum buffer size in bytes that is allowed to be used for gathering data. If None, there is no limit.

Returns

None

pygsti.tools.mpitools.gather_slices_by_owner(current_slices, ar_to_fill, ar_to_fill_inds, axes, comm, max_buffer_size=None)

Gathers data within a numpy array, ar_to_fill, according to given slices.

Upon entry it is assumed that the different processors within comm have computed different parts of ar_to_fill, namely different slices of the axes indexed by axes. At exit, data has been gathered such that all processors have the results for the entire ar_to_fill (or at least for all the slices given).

Parameters

current_sliceslist

A list of all the slices computed by the current processor. Each element of slices may be either a single slice or a tuple of slices (when gathering across multiple dimensions).

ar_to_fillnumpy.ndarray

The array which contains partial data upon entry and the gathered data upon exit.

ar_to_fill_indslist

A list of slice or index-arrays specifying the (fixed) sub-array of ar_to_fill that should be gathered into. The elements of ar_to_fill_inds are taken to be indices for the leading dimension first, and any unspecified dimensions or None elements are assumed to be unrestricted (as if slice(None,None)). Note that the combination of ar_to_fill and ar_to_fill_inds is essentally like passing ar_to_fill[ar_to_fill_inds] to this function, except it will work with index arrays as well as slices.

axesint or tuple of ints

The axis or axes of ar_to_fill on which the slices apply (which axis do the slices in slices refer to?). Note that len(axes) must be equal to the number of slices (i.e. the tuple length) of each element of slices.

commmpi4py.MPI.Comm or None

The communicator specifying the processors involved and used to perform the gather operation.

max_buffer_sizeint or None

The maximum buffer size in bytes that is allowed to be used for gathering data. If None, there is no limit.

Returns

None

pygsti.tools.mpitools.gather_indices(indices, index_owners, ar_to_fill, ar_to_fill_inds, axes, comm, max_buffer_size=None)

Gathers data within a numpy array, ar_to_fill, according to given indices.

Upon entry it is assumed that the different processors within comm have computed different parts of ar_to_fill, namely different slices or index-arrays of the axis-th axis. At exit, data has been gathered such that all processors have the results for the entire ar_to_fill (or at least for all the indices given).

Parameters

indiceslist

A list of all the integer-arrays or slices (computed by any of the processors, not just the current one). Each element of indices may be either a single slice/index-array or a tuple of such elements (when gathering across multiple dimensions).

index_ownersdict

A dictionary mapping the index of an element within slices to an integer rank of the processor responsible for communicating that slice/index-array’s data to the rest of the processors.

ar_to_fillnumpy.ndarray

The array which contains partial data upon entry and the gathered data upon exit.

ar_to_fill_indslist

A list of slice or index-arrays specifying the (fixed) sub-array of ar_to_fill that should be gathered into. The elements of ar_to_fill_inds are taken to be indices for the leading dimension first, and any unspecified dimensions or None elements are assumed to be unrestricted (as if slice(None,None)). Note that the combination of ar_to_fill and ar_to_fill_inds is essentally like passing ar_to_fill[ar_to_fill_inds] to this function, except it will work with index arrays as well as slices.

axesint or tuple of ints

The axis or axes of ar_to_fill on which the slices apply (which axis do the elements of indices refer to?). Note that len(axes) must be equal to the number of sub-indices (i.e. the tuple length) of each element of indices.

commmpi4py.MPI.Comm or None

The communicator specifying the processors involved and used to perform the gather operation.

max_buffer_sizeint or None

The maximum buffer size in bytes that is allowed to be used for gathering data. If None, there is no limit.

Returns

None

pygsti.tools.mpitools.distribute_for_dot(a_shape, b_shape, comm)

Prepares for one or muliple distributed dot products given the dimensions to be dotted.

The returned values should be passed as loc_slices to mpidot().

Parameters

a_shape, b_shapetuple

The shapes of the arrays that will be dotted together in ensuing mpidot() calls (see above).

commmpi4py.MPI.Comm or ResourceAllocation or None

The communicator used to perform the distribution.

Returns

row_slice, col_sliceslice

The “local” row slice of “A” and column slice of “B” belonging to the current processor, which computes result[row slice, col slice]. These should be passed to mpidot().

slice_tuples_by_ranklist

A list of the (row_slice, col_slice) owned by each processor, ordered by rank. If a ResourceAllocation is given that utilizes shared memory, then this list is for the ranks in this processor’s inter-host communication group. This should be passed as the slice_tuples_by_rank argument of mpidot().

pygsti.tools.mpitools.mpidot(a, b, loc_row_slice, loc_col_slice, slice_tuples_by_rank, comm, out=None, out_shm=None)

Performs a distributed dot product, dot(a,b).

Parameters

anumpy.ndarray

First array to dot together.

bnumpy.ndarray

Second array to dot together.

loc_row_slice, loc_col_sliceslice

Specify the row or column indices, respectively, of the resulting dot product that are computed by this processor (the rows of a and columns of b that are used). Obtained from distribute_for_dot().

slice_tuples_by_ranklist

A list of (row_slice, col_slice) tuples, one per processor within this processors broadcast group, ordered by rank. Provided by distribute_for_dot().

commmpi4py.MPI.Comm or ResourceAllocation or None

The communicator used to parallelize the dot product. If a ResourceAllocation object is given, then a shared memory result will be returned when appropriate.

outnumpy.ndarray, optional

If not None, the array to use for the result. This should be the same type of array (size, and whether it’s shared or not) as this function would have created if out were None.

out_shmmultiprocessing.shared_memory.SharedMemory, optinal

The shared memory object corresponding to out when it uses shared memory.

Returns

resultnumpy.ndarray

The resulting array

shmmultiprocessing.shared_memory.SharedMemory

A shared memory object needed to cleanup the shared memory. If a normal array is created, this is None. Provide this to cleanup_shared_ndarray() to ensure ar is deallocated properly.

pygsti.tools.mpitools.parallel_apply(f, l, comm)

Apply a function, f to every element of a list, l in parallel, using MPI.

Parameters

ffunction

function of an item in the list l

llist

list of items as arguments to f

commMPI Comm

MPI communicator object for organizing parallel programs

Returns

resultslist

list of items after f has been applied

pygsti.tools.mpitools.mpi4py_comm()

Get a comm object

Returns

MPI.Comm

Comm object to be passed down to parallel pygsti routines

pygsti.tools.mpitools.sum_across_procs(x, comm)

Sum a value across all processors in comm.

Parameters

xobject

Local value - the current processor’s contrubution to the sum.

commmpi4py.MPI.Comm

MPI communicator

Returns

object

Of the same type as the x objects that were summed.

pygsti.tools.mpitools.processor_group_size(nprocs, number_of_tasks)

Find the number of groups to divide nprocs processors into to tackle number_of_tasks tasks.

When number_of_tasks > nprocs the smallest integer multiple of nprocs is returned that equals or exceeds number_of_tasks is returned.

When number_of_tasks < nprocs the smallest divisor of nprocs that equals or exceeds number_of_tasks is returned.

Parameters

nprocsint

The number of processors to divide into groups.

number_of_tasksint or float

The number of tasks to perform, which can also be seen as the desired number of processor groups. If a floating point value is given the next highest integer is used.

Returns

int

pygsti.tools.mpitools.sum_arrays(local_array, owners, comm)

Sums arrays across all “owner” processors.

Parameters

local_arraynumpy.ndarray

The array contributed by this processor. This array will be zeroed out on processors whose ranks are not in owners.

ownerslist or set

The ranks whose contributions should be summed. These are the ranks of the processors that “own” the responsibility to communicate their local array to the rest of the processors.

commmpi4py.MPI.Comm

MPI communicator

Returns

numpy.ndarray

The summed local arrays.

pygsti.tools.mpitools.closest_divisor(a, b)

Returns the divisor of a that is closest to b.

Parameters

a, b : int

Returns

int