pygsti.layouts.distlayout

Defines the DistributableCOPALayout class.

Module Contents

Classes

DistributableCOPALayout

A circuit-outcome-probability-array (COPA) layout that is distributed among many processors.

class pygsti.layouts.distlayout.DistributableCOPALayout(circuits, unique_circuits, to_unique, unique_complete_circuits, create_atom_fn, create_atom_args, num_atom_processors, num_param_dimension_processors=(), param_dimensions=(), param_dimension_blk_sizes=(), resource_alloc=None, verbosity=0)

Bases: pygsti.layouts.copalayout.CircuitOutcomeProbabilityArrayLayout

A circuit-outcome-probability-array (COPA) layout that is distributed among many processors.

This layout divides the work of computing arrays with one dimension corresponding to the layout’s “elements” (circuit outcomes) and 0, 1, or 2 parameter dimensions corresponding to first or second derivatives of a by-element quantity with respect to a model’s parameters.

The size of element dimension is given by the number of unique circuits and the outcomes retained for each circuit. Computation along the element dimension is broken into “atoms”, which hold a slice that indexes the element dimension along with the necessary information (used by a forward simulator) to compute those elements. This often includes the circuits and outcomes an atom’s elements correspond to, and perhaps precomputed structures for speeding up the circuit computation. An atom-creating function is used to initialize a DistributableCOPALayout.

Technical note: the atoms themselves determine which outcomes for each circuit are included in the layout, so the layout doesn’t know how many elements it contains until the atoms are created. This makes for an awkward _update_indices callback that adjusts an atom’s indices based on the selected circuits of the (local) layout, since this selection can only be performed after the atoms are created.

The size of the parameter dimensions is given directly via the param_dimensions argument. These dimensions are divided into “blocks” (slices of the entire dimension) but there is no analogous atom-like object for the blocks, as there isn’t any need to hold meta-data specific to a block. The size of the parameter-blocks is essentially constant along each parameter dimension, and specified by the param_dimension_blk_sizes argument.

Along each of the (possible) array dimensions, we also assign a number of atom (for the element dimension) or block (for the parameter dimensions) “processors”. These are not physical CPUs but are logical objects act by processing atoms or blocks, respectively. A single atom processor is assigned one or more atoms to process, and similarly with block processors.

The total number of physical processors, N, is arranged in a grid so that:

N = num_atom_processors x num_param1_processors x num_param2_processors

This may restricts the allowed values of N is the number of atom/block processors is fixed or constrained. The reason there are 2 levels of “breaking up” the computation are so that intermediate memory may be controlled. If we merged the notion of atoms and atom-processors, for instance, so that each atom processor always had exactly 1 atom to process, then the only way to divide up a compuation would be to use more processors. Since computations can involve intermediate memory usage that far exceeds the memory required to hold the results, it is useful to be able to break up a computation into chunks even when there is, e.g., just a single processor. Separating atom/blocks from atom-processors and param-block-processors allow us to divide a computation into chunks that use manageable amounts of intermediate memory regardless of the number of processors available. When intermediate memory is not a concern, then there is no reason to assign more than one atom/block to it’s corresponding processor type.

When creating a DistributableCOPALayout the caller can separately specify the number of atoms (length of create_atom_args) or the size of parameter blocks and the number of atom-processors or the number of param-block-processors.

Furthermore, a ResourceAllocation object can be given that specifies a shared-memory structure to the physical processors, where the total number of cores is divided into node-groups that are able to share memory. The total number of cores is divided like this:

  • first, we divide the cores into atom-processing groups, i.e. “atom-processors”. An atom-processor is most accurately seen as a comm (group of processors). If shared memory is being used, either the entire atom-processor must be contained within a single node OR the atom-processor must contain an integer number of nodes (it cannot contain a mixed fractional number of nodes, e.g. 1+1/2).

  • each atom processor is divided into param1-processors, which process sections arrays within that atom processor’s element slice and within the param1-processors parameter slice. Similarly, each param1-processor cannot contain a mixed fraction number of nodes - it must be a fraction < 1 or an integer number of nodes.

  • each param1-processor is divided into param2-processors, with exactly the same rules as for the param1-processors.

These nested MPI communicators neatly divide up the entries of arrays that have shape (nElements, nParams1, nParams2) or arrays with fewer dimensions, in which case processors that would have computed different entries of a missing dimension just duplicate the computation of array entries in the existing dimensions.

Arrays will also be used that do not have a leading nElements dimension (e.g. when element-contributions have been summed over), with shapes involving just the parameter dimensions. For these arrays, we also construct a “fine” processor grouping where all the cores are divided among the (first) parameter dimension. The array types “jtf” and “jtj” are distributed according to this “fine” grouping.

Parameters

circuitslist of Circuits

The circuits whose outcome probabilities are to be computed. This list may contain duplicates.

unique_circuitslist of Circuits

The same as circuits, except duplicates are removed. Often this value is obtained by a derived class calling the class method _compute_unique_circuits().

to_uniquedict

A mapping that translates an index into circuits to one into unique_circuits. Keys are the integers 0 to len(circuits) and values are indices into unique_circuits.

unique_complete_circuitslist, optional

A list, parallel to unique_circuits, that contains the “complete” version of these circuits. This information is currently unused, and is included for potential future expansion and flexibility.

create_atom_fn: function

A function that creates an atom when given one of the elements of create_atom_args.

create_atom_argslist

A list of tuples such that each element is a tuple of arguments for create_atom_fn. The length of this list specifies the number of atoms, and the caller must provide the same list on all processors. When the layout is created, create_atom_fn will be used to create some subset of the atoms on each processor.

num_atom_processorsint

The number of “atom processors”. An atom processor is not a physical processor, but a group of physical processors that is assigned one or more of the atoms (see above).

num_param_dimension_processorstuple, optional

A 1- or 2-tuple of integers specifying how many parameter-block processors (again, not physical processors, but groups of processors that are assigned to parameter blocks) are used when dividing the physical processors into a grid. The first and second elements correspond to counts for the first and second parameter dimensions, respecively.

param_dimensionstuple, optional

The full (global) number of parameters along each parameter dimension. Can be an empty, 1-, or 2-tuple of integers which dictates how many parameter dimensions this layout supports.

param_dimension_blk_sizestuple, optional

The parameter block sizes along each present parameter dimension, so this should be the same shape as param_dimensions. A block size of None means that there should be no division into blocks, and that each block processor computes all of its parameter indices at once.

resource_allocResourceAllocation, optional

The resources available for computing circuit outcome probabilities.

verbosityint or VerbosityPrinter

Determines how much output to send to stdout. 0 means no output, higher integers mean more output.

property max_atom_elements

The most elements owned by a single atom.

property max_atom_cachesize

The largest cache size among all this layout’s atoms

property global_layout

The global layout that this layout is or is a part of. Cannot be comm-dependent.

resource_alloc(sub_alloc_name=None, empty_if_missing=True)

Retrieves the resource-allocation objectfor this layout.

Sub-resource-allocations can also be obtained by passing a non-None sub_alloc_name.

Parameters
sub_alloc_namestr

The name to retrieve

empty_if_missingbool

When True, an empty resource allocation object is returned when sub_alloc_name doesn’t exist for this layout. Otherwise a KeyError is raised when this occurs.

Returns

ResourceAllocation

allocate_local_array(array_type, dtype, zero_out=False, memory_tracker=None, extra_elements=0)

Allocate an array that is distributed according to this layout.

Creates an array for holding elements and/or derivatives with respect to model parameters, possibly distributed among multiple processors as dictated by this layout.

Parameters
array_type(“e”, “ep”, “ep2”, “epp”, “p”, “jtj”, “jtf”, “c”, “cp”, “cp2”, “cpp”)

The type of array to allocate, often corresponding to the array shape. Let nE be the layout’s number of elements, nP1 and nP2 be the number of parameters we differentiate with respect to (for first and second derivatives), and nC be the number of circuits. Then the array types designate the following array shapes: - “e”: (nE,) - “ep”: (nE, nP1) - “ep2”: (nE, nP2) - “epp”: (nE, nP1, nP2) - “p”: (nP1,) - “jtj”: (nP1, nP2) - “jtf”: (nP1,) - “c”: (nC,) - “cp”: (nC, nP1) - “cp2”: (nC, nP2) - “cpp”: (nC, nP1, nP2) Note that, even though the “p” and “jtf” types are the same shape they are used for different purposes and are distributed differently when there are multiple processors. The “p” type is for use with other element-dimentions-containing arrays, whereas the “jtf” type assumes that the element dimension has already been summed over.

dtypenumpy.dtype

The NumPy data type for the array.

zero_outbool, optional

Whether the array should be zeroed out initially.

memory_trackerResourceAllocation, optional

If not None, the amount of memory being allocated is added, using add_tracked_memory() to this resource allocation object.

extra_elementsint, optional

The number of additional “extra” elements to append to the element dimension, beyond those called for by this layout. Such additional elements are used to store penalty terms that are treated by the objective function just like usual outcome-probability-type terms.

Returns
LocalNumpyArray

An array that looks and acts just like a normal NumPy array, but potentially with internal handles to utilize shared memory.

free_local_array(local_array)

Frees an array allocated by allocate_local_array().

This method should always be paired with a call to allocate_local_array(), since the allocated array may utilize shared memory, which must be explicitly de-allocated.

Parameters
local_arraynumpy.ndarray or LocalNumpyArray

The array to free, as returned from allocate_local_array.

Returns

None

gather_local_array_base(array_type, array_portion, extra_elements=0, all_gather=False, return_shared=False)

Gathers an array onto the root processor or all the processors.

Gathers the portions of an array that was distributed using this layout (i.e. according to the host_element_slice, etc. slices in this layout). This could be an array allocated by allocate_local_array() but need not be, as this routine does not require that array_portion be shared. Arrays can be 1, 2, or 3-dimensional. The dimensions are understood to be along the “element”, “parameter”, and “2nd parameter” directions in that order.

Parameters
array_type(“e”, “ep”, “ep2”, “epp”, “p”, “jtj”, “jtf”, “c”, “cp”, “cp2”, “cpp”)

The type of array to allocate, often corresponding to the array shape. See allocate_local_array() for a more detailed description.

array_portionnumpy.ndarray

The portion of the final array that is local to the calling processor. This should be a shared memory array when a resource_alloc with shared memory enabled was used to construct this layout.

extra_elementsint, optional

The number of additional “extra” elements to append to the element dimension, beyond those called for by this layout. Should match usage in allocate_local_array().

all_gatherbool, optional

Whether the result should be returned on all the processors (when all_gather=True) or just the rank-0 processor (when all_gather=False).

return_sharedbool, optional

Whether the returned array is allowed to be a shared-memory array, which results in a small performance gain because the array used internally to gather the results can be returned directly. When True a shared memory handle is also returned, and the caller assumes responsibilty for freeing the memory via pygsti.tools.sharedmemtools.cleanup_shared_ndarray().

Returns
gathered_arraynumpy.ndarray or None

The full (global) output array on the root (rank=0) processor and None on all other processors, unless all_gather == True, in which case the array is returned on all the processors.

shared_memory_handlemultiprocessing.shared_memory.SharedMemory or None

Returned only when return_shared == True. The shared memory handle associated with gathered_array, which is needed to free the memory.

allsum_local_quantity(typ, value, use_shared_mem='auto')

Gathers an array onto all the processors.

Gathers the portions of an array that was distributed using this layout (i.e. according to the host_element_slice, etc. slices in this layout). This could be an array allocated by allocate_local_array() but need not be, as this routine does not require that array_portion be shared. Arrays can be 1, 2, or 3-dimensional. The dimensions are understood to be along the “element”, “parameter”, and “2nd parameter” directions in that order.

Parameters
array_portionnumpy.ndarray

The portion of the final array that is local to the calling processor. This could be a shared memory array, but just needs to be of the correct size.

extra_elementsint, optional

The number of additional “extra” elements to append to the element dimension, beyond those called for by this layout. Should match usage in allocate_local_array().

return_sharedbool, optional

If True then, when shared memory is being used, the shared array used to accumulate the gathered results is returned directly along with its shared-memory handle (None if shared memory isn’t used). This results in a small performance gain.

Returns
resultnumpy.ndarray or None

The full (global) output array.

shared_memory_handlemultiprocessing.shared_memory.SharedMemory or None

Returned only when return_shared == True. The shared memory handle associated with result, which is needed to free the memory.

fill_jtf(j, f, jtf)

Calculate the matrix-vector product j.T @ f.

Here j is often a jacobian matrix, and f a vector of objective function term values. j and f must be local arrays, created with allocate_local_array(). This function performs any necessary MPI/shared-memory communication when the arrays are distributed over multiple processors.

Parameters
jLocalNumpyArray

A local 2D array (matrix) allocated using allocate_local_array with the “ep” (jacobian) type.

fLocalNumpyArray

A local array allocated using allocate_local_array with the “e” (element array) type.

jtfLocalNumpyArray

The result. This must be a pre-allocated local array of type “jtf”.

Returns

None

fill_jtj(j, jtj, shared_mem_buf=None)

Calculate the matrix-matrix product j.T @ j.

Here j is often a jacobian matrix, so the result is an approximate Hessian. This function performs any necessary MPI/shared-memory communication when the arrays are distributed over multiple processors.

Parameters
jLocalNumpyArray

A local 2D array (matrix) allocated using allocate_local_array with the “ep” (jacobian) type.

jtjLocalNumpyArray

The result. This must be a pre-allocated local array of type “jtj”.

Returns

None

distribution_info(nprocs)

Generates information about how this layout is distributed across multiple processors.

This is useful when comparing and selecting a layout, as this information can be used to compute the amount of required memory per processor.

Parameters
nprocsint

The number of processors.

Returns

dict