A custom MPI-enabled linear solver.

Module Contents


custom_solve(a, b, x, ari, resource_alloc, proc_threshold=100)

Simple parallel Gaussian Elimination with pivoting.

_find_pivot(a, b, icol, potential_pivot_inds, my_row_slice, shared_floats, shared_ints, resource_alloc, comm, host_comm, buf1, buf2, buf3, best_host_indices, best_host_vals)

_broadcast_pivot_row(a, b, ibest_local, h, k, shared_rowb, local_pivot_rowb, potential_pivot_mask, resource_alloc, comm, host_comm)

_update_rows(a, b, icol, ipivot_local, pivot_row, pivot_b)

_back_substitution(a, b, x, pivot_row_indices, my_row_slice, ari, resource_alloc, host_comm)

_tallskinny_custom_solve(a, b, resource_alloc)



pygsti.optimize.customsolve.custom_solve(a, b, x, ari, resource_alloc, proc_threshold=100)

Simple parallel Gaussian Elimination with pivoting.

This function was built to provide a parallel alternative to scipy.linalg.solve, and can achieve faster runtimes compared with the serial SciPy routine when the number of available processors and problem size are large enough.

When the number of processors is greater than proc_threshold (below this number the routine just calls scipy.linalg.solve on the root processor) the method works as follows:

  • each processor “owns” some subset of the rows of a and b.

  • iteratively (over pivot columns), the best pivot row is found, and this row is used to eliminate all other elements in the current pivot column. This procedure operations on the joined matrix a|b, and when it completes the matrix a is in reduced row echelon form (RREF).

  • back substitution (trivial because a is in reduced REF) is performed to find the solution x such that a @ x = b.

  • a (LocalNumpyArray) – A 2D array with the ‘jtj’ distribution, holding the rows of the a matrix belonging to the current processor. (This belonging is dictated by the “fine” distribution in a distributed layout.)

  • b (LocalNumpyArray) – A 1D array with the ‘jtf’ distribution, holding the rows of the b vector belonging to the current processor.

  • x (LocalNumpyArray) – A 1D array with the ‘jtf’ distribution, holding the rows of the x vector belonging to the current processor. This vector is filled by this function.

  • ari (ArraysInterface) – An object that provides an interface for creating and manipulating data arrays.

  • resource_alloc (ResourceAllocation) – Gives the resources (e.g., processors and memory) available for use.

  • proc_threshold (int, optional) – Below this number of processors this routine will simply gather a and b to a single (the rank 0) processor, call SciPy’s serial linear solver, scipy.linalg.solve, and scatter the results back onto all the processors.



pygsti.optimize.customsolve._find_pivot(a, b, icol, potential_pivot_inds, my_row_slice, shared_floats, shared_ints, resource_alloc, comm, host_comm, buf1, buf2, buf3, best_host_indices, best_host_vals)
pygsti.optimize.customsolve._broadcast_pivot_row(a, b, ibest_local, h, k, shared_rowb, local_pivot_rowb, potential_pivot_mask, resource_alloc, comm, host_comm)
pygsti.optimize.customsolve._update_rows(a, b, icol, ipivot_local, pivot_row, pivot_b)
pygsti.optimize.customsolve._back_substitution(a, b, x, pivot_row_indices, my_row_slice, ari, resource_alloc, host_comm)
pygsti.optimize.customsolve._tallskinny_custom_solve(a, b, resource_alloc)


Based on “Parallel QR algorithm for data-driven decompositions” by Sayadi et al. (Center for Turbulence Research 335 Proceedings of the Summer Program 2014)