pygsti.data

A sub-package holding data set objects and supporting analysis objects

Submodules

Package Contents

Classes

DataSet

An association between Circuits and outcome counts, serving as the input data for many QCVV protocols.

MultiDataSet

A collection of DataSets that hold data for the same circuits.

DataComparator

A comparison between multiple data, presumably taken in different contexts.

FreeformDataSet

An association between Circuits and arbitrary data.

HypothesisTest

A set of statistical hypothesis tests on a set of null hypotheses.

Functions

simulate_data(model_or_dataset, circuit_list, num_samples)

Creates a DataSet using the probabilities obtained from a model.

aggregate_dataset_outcomes(dataset, label_merge_dict)

Creates a DataSet which merges certain outcomes in input DataSet.

filter_dataset(dataset, sectors_to_keep[, ...])

Creates a DataSet that is the restriction of dataset to sectors_to_keep.

trim_to_constant_numtimesteps(ds)

Trims a DataSet so that each circuit's data comprises the same number of timesteps.

make_rpe_data_set(model_or_dataset, string_list_d, ...)

Generate a fake RPE DataSet using the probabilities obtained from a model.

class pygsti.data.DataSet(oli_data=None, time_data=None, rep_data=None, circuits=None, circuit_indices=None, outcome_labels=None, outcome_label_indices=None, static=False, file_to_load_from=None, collision_action='aggregate', comment=None, aux_info=None)

Bases: pygsti.baseobjs.mongoserializable.MongoSerializable

An association between Circuits and outcome counts, serving as the input data for many QCVV protocols.

The DataSet class associates circuits with counts or time series of counts for each outcome label, and can be thought of as a table with gate strings labeling the rows and outcome labels and/or time labeling the columns. It is designed to behave similarly to a dictionary of dictionaries, so that counts are accessed by:

count = dataset[circuit][outcomeLabel]

in the time-independent case, and in the time-dependent case, for integer time index i >= 0,

outcomeLabel = dataset[circuit][i].outcome count = dataset[circuit][i].count time = dataset[circuit][i].time

Parameters

oli_datalist or numpy.ndarray

When static == True, a 1D numpy array containing outcome label indices (integers), concatenated for all sequences. Otherwise, a list of 1D numpy arrays, one array per gate sequence. In either case, this quantity is indexed by the values of circuit_indices or the index of circuits.

time_datalist or numpy.ndarray

Same format at oli_data except stores floating-point timestamp values.

rep_datalist or numpy.ndarray

Same format at oli_data except stores integer repetition counts for each “data bin” (i.e. (outcome,time) pair). If all repetitions equal 1 (“single-shot” timestampted data), then rep_data can be None (no repetitions).

circuitslist of (tuples or Circuits)

Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spam-label indices (above). Only specify this argument OR circuit_indices, not both.

circuit_indicesordered dictionary

An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.

outcome_labelslist of strings or int

Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both. If an int, specifies that the outcome labels should be those for a standard set of this many qubits.

outcome_label_indicesordered dictionary

An OrderedDict with keys equal to spam labels (strings) and value equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.

staticbool
When True, create a read-only, i.e. “static” DataSet which cannot be modified. In

this case you must specify the timeseries data, circuits, and spam labels.

When False, create a DataSet that can have time series data added to it. In this case,

you only need to specify the spam labels.

file_to_load_fromstring or file object

Specify this argument and no others to create a static DataSet by loading from a file (just like using the load(…) function).

collision_action{“aggregate”,”overwrite”,”keepseparate”}

Specifies how duplicate circuits should be handled. “aggregate” adds duplicate-circuit counts to the same circuit’s data at the next integer timestamp. “overwrite” only keeps the latest given data for a circuit. “keepseparate” tags duplicate-circuits by setting the .occurrence ID of added circuits that are already contained in this data set to the next available positive integer.

commentstring, optional

A user-specified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.

aux_infodict, optional

A user-specified dictionary of per-circuit auxiliary information. Keys should be the circuits in this DataSet and value should be Python dictionaries.

Initialize a DataSet.

Parameters

oli_datalist or numpy.ndarray

When static == True, a 1D numpy array containing outcome label indices (integers), concatenated for all sequences. Otherwise, a list of 1D numpy arrays, one array per gate sequence. In either case, this quantity is indexed by the values of circuit_indices or the index of circuits.

time_datalist or numpy.ndarray

Same format at oli_data except stores floating-point timestamp values.

rep_datalist or numpy.ndarray

Same format at oli_data except stores integer repetition counts for each “data bin” (i.e. (outcome,time) pair). If all repetitions equal 1 (“single-shot” timestampted data), then rep_data can be None (no repetitions).

circuitslist of (tuples or Circuits)

Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spam-label indices (above). Only specify this argument OR circuit_indices, not both.

circuit_indicesordered dictionary

An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.

outcome_labelslist of strings or int

Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both. If an int, specifies that the outcome labels should be those for a standard set of this many qubits.

outcome_label_indicesordered dictionary

An OrderedDict with keys equal to outcome labels and values equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.

staticbool
When True, create a read-only, i.e. “static” DataSet which cannot be modified. In

this case you must specify the timeseries data, circuits, and spam labels.

When False, create a DataSet that can have time series data added to it. In this case,

you only need to specify the spam labels.

file_to_load_fromstring or file object

Specify this argument and no others to create a static DataSet by loading from a file (just like using the load(…) function).

collision_action{“aggregate”,”overwrite”,”keepseparate”}

Specifies how duplicate circuits should be handled. “aggregate” adds duplicate-circuit counts to the same circuit’s data at the next integer timestamp. “overwrite” only keeps the latest given data for a circuit. “keepseparate” tags duplicate-circuits by setting the .occurrence ID of added circuits that are already contained in this data set to the next available positive integer.

commentstring, optional

A user-specified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.

aux_infodict, optional

A user-specified dictionary of per-circuit auxiliary information. Keys should be the circuits in this DataSet and value should be Python dictionaries.

Returns

DataSet

a new data set object.

property outcome_labels

Get a list of all the outcome labels contained in this DataSet.

Returns
list of strings or tuples

A list where each element is an outcome label (which can be a string or a tuple of strings).

property timestamps

Get a list of all the (unique) timestamps contained in this DataSet.

Returns
list of floats

A list where each element is a timestamp.

property meantimestep

The mean time-step, averaged over the time-step for each circuit and over circuits.

Returns

float

property has_constant_totalcounts_pertime

True if the data for every circuit has the same number of total counts at every data collection time.

This will return True if there is a different number of total counts per circuit (i.e., after aggregating over time), as long as every circuit has the same total counts per time step (this will happen when the number of time-steps varies between circuit).

Returns

bool

property totalcounts_pertime

Total counts per time, if this is constant over times and circuits.

When that doesn’t hold, an error is raised.

Returns

float or int

property has_constant_totalcounts

True if the data for every circuit has the same number of total counts.

Returns

bool

property has_trivial_timedependence

True if all the data in this DataSet occurs at time 0.

Returns

bool

collection_name = 'pygsti_datasets'
keys()

Returns the circuits used as keys of this DataSet.

Returns
list

A list of Circuit objects which index the data counts within this data set.

items()

Iterator over (circuit, timeSeries) pairs.

Here circuit is a tuple of operation labels and timeSeries is a _DataSetRow instance, which behaves similarly to a list of spam labels whose index corresponds to the time step.

Returns

_DataSetKVIterator

values()

Iterator over _DataSetRow instances corresponding to the time series data for each circuit.

Returns

_DataSetValueIterator

gate_labels(prefix='G')

Get a list of all the distinct operation labels used in the circuits of this dataset.

Parameters
prefixstr

Filter the circuit labels so that only elements beginning with this prefix are returned. None performs no filtering.

Returns
list of strings

A list where each element is a operation label.

degrees_of_freedom(circuits=None, method='present_outcomes-1', aggregate_times=True)

Returns the number of independent degrees of freedom in the data for the circuits in circuits.

Parameters
circuitslist of Circuits

The list of circuits to count degrees of freedom for. If None then all of the DataSet’s strings are used.

method{‘all_outcomes-1’, ‘present_outcomes-1’, ‘tuned’}

How the degrees of freedom should be computed. ‘all_outcomes-1’ takes the number of circuits and multiplies this by the total number of outcomes (the length of what is returned by outcome_labels()) minus one. ‘present_outcomes-1’ counts on a per-circuit basis the number of present (usually = non-zero) outcomes recorded minus one. ‘tuned’ should be the most accurate, as it accounts for low-N “Poisson bump” behavior, but it is not the default because it is still under development. For timestamped data, see aggreate_times below.

aggregate_timesbool, optional

Whether counts that occur at different times should be tallied separately. If True, then even when counts occur at different times degrees of freedom are tallied on a per-circuit basis. If False, then counts occuring at distinct times are treated as independent of those an any other time, and are tallied separately. So, for example, if aggregate_times is False and a data row has 0- and 1-counts of 45 & 55 at time=0 and 42 and 58 at time=1 this row would contribute 2 degrees of freedom, not 1. It can sometimes be useful to set this to False when the DataSet holds coarse-grained data, but usually you want this to be left as True (especially for time-series data).

Returns

int

add_count_dict(circuit, count_dict, record_zero_counts=True, aux=None, update_ol=True)

Add a single circuit’s counts to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

count_dictdict

A dictionary with keys = outcome labels and values = counts

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

update_olbool, optional

This argument is for internal use only and should be left as True.

Returns

None

add_count_list(circuit, outcome_labels, counts, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)

Add a single circuit’s counts to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

outcome_labelslist or tuple

The outcome labels corresponding to counts.

countslist or tuple

The counts themselves.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

update_olbool, optional

This argument is for internal use only and should be left as True.

unsafebool, optional

True means that outcome_labels is guaranteed to hold tuple-type outcome labels and never plain strings. Only set this to True if you know what you’re doing.

Returns

None

add_count_arrays(circuit, outcome_index_array, count_array, record_zero_counts=True, aux=None)

Add the outcomes for a single circuit, formatted as raw data arrays.

Parameters
circuitCircuit

The circuit to add data for.

outcome_index_arraynumpy.ndarray

An array of outcome indices, which must be values of self.olIndex (which maps outcome labels to indices).

count_arraynumpy.ndarray

An array of integer (or sometimes floating point) counts, one corresponding to each outcome index (element of outcome_index_array).

record_zero_countsbool, optional

Whether zero counts (zeros in count_array should be stored explicitly or not stored and inferred. Setting to False reduces the space taken by data sets containing lots of zero counts, but makes some objective function evaluations less precise.

auxdict or None, optional

If not None a dictionary of user-defined auxiliary information that should be associated with this circuit.

Returns

None

add_cirq_trial_result(circuit, trial_result, key, convert_int_to_binary=True, num_qubits=None)

Add a single circuit’s counts — stored in a Cirq TrialResult — to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object. Note that this must be a PyGSTi circuit — not a Cirq circuit.

trial_resultcirq.TrialResult

The TrialResult to add

keystr

The string key of the measurement. Set by cirq.measure.

convert_int_to_binarybool, optional (defaut True)

By default the keys in the cirq Results object are the integers representing the bitstrings of the measurements on a set of qubits, in big-endian convention. If True this converts back to a binary string before adding the counts as a entry into the pygsti dataset.

num_qubitsint, optional (default None)

Number of qubits used in the conversion from integers to binary when convert_int_to_binary is True. If None, then the number of line_labels on the input circuit is used.

Returns

None

add_raw_series_data(circuit, outcome_label_list, time_stamp_list, rep_count_list=None, overwrite_existing=True, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)

Add a single circuit’s counts to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

outcome_label_listlist

A list of outcome labels (strings or tuples). An element’s index links it to a particular time step (i.e. the i-th element of the list specifies the outcome of the i-th measurement in the series).

time_stamp_listlist

A list of floating point timestamps, each associated with the single corresponding outcome in outcome_label_list. Must be the same length as outcome_label_list.

rep_count_listlist, optional

A list of integer counts specifying how many outcomes of type given by outcome_label_list occurred at the time given by time_stamp_list. If None, then all counts are assumed to be 1. When not None, must be the same length as outcome_label_list.

overwrite_existingbool, optional

Whether to overwrite the data for circuit (if it exists). If False, then the given lists are appended (added) to existing data.

record_zero_countsbool, optional

Whether zero-counts (elements of rep_count_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

update_olbool, optional

This argument is for internal use only and should be left as True.

unsafebool, optional

When True, don’t bother checking that outcome_label_list contains tuple-type outcome labels and automatically upgrading strings to 1-tuples. Only set this to True if you know what you’re doing and need the marginally faster performance.

Returns

None

update_ol()

Updates the internal outcome-label list in this dataset.

Call this after calling add_count_dict(…) or add_raw_series_data(…) with update_olIndex=False.

Returns

None

add_series_data(circuit, count_dict_list, time_stamp_list, overwrite_existing=True, record_zero_counts=True, aux=None)

Add a single circuit’s counts to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

count_dict_listlist

A list of dictionaries holding the outcome-label:count pairs for each time step (times given by time_stamp_list.

time_stamp_listlist

A list of floating point timestamps, each associated with an entire dictionary of outcomes specified by count_dict_list.

overwrite_existingbool, optional

If True, overwrite any existing data for the circuit. If False, add the count data with the next non-negative integer timestamp.

record_zero_countsbool, optional

Whether zero-counts (elements of the dictionaries in count_dict_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

Returns

None

aggregate_outcomes(label_merge_dict, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in this DataSet.

Used, for example, to aggregate a 2-qubit 4-outcome DataSet into a 1-qubit 2-outcome DataSet.

Parameters
label_merge_dictdictionary

The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a two-qubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use filter_qubits() which also updates the circuits.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

Returns
merged_datasetDataSet object

The DataSet with outcomes merged according to the rules given in label_merge_dict.

aggregate_std_nqubit_outcomes(qubit_indices_to_keep, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in this DataSet.

Used, for example, to aggregate a 2-qubit 4-outcome DataSet into a 1-qubit 2-outcome DataSet. This assumes that outcome labels are in the standard format whereby each qubit corresponds to a single ‘0’ or ‘1’ character.

Parameters
qubit_indices_to_keeplist

A list of integers specifying which qubits should be kept, that is, not aggregated.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

Returns
merged_datasetDataSet object

The DataSet with outcomes merged.

add_auxiliary_info(circuit, aux)

Add auxiliary meta information to circuit.

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

Returns

None

add_counts_from_dataset(other_data_set)

Append another DataSet’s data to this DataSet

Parameters
other_data_setDataSet

The dataset to take counts from.

Returns

None

add_series_from_dataset(other_data_set)

Append another DataSet’s series data to this DataSet

Parameters
other_data_setDataSet

The dataset to take time series data from.

Returns

None

to_str(mode='auto')

Render this DataSet as a string.

Parameters
mode{“auto”,”time-dependent”,”time-independent”}

Whether to display the data as time-series of outcome counts (“time-dependent”) or to report per-outcome counts aggregated over time (“time-independent”). If “auto” is specified, then the time-independent mode is used only if all time stamps in the DataSet are equal to zero (trivial time dependence).

Returns

str

truncate(list_of_circuits_to_keep, missing_action='raise')

Create a truncated dataset comprised of a subset of the circuits in this dataset.

Parameters
list_of_circuits_to_keeplist of (tuples or Circuits)

A list of the circuits for the new returned dataset. If a circuit is given in this list that isn’t in the original data set, missing_action determines the behavior.

missing_action{“raise”,”warn”,”ignore”}

What to do when a string in list_of_circuits_to_keep is not in the data set (raise a KeyError, issue a warning, or do nothing).

Returns
DataSet

The truncated data set.

time_slice(start_time, end_time, aggregate_to_time=None)

Creates a DataSet by aggregating the counts within the [start_time,`end_time`) interval.

Parameters
start_timefloat

The starting time.

end_timefloat

The ending time.

aggregate_to_timefloat, optional

If not None, a single timestamp to give all the data in the specified range, resulting in time-independent DataSet. If None, then the original timestamps are preserved.

Returns

DataSet

split_by_time(aggregate_to_time=None)

Creates a dictionary of DataSets, each of which is a equal-time slice of this DataSet.

The keys of the returned dictionary are the distinct timestamps in this dataset.

Parameters
aggregate_to_timefloat, optional

If not None, a single timestamp to give all the data in each returned data set, resulting in time-independent DataSet objects. If None, then the original timestamps are preserved.

Returns
OrderedDict

A dictionary of DataSet objects whose keys are the timestamp values of the original (this) data set in sorted order.

drop_zero_counts()

Creates a copy of this data set that doesn’t include any zero counts.

Returns

DataSet

process_times(process_times_array_fn)

Manipulate this DataSet’s timestamps according to processor_fn.

For example, using, the folloing process_times_array_fn would change the timestamps for each circuit to sequential integers.

def process_times_array_fn(times):
    return list(range(len(times)))
Parameters
process_times_array_fnfunction

A function which takes a single array-of-timestamps argument and returns another similarly-sized array. This function is called, once per circuit, with the circuit’s array of timestamps.

Returns
DataSet

A new data set with altered timestamps.

process_circuits(processor_fn, aggregate=False)

Create a new data set by manipulating this DataSet’s circuits (keys) according to processor_fn.

The new DataSet’s circuits result from by running each of this DataSet’s circuits through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multi-qubit data.

Parameters
processor_fnfunction

A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.

aggregatebool, optional

When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.

Returns

DataSet

process_circuits_inplace(processor_fn, aggregate=False)

Manipulate this DataSet’s circuits (keys) in-place according to processor_fn.

All of this DataSet’s circuits are updated by running each one through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multi-qubit data.

Parameters
processor_fnfunction

A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.

aggregatebool, optional

When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.

Returns

None

remove(circuits, missing_action='raise')

Remove (delete) the data for circuits from this DataSet.

Parameters
circuitsiterable

An iterable over Circuit-like objects specifying the keys (circuits) to remove.

missing_action{“raise”,”warn”,”ignore”}

What to do when a string in circuits is not in this data set (raise a KeyError, issue a warning, or do nothing).

Returns

None

copy()

Make a copy of this DataSet.

Returns

DataSet

copy_nonstatic()

Make a non-static copy of this DataSet.

Returns

DataSet

done_adding_data()

Promotes a non-static DataSet to a static (read-only) DataSet.

This method should be called after all data has been added.

Returns

None

save(file_or_filename)
write_binary(file_or_filename)

Write this data set to a binary-format file.

Parameters
file_or_filenamestring or file object

If a string, interpreted as a filename. If this filename ends in “.gz”, the file will be gzip compressed.

Returns

None

load(file_or_filename)
read_binary(file_or_filename)

Read a DataSet from a binary file, clearing any data is contained previously.

The file should have been created with DataSet.write_binary()

Parameters
file_or_filenamestr or buffer

The file or filename to load from.

Returns

None

rename_outcome_labels(old_to_new_dict)

Replaces existing output labels with new ones as per old_to_new_dict.

Parameters
old_to_new_dictdict

A mapping from old/existing outcome labels to new ones. Strings in keys or values are automatically converted to 1-tuples. Missing outcome labels are left unaltered.

Returns

None

add_std_nqubit_outcome_labels(nqubits)

Adds all the “standard” outcome labels (e.g. ‘0010’) on nqubits qubits.

This is useful to ensure that, even if not all outcomes appear in the data, that all are recognized as being potentially valid outcomes (and so attempts to get counts for these outcomes will be 0 rather than raising an error).

Parameters
nqubitsint

The number of qubits. For example, if equal to 3 the outcome labels ‘000’, ‘001’, ‘111’ are added.

Returns

None

add_outcome_labels(outcome_labels, update_ol=True)

Adds new valid outcome labels.

Ensures that all the elements of outcome_labels are stored as valid outcomes for circuits in this DataSet, adding new outcomes as necessary.

Parameters
outcome_labelslist or generator

A list or generator of string- or tuple-valued outcome labels.

update_olbool, optional

Whether to update internal mappings to reflect the new outcome labels. Leave this as True unless you really know what you’re doing.

Returns

None

auxinfo_dataframe(pivot_valuename=None, pivot_value=None, drop_columns=False)

Create a Pandas dataframe with aux-data from this dataset.

Parameters
pivot_valuenamestr, optional

If not None, the resulting dataframe is pivoted using pivot_valuename as the column whose values name the pivoted table’s column names. If None and pivot_value is not None,`”ValueName”` is used.

pivot_valuestr, optional

If not None, the resulting dataframe is pivoted such that values of the pivot_value column are rearranged into new columns whose names are given by the values of the pivot_valuename column. If None and pivot_valuename is not None,`”Value”` is used.

drop_columnsbool or list, optional

A list of column names to drop (prior to performing any pivot). If True appears in this list or is given directly, then all constant-valued columns are dropped as well. No columns are dropped when drop_columns == False.

Returns

pandas.DataFrame

class pygsti.data.MultiDataSet(oli_dict=None, time_dict=None, rep_dict=None, circuit_indices=None, outcome_labels=None, outcome_label_indices=None, file_to_load_from=None, collision_actions=None, comment=None, comments=None, aux_info=None)

Bases: object

A collection of DataSets that hold data for the same circuits.

The MultiDataSet class allows for the combined access and storage of several static DataSets that contain the same circuits (in the same order) AND the same time-dependence structure (if applicable).

It is designed to behave similarly to a dictionary of DataSets, so that a DataSet is obtained by:

dataset = multiDataset[dataset_name]

where dataset_name may be a string OR a tuple.

Parameters

oli_dictordered dictionary, optional

Keys specify dataset names. Values are 1D numpy arrays which specify outcome label indices. Each value is indexed by the values of circuit_indices.

time_dictordered dictionary, optional

Same format as oli_dict except stores arrays of floating-point time stamp data.

rep_dictordered dictionary, optional

Same format as oli_dict except stores arrays of integer repetition counts (can be None if there are no repetitions)

circuit_indicesordered dictionary, optional

An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit.

outcome_labelslist of strings

Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both.

outcome_label_indicesordered dictionary

An OrderedDict with keys equal to spam labels (strings) and value equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.

file_to_load_fromstring or file object, optional

Specify this argument and no others to create a MultiDataSet by loading from a file (just like using the load(…) function).

collision_actionsdictionary, optional

Specifies how duplicate circuits should be handled for the data sets. Keys must match those of oli_dict and values are “aggregate” or “keepseparate”. See documentation for DataSet. If None, then “aggregate” is used for all sets by default.

commentstring, optional

A user-specified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.

commentsdict, optional

A user-specified dictionary of comments, one per dataset. Keys are dataset names (same as oli_dict keys).

aux_infodict, optional

A user-specified dictionary of per-circuit auxiliary information. Keys should be the circuits in this MultiDataSet and value should be Python dictionaries.

Initialize a MultiDataSet.

Parameters

oli_dictordered dictionary, optional

Keys specify dataset names. Values are 1D numpy arrays which specify outcome label indices. Each value is indexed by the values of circuit_indices.

time_dictordered dictionary, optional

Same format as oli_dict except stores arrays of floating-point time stamp data.

rep_dictordered dictionary, optional

Same format as oli_dict except stores arrays of integer repetition counts (can be None if there are no repetitions)

circuit_indicesordered dictionary, optional

An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit.

outcome_labelslist of strings

Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both.

outcome_label_indicesordered dictionary

An OrderedDict with keys equal to spam labels (strings) and value equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.

file_to_load_fromstring or file object, optional

Specify this argument and no others to create a MultiDataSet by loading from a file (just like using the load(…) function).

collision_actionsdictionary, optional

Specifies how duplicate circuits should be handled for the data sets. Keys must match those of oli_dict and values are “aggregate” or “keepseparate”. See documentation for DataSet. If None, then “aggregate” is used for all sets by default.

commentstring, optional

A user-specified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.

commentsdict, optional

A user-specified dictionary of comments, one per dataset. Keys are dataset names (same as oli_dict keys).

aux_infodict, optional

A user-specified dictionary of per-circuit auxiliary information. Keys should be the circuits in this MultiDataSet and value should be Python dictionaries.

Returns

MultiDataSet

a new multi data set object.

property outcome_labels

Get a list of all the outcome labels contained in this MultiDataSet.

Returns
list of strings or tuples

A list where each element is an outcome label (which can be a string or a tuple of strings).

keys()

A list of the keys (dataset names) of this MultiDataSet

Returns

list

items()

Iterator over (dataset name, DataSet) pairs.

values()

Iterator over DataSets corresponding to each dataset name.

datasets_aggregate(*dataset_names)

Generate a new DataSet by combining the outcome counts of multiple member Datasets.

Data with the same time-stamp and outcome are merged into a single “bin” in the returned DataSet.

Parameters
dataset_nameslist of strs

one or more dataset names.

Returns
DataSet

a single DataSet containing the summed counts of each of the data named by the parameters.

add_dataset(dataset_name, dataset, update_auxinfo=True)

Add a DataSet to this MultiDataSet.

The dataset must be static and conform with the circuits and time-dependent structure passed upon construction or those inherited from the first dataset added.

Parameters
dataset_namestring

The name to give the added dataset (i.e. the key the new data set will be referenced by).

datasetDataSet

The data set to add.

update_auxinfobool, optional

Whether the auxiliary information (if any exists) in dataset is added to the information already stored in this MultiDataSet.

Returns

None

copy()

Make a copy of this MultiDataSet

Returns

MultiDataSet

save(file_or_filename)
write_binary(file_or_filename)

Write this MultiDataSet to a binary-format file.

Parameters
file_or_filenamefile or string

Either a filename or a file object. In the former case, if the filename ends in “.gz”, the file will be gzip compressed.

Returns

None

load(file_or_filename)
read_binary(file_or_filename)

Read a MultiDataSet from a file, clearing any data is contained previously.

The file should have been created with MultiDataSet.write_binary()

Parameters
file_or_filenamefile or string

Either a filename or a file object. In the former case, if the filename ends in “.gz”, the file will be gzip uncompressed as it is read.

Returns

None

add_auxiliary_info(circuit, aux)

Add auxiliary meta information to circuit.

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

Returns

None

class pygsti.data.DataComparator(dataset_list_or_multidataset, circuits='all', op_exclusions=None, op_inclusions=None, ds_names=None, allow_bad_circuits=False)

A comparison between multiple data, presumably taken in different contexts.

This object can be used to run all of the “context dependence detection” methods described in “Probing context-dependent errors in quantum processors”, by Rudinger et al. (See that paper’s supplemental material for explicit demonstrations of this object.)

This object stores the p-values and log-_likelihood ratio values from a consistency comparison between two or more data, and provides methods to:

  • Perform a hypothesis test to decide which sequences contain statistically significant variation.

  • Plot p-value histograms and log-_likelihood ratio box plots.

  • Extract (1) the “statistically significant total variation distance” for a circuit, (2) various other quantifications of the “amount” of context dependence, and (3) the level of statistical significance at which any context dependence is detected.

Parameters

dataset_list_multidatasetList of DataSets or MultiDataSet

Either a list of DataSets, containing two or more sets of data to compare, or a MultiDataSet object, containing two or more sets of data to compare. Note that these DataSets should contain data for the same set of Circuits (although if there are additional Circuits these can be ignored using the parameters below). This object is then intended to be used test to see if the results are indicative that the outcome probabilities for these Circuits has changed between the “contexts” that the data was obtained in.

circuits‘all’ or list of Circuits, optional (default is ‘all’)

If ‘all’ the comparison is implemented for all Circuits in the DataSets. Otherwise, this should be a list containing all the Circuits to run the comparison for (although note that some of these Circuits may be ignored with non-default options for the next two inputs).

op_exclusionsNone or list of gates, optional (default is None)

If not None, all Circuits containing any of the gates in this list are discarded, and no comparison will be made for those strings.

op_exclusionsNone or list of gates, optional (default is None)

If not None, a Circuit will be dropped from the list to run the comparisons for if it doesn’t include some gate from this list (or is the empty circuit).

ds_namesNone or list, optional (default is None)

If dataset_list_multidataset is a list of DataSets, this can be used to specify names for the DataSets in the list. E.g., [“Time 0”, “Time 1”, “Time 3”] or [“Driving”,”NoDriving”].

allow_bad_circuitsbool, optional

Whether or not the data is allowed to have zero total counts for any circuits in any of the passes. If false, then an error will be raise when there are such unimplemented circuits. If true, then the data from those circuits that weren’t run in one or more of the passes will be discarded before any analysis is performed (equivalent to excluding them explicitly in with the circuits input.

Initializes a DataComparator object.

Parameters

dataset_list_multidatasetList of DataSets or MultiDataSet

Either a list of DataSets, containing two or more sets of data to compare, or a MultiDataSet object, containing two or more sets of data to compare. Note that these DataSets should contain data for the same set of Circuits (although if there are additional Circuits these can be ignored using the parameters below). This object is then intended to be used test to see if the results are indicative that the outcome probabilities for these Circuits has changed between the “contexts” that the data was obtained in.

circuits‘all’ or list of Circuits, optional (default is ‘all’)

If ‘all’ the comparison is implemented for all Circuits in the DataSets. Otherwise, this should be a list containing all the Circuits to implement the comparison for (although note that some of these Circuits may be ignored with non-default options for the next two inputs).

op_exclusionsNone or list of gates, optional (default is None)

If not None, all Circuits containing any of the gates in this list are discarded, and no comparison will be made for those strings.

op_exclusionsNone or list of gates, optional (default is None)

If not None, a Circuit will be dropped from the list to implement the comparisons for if it doesn’t include some gate from this list (or is the empty circuit).

ds_namesNone or list, optional (default is None)

If dataset_list_multidataset is a list of DataSets, this can be used to specify names for the DataSets in the list. E.g., [“Time 0”, “Time 1”, “Time 3”] or [“Driving”,”NoDriving”].

allow_bad_circuitsbool, optional

Whether or not the data is allowed to have zero total counts for any circuits in any of the passes. If false, then an error will be raise when there are such unimplemented circuits. If true, then the data from those circuits that weren’t run in one or more of the passes will be discarded before any analysis is performed (equivalent to excluding them explicitly in with the circuits input.

Returns

A DataComparator object.

property maximum_sstvd

Returns the maximum, over circuits, of the “statistically significant total variation distance” (SSTVD).

This is only possible if the comparison is between two sets of data. See the .sstvd() method for information on SSTVD.

Returns
float

The circuit associated with the maximum SSTVD, and the SSTVD of that circuit.

property pvalue_pseudothreshold

Returns the (multi-test-adjusted) statistical significance pseudo-threshold for the per-sequence p-values.

The p-values under consideration are those obtained from the log-likehood ratio test. This is a “pseudo-threshold”, because it is data-dependent in general, but all the per-sequence p-values below this value are statistically significant. This quantity is given by Eq. (9) in “Probing context-dependent errors in quantum processors”, by Rudinger et al.

Returns
float

The statistical significance pseudo-threshold for the per-sequence p-value.

property llr_pseudothreshold

Returns the statistical significance pseudo-threshold for the per-sequence log-_likelihood ratio (LLR).

This results has been multi-test-adjusted.

This is a “pseudo-threshold”, because it is data-dependent in general, but all LLRs above this value are statistically significant. This quantity is given by Eq (10) in “Probing context-dependent errors in quantum processors”, by Rudinger et al.

Returns
float

The statistical significance pseudo-threshold for per-sequence LLR.

property jsd_pseudothreshold

The statistical significance pseudo-threshold for the Jensen-Shannon divergence (JSD) between “contexts”.

This is a rescaling of the pseudo-threshold for the LLR, returned by the method .llr_pseudothreshold; see that method for more details. This threshold is also given by Eq (17) in “Probing context-dependent errors in quantum processors”, by Rudinger et al.

Note that this pseudo-threshold is not defined if the total number of counts (summed over contexts) for a sequence varies between sequences.

Returns
float

The pseudo-threshold for the JSD of a circuit, if well-defined.

property aggregate_llr

Returns the “aggregate” log-_likelihood ratio (LLR).

This values compares the null hypothesis of no context dependence in any sequence with the full model of arbitrary context dependence. This is the sum of the per-sequence LLRs, and it is defined in Eq (11) of “Probing context-dependent errors in quantum processors”, by Rudinger et al.

Returns
float

The aggregate LLR.

property aggregate_llr_threshold

The (multi-test-adjusted) statistical significance threshold for the “aggregate” log-_likelihood ratio (LLR).

Above this value, the LLR is significant. See .aggregate_llr for more details. This quantity is the LLR version of the quantity defined in Eq (14) of “Probing context-dependent errors in quantum processors”, by Rudinger et al.

Returns
float

The threshold above which the aggregate LLR is statistically significant.

property aggregate_pvalue

Returns the p-value for the “aggregate” log-_likelihood ratio (LLR).

This compares the null hypothesis of no context dependence in any sequence with the full model of arbitrary dependence. This LLR is defined in Eq (11) in “Probing context-dependent errors in quantum processors”, by Rudinger et al., and it is converted to a p-value via Wilks’ theorem (see discussion therein).

Note that this p-value is often zero to machine precision, when there is context dependence, so a more useful number is often returned by aggregate_nsigma() (that quantity is equivalent to this p-value but expressed on a different scale).

Returns
float

The p-value of the aggregate LLR.

property aggregate_pvalue_threshold

The (multi-test-adjusted) statistical significance threshold for the p-value of the “aggregate” LLR.

Here, LLR refers to the log-_likelihood ratio. Below this p-value the LLR would be deemed significant. See the .aggregate_pvalue method for more details.

Returns
float

The statistical significance threshold for the p-value of the “aggregate” LLR.

property aggregate_nsigma

The number of standard deviations the “aggregate” LLR is above the context-independent mean.

More specifically, the number of standard deviations above the context-independent mean that the “aggregate” log-_likelihood ratio (LLR) is. This quantity is defined in Eq (13) of “Probing context-dependent errors in quantum processors”, by Rudinger et al.

Returns
float

The number of signed standard deviations of the aggregate LLR .

property aggregate_nsigma_threshold

The significance threshold above which the signed standard deviations of the aggregate LLR is significant.

The (multi-test-adjusted) statistical significance threshold for the signed standard deviations of the the “aggregate” log-_likelihood ratio (LLR). See the .aggregate_nsigma method for more details. This quantity is defined in Eq (14) of “Probing context-dependent errors in quantum processors”, by Rudinger et al.

Returns
float

The statistical significance threshold above which the signed standard deviations of the aggregate LLR is significant.

run(significance=0.05, per_circuit_correction='Hochberg', aggregate_test_weighting=0.5, pass_alpha=True, verbosity=2)

Runs statistical hypothesis testing.

This detects whether there is statistically significant variation between the DateSets in this DataComparator. This performs hypothesis tests on the data from individual circuits, and a joint hypothesis test on all of the data. With the default settings, this is the method described and implemented in “Probing context-dependent errors in quantum processors”, by Rudinger et al. With non-default settings, this is some minor variation on that method.

Note that the default values of all the parameters are likely sufficient for most purposes.

Parameters
significancefloat in (0,1), optional (default is 0.05)

The “global” statistical significance to implement the tests at. I.e, with the standard per_circuit_correction value (and some other values for this parameter) the probability that a sequence that has been flagged up as context dependent is actually from a context-independent circuit is no more than significance. Precisely, significance is what the “family-wise error rate” (FWER) of the full set of hypothesis tests (1 “aggregate test”, and 1 test per sequence) is controlled to, as long as per_circuit_correction is set to the default value, or another option that controls the FWER of the per-sequence comparion (see below).

per_circuit_correctionstring, optional (default is ‘Hochberg’)

The multi-hypothesis test correction used for the per-circuit/sequence comparisons. (See “Probing context-dependent errors in quantum processors”, by Rudinger et al. for the details of what the per-circuit comparison is). This can be any string that is an allowed value for the localcorrections input parameter of the HypothesisTest object. This includes:

  • ‘Hochberg’. This implements the Hochberg multi-test compensation technique. This is strictly the best method available in the code, if you wish to control the FWER, and it is the method described in “Probing context-dependent errors in quantum processors”, by Rudinger et al.

  • ‘Holms’. This implements the Holms multi-test compensation technique. This controls the FWER, and it results in a strictly less powerful test than the Hochberg correction.

  • ‘Bonferroni’. This implements the well-known Bonferroni multi-test compensation technique. This controls the FWER, and it results in a strictly less powerful test than the Hochberg correction.

  • ‘none’. This implements no multi-test compensation for the per-sequence comparsions, so they are all implemented at a “local” signifincance level that is altered from significance only by the (inbuilt) Bonferroni-like correction between the “aggregate” test and the per-sequence tests. This option does not control the FWER, and many sequences may be flagged up as context dependent even if none are.

  • ‘Benjamini-Hochberg’. This implements the Benjamini-Hockberg multi-test compensation technique. This does not control the FWER, and instead controls the “False Detection Rate” (FDR); see, for example, https://en.wikipedia.org/wiki/False_discovery_rate. That means that the global significance is maintained for the test of “Is there any context dependence?”. I.e., one or more tests will trigger when there is no context dependence with at most a probability of significance. But, if one or more per-sequence tests trigger then we are only guaranteed that (in expectation) no more than a fraction of “local-signifiance” of the circuits that have been flagged up as context dependent actually aren’t. Here, “local-significance” is the significance at which the per-sequence tests are, together, implemented, which is significance`*(1 - `aggregate_test_weighting) if the aggregate test doesn’t detect context dependence and significance if it does (as long as pass_alpha is True). This method is strictly more powerful than the Hochberg correction, but it controls a different, weaker quantity.

aggregate_test_weightingfloat in [0,1], optional (default is 0.5)

The weighting, in a generalized Bonferroni correction, to put on the “aggregate test”, that jointly tests all of the data for context dependence (in contrast to the per-sequence tests). If this is 0 then the aggreate test is not implemented, and if it is 1 only the aggregate test is implemented (unless it triggers and pass_alpha is True).

pass_alphaBool, optional (default is True)

The aggregate test is implemented first, at the “local” significance defined by aggregate_test_weighting and significance (see above). If pass_alpha is True, then when the aggregate test triggers all the local significance for this test is passed on to the per-sequence tests (which are then jointly implemented with significance significance, that is then locally corrected for the multi-test correction as specified above), and when the aggregate test doesn’t trigger this local significance isn’t passed on. If pass_alpha is False then local significance of the aggregate test is never passed on from the aggregate test. See “Probing context-dependent errors in quantum processors”, by Rudinger et al. (or hypothesis testing literature) for discussions of why this “significance passing” still maintains a (global) FWER of significance. Note that The default value of True always results in a strictly more powerful test.

verbosityint, optional (default is 1)

If > 0 then a summary of the results of the tests is printed to screen. Otherwise, the various .get_…() methods need to be queried to obtain the results of the hypothesis tests.

Returns

None

tvd(circuit)

Returns the observed total variation distacnce (TVD) for the specified circuit.

This is only possible if the comparison is between two sets of data. See Eq. (19) in “Probing context-dependent errors in quantum processors”, by Rudinger et al. for the definition of this observed TVD.

This is a quantification for the “amount” of context dependence for this circuit (see also, jsd(), sstvd() and ssjsd()).

Parameters
circuitCircuit

The circuit to return the TVD of.

Returns
float

The TVD for the specified circuit.

sstvd(circuit)

Returns the “statistically significant total variation distacnce” (SSTVD) for the specified circuit.

This is only possible if the comparison is between two sets of data. The SSTVD is None if the circuit has not been found to have statistically significant variation. Otherwise it is equal to the observed TVD. See Eq. (20) and surrounding discussion in “Probing context-dependent errors in quantum processors”, by Rudinger et al., for more information.

This is a quantification for the “amount” of context dependence for this circuit (see also, jsd(), _tvd() and ssjsd()).

Parameters
circuitCircuit

The circuit to return the SSTVD of.

Returns
float

The SSTVD for the specified circuit.

pvalue(circuit)

Returns the pvalue for the log-_likelihood ratio test for the specified circuit.

Parameters
circuitCircuit

The circuit to return the p-value of.

Returns
float

The p-value of the specified circuit.

llr(circuit)

Returns the log-_likelihood ratio (LLR) for the input circuit.

This is the quantity defined in Eq (4) of “Probing context-dependent errors in quantum processors”, by Rudinger et al.

Parameters
circuitCircuit

The circuit to return the LLR of.

Returns
float

The LLR of the specified circuit.

jsd(circuit)

Returns the observed Jensen-Shannon divergence (JSD) between “contexts” for the specified circuit.

The JSD is a rescaling of the LLR, given by dividing the LLR by 2*N where N is the total number of counts (summed over contexts) for this circuit. This quantity is given by Eq (15) in “Probing context-dependent errors in quantum processors”, Rudinger et al.

This is a quantification for the “amount” of context dependence for this circuit (see also, _tvd(), sstvd() and ssjsd()).

Parameters
circuitCircuit

The circuit to return the JSD of

Returns
float

The JSD of the specified circuit.

ssjsd(circuit)

Returns the statistically significant Jensen-Shannon divergence” (SSJSD) between “contexts” for circuit.

This is the JSD of the circuit (see .jsd()), if the circuit has been found to be context dependent, and otherwise it is None. This quantity is the JSD version of the SSTVD given in Eq. (20) of “Probing context-dependent errors in quantum processors”, by Rudinger et al.

This is a quantification for the “amount” of context dependence for this circuit (see also, _tvd(), sstvd() and ssjsd()).

Parameters
circuitCircuit

The circuit to return the JSD of

Returns
float

The JSD of the specified circuit.

worst_circuits(number)

Returns the “worst” circuits that have the smallest p-values.

Parameters
numberint

The number of circuits to return.

Returns
list

A list of tuples containing the worst number circuits along with the correpsonding p-values.

class pygsti.data.FreeformDataSet(circuits=None, circuit_indices=None)

Bases: object

An association between Circuits and arbitrary data.

Parameters

circuitslist of (tuples or Circuits), optional

Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spam-label indices (above). Only specify this argument OR circuit_indices, not both.

circuit_indicesordered dictionary, optional

An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.

to_dataframe(pivot_valuename=None, pivot_value='Value', drop_columns=False)

Create a Pandas dataframe with the data from this free-form dataset.

Parameters
pivot_valuenamestr, optional

If not None, the resulting dataframe is pivoted using pivot_valuename as the column whose values name the pivoted table’s column names. If None and pivot_value is not None,`”ValueName”` is used.

pivot_valuestr, optional

If not None, the resulting dataframe is pivoted such that values of the pivot_value column are rearranged into new columns whose names are given by the values of the pivot_valuename column. If None and pivot_valuename is not None,`”Value”` is used.

drop_columnsbool or list, optional

A list of column names to drop (prior to performing any pivot). If True appears in this list or is given directly, then all constant-valued columns are dropped as well. No columns are dropped when drop_columns == False.

Returns

pandas.DataFrame

keys()

Returns the circuits used as keys of this DataSet.

Returns
list

A list of Circuit objects which index the data counts within this data set.

items()

Iterator over (circuit, info_dict) pairs.

Returns

Iterator

values()

Iterator over info-dicts for each circuit.

Returns

Iterator

copy()

Make a copy of this FreeformDataSet.

Returns

DataSet

class pygsti.data.HypothesisTest(hypotheses, significance=0.05, weighting='equal', passing_graph='Holms', local_corrections='Holms')

Bases: object

A set of statistical hypothesis tests on a set of null hypotheses.

This object has not been carefully tested.

Parameters

hypotheseslist or tuple

Specifies the set of null hypotheses. This should be a list containing elements that are either

  • A “label” for a hypothesis, which is just some hashable object such as a string.

  • A tuple of “nested hypotheses”, which are also just labels for some null hypotheses.

The elements of this list are then subject to multi-test correction of the “closed test procedure” type, with the exact correction method specified by passing_graph. For each element that is itself a tuple of hypotheses, these hypotheses are then further corrected using the method specified by local_corrections.

significancefloat in (0,1), optional

The global significance level. If either there are no “nested hypotheses” or the correction used for the nested hypotheses will locally control the family-wise error rate (FWER) (such as if local_correction`=’Holms’) then when the hypothesis test encoded by this object will control the FWER to `significance.

weightingstring or dict.

Specifies what proportion of significance is initially allocated to each element of hypotheses. If a string, must be ‘equal’. In this case, the local significance allocated to each element of hypotheses is significance/len(hypotheses). If not a string, a dictionary whereby each key is an element of hypotheses and each value is a non-negative integer (which will be normalized to one inside the function).

passing_graphstring or numpy.array

Specifies where the local significance from each test in hypotheses that triggers is passed to. If a string, then must be ‘Holms’. In this case a test that triggers passes it’s local significance to all the remaining hypotheses that have not yet triggered, split evenly over these hypotheses. If it is an array then its value for [i,j] is the proportion of the “local significance” that is passed from hypothesis with index i (in the tuple hypotheses) to the hypothesis with index j if the hypothesis with index i is rejected (and if j hasn’t yet been rejected; otherwise that proportion is re-distributed other the other hypothesis that i is to pass it’s significance to). The only restriction on restriction on this array is that a row must sum to <= 1 (and it is sub-optimal for a row to sum to less than 1).

Note that a nested hypothesis is not allowed to pass significance out of it, so any rows that request doing this will be ignored. This is because a nested hypothesis represents a set of hypotheses that are to be jointly tested using some multi-test correction, and so this can only pass significance out if all of the hypotheses in that nested hypothesis are rejected. As this is unlikely in most use-cases, this has not been allowed for.

local_correctionsstr, optional

The type of multi-test correction used for testing any nested hypotheses. After all of the “top level” testing as been implemented on all non-nested hypotheses, whatever the “local” significance is for each of the “nested hypotheses” is multi-test corrected using this procedure. Must be one of:

  • ‘Holms’. This implements the Holms multi-test compensation technique. This controls the FWER for each set of nested hypotheses (and so controls the global FWER, in combination with the “top level” corrections). This requires no assumptions about the null hypotheses.

  • ‘Bonferroni’. This implements the well-known Bonferroni multi-test compensation technique. This is strictly less powerful test than the Hochberg correction. Note that neither ‘Holms’ nor ‘Bonferronni’ gained any advantage from being implemented using “nesting”, as if all the hypotheses were put into the “top level” the same corrections could be achieved.

  • ‘Hochberg’. This implements the Hockberg multi-test compensation technique. It is not a “closed test procedure”, so it is not something that can be implemented in the top level. To be provably valid, it is necessary for the p-values of the nested hypotheses to be non-negatively dependent. When that is true, this is strictly better than the Holms and Bonferroni corrections whilst still controlling the FWER.

  • ‘none’. This implements no multi-test compensation. This option does not control the FWER of the nested hypotheses. So it will generally not control the global FWER as specified.

  • ‘Benjamini-Hochberg’. This implements the Benjamini-Hockberg multi-test compensation technique. This does not control the FWER of the nested hypotheses, and instead controls the “False Detection Rate” (FDR); see wikipedia. That means that the global significance is maintained in the sense that the probability of one or more tests triggering is at most significance. But, if one or more tests are triggered in a particular nested hypothesis test we are only guaranteed that (in expectation) no more than a fraction of “local signifiance” of tests are false alarms.This method is strictly more powerful than the Hochberg correction, but it controls a different, weaker quantity.

Initializes a HypothesisTest object. This specifies the set of null hypotheses, and the tests to be implemented, it does not implement the tests. Methods are used to add the data (.add_pvalues) and run the tests (.run).

Parameters

hypotheseslist or tuple

Specifies the set of null hypotheses. This should be a list containing elements that are either

  • A “label” for a hypothesis, which is just some hashable object such as a string.

  • A tuple of “nested hypotheses”, which are also just labels for some null hypotheses.

The elements of this list are then subject to multi-test correction of the “closed test procedure” type, with the exact correction method specified by passing_graph. For each element that is itself a tuple of hypotheses, these hypotheses are then further corrected using the method specified by local_corrections.

significancefloat in (0,1), optional

The global significance level. If either there are no “nested hypotheses” or the correction used for the nested hypotheses will locally control the family-wise error rate (FWER) (such as if local_correction`=’Holms’) then when the hypothesis test encoded by this object will control the FWER to `significance.

weightingstring or dict.

Specifies what proportion of significance is initially allocated to each element of hypotheses. If a string, must be ‘equal’. In this case, the local significance allocated to each element of hypotheses is significance/len(hypotheses). If not a string, a dictionary whereby each key is an element of hypotheses and each value is a non-negative integer (which will be normalized to one inside the function).

passing_graphstring or numpy.array

Specifies where the local significance from each test in hypotheses that triggers is passed to. If a string, then must be ‘Holms’. In this case a test that triggers passes it’s local significance to all the remaining hypotheses that have not yet triggered, split evenly over these hypotheses. If it is an array then its value for [i,j] is the proportion of the “local significance” that is passed from hypothesis with index i (in the tuple hypotheses) to the hypothesis with index j if the hypothesis with index i is rejected (and if j hasn’t yet been rejected; otherwise that proportion is re-distributed other the other hypothesis that i is to pass it’s significance to). The only restriction on restriction on this array is that a row must sum to <= 1 (and it is sub-optimal for a row to sum to less than 1).

Note that a nested hypothesis is not allowed to pass significance out of it, so any rows that request doing this will be ignored. This is because a nested hypothesis represents a set of hypotheses that are to be jointly tested using some multi-test correction, and so this can only pass significance out if all of the hypotheses in that nested hypothesis are rejected. As this is unlikely in most use-cases, this has not been allowed for.

local_correctionsstr, optional

The type of multi-test correction used for testing any nested hypotheses. After all of the “top level” testing as been implemented on all non-nested hypotheses, whatever the “local” significance is for each of the “nested hypotheses” is multi-test corrected using this procedure. Must be one of:

  • ‘Holms’. This implements the Holms multi-test compensation technique. This controls the FWER for each set of nested hypotheses (and so controls the global FWER, in combination with the “top level” corrections). This requires no assumptions about the null hypotheses.

  • ‘Bonferroni’. This implements the well-known Bonferroni multi-test compensation technique. This is strictly less powerful test than the Hochberg correction. Note that neither ‘Holms’ nor ‘Bonferronni’ gained any advantage from being implemented using “nesting”, as if all the hypotheses were put into the “top level” the same corrections could be achieved.

  • ‘Hochberg’. This implements the Hockberg multi-test compensation technique. It is not a “closed test procedure”, so it is not something that can be implemented in the top level. To be provably valid, it is necessary for the p-values of the nested hypotheses to be non-negatively dependent. When that is true, this is strictly better than the Holms and Bonferroni corrections whilst still controlling the FWER.

  • ‘none’. This implements no multi-test compensation. This option does not control the FWER of the nested hypotheses. So it will generally not control the global FWER as specified.

  • ‘Benjamini-Hochberg’. This implements the Benjamini-Hockberg multi-test compensation technique. This does not control the FWER of the nested hypotheses, and instead controls the “False Detection Rate” (FDR); see wikipedia. That means that the global significance is maintained in the sense that the probability of one or more tests triggering is at most significance. But, if one or more tests are triggered in a particular nested hypothesis test we are only guaranteed that (in expectation) no more than a fraction of “local signifiance” of tests are false alarms.This method is strictly more powerful than the Hochberg correction, but it controls a different, weaker quantity.

Returns

A HypothesisTest object.

add_pvalues(pvalues)

Insert the p-values for the hypotheses.

Parameters
pvaluesdict

A dictionary specifying the p-value for each hypothesis.

Returns

None

run()

Implements the multiple hypothesis testing routine encoded by this object.

This populates the self.hypothesis_rejected dictionary, that shows which hypotheses can be rejected using the procedure specified.

Returns

None

pygsti.data.simulate_data(model_or_dataset, circuit_list, num_samples, sample_error='multinomial', seed=None, rand_state=None, alias_dict=None, collision_action='aggregate', record_zero_counts=True, comm=None, mem_limit=None, times=None)

Creates a DataSet using the probabilities obtained from a model.

Parameters

model_or_datasetModel or DataSet object

The source of the underlying probabilities used to generate the data. If a Model, the model whose probabilities generate the data. If a DataSet, the data set whose frequencies generate the data.

circuit_listlist of (tuples or Circuits) or ExperimentDesign or None

Each tuple or Circuit contains operation labels and specifies a gate sequence whose counts are included in the returned DataSet. e.g. [ (), ('Gx',), ('Gx','Gy') ] If an ExperimentDesign, then the design’s .all_circuits_needing_data list is used as the circuit list.

num_samplesint or list of ints or None

The simulated number of samples for each circuit. This only has effect when sample_error == "binomial" or "multinomial". If an integer, all circuits have this number of total samples. If a list, integer elements specify the number of samples for the corresponding circuit. If None, then model_or_dataset must be a DataSet, and total counts are taken from it (on a per-circuit basis).

sample_errorstring, optional

What type of sample error is included in the counts. Can be:

  • “none” - no sample error: counts are floating point numbers such that the exact probabilty can be found by the ratio of count / total.

  • “clip” - no sample error, but clip probabilities to [0,1] so, e.g., counts are always positive.

  • “round” - same as “clip”, except counts are rounded to the nearest integer.

  • “binomial” - the number of counts is taken from a binomial distribution. Distribution has parameters p = (clipped) probability of the circuit and n = number of samples. This can only be used when there are exactly two SPAM labels in model_or_dataset.

  • “multinomial” - counts are taken from a multinomial distribution. Distribution has parameters p_k = (clipped) probability of the gate string using the k-th SPAM label and n = number of samples.

seedint, optional

If not None, a seed for numpy’s random number generator, which is used to sample from the binomial or multinomial distribution.

rand_statenumpy.random.RandomState

A RandomState object to generate samples from. Can be useful to set instead of seed if you want reproducible distribution samples across multiple random function calls but you don’t want to bother with manually incrementing seeds between those calls.

alias_dictdict, optional

A dictionary mapping single operation labels into tuples of one or more other operation labels which translate the given circuits before values are computed using model_or_dataset. The resulting Dataset, however, contains the un-translated circuits as keys.

collision_action{“aggregate”, “keepseparate”}

Determines how duplicate circuits are handled by the resulting DataSet. Please see the constructor documentation for DataSet.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in the returned DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

commmpi4py.MPI.Comm, optional

When not None, an MPI communicator for distributing the computation across multiple processors and ensuring that the same dataset is generated on each processor.

mem_limitint, optional

A rough memory limit in bytes which is used to determine job allocation when there are multiple processors.

timesiterable, optional

When not None, a list of time-stamps at which data should be sampled. num_samples samples will be simulated at each time value, meaning that each circuit in circuit_list will be evaluated with the given time value as its start time.

Returns

DataSet

A static data set filled with counts for the specified circuits.

pygsti.data.aggregate_dataset_outcomes(dataset, label_merge_dict, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in input DataSet.

This is used, for example, to aggregate a 2-qubit, 4-outcome DataSet into a 1-qubit, 2-outcome DataSet.

Parameters

datasetDataSet object

The input DataSet whose results will be simplified according to the rules set forth in label_merge_dict

label_merge_dictdictionary

The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a two-qubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use filter_dataset() which also updates the circuits.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

Returns

merged_datasetDataSet object

The DataSet with outcomes merged according to the rules given in label_merge_dict.

pygsti.data.filter_dataset(dataset, sectors_to_keep, sindices_to_keep=None, new_sectors=None, idle=((),), record_zero_counts=True, filtercircuits=True)

Creates a DataSet that is the restriction of dataset to sectors_to_keep.

This function aggregates (sums) outcomes in dataset which differ only in sectors (usually qubits - see below) not in sectors_to_keep, and removes any operation labels which act specifically on sectors not in sectors_to_keep (e.g. an idle gate acting on all sectors because it’s .sslbls is None will not be removed).

Here “sectors” are state-space labels, present in the circuits of dataset. Each sector also corresponds to a particular character position within the outcomes labels of dataset. Thus, for this function to work, the outcome labels of dataset must all be 1-tuples whose sole element is an n-character string such that each character represents the outcome of a single sector. If the state-space labels are integers, then they can serve as both a label and an outcome-string position. The argument new_sectors may be given to rename the kept state-space labels in the returned DataSet’s circuits.

A typical case is when the state-space is that of n qubits, and the state space labels the intergers 0 to n-1. As stated above, in this case there is no need to specify sindices_to_keep. One may want to “rebase” the indices to 0 in the returned data set using new_sectors (E.g. sectors_to_keep == [4,5,6] and new_sectors == [0,1,2]).

Parameters

datasetDataSet object

The input DataSet whose data will be processed.

sectors_to_keeplist or tuple

The state-space labels (strings or integers) of the “sectors” to keep in the returned DataSet.

sindices_to_keeplist or tuple, optional

The 0-based indices of the labels in sectors_to_keep which give the postiions of the corresponding letters in each outcome string (see above). If the state space labels are integers (labeling qubits) thath are also letter-positions, then this may be left as None. For example, if the outcome strings of dataset are ‘00’,’01’,’10’,and ‘11’ and the first position refers to qubit “Q1” and the second to qubit “Q2” (present in operation labels), then to extract just “Q2” data sectors_to_keep should be [“Q2”] and sindices_to_keep should be [1].

new_sectorslist or tuple, optional

New sectors names to map the elements of sectors_to_keep onto in the output DataSet’s circuits. None means the labels are not renamed. This can be useful if, for instance, you want to run a 2-qubit protocol that expects the qubits to be labeled “0” and “1” on qubits “4” and “5” of a larger set. Simply set sectors_to_keep == [4,5] and new_sectors == [0,1].

idlestring or Label, optional

The operation label to be used when there are no kept components of a “layer” (element) of a circuit.

record_zero_countsbool, optional

Whether zero-counts present in the original dataset are recorded (stored) in the returned (filtered) DataSet. If False, then such zero counts are ignored, except for potentially registering new outcome labels.

filtercircuitsbool, optional

Whether or not to “filter” the circuits, by removing gates that act outside of the sectors_to_keep.

Returns

filtered_datasetDataSet object

The DataSet with outcomes and circuits filtered as described above.

pygsti.data.trim_to_constant_numtimesteps(ds)

Trims a DataSet so that each circuit’s data comprises the same number of timesteps.

Returns a new dataset that has data for the same number of time steps for every circuit. This is achieved by discarding all time-series data for every circuit with a time step index beyond ‘min-time-step-index’, where ‘min-time-step-index’ is the minimum number of time steps over circuits.

Parameters

dsDataSet

The dataset to trim.

Returns

DataSet

The trimmed dataset, obtained by potentially discarding some of the data.

pygsti.data.make_rpe_data_set(model_or_dataset, string_list_d, num_samples, sample_error='binomial', seed=None)

Generate a fake RPE DataSet using the probabilities obtained from a model.

Is a thin wrapper for pygsti.data.simulate_data, changing default behavior of sample_error, and taking a dictionary of circuits as input.

Parameters

model_or_datasetModel or DataSet object

If a Model, the model whose probabilities generate the data. If a DataSet, the data set whose frequencies generate the data.

string_list_dDictionary of list of (tuples or Circuits)

Each tuple or Circuit contains operation labels and specifies a gate sequence whose counts are included in the returned DataSet. The dictionary must have the key ‘totalStrList’; easiest if this dictionary is generated by make_rpe_string_list_d.

num_samplesint or list of ints or None

The simulated number of samples for each circuit. This only has effect when sample_error == “binomial” or “multinomial”. If an integer, all circuits have this number of total samples. If a list, integer elements specify the number of samples for the corresponding circuit. If None, then model_or_dataset must be a DataSet, and total counts are taken from it (on a per-circuit basis).

sample_errorstring, optional

What type of sample error is included in the counts. Can be:

  • “none” - no sample error: counts are floating point numbers such that the exact probability can be found by the ratio of count / total.

  • “round” - same as “none”, except counts are rounded to the nearest integer.

  • “binomial” - the number of counts is taken from a binomial distribution. Distribution has parameters p = probability of the circuit and n = number of samples. This can only be used when there are exactly two SPAM labels in model_or_dataset.

  • “multinomial” - counts are taken from a multinomial distribution. Distribution has parameters p_k = probability of the circuit using the k-th SPAM label and n = number of samples. This should not be used for RPE.

seedint, optional

If not None, a seed for numpy’s random number generator, which is used to sample from the binomial or multinomial distribution.

Returns

DataSet

A static data set filled with counts for the specified circuits.