pygsti.data.dataset

Defines the DataSet class and supporting classes and functions

Module Contents

Classes

DataSet

An association between Circuits and outcome counts, serving as the input data for many QCVV protocols.

Attributes

Oindex_type

Time_type

Repcount_type

pygsti.data.dataset.Oindex_type
pygsti.data.dataset.Time_type
pygsti.data.dataset.Repcount_type
class pygsti.data.dataset.DataSet(oli_data=None, time_data=None, rep_data=None, circuits=None, circuit_indices=None, outcome_labels=None, outcome_label_indices=None, static=False, file_to_load_from=None, collision_action='aggregate', comment=None, aux_info=None)

Bases: pygsti.baseobjs.mongoserializable.MongoSerializable

An association between Circuits and outcome counts, serving as the input data for many QCVV protocols.

The DataSet class associates circuits with counts or time series of counts for each outcome label, and can be thought of as a table with gate strings labeling the rows and outcome labels and/or time labeling the columns. It is designed to behave similarly to a dictionary of dictionaries, so that counts are accessed by:

count = dataset[circuit][outcomeLabel]

in the time-independent case, and in the time-dependent case, for integer time index i >= 0,

outcomeLabel = dataset[circuit][i].outcome count = dataset[circuit][i].count time = dataset[circuit][i].time

Parameters

oli_datalist or numpy.ndarray

When static == True, a 1D numpy array containing outcome label indices (integers), concatenated for all sequences. Otherwise, a list of 1D numpy arrays, one array per gate sequence. In either case, this quantity is indexed by the values of circuit_indices or the index of circuits.

time_datalist or numpy.ndarray

Same format at oli_data except stores floating-point timestamp values.

rep_datalist or numpy.ndarray

Same format at oli_data except stores integer repetition counts for each “data bin” (i.e. (outcome,time) pair). If all repetitions equal 1 (“single-shot” timestampted data), then rep_data can be None (no repetitions).

circuitslist of (tuples or Circuits)

Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spam-label indices (above). Only specify this argument OR circuit_indices, not both.

circuit_indicesordered dictionary

An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.

outcome_labelslist of strings or int

Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both. If an int, specifies that the outcome labels should be those for a standard set of this many qubits.

outcome_label_indicesordered dictionary

An OrderedDict with keys equal to spam labels (strings) and value equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.

staticbool
When True, create a read-only, i.e. “static” DataSet which cannot be modified. In

this case you must specify the timeseries data, circuits, and spam labels.

When False, create a DataSet that can have time series data added to it. In this case,

you only need to specify the spam labels.

file_to_load_fromstring or file object

Specify this argument and no others to create a static DataSet by loading from a file (just like using the load(…) function).

collision_action{“aggregate”,”overwrite”,”keepseparate”}

Specifies how duplicate circuits should be handled. “aggregate” adds duplicate-circuit counts to the same circuit’s data at the next integer timestamp. “overwrite” only keeps the latest given data for a circuit. “keepseparate” tags duplicate-circuits by setting the .occurrence ID of added circuits that are already contained in this data set to the next available positive integer.

commentstring, optional

A user-specified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.

aux_infodict, optional

A user-specified dictionary of per-circuit auxiliary information. Keys should be the circuits in this DataSet and value should be Python dictionaries.

Initialize a DataSet.

Parameters

oli_datalist or numpy.ndarray

When static == True, a 1D numpy array containing outcome label indices (integers), concatenated for all sequences. Otherwise, a list of 1D numpy arrays, one array per gate sequence. In either case, this quantity is indexed by the values of circuit_indices or the index of circuits.

time_datalist or numpy.ndarray

Same format at oli_data except stores floating-point timestamp values.

rep_datalist or numpy.ndarray

Same format at oli_data except stores integer repetition counts for each “data bin” (i.e. (outcome,time) pair). If all repetitions equal 1 (“single-shot” timestampted data), then rep_data can be None (no repetitions).

circuitslist of (tuples or Circuits)

Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spam-label indices (above). Only specify this argument OR circuit_indices, not both.

circuit_indicesordered dictionary

An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.

outcome_labelslist of strings or int

Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both. If an int, specifies that the outcome labels should be those for a standard set of this many qubits.

outcome_label_indicesordered dictionary

An OrderedDict with keys equal to outcome labels and values equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.

staticbool
When True, create a read-only, i.e. “static” DataSet which cannot be modified. In

this case you must specify the timeseries data, circuits, and spam labels.

When False, create a DataSet that can have time series data added to it. In this case,

you only need to specify the spam labels.

file_to_load_fromstring or file object

Specify this argument and no others to create a static DataSet by loading from a file (just like using the load(…) function).

collision_action{“aggregate”,”overwrite”,”keepseparate”}

Specifies how duplicate circuits should be handled. “aggregate” adds duplicate-circuit counts to the same circuit’s data at the next integer timestamp. “overwrite” only keeps the latest given data for a circuit. “keepseparate” tags duplicate-circuits by setting the .occurrence ID of added circuits that are already contained in this data set to the next available positive integer.

commentstring, optional

A user-specified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.

aux_infodict, optional

A user-specified dictionary of per-circuit auxiliary information. Keys should be the circuits in this DataSet and value should be Python dictionaries.

Returns

DataSet

a new data set object.

property outcome_labels

Get a list of all the outcome labels contained in this DataSet.

Returns
list of strings or tuples

A list where each element is an outcome label (which can be a string or a tuple of strings).

property timestamps

Get a list of all the (unique) timestamps contained in this DataSet.

Returns
list of floats

A list where each element is a timestamp.

property meantimestep

The mean time-step, averaged over the time-step for each circuit and over circuits.

Returns

float

property has_constant_totalcounts_pertime

True if the data for every circuit has the same number of total counts at every data collection time.

This will return True if there is a different number of total counts per circuit (i.e., after aggregating over time), as long as every circuit has the same total counts per time step (this will happen when the number of time-steps varies between circuit).

Returns

bool

property totalcounts_pertime

Total counts per time, if this is constant over times and circuits.

When that doesn’t hold, an error is raised.

Returns

float or int

property has_constant_totalcounts

True if the data for every circuit has the same number of total counts.

Returns

bool

property has_trivial_timedependence

True if all the data in this DataSet occurs at time 0.

Returns

bool

collection_name = 'pygsti_datasets'
keys()

Returns the circuits used as keys of this DataSet.

Returns
list

A list of Circuit objects which index the data counts within this data set.

items()

Iterator over (circuit, timeSeries) pairs.

Here circuit is a tuple of operation labels and timeSeries is a _DataSetRow instance, which behaves similarly to a list of spam labels whose index corresponds to the time step.

Returns

_DataSetKVIterator

values()

Iterator over _DataSetRow instances corresponding to the time series data for each circuit.

Returns

_DataSetValueIterator

gate_labels(prefix='G')

Get a list of all the distinct operation labels used in the circuits of this dataset.

Parameters
prefixstr

Filter the circuit labels so that only elements beginning with this prefix are returned. None performs no filtering.

Returns
list of strings

A list where each element is a operation label.

degrees_of_freedom(circuits=None, method='present_outcomes-1', aggregate_times=True)

Returns the number of independent degrees of freedom in the data for the circuits in circuits.

Parameters
circuitslist of Circuits

The list of circuits to count degrees of freedom for. If None then all of the DataSet’s strings are used.

method{‘all_outcomes-1’, ‘present_outcomes-1’, ‘tuned’}

How the degrees of freedom should be computed. ‘all_outcomes-1’ takes the number of circuits and multiplies this by the total number of outcomes (the length of what is returned by outcome_labels()) minus one. ‘present_outcomes-1’ counts on a per-circuit basis the number of present (usually = non-zero) outcomes recorded minus one. ‘tuned’ should be the most accurate, as it accounts for low-N “Poisson bump” behavior, but it is not the default because it is still under development. For timestamped data, see aggreate_times below.

aggregate_timesbool, optional

Whether counts that occur at different times should be tallied separately. If True, then even when counts occur at different times degrees of freedom are tallied on a per-circuit basis. If False, then counts occuring at distinct times are treated as independent of those an any other time, and are tallied separately. So, for example, if aggregate_times is False and a data row has 0- and 1-counts of 45 & 55 at time=0 and 42 and 58 at time=1 this row would contribute 2 degrees of freedom, not 1. It can sometimes be useful to set this to False when the DataSet holds coarse-grained data, but usually you want this to be left as True (especially for time-series data).

Returns

int

add_count_dict(circuit, count_dict, record_zero_counts=True, aux=None, update_ol=True)

Add a single circuit’s counts to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

count_dictdict

A dictionary with keys = outcome labels and values = counts

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

update_olbool, optional

This argument is for internal use only and should be left as True.

Returns

None

add_count_list(circuit, outcome_labels, counts, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)

Add a single circuit’s counts to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

outcome_labelslist or tuple

The outcome labels corresponding to counts.

countslist or tuple

The counts themselves.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

update_olbool, optional

This argument is for internal use only and should be left as True.

unsafebool, optional

True means that outcome_labels is guaranteed to hold tuple-type outcome labels and never plain strings. Only set this to True if you know what you’re doing.

Returns

None

add_count_arrays(circuit, outcome_index_array, count_array, record_zero_counts=True, aux=None)

Add the outcomes for a single circuit, formatted as raw data arrays.

Parameters
circuitCircuit

The circuit to add data for.

outcome_index_arraynumpy.ndarray

An array of outcome indices, which must be values of self.olIndex (which maps outcome labels to indices).

count_arraynumpy.ndarray

An array of integer (or sometimes floating point) counts, one corresponding to each outcome index (element of outcome_index_array).

record_zero_countsbool, optional

Whether zero counts (zeros in count_array should be stored explicitly or not stored and inferred. Setting to False reduces the space taken by data sets containing lots of zero counts, but makes some objective function evaluations less precise.

auxdict or None, optional

If not None a dictionary of user-defined auxiliary information that should be associated with this circuit.

Returns

None

add_cirq_trial_result(circuit, trial_result, key, convert_int_to_binary=True, num_qubits=None)

Add a single circuit’s counts — stored in a Cirq TrialResult — to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object. Note that this must be a PyGSTi circuit — not a Cirq circuit.

trial_resultcirq.TrialResult

The TrialResult to add

keystr

The string key of the measurement. Set by cirq.measure.

convert_int_to_binarybool, optional (defaut True)

By default the keys in the cirq Results object are the integers representing the bitstrings of the measurements on a set of qubits, in big-endian convention. If True this converts back to a binary string before adding the counts as a entry into the pygsti dataset.

num_qubitsint, optional (default None)

Number of qubits used in the conversion from integers to binary when convert_int_to_binary is True. If None, then the number of line_labels on the input circuit is used.

Returns

None

add_raw_series_data(circuit, outcome_label_list, time_stamp_list, rep_count_list=None, overwrite_existing=True, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)

Add a single circuit’s counts to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

outcome_label_listlist

A list of outcome labels (strings or tuples). An element’s index links it to a particular time step (i.e. the i-th element of the list specifies the outcome of the i-th measurement in the series).

time_stamp_listlist

A list of floating point timestamps, each associated with the single corresponding outcome in outcome_label_list. Must be the same length as outcome_label_list.

rep_count_listlist, optional

A list of integer counts specifying how many outcomes of type given by outcome_label_list occurred at the time given by time_stamp_list. If None, then all counts are assumed to be 1. When not None, must be the same length as outcome_label_list.

overwrite_existingbool, optional

Whether to overwrite the data for circuit (if it exists). If False, then the given lists are appended (added) to existing data.

record_zero_countsbool, optional

Whether zero-counts (elements of rep_count_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

update_olbool, optional

This argument is for internal use only and should be left as True.

unsafebool, optional

When True, don’t bother checking that outcome_label_list contains tuple-type outcome labels and automatically upgrading strings to 1-tuples. Only set this to True if you know what you’re doing and need the marginally faster performance.

Returns

None

update_ol()

Updates the internal outcome-label list in this dataset.

Call this after calling add_count_dict(…) or add_raw_series_data(…) with update_olIndex=False.

Returns

None

add_series_data(circuit, count_dict_list, time_stamp_list, overwrite_existing=True, record_zero_counts=True, aux=None)

Add a single circuit’s counts to this DataSet

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

count_dict_listlist

A list of dictionaries holding the outcome-label:count pairs for each time step (times given by time_stamp_list.

time_stamp_listlist

A list of floating point timestamps, each associated with an entire dictionary of outcomes specified by count_dict_list.

overwrite_existingbool, optional

If True, overwrite any existing data for the circuit. If False, add the count data with the next non-negative integer timestamp.

record_zero_countsbool, optional

Whether zero-counts (elements of the dictionaries in count_dict_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

Returns

None

aggregate_outcomes(label_merge_dict, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in this DataSet.

Used, for example, to aggregate a 2-qubit 4-outcome DataSet into a 1-qubit 2-outcome DataSet.

Parameters
label_merge_dictdictionary

The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a two-qubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use filter_qubits() which also updates the circuits.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

Returns
merged_datasetDataSet object

The DataSet with outcomes merged according to the rules given in label_merge_dict.

aggregate_std_nqubit_outcomes(qubit_indices_to_keep, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in this DataSet.

Used, for example, to aggregate a 2-qubit 4-outcome DataSet into a 1-qubit 2-outcome DataSet. This assumes that outcome labels are in the standard format whereby each qubit corresponds to a single ‘0’ or ‘1’ character.

Parameters
qubit_indices_to_keeplist

A list of integers specifying which qubits should be kept, that is, not aggregated.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

Returns
merged_datasetDataSet object

The DataSet with outcomes merged.

add_auxiliary_info(circuit, aux)

Add auxiliary meta information to circuit.

Parameters
circuittuple or Circuit

A tuple of operation labels specifying the circuit or a Circuit object

auxdict, optional

A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

Returns

None

add_counts_from_dataset(other_data_set)

Append another DataSet’s data to this DataSet

Parameters
other_data_setDataSet

The dataset to take counts from.

Returns

None

add_series_from_dataset(other_data_set)

Append another DataSet’s series data to this DataSet

Parameters
other_data_setDataSet

The dataset to take time series data from.

Returns

None

to_str(mode='auto')

Render this DataSet as a string.

Parameters
mode{“auto”,”time-dependent”,”time-independent”}

Whether to display the data as time-series of outcome counts (“time-dependent”) or to report per-outcome counts aggregated over time (“time-independent”). If “auto” is specified, then the time-independent mode is used only if all time stamps in the DataSet are equal to zero (trivial time dependence).

Returns

str

truncate(list_of_circuits_to_keep, missing_action='raise')

Create a truncated dataset comprised of a subset of the circuits in this dataset.

Parameters
list_of_circuits_to_keeplist of (tuples or Circuits)

A list of the circuits for the new returned dataset. If a circuit is given in this list that isn’t in the original data set, missing_action determines the behavior.

missing_action{“raise”,”warn”,”ignore”}

What to do when a string in list_of_circuits_to_keep is not in the data set (raise a KeyError, issue a warning, or do nothing).

Returns
DataSet

The truncated data set.

time_slice(start_time, end_time, aggregate_to_time=None)

Creates a DataSet by aggregating the counts within the [start_time,`end_time`) interval.

Parameters
start_timefloat

The starting time.

end_timefloat

The ending time.

aggregate_to_timefloat, optional

If not None, a single timestamp to give all the data in the specified range, resulting in time-independent DataSet. If None, then the original timestamps are preserved.

Returns

DataSet

split_by_time(aggregate_to_time=None)

Creates a dictionary of DataSets, each of which is a equal-time slice of this DataSet.

The keys of the returned dictionary are the distinct timestamps in this dataset.

Parameters
aggregate_to_timefloat, optional

If not None, a single timestamp to give all the data in each returned data set, resulting in time-independent DataSet objects. If None, then the original timestamps are preserved.

Returns
OrderedDict

A dictionary of DataSet objects whose keys are the timestamp values of the original (this) data set in sorted order.

drop_zero_counts()

Creates a copy of this data set that doesn’t include any zero counts.

Returns

DataSet

process_times(process_times_array_fn)

Manipulate this DataSet’s timestamps according to processor_fn.

For example, using, the folloing process_times_array_fn would change the timestamps for each circuit to sequential integers.

def process_times_array_fn(times):
    return list(range(len(times)))
Parameters
process_times_array_fnfunction

A function which takes a single array-of-timestamps argument and returns another similarly-sized array. This function is called, once per circuit, with the circuit’s array of timestamps.

Returns
DataSet

A new data set with altered timestamps.

process_circuits(processor_fn, aggregate=False)

Create a new data set by manipulating this DataSet’s circuits (keys) according to processor_fn.

The new DataSet’s circuits result from by running each of this DataSet’s circuits through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multi-qubit data.

Parameters
processor_fnfunction

A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.

aggregatebool, optional

When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.

Returns

DataSet

process_circuits_inplace(processor_fn, aggregate=False)

Manipulate this DataSet’s circuits (keys) in-place according to processor_fn.

All of this DataSet’s circuits are updated by running each one through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multi-qubit data.

Parameters
processor_fnfunction

A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.

aggregatebool, optional

When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.

Returns

None

remove(circuits, missing_action='raise')

Remove (delete) the data for circuits from this DataSet.

Parameters
circuitsiterable

An iterable over Circuit-like objects specifying the keys (circuits) to remove.

missing_action{“raise”,”warn”,”ignore”}

What to do when a string in circuits is not in this data set (raise a KeyError, issue a warning, or do nothing).

Returns

None

copy()

Make a copy of this DataSet.

Returns

DataSet

copy_nonstatic()

Make a non-static copy of this DataSet.

Returns

DataSet

done_adding_data()

Promotes a non-static DataSet to a static (read-only) DataSet.

This method should be called after all data has been added.

Returns

None

save(file_or_filename)
write_binary(file_or_filename)

Write this data set to a binary-format file.

Parameters
file_or_filenamestring or file object

If a string, interpreted as a filename. If this filename ends in “.gz”, the file will be gzip compressed.

Returns

None

load(file_or_filename)
read_binary(file_or_filename)

Read a DataSet from a binary file, clearing any data is contained previously.

The file should have been created with DataSet.write_binary()

Parameters
file_or_filenamestr or buffer

The file or filename to load from.

Returns

None

rename_outcome_labels(old_to_new_dict)

Replaces existing output labels with new ones as per old_to_new_dict.

Parameters
old_to_new_dictdict

A mapping from old/existing outcome labels to new ones. Strings in keys or values are automatically converted to 1-tuples. Missing outcome labels are left unaltered.

Returns

None

add_std_nqubit_outcome_labels(nqubits)

Adds all the “standard” outcome labels (e.g. ‘0010’) on nqubits qubits.

This is useful to ensure that, even if not all outcomes appear in the data, that all are recognized as being potentially valid outcomes (and so attempts to get counts for these outcomes will be 0 rather than raising an error).

Parameters
nqubitsint

The number of qubits. For example, if equal to 3 the outcome labels ‘000’, ‘001’, ‘111’ are added.

Returns

None

add_outcome_labels(outcome_labels, update_ol=True)

Adds new valid outcome labels.

Ensures that all the elements of outcome_labels are stored as valid outcomes for circuits in this DataSet, adding new outcomes as necessary.

Parameters
outcome_labelslist or generator

A list or generator of string- or tuple-valued outcome labels.

update_olbool, optional

Whether to update internal mappings to reflect the new outcome labels. Leave this as True unless you really know what you’re doing.

Returns

None

auxinfo_dataframe(pivot_valuename=None, pivot_value=None, drop_columns=False)

Create a Pandas dataframe with aux-data from this dataset.

Parameters
pivot_valuenamestr, optional

If not None, the resulting dataframe is pivoted using pivot_valuename as the column whose values name the pivoted table’s column names. If None and pivot_value is not None,`”ValueName”` is used.

pivot_valuestr, optional

If not None, the resulting dataframe is pivoted such that values of the pivot_value column are rearranged into new columns whose names are given by the values of the pivot_valuename column. If None and pivot_valuename is not None,`”Value”` is used.

drop_columnsbool or list, optional

A list of column names to drop (prior to performing any pivot). If True appears in this list or is given directly, then all constant-valued columns are dropped as well. No columns are dropped when drop_columns == False.

Returns

pandas.DataFrame