pygsti.data.dataset
Defines the DataSet class and supporting classes and functions
Module Contents
Classes
An association between Circuits and outcome counts, serving as the input data for many QCVV protocols. |
Attributes
- pygsti.data.dataset.Oindex_type
- pygsti.data.dataset.Time_type
- pygsti.data.dataset.Repcount_type
- class pygsti.data.dataset.DataSet(oli_data=None, time_data=None, rep_data=None, circuits=None, circuit_indices=None, outcome_labels=None, outcome_label_indices=None, static=False, file_to_load_from=None, collision_action='aggregate', comment=None, aux_info=None)
Bases:
pygsti.baseobjs.mongoserializable.MongoSerializable
An association between Circuits and outcome counts, serving as the input data for many QCVV protocols.
The DataSet class associates circuits with counts or time series of counts for each outcome label, and can be thought of as a table with gate strings labeling the rows and outcome labels and/or time labeling the columns. It is designed to behave similarly to a dictionary of dictionaries, so that counts are accessed by:
count = dataset[circuit][outcomeLabel]
in the time-independent case, and in the time-dependent case, for integer time index i >= 0,
outcomeLabel = dataset[circuit][i].outcome count = dataset[circuit][i].count time = dataset[circuit][i].time
Parameters
- oli_datalist or numpy.ndarray
When static == True, a 1D numpy array containing outcome label indices (integers), concatenated for all sequences. Otherwise, a list of 1D numpy arrays, one array per gate sequence. In either case, this quantity is indexed by the values of circuit_indices or the index of circuits.
- time_datalist or numpy.ndarray
Same format at oli_data except stores floating-point timestamp values.
- rep_datalist or numpy.ndarray
Same format at oli_data except stores integer repetition counts for each “data bin” (i.e. (outcome,time) pair). If all repetitions equal 1 (“single-shot” timestampted data), then rep_data can be None (no repetitions).
- circuitslist of (tuples or Circuits)
Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spam-label indices (above). Only specify this argument OR circuit_indices, not both.
- circuit_indicesordered dictionary
An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.
- outcome_labelslist of strings or int
Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both. If an int, specifies that the outcome labels should be those for a standard set of this many qubits.
- outcome_label_indicesordered dictionary
An OrderedDict with keys equal to spam labels (strings) and value equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.
- staticbool
- When True, create a read-only, i.e. “static” DataSet which cannot be modified. In
this case you must specify the timeseries data, circuits, and spam labels.
- When False, create a DataSet that can have time series data added to it. In this case,
you only need to specify the spam labels.
- file_to_load_fromstring or file object
Specify this argument and no others to create a static DataSet by loading from a file (just like using the load(…) function).
- collision_action{“aggregate”,”overwrite”,”keepseparate”}
Specifies how duplicate circuits should be handled. “aggregate” adds duplicate-circuit counts to the same circuit’s data at the next integer timestamp. “overwrite” only keeps the latest given data for a circuit. “keepseparate” tags duplicate-circuits by setting the .occurrence ID of added circuits that are already contained in this data set to the next available positive integer.
- commentstring, optional
A user-specified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.
- aux_infodict, optional
A user-specified dictionary of per-circuit auxiliary information. Keys should be the circuits in this DataSet and value should be Python dictionaries.
Initialize a DataSet.
Parameters
- oli_datalist or numpy.ndarray
When static == True, a 1D numpy array containing outcome label indices (integers), concatenated for all sequences. Otherwise, a list of 1D numpy arrays, one array per gate sequence. In either case, this quantity is indexed by the values of circuit_indices or the index of circuits.
- time_datalist or numpy.ndarray
Same format at oli_data except stores floating-point timestamp values.
- rep_datalist or numpy.ndarray
Same format at oli_data except stores integer repetition counts for each “data bin” (i.e. (outcome,time) pair). If all repetitions equal 1 (“single-shot” timestampted data), then rep_data can be None (no repetitions).
- circuitslist of (tuples or Circuits)
Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spam-label indices (above). Only specify this argument OR circuit_indices, not both.
- circuit_indicesordered dictionary
An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.
- outcome_labelslist of strings or int
Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both. If an int, specifies that the outcome labels should be those for a standard set of this many qubits.
- outcome_label_indicesordered dictionary
An OrderedDict with keys equal to outcome labels and values equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.
- staticbool
- When True, create a read-only, i.e. “static” DataSet which cannot be modified. In
this case you must specify the timeseries data, circuits, and spam labels.
- When False, create a DataSet that can have time series data added to it. In this case,
you only need to specify the spam labels.
- file_to_load_fromstring or file object
Specify this argument and no others to create a static DataSet by loading from a file (just like using the load(…) function).
- collision_action{“aggregate”,”overwrite”,”keepseparate”}
Specifies how duplicate circuits should be handled. “aggregate” adds duplicate-circuit counts to the same circuit’s data at the next integer timestamp. “overwrite” only keeps the latest given data for a circuit. “keepseparate” tags duplicate-circuits by setting the .occurrence ID of added circuits that are already contained in this data set to the next available positive integer.
- commentstring, optional
A user-specified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.
- aux_infodict, optional
A user-specified dictionary of per-circuit auxiliary information. Keys should be the circuits in this DataSet and value should be Python dictionaries.
Returns
- DataSet
a new data set object.
- property outcome_labels
Get a list of all the outcome labels contained in this DataSet.
Returns
- list of strings or tuples
A list where each element is an outcome label (which can be a string or a tuple of strings).
- property timestamps
Get a list of all the (unique) timestamps contained in this DataSet.
Returns
- list of floats
A list where each element is a timestamp.
- property meantimestep
The mean time-step, averaged over the time-step for each circuit and over circuits.
Returns
float
- property has_constant_totalcounts_pertime
True if the data for every circuit has the same number of total counts at every data collection time.
This will return True if there is a different number of total counts per circuit (i.e., after aggregating over time), as long as every circuit has the same total counts per time step (this will happen when the number of time-steps varies between circuit).
Returns
bool
- property totalcounts_pertime
Total counts per time, if this is constant over times and circuits.
When that doesn’t hold, an error is raised.
Returns
float or int
- property has_constant_totalcounts
True if the data for every circuit has the same number of total counts.
Returns
bool
- property has_trivial_timedependence
True if all the data in this DataSet occurs at time 0.
Returns
bool
- collection_name = "'pygsti_datasets'"
- keys()
Returns the circuits used as keys of this DataSet.
Returns
- list
A list of Circuit objects which index the data counts within this data set.
- items()
Iterator over (circuit, timeSeries) pairs.
Here circuit is a tuple of operation labels and timeSeries is a
_DataSetRow
instance, which behaves similarly to a list of spam labels whose index corresponds to the time step.Returns
_DataSetKVIterator
- values()
Iterator over _DataSetRow instances corresponding to the time series data for each circuit.
Returns
_DataSetValueIterator
- gate_labels(prefix='G')
Get a list of all the distinct operation labels used in the circuits of this dataset.
Parameters
- prefixstr
Filter the circuit labels so that only elements beginning with this prefix are returned. None performs no filtering.
Returns
- list of strings
A list where each element is a operation label.
- degrees_of_freedom(circuits=None, method='present_outcomes-1', aggregate_times=True)
Returns the number of independent degrees of freedom in the data for the circuits in circuits.
Parameters
- circuitslist of Circuits
The list of circuits to count degrees of freedom for. If None then all of the DataSet’s strings are used.
- method{‘all_outcomes-1’, ‘present_outcomes-1’, ‘tuned’}
How the degrees of freedom should be computed. ‘all_outcomes-1’ takes the number of circuits and multiplies this by the total number of outcomes (the length of what is returned by outcome_labels()) minus one. ‘present_outcomes-1’ counts on a per-circuit basis the number of present (usually = non-zero) outcomes recorded minus one. ‘tuned’ should be the most accurate, as it accounts for low-N “Poisson bump” behavior, but it is not the default because it is still under development. For timestamped data, see aggreate_times below.
- aggregate_timesbool, optional
Whether counts that occur at different times should be tallied separately. If True, then even when counts occur at different times degrees of freedom are tallied on a per-circuit basis. If False, then counts occuring at distinct times are treated as independent of those an any other time, and are tallied separately. So, for example, if aggregate_times is False and a data row has 0- and 1-counts of 45 & 55 at time=0 and 42 and 58 at time=1 this row would contribute 2 degrees of freedom, not 1. It can sometimes be useful to set this to False when the DataSet holds coarse-grained data, but usually you want this to be left as True (especially for time-series data).
Returns
int
- add_count_dict(circuit, count_dict, record_zero_counts=True, aux=None, update_ol=True)
Add a single circuit’s counts to this DataSet
Parameters
- circuittuple or Circuit
A tuple of operation labels specifying the circuit or a Circuit object
- count_dictdict
A dictionary with keys = outcome labels and values = counts
- record_zero_countsbool, optional
Whether zero-counts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
- auxdict, optional
A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
- update_olbool, optional
This argument is for internal use only and should be left as True.
Returns
None
- add_count_list(circuit, outcome_labels, counts, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)
Add a single circuit’s counts to this DataSet
Parameters
- circuittuple or Circuit
A tuple of operation labels specifying the circuit or a Circuit object
- outcome_labelslist or tuple
The outcome labels corresponding to counts.
- countslist or tuple
The counts themselves.
- record_zero_countsbool, optional
Whether zero-counts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
- auxdict, optional
A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
- update_olbool, optional
This argument is for internal use only and should be left as True.
- unsafebool, optional
True means that outcome_labels is guaranteed to hold tuple-type outcome labels and never plain strings. Only set this to True if you know what you’re doing.
Returns
None
- add_count_arrays(circuit, outcome_index_array, count_array, record_zero_counts=True, aux=None)
Add the outcomes for a single circuit, formatted as raw data arrays.
Parameters
- circuitCircuit
The circuit to add data for.
- outcome_index_arraynumpy.ndarray
An array of outcome indices, which must be values of self.olIndex (which maps outcome labels to indices).
- count_arraynumpy.ndarray
An array of integer (or sometimes floating point) counts, one corresponding to each outcome index (element of outcome_index_array).
- record_zero_countsbool, optional
Whether zero counts (zeros in count_array should be stored explicitly or not stored and inferred. Setting to False reduces the space taken by data sets containing lots of zero counts, but makes some objective function evaluations less precise.
- auxdict or None, optional
If not None a dictionary of user-defined auxiliary information that should be associated with this circuit.
Returns
None
- add_cirq_trial_result(circuit, trial_result, key, convert_int_to_binary=True, num_qubits=None)
Add a single circuit’s counts — stored in a Cirq TrialResult — to this DataSet
Parameters
- circuittuple or Circuit
A tuple of operation labels specifying the circuit or a Circuit object. Note that this must be a PyGSTi circuit — not a Cirq circuit.
- trial_resultcirq.TrialResult
The TrialResult to add
- keystr
The string key of the measurement. Set by cirq.measure.
- convert_int_to_binarybool, optional (defaut True)
By default the keys in the cirq Results object are the integers representing the bitstrings of the measurements on a set of qubits, in big-endian convention. If True this converts back to a binary string before adding the counts as a entry into the pygsti dataset.
- num_qubitsint, optional (default None)
Number of qubits used in the conversion from integers to binary when convert_int_to_binary is True. If None, then the number of line_labels on the input circuit is used.
Returns
None
- add_raw_series_data(circuit, outcome_label_list, time_stamp_list, rep_count_list=None, overwrite_existing=True, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)
Add a single circuit’s counts to this DataSet
Parameters
- circuittuple or Circuit
A tuple of operation labels specifying the circuit or a Circuit object
- outcome_label_listlist
A list of outcome labels (strings or tuples). An element’s index links it to a particular time step (i.e. the i-th element of the list specifies the outcome of the i-th measurement in the series).
- time_stamp_listlist
A list of floating point timestamps, each associated with the single corresponding outcome in outcome_label_list. Must be the same length as outcome_label_list.
- rep_count_listlist, optional
A list of integer counts specifying how many outcomes of type given by outcome_label_list occurred at the time given by time_stamp_list. If None, then all counts are assumed to be 1. When not None, must be the same length as outcome_label_list.
- overwrite_existingbool, optional
Whether to overwrite the data for circuit (if it exists). If False, then the given lists are appended (added) to existing data.
- record_zero_countsbool, optional
Whether zero-counts (elements of rep_count_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
- auxdict, optional
A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
- update_olbool, optional
This argument is for internal use only and should be left as True.
- unsafebool, optional
When True, don’t bother checking that outcome_label_list contains tuple-type outcome labels and automatically upgrading strings to 1-tuples. Only set this to True if you know what you’re doing and need the marginally faster performance.
Returns
None
- update_ol()
Updates the internal outcome-label list in this dataset.
Call this after calling add_count_dict(…) or add_raw_series_data(…) with update_olIndex=False.
Returns
None
- add_series_data(circuit, count_dict_list, time_stamp_list, overwrite_existing=True, record_zero_counts=True, aux=None)
Add a single circuit’s counts to this DataSet
Parameters
- circuittuple or Circuit
A tuple of operation labels specifying the circuit or a Circuit object
- count_dict_listlist
A list of dictionaries holding the outcome-label:count pairs for each time step (times given by time_stamp_list.
- time_stamp_listlist
A list of floating point timestamps, each associated with an entire dictionary of outcomes specified by count_dict_list.
- overwrite_existingbool, optional
If True, overwrite any existing data for the circuit. If False, add the count data with the next non-negative integer timestamp.
- record_zero_countsbool, optional
Whether zero-counts (elements of the dictionaries in count_dict_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
- auxdict, optional
A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
Returns
None
- aggregate_outcomes(label_merge_dict, record_zero_counts=True)
Creates a DataSet which merges certain outcomes in this DataSet.
Used, for example, to aggregate a 2-qubit 4-outcome DataSet into a 1-qubit 2-outcome DataSet.
Parameters
- label_merge_dictdictionary
The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a two-qubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use
filter_qubits()
which also updates the circuits.- record_zero_countsbool, optional
Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
Returns
- merged_datasetDataSet object
The DataSet with outcomes merged according to the rules given in label_merge_dict.
- aggregate_std_nqubit_outcomes(qubit_indices_to_keep, record_zero_counts=True)
Creates a DataSet which merges certain outcomes in this DataSet.
Used, for example, to aggregate a 2-qubit 4-outcome DataSet into a 1-qubit 2-outcome DataSet. This assumes that outcome labels are in the standard format whereby each qubit corresponds to a single ‘0’ or ‘1’ character.
Parameters
- qubit_indices_to_keeplist
A list of integers specifying which qubits should be kept, that is, not aggregated.
- record_zero_countsbool, optional
Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
Returns
- merged_datasetDataSet object
The DataSet with outcomes merged.
- add_auxiliary_info(circuit, aux)
Add auxiliary meta information to circuit.
Parameters
- circuittuple or Circuit
A tuple of operation labels specifying the circuit or a Circuit object
- auxdict, optional
A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).
Returns
None
- add_counts_from_dataset(other_data_set)
Append another DataSet’s data to this DataSet
Parameters
- other_data_setDataSet
The dataset to take counts from.
Returns
None
- add_series_from_dataset(other_data_set)
Append another DataSet’s series data to this DataSet
Parameters
- other_data_setDataSet
The dataset to take time series data from.
Returns
None
- to_str(mode='auto')
Render this DataSet as a string.
Parameters
- mode{“auto”,”time-dependent”,”time-independent”}
Whether to display the data as time-series of outcome counts (“time-dependent”) or to report per-outcome counts aggregated over time (“time-independent”). If “auto” is specified, then the time-independent mode is used only if all time stamps in the DataSet are equal to zero (trivial time dependence).
Returns
str
- truncate(list_of_circuits_to_keep, missing_action='raise')
Create a truncated dataset comprised of a subset of the circuits in this dataset.
Parameters
- list_of_circuits_to_keeplist of (tuples or Circuits)
A list of the circuits for the new returned dataset. If a circuit is given in this list that isn’t in the original data set, missing_action determines the behavior.
- missing_action{“raise”,”warn”,”ignore”}
What to do when a string in list_of_circuits_to_keep is not in the data set (raise a KeyError, issue a warning, or do nothing).
Returns
- DataSet
The truncated data set.
- time_slice(start_time, end_time, aggregate_to_time=None)
Creates a DataSet by aggregating the counts within the [start_time,`end_time`) interval.
Parameters
- start_timefloat
The starting time.
- end_timefloat
The ending time.
- aggregate_to_timefloat, optional
If not None, a single timestamp to give all the data in the specified range, resulting in time-independent DataSet. If None, then the original timestamps are preserved.
Returns
DataSet
- split_by_time(aggregate_to_time=None)
Creates a dictionary of DataSets, each of which is a equal-time slice of this DataSet.
The keys of the returned dictionary are the distinct timestamps in this dataset.
Parameters
- aggregate_to_timefloat, optional
If not None, a single timestamp to give all the data in each returned data set, resulting in time-independent DataSet objects. If None, then the original timestamps are preserved.
Returns
- OrderedDict
A dictionary of
DataSet
objects whose keys are the timestamp values of the original (this) data set in sorted order.
- drop_zero_counts()
Creates a copy of this data set that doesn’t include any zero counts.
Returns
DataSet
- process_times(process_times_array_fn)
Manipulate this DataSet’s timestamps according to processor_fn.
For example, using, the folloing process_times_array_fn would change the timestamps for each circuit to sequential integers.
def process_times_array_fn(times): return list(range(len(times)))
Parameters
- process_times_array_fnfunction
A function which takes a single array-of-timestamps argument and returns another similarly-sized array. This function is called, once per circuit, with the circuit’s array of timestamps.
Returns
- DataSet
A new data set with altered timestamps.
- process_circuits(processor_fn, aggregate=False)
Create a new data set by manipulating this DataSet’s circuits (keys) according to processor_fn.
The new DataSet’s circuits result from by running each of this DataSet’s circuits through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multi-qubit data.
Parameters
- processor_fnfunction
A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.
- aggregatebool, optional
When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.
Returns
DataSet
- process_circuits_inplace(processor_fn, aggregate=False)
Manipulate this DataSet’s circuits (keys) in-place according to processor_fn.
All of this DataSet’s circuits are updated by running each one through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multi-qubit data.
Parameters
- processor_fnfunction
A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.
- aggregatebool, optional
When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.
Returns
None
- remove(circuits, missing_action='raise')
Remove (delete) the data for circuits from this
DataSet
.Parameters
- circuitsiterable
An iterable over Circuit-like objects specifying the keys (circuits) to remove.
- missing_action{“raise”,”warn”,”ignore”}
What to do when a string in circuits is not in this data set (raise a KeyError, issue a warning, or do nothing).
Returns
None
- done_adding_data()
Promotes a non-static DataSet to a static (read-only) DataSet.
This method should be called after all data has been added.
Returns
None
- save(file_or_filename)
- write_binary(file_or_filename)
Write this data set to a binary-format file.
Parameters
- file_or_filenamestring or file object
If a string, interpreted as a filename. If this filename ends in “.gz”, the file will be gzip compressed.
Returns
None
- load(file_or_filename)
- read_binary(file_or_filename)
Read a DataSet from a binary file, clearing any data is contained previously.
The file should have been created with
DataSet.write_binary()
Parameters
- file_or_filenamestr or buffer
The file or filename to load from.
Returns
None
- rename_outcome_labels(old_to_new_dict)
Replaces existing output labels with new ones as per old_to_new_dict.
Parameters
- old_to_new_dictdict
A mapping from old/existing outcome labels to new ones. Strings in keys or values are automatically converted to 1-tuples. Missing outcome labels are left unaltered.
Returns
None
- add_std_nqubit_outcome_labels(nqubits)
Adds all the “standard” outcome labels (e.g. ‘0010’) on nqubits qubits.
This is useful to ensure that, even if not all outcomes appear in the data, that all are recognized as being potentially valid outcomes (and so attempts to get counts for these outcomes will be 0 rather than raising an error).
Parameters
- nqubitsint
The number of qubits. For example, if equal to 3 the outcome labels ‘000’, ‘001’, … ‘111’ are added.
Returns
None
- add_outcome_labels(outcome_labels, update_ol=True)
Adds new valid outcome labels.
Ensures that all the elements of outcome_labels are stored as valid outcomes for circuits in this DataSet, adding new outcomes as necessary.
Parameters
- outcome_labelslist or generator
A list or generator of string- or tuple-valued outcome labels.
- update_olbool, optional
Whether to update internal mappings to reflect the new outcome labels. Leave this as True unless you really know what you’re doing.
Returns
None
- auxinfo_dataframe(pivot_valuename=None, pivot_value=None, drop_columns=False)
Create a Pandas dataframe with aux-data from this dataset.
Parameters
- pivot_valuenamestr, optional
If not None, the resulting dataframe is pivoted using pivot_valuename as the column whose values name the pivoted table’s column names. If None and pivot_value is not None,`”ValueName”` is used.
- pivot_valuestr, optional
If not None, the resulting dataframe is pivoted such that values of the pivot_value column are rearranged into new columns whose names are given by the values of the pivot_valuename column. If None and pivot_valuename is not None,`”Value”` is used.
- drop_columnsbool or list, optional
A list of column names to drop (prior to performing any pivot). If True appears in this list or is given directly, then all constant-valued columns are dropped as well. No columns are dropped when drop_columns == False.
Returns
pandas.DataFrame