pygsti.data.dataset

Defines the DataSet class and supporting classes and functions

Module Contents

Classes

_DataSetKVIterator

Iterator class for op_string,_DataSetRow pairs of a DataSet

_DataSetValueIterator

Iterator class for _DataSetRow values of a DataSet

_DataSetRow

Encapsulates DataSet time series data for a single circuit.

DataSet

An association between Circuits and outcome counts, serving as the input data for many QCVV protocols.

Functions

_round_int_repcnt(nreps)

Helper function to localize warning message

Attributes

Oindex_type

Time_type

Repcount_type

_DATAROW_AUTOCACHECOUNT_THRESHOLD

pygsti.data.dataset.Oindex_type
pygsti.data.dataset.Time_type
pygsti.data.dataset.Repcount_type
pygsti.data.dataset._DATAROW_AUTOCACHECOUNT_THRESHOLD = 256
class pygsti.data.dataset._DataSetKVIterator(dataset)

Bases: object

Iterator class for op_string,_DataSetRow pairs of a DataSet

Parameters

dataset (DataSet) – The parent data set.

next
__iter__(self)
__next__(self)
class pygsti.data.dataset._DataSetValueIterator(dataset)

Bases: object

Iterator class for _DataSetRow values of a DataSet

Parameters

dataset (DataSet) – The parent data set.

next
__iter__(self)
__next__(self)
class pygsti.data.dataset._DataSetRow(dataset, row_oli_data, row_time_data, row_rep_data, cached_cnts, aux)

Bases: object

Encapsulates DataSet time series data for a single circuit.

Outwardly, it looks similar to a list with (outcome_label, time_index, repetition_count) tuples as the values.

Parameters
  • dataset (DataSet) – The parent data set.

  • row_oli_data (numpy.ndarray) – The outcome label indices for each bin of this row.

  • row_time_data (numpy.ndarray) – The timestamps for each bin of this row.

  • row_rep_data (numpy.ndarray) – The repetition counts for each bin of this row (if None, assume 1 per bin).

  • cached_cnts (dict) – A cached pre-computed count dictionary (for speed).

  • aux (dict) – Dictionary of auxiliary information.

outcomes

Returns this row’s sequence of outcome labels, one per “bin” of repetition counts (returned by :method:`get_counts`).

Type

list

counts

a dictionary of per-outcome counts.

Type

dict

allcounts

a dictionary of per-outcome counts with all possible outcomes as keys and zero values when an outcome didn’t occur. Note this can be expensive to compute for many-qubit data.

Type

dict

fractions

a dictionary of per-outcome fractions.

Type

dict

total

Returns the total number of counts contained in this row.

Type

int

property outcomes(self)

This row’s sequence of outcome labels, one per “bin” of repetition counts.

property unique_outcomes(self)

This row’s unique set of outcome labels, as a list

property expanded_ol(self)

This row’s sequence of outcome labels, with repetition counts expanded.

Thus, there’s one element in the returned list for each count.

Returns

list

property expanded_oli(self)

This row’s sequence of outcome label indices, with repetition counts expanded.

Thus, there’s one element in the returned list for each count.

Returns

numpy.ndarray

property expanded_times(self)

This row’s sequence of time stamps, with repetition counts expanded.

Thus, there’s one element in the returned list for each count.

Returns

numpy.ndarray

property times(self)

A list containing the unique data collection times at which there is at least one measurement result.

Returns

list

property timeseries_for_outcomes(self)

Row data in a time-series format.

This can be a much less succinct format than returned by counts_as_timeseries. E.g., it is highly inefficient for many-qubit data.

Returns

  • times (list) – The time steps, containing the unique data collection times.

  • reps (dict) – A dictionary of lists containing the number of times each measurement outcome was observed at the unique data collection times in times.

counts_as_timeseries(self)

Returns data in a time-series format.

Returns

  • times (list) – The time steps, containing the unique data collection times.

  • reps (list) – A list of dictionaries containing the counts dict corresponding to the list of unique data collection times in times.

property reps_timeseries(self)

The number of measurement results at each data collection time.

Returns

  • times (list) – The time steps.

  • reps (list) – The total number of counts at each time step.

property number_of_times(self)

Returns the number of data collection times.

Returns

int

property has_constant_totalcounts(self)

True if the numbers of counts is the same at all data collection times. Otherwise False.

Returns

bool

property totalcounts_per_timestep(self)

The number of total counts per time-step, when this is constant.

If the total counts vary over the times that there is at least one measurement result, then this function will raise an error.

Returns

int

property meantimestep(self)

The mean time-step.

Will raise an error for data that is a trivial time-series (i.e., data all at one time).

Returns

float

__iter__(self)
__contains__(self, outcome_label)

Checks whether data counts for outcomelabel are available.

__getitem__(self, index_or_outcome_label)
__setitem__(self, index_or_outcome_label, val)
get(self, index_or_outcome_label, default_value)

The the number of counts for an index or outcome label.

If the index or outcome is nor present, default_value is returned.

Parameters
  • index_or_outcome_label (int or str or tuple) – The index or outcome label to lookup.

  • default_value (object) – The value to return if this data row doesn’t contain data at the given index.

Returns

int or float

_get_single_count(self, outcome_label, timestamp=None)
_get_counts(self, timestamp=None, all_outcomes=False)

Returns this row’s sequence of “repetition counts”, that is, the number of repetitions of each outcome label in the outcomes list, or equivalently, each outcome label index in this rows .oli member.

property counts(self)

Dictionary of per-outcome counts.

property allcounts(self)

Dictionary of per-outcome counts with all possible outcomes as keys.

This means that and zero values are included when an outcome didn’t occur. Note this can be expensive to assemble for many-qubit data.

property fractions(self, all_outcomes=False)

Dictionary of per-outcome fractions.

property total(self)

The total number of counts contained in this row.

fraction(self, outcomelabel)

The fraction of total counts for outcomelabel.

Parameters

outcomelabel (str or tuple) – The outcome label, e.g. ‘010’ or (‘0’,’11’).

Returns

float

counts_at_time(self, timestamp)

Returns a dictionary of counts at a particular time

Parameters

timestamp (float) – the time to get counts at.

Returns

int

timeseries(self, outcomelabel, timestamps=None)

Retrieve timestamps and counts for a single outcome label or for aggregated counts if outcomelabel == “all”.

Parameters
  • outcomelabel (str or tuple) – The outcome label to extract a series for. If the special value “all” is used, total (aggregated over all outcomes) counts are returned.

  • timestamps (list or array, optional) – If not None, an array of time stamps to extract counts for, which will also be returned as times. Times at which there is no data will be returned as zero-counts.

Returns

times, counts (numpy.ndarray)

scale_inplace(self, factor)

Scales all the counts of this row by the given factor

Parameters

factor (float) – scaling factor.

Returns

None

to_dict(self)

Returns the (outcomeLabel,count) pairs as a dictionary.

Returns

dict

to_str(self, mode='auto')

Render this _DataSetRow as a string.

Parameters

mode ({"auto","time-dependent","time-independent"}) – Whether to display the data as time-series of outcome counts (“time-dependent”) or to report per-outcome counts aggregated over time (“time-independent”). If “auto” is specified, then the time-independent mode is used only if all time stamps in the _DataSetRow are equal (trivial time dependence).

Returns

str

__str__(self)

Return str(self).

__len__(self)
pygsti.data.dataset._round_int_repcnt(nreps)

Helper function to localize warning message

class pygsti.data.dataset.DataSet(oli_data=None, time_data=None, rep_data=None, circuits=None, circuit_indices=None, outcome_labels=None, outcome_label_indices=None, static=False, file_to_load_from=None, collision_action='aggregate', comment=None, aux_info=None)

Bases: object

An association between Circuits and outcome counts, serving as the input data for many QCVV protocols.

The DataSet class associates circuits with counts or time series of counts for each outcome label, and can be thought of as a table with gate strings labeling the rows and outcome labels and/or time labeling the columns. It is designed to behave similarly to a dictionary of dictionaries, so that counts are accessed by:

count = dataset[circuit][outcomeLabel]

in the time-independent case, and in the time-dependent case, for integer time index i >= 0,

outcomeLabel = dataset[circuit][i].outcome count = dataset[circuit][i].count time = dataset[circuit][i].time

Parameters
  • oli_data (list or numpy.ndarray) – When static == True, a 1D numpy array containing outcome label indices (integers), concatenated for all sequences. Otherwise, a list of 1D numpy arrays, one array per gate sequence. In either case, this quantity is indexed by the values of circuit_indices or the index of circuits.

  • time_data (list or numpy.ndarray) – Same format at oli_data except stores floating-point timestamp values.

  • rep_data (list or numpy.ndarray) – Same format at oli_data except stores integer repetition counts for each “data bin” (i.e. (outcome,time) pair). If all repetitions equal 1 (“single-shot” timestampted data), then rep_data can be None (no repetitions).

  • circuits (list of (tuples or Circuits)) – Each element is a tuple of operation labels or a Circuit object. Indices for these strings are assumed to ascend from 0. These indices must correspond to the time series of spam-label indices (above). Only specify this argument OR circuit_indices, not both.

  • circuit_indices (ordered dictionary) – An OrderedDict with keys equal to circuits (tuples of operation labels) and values equal to integer indices associating a row/element of counts with the circuit. Only specify this argument OR circuits, not both.

  • outcome_labels (list of strings or int) – Specifies the set of spam labels for the DataSet. Indices for the spam labels are assumed to ascend from 0, starting with the first element of this list. These indices will associate each elememtn of timeseries with a spam label. Only specify this argument OR outcome_label_indices, not both. If an int, specifies that the outcome labels should be those for a standard set of this many qubits.

  • outcome_label_indices (ordered dictionary) – An OrderedDict with keys equal to spam labels (strings) and value equal to integer indices associating a spam label with given index. Only specify this argument OR outcome_labels, not both.

  • static (bool) –

    When True, create a read-only, i.e. “static” DataSet which cannot be modified. In

    this case you must specify the timeseries data, circuits, and spam labels.

    When False, create a DataSet that can have time series data added to it. In this case,

    you only need to specify the spam labels.

  • file_to_load_from (string or file object) – Specify this argument and no others to create a static DataSet by loading from a file (just like using the load(…) function).

  • collision_action ({"aggregate","overwrite","keepseparate"}) – Specifies how duplicate circuits should be handled. “aggregate” adds duplicate-circuit counts to the same circuit’s data at the next integer timestamp. “overwrite” only keeps the latest given data for a circuit. “keepseparate” tags duplicate-circuits by setting the .occurrence ID of added circuits that are already contained in this data set to the next available positive integer.

  • comment (string, optional) – A user-specified comment string that gets carried around with the data. A common use for this field is to attach to the data details regarding its collection.

  • aux_info (dict, optional) – A user-specified dictionary of per-circuit auxiliary information. Keys should be the circuits in this DataSet and value should be Python dictionaries.

__iter__(self)
__len__(self)
__contains__(self, circuit)

Test whether data set contains a given circuit.

Parameters

circuit (tuple or Circuit) – A tuple of operation labels or a Circuit instance which specifies the the circuit to check for.

Returns

bool – whether circuit was found.

__hash__(self)

Return hash(self).

__getitem__(self, circuit)
__setitem__(self, circuit, outcome_dict_or_series)
__delitem__(self, circuit)
_get_row(self, circuit)

Get a row of data from this DataSet.

Parameters

circuit (Circuit or tuple) – The gate sequence to extract data for.

Returns

_DataSetRow

_set_row(self, circuit, outcome_dict_or_series)

Set the counts for a row of this DataSet.

Parameters
  • circuit (Circuit or tuple) – The gate sequence to extract data for.

  • outcome_dict_or_series (dict or tuple) – The outcome count data, either a dictionary of outcome counts (with keys as outcome labels) or a tuple of lists. In the latter case this can be a 2-tuple: (outcome-label-list, timestamp-list) or a 3-tuple: (outcome-label-list, timestamp-list, repetition-count-list).

Returns

None

keys(self)

Returns the circuits used as keys of this DataSet.

Returns

list – A list of Circuit objects which index the data counts within this data set.

items(self)

Iterator over (circuit, timeSeries) pairs.

Here circuit is a tuple of operation labels and timeSeries is a _DataSetRow instance, which behaves similarly to a list of spam labels whose index corresponds to the time step.

Returns

_DataSetKVIterator

values(self)

Iterator over _DataSetRow instances corresponding to the time series data for each circuit.

Returns

_DataSetValueIterator

property outcome_labels(self)

Get a list of all the outcome labels contained in this DataSet.

Returns

list of strings or tuples – A list where each element is an outcome label (which can be a string or a tuple of strings).

property timestamps(self)

Get a list of all the (unique) timestamps contained in this DataSet.

Returns

list of floats – A list where each element is a timestamp.

gate_labels(self, prefix='G')

Get a list of all the distinct operation labels used in the circuits of this dataset.

Parameters

prefix (str) – Filter the circuit labels so that only elements beginning with this prefix are returned. None performs no filtering.

Returns

list of strings – A list where each element is a operation label.

degrees_of_freedom(self, circuits=None, method='present_outcomes-1', aggregate_times=True)

Returns the number of independent degrees of freedom in the data for the circuits in circuits.

Parameters
  • circuits (list of Circuits) – The list of circuits to count degrees of freedom for. If None then all of the DataSet’s strings are used.

  • method ({'all_outcomes-1', 'present_outcomes-1', 'tuned'}) – How the degrees of freedom should be computed. ‘all_outcomes-1’ takes the number of circuits and multiplies this by the total number of outcomes (the length of what is returned by outcome_labels()) minus one. ‘present_outcomes-1’ counts on a per-circuit basis the number of present (usually = non-zero) outcomes recorded minus one. ‘tuned’ should be the most accurate, as it accounts for low-N “Poisson bump” behavior, but it is not the default because it is still under development. For timestamped data, see aggreate_times below.

  • aggregate_times (bool, optional) – Whether counts that occur at different times should be tallied separately. If True, then even when counts occur at different times degrees of freedom are tallied on a per-circuit basis. If False, then counts occuring at distinct times are treated as independent of those an any other time, and are tallied separately. So, for example, if aggregate_times is False and a data row has 0- and 1-counts of 45 & 55 at time=0 and 42 and 58 at time=1 this row would contribute 2 degrees of freedom, not 1. It can sometimes be useful to set this to False when the DataSet holds coarse-grained data, but usually you want this to be left as True (especially for time-series data).

Returns

int

_collisionaction_update_circuit(self, circuit)
_add_explicit_repetition_counts(self)

Build internal repetition counts if they don’t exist already.

This method is usually unnecessary, as repetition counts are almost always build as soon as they are needed.

Returns

None

add_count_dict(self, circuit, count_dict, record_zero_counts=True, aux=None, update_ol=True)

Add a single circuit’s counts to this DataSet

Parameters
  • circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object

  • count_dict (dict) – A dictionary with keys = outcome labels and values = counts

  • record_zero_counts (bool, optional) – Whether zero-counts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

  • aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

  • update_ol (bool, optional) – This argument is for internal use only and should be left as True.

Returns

None

add_count_list(self, circuit, outcome_labels, counts, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)

Add a single circuit’s counts to this DataSet

Parameters
  • circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object

  • outcome_labels (list or tuple) – The outcome labels corresponding to counts.

  • counts (list or tuple) – The counts themselves.

  • record_zero_counts (bool, optional) – Whether zero-counts are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

  • aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

  • update_ol (bool, optional) – This argument is for internal use only and should be left as True.

  • unsafe (bool, optional) – True means that outcome_labels is guaranteed to hold tuple-type outcome labels and never plain strings. Only set this to True if you know what you’re doing.

Returns

None

add_count_arrays(self, circuit, outcome_index_array, count_array, record_zero_counts=True, aux=None)

Add the outcomes for a single circuit, formatted as raw data arrays.

Parameters
  • circuit (Circuit) – The circuit to add data for.

  • outcome_index_array (numpy.ndarray) – An array of outcome indices, which must be values of self.olIndex (which maps outcome labels to indices).

  • count_array (numpy.ndarray) – An array of integer (or sometimes floating point) counts, one corresponding to each outcome index (element of outcome_index_array).

  • record_zero_counts (bool, optional) – Whether zero counts (zeros in count_array should be stored explicitly or not stored and inferred. Setting to False reduces the space taken by data sets containing lots of zero counts, but makes some objective function evaluations less precise.

  • aux (dict or None, optional) – If not None a dictionary of user-defined auxiliary information that should be associated with this circuit.

Returns

None

add_cirq_trial_result(self, circuit, trial_result, key)

Add a single circuit’s counts — stored in a Cirq TrialResult — to this DataSet

Parameters
  • circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object. Note that this must be a PyGSTi circuit — not a Cirq circuit.

  • trial_result (cirq.TrialResult) – The TrialResult to add

  • key (str) – The string key of the measurement. Set by cirq.measure.

Returns

None

add_raw_series_data(self, circuit, outcome_label_list, time_stamp_list, rep_count_list=None, overwrite_existing=True, record_zero_counts=True, aux=None, update_ol=True, unsafe=False)

Add a single circuit’s counts to this DataSet

Parameters
  • circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object

  • outcome_label_list (list) – A list of outcome labels (strings or tuples). An element’s index links it to a particular time step (i.e. the i-th element of the list specifies the outcome of the i-th measurement in the series).

  • time_stamp_list (list) – A list of floating point timestamps, each associated with the single corresponding outcome in outcome_label_list. Must be the same length as outcome_label_list.

  • rep_count_list (list, optional) – A list of integer counts specifying how many outcomes of type given by outcome_label_list occurred at the time given by time_stamp_list. If None, then all counts are assumed to be 1. When not None, must be the same length as outcome_label_list.

  • overwrite_existing (bool, optional) – Whether to overwrite the data for circuit (if it exists). If False, then the given lists are appended (added) to existing data.

  • record_zero_counts (bool, optional) – Whether zero-counts (elements of rep_count_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

  • aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

  • update_ol (bool, optional) – This argument is for internal use only and should be left as True.

  • unsafe (bool, optional) – When True, don’t bother checking that outcome_label_list contains tuple-type outcome labels and automatically upgrading strings to 1-tuples. Only set this to True if you know what you’re doing and need the marginally faster performance.

Returns

None

_add_raw_arrays(self, circuit, oli_array, time_array, rep_array, overwrite_existing, record_zero_counts, aux)
update_ol(self)

Updates the internal outcome-label list in this dataset.

Call this after calling add_count_dict(…) or add_raw_series_data(…) with update_olIndex=False.

Returns

None

add_series_data(self, circuit, count_dict_list, time_stamp_list, overwrite_existing=True, record_zero_counts=True, aux=None)

Add a single circuit’s counts to this DataSet

Parameters
  • circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object

  • count_dict_list (list) – A list of dictionaries holding the outcome-label:count pairs for each time step (times given by time_stamp_list.

  • time_stamp_list (list) – A list of floating point timestamps, each associated with an entire dictionary of outcomes specified by count_dict_list.

  • overwrite_existing (bool, optional) – If True, overwrite any existing data for the circuit. If False, add the count data with the next non-negative integer timestamp.

  • record_zero_counts (bool, optional) – Whether zero-counts (elements of the dictionaries in count_dict_list that are zero) are actually recorded (stored) in this DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

  • aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

Returns

None

aggregate_outcomes(self, label_merge_dict, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in this DataSet.

Used, for example, to aggregate a 2-qubit 4-outcome DataSet into a 1-qubit 2-outcome DataSet.

Parameters
  • label_merge_dict (dictionary) – The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a two-qubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use :function:`filter_qubits` which also updates the circuits.

  • record_zero_counts (bool, optional) – Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

Returns

merged_dataset (DataSet object) – The DataSet with outcomes merged according to the rules given in label_merge_dict.

aggregate_std_nqubit_outcomes(self, qubit_indices_to_keep, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in this DataSet.

Used, for example, to aggregate a 2-qubit 4-outcome DataSet into a 1-qubit 2-outcome DataSet. This assumes that outcome labels are in the standard format whereby each qubit corresponds to a single ‘0’ or ‘1’ character.

Parameters
  • qubit_indices_to_keep (list) – A list of integers specifying which qubits should be kept, that is, not aggregated.

  • record_zero_counts (bool, optional) – Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

Returns

merged_dataset (DataSet object) – The DataSet with outcomes merged.

add_auxiliary_info(self, circuit, aux)

Add auxiliary meta information to circuit.

Parameters
  • circuit (tuple or Circuit) – A tuple of operation labels specifying the circuit or a Circuit object

  • aux (dict, optional) – A dictionary of auxiliary meta information to be included with this set of data counts (associated with circuit).

Returns

None

add_counts_from_dataset(self, other_data_set)

Append another DataSet’s data to this DataSet

Parameters

other_data_set (DataSet) – The dataset to take counts from.

Returns

None

add_series_from_dataset(self, other_data_set)

Append another DataSet’s series data to this DataSet

Parameters

other_data_set (DataSet) – The dataset to take time series data from.

Returns

None

property meantimestep(self)

The mean time-step, averaged over the time-step for each circuit and over circuits.

Returns

float

property has_constant_totalcounts_pertime(self)

True if the data for every circuit has the same number of total counts at every data collection time.

This will return True if there is a different number of total counts per circuit (i.e., after aggregating over time), as long as every circuit has the same total counts per time step (this will happen when the number of time-steps varies between circuit).

Returns

bool

property totalcounts_pertime(self)

Total counts per time, if this is constant over times and circuits.

When that doesn’t hold, an error is raised.

Returns

float or int

property has_constant_totalcounts(self)

True if the data for every circuit has the same number of total counts.

Returns

bool

property has_trivial_timedependence(self)

True if all the data in this DataSet occurs at time 0.

Returns

bool

__str__(self)

Return str(self).

to_str(self, mode='auto')

Render this DataSet as a string.

Parameters

mode ({"auto","time-dependent","time-independent"}) – Whether to display the data as time-series of outcome counts (“time-dependent”) or to report per-outcome counts aggregated over time (“time-independent”). If “auto” is specified, then the time-independent mode is used only if all time stamps in the DataSet are equal to zero (trivial time dependence).

Returns

str

truncate(self, list_of_circuits_to_keep, missing_action='raise')

Create a truncated dataset comprised of a subset of the circuits in this dataset.

Parameters
  • list_of_circuits_to_keep (list of (tuples or Circuits)) – A list of the circuits for the new returned dataset. If a circuit is given in this list that isn’t in the original data set, missing_action determines the behavior.

  • missing_action ({"raise","warn","ignore"}) – What to do when a string in list_of_circuits_to_keep is not in the data set (raise a KeyError, issue a warning, or do nothing).

Returns

DataSet – The truncated data set.

time_slice(self, start_time, end_time, aggregate_to_time=None)

Creates a DataSet by aggregating the counts within the [start_time,`end_time`) interval.

Parameters
  • start_time (float) – The starting time.

  • end_time (float) – The ending time.

  • aggregate_to_time (float, optional) – If not None, a single timestamp to give all the data in the specified range, resulting in time-independent DataSet. If None, then the original timestamps are preserved.

Returns

DataSet

split_by_time(self, aggregate_to_time=None)

Creates a dictionary of DataSets, each of which is a equal-time slice of this DataSet.

The keys of the returned dictionary are the distinct timestamps in this dataset.

Parameters

aggregate_to_time (float, optional) – If not None, a single timestamp to give all the data in each returned data set, resulting in time-independent `DataSet`s. If None, then the original timestamps are preserved.

Returns

OrderedDict – A dictionary of DataSet objects whose keys are the timestamp values of the original (this) data set in sorted order.

drop_zero_counts(self)

Creates a copy of this data set that doesn’t include any zero counts.

Returns

DataSet

process_times(self, process_times_array_fn)

Manipulate this DataSet’s timestamps according to processor_fn.

For example, using, the folloing process_times_array_fn would change the timestamps for each circuit to sequential integers.

``` def process_times_array_fn(times):

return list(range(len(times)))

```

Parameters

process_times_array_fn (function) – A function which takes a single array-of-timestamps argument and returns another similarly-sized array. This function is called, once per circuit, with the circuit’s array of timestamps.

Returns

DataSet – A new data set with altered timestamps.

process_circuits(self, processor_fn, aggregate=False)

Create a new data set by manipulating this DataSet’s circuits (keys) according to processor_fn.

The new DataSet’s circuits result from by running each of this DataSet’s circuits through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multi-qubit data.

Parameters
  • processor_fn (function) – A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.

  • aggregate (bool, optional) – When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.

Returns

DataSet

process_circuits_inplace(self, processor_fn, aggregate=False)

Manipulate this DataSet’s circuits (keys) in-place according to processor_fn.

All of this DataSet’s circuits are updated by running each one through processor_fn. This can be useful when “tracing out” qubits in a dataset containing multi-qubit data.

Parameters
  • processor_fn (function) – A function which takes a single Circuit argument and returns another (or the same) Circuit. This function may also return None, in which case the data for that string is deleted.

  • aggregate (bool, optional) – When True, aggregate the data for ciruits that processor_fn assigns to the same “new” circuit. When False, use the data from the last original circuit that maps to a given “new” circuit.

Returns

None

remove(self, circuits, missing_action='raise')

Remove (delete) the data for circuits from this DataSet.

Parameters
  • circuits (iterable) – An iterable over Circuit-like objects specifying the keys (circuits) to remove.

  • missing_action ({"raise","warn","ignore"}) – What to do when a string in circuits is not in this data set (raise a KeyError, issue a warning, or do nothing).

Returns

None

_remove(self, gstr_indices)

Removes the data in indices given by gstr_indices

copy(self)

Make a copy of this DataSet.

Returns

DataSet

copy_nonstatic(self)

Make a non-static copy of this DataSet.

Returns

DataSet

done_adding_data(self)

Promotes a non-static DataSet to a static (read-only) DataSet.

This method should be called after all data has been added.

Returns

None

__getstate__(self)
__setstate__(self, state_dict)
save(self, file_or_filename)
write_binary(self, file_or_filename)

Write this data set to a binary-format file.

Parameters

file_or_filename (string or file object) – If a string, interpreted as a filename. If this filename ends in “.gz”, the file will be gzip compressed.

Returns

None

load(self, file_or_filename)
read_binary(self, file_or_filename)

Read a DataSet from a binary file, clearing any data is contained previously.

The file should have been created with :method:`DataSet.write_binary`

Parameters

file_or_filename (str or buffer) – The file or filename to load from.

Returns

None

rename_outcome_labels(self, old_to_new_dict)

Replaces existing output labels with new ones as per old_to_new_dict.

Parameters

old_to_new_dict (dict) – A mapping from old/existing outcome labels to new ones. Strings in keys or values are automatically converted to 1-tuples. Missing outcome labels are left unaltered.

Returns

None

add_std_nqubit_outcome_labels(self, nqubits)

Adds all the “standard” outcome labels (e.g. ‘0010’) on nqubits qubits.

This is useful to ensure that, even if not all outcomes appear in the data, that all are recognized as being potentially valid outcomes (and so attempts to get counts for these outcomes will be 0 rather than raising an error).

Parameters

nqubits (int) – The number of qubits. For example, if equal to 3 the outcome labels ‘000’, ‘001’, … ‘111’ are added.

Returns

None

add_outcome_labels(self, outcome_labels, update_ol=True)

Adds new valid outcome labels.

Ensures that all the elements of outcome_labels are stored as valid outcomes for circuits in this DataSet, adding new outcomes as necessary.

Parameters
  • outcome_labels (list or generator) – A list or generator of string- or tuple-valued outcome labels.

  • update_ol (bool, optional) – Whether to update internal mappings to reflect the new outcome labels. Leave this as True unless you really know what you’re doing.

Returns

None

auxinfo_dataframe(self, pivot_valuename=None, pivot_value=None, drop_columns=False)

Create a Pandas dataframe with aux-data from this dataset.

Parameters
  • pivot_valuename (str, optional) – If not None, the resulting dataframe is pivoted using pivot_valuename as the column whose values name the pivoted table’s column names. If None and pivot_value is not None,`”ValueName”` is used.

  • pivot_value (str, optional) – If not None, the resulting dataframe is pivoted such that values of the pivot_value column are rearranged into new columns whose names are given by the values of the pivot_valuename column. If None and pivot_valuename is not None,`”Value”` is used.

  • drop_columns (bool or list, optional) – A list of column names to drop (prior to performing any pivot). If True appears in this list or is given directly, then all constant-valued columns are dropped as well. No columns are dropped when drop_columns == False.

Returns

pandas.DataFrame