pygsti.data.datasetconstruction

Functions for creating data

Module Contents

Functions

simulate_data(model_or_dataset, circuit_list, num_samples)

Creates a DataSet using the probabilities obtained from a model.

aggregate_dataset_outcomes(dataset, label_merge_dict)

Creates a DataSet which merges certain outcomes in input DataSet.

filter_dataset(dataset, sectors_to_keep[, ...])

Creates a DataSet that is the restriction of dataset to sectors_to_keep.

trim_to_constant_numtimesteps(ds)

Trims a DataSet so that each circuit's data comprises the same number of timesteps.

pygsti.data.datasetconstruction.simulate_data(model_or_dataset, circuit_list, num_samples, sample_error='multinomial', seed=None, rand_state=None, alias_dict=None, collision_action='aggregate', record_zero_counts=True, comm=None, mem_limit=None, times=None)

Creates a DataSet using the probabilities obtained from a model.

Parameters

model_or_datasetModel or DataSet object

The source of the underlying probabilities used to generate the data. If a Model, the model whose probabilities generate the data. If a DataSet, the data set whose frequencies generate the data.

circuit_listlist of (tuples or Circuits) or ExperimentDesign or None

Each tuple or Circuit contains operation labels and specifies a gate sequence whose counts are included in the returned DataSet. e.g. [ (), ('Gx',), ('Gx','Gy') ] If an ExperimentDesign, then the design’s .all_circuits_needing_data list is used as the circuit list.

num_samplesint or list of ints or None

The simulated number of samples for each circuit. This only has effect when sample_error == "binomial" or "multinomial". If an integer, all circuits have this number of total samples. If a list, integer elements specify the number of samples for the corresponding circuit. If None, then model_or_dataset must be a DataSet, and total counts are taken from it (on a per-circuit basis).

sample_errorstring, optional

What type of sample error is included in the counts. Can be:

  • “none” - no sample error: counts are floating point numbers such that the exact probabilty can be found by the ratio of count / total.

  • “clip” - no sample error, but clip probabilities to [0,1] so, e.g., counts are always positive.

  • “round” - same as “clip”, except counts are rounded to the nearest integer.

  • “binomial” - the number of counts is taken from a binomial distribution. Distribution has parameters p = (clipped) probability of the circuit and n = number of samples. This can only be used when there are exactly two SPAM labels in model_or_dataset.

  • “multinomial” - counts are taken from a multinomial distribution. Distribution has parameters p_k = (clipped) probability of the gate string using the k-th SPAM label and n = number of samples.

seedint, optional

If not None, a seed for numpy’s random number generator, which is used to sample from the binomial or multinomial distribution.

rand_statenumpy.random.RandomState

A RandomState object to generate samples from. Can be useful to set instead of seed if you want reproducible distribution samples across multiple random function calls but you don’t want to bother with manually incrementing seeds between those calls.

alias_dictdict, optional

A dictionary mapping single operation labels into tuples of one or more other operation labels which translate the given circuits before values are computed using model_or_dataset. The resulting Dataset, however, contains the un-translated circuits as keys.

collision_action{“aggregate”, “keepseparate”}

Determines how duplicate circuits are handled by the resulting DataSet. Please see the constructor documentation for DataSet.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in the returned DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

commmpi4py.MPI.Comm, optional

When not None, an MPI communicator for distributing the computation across multiple processors and ensuring that the same dataset is generated on each processor.

mem_limitint, optional

A rough memory limit in bytes which is used to determine job allocation when there are multiple processors.

timesiterable, optional

When not None, a list of time-stamps at which data should be sampled. num_samples samples will be simulated at each time value, meaning that each circuit in circuit_list will be evaluated with the given time value as its start time.

Returns

DataSet

A static data set filled with counts for the specified circuits.

pygsti.data.datasetconstruction.aggregate_dataset_outcomes(dataset, label_merge_dict, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in input DataSet.

This is used, for example, to aggregate a 2-qubit, 4-outcome DataSet into a 1-qubit, 2-outcome DataSet.

Parameters

datasetDataSet object

The input DataSet whose results will be simplified according to the rules set forth in label_merge_dict

label_merge_dictdictionary

The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a two-qubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use filter_dataset() which also updates the circuits.

record_zero_countsbool, optional

Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

Returns

merged_datasetDataSet object

The DataSet with outcomes merged according to the rules given in label_merge_dict.

pygsti.data.datasetconstruction.filter_dataset(dataset, sectors_to_keep, sindices_to_keep=None, new_sectors=None, idle=((),), record_zero_counts=True, filtercircuits=True)

Creates a DataSet that is the restriction of dataset to sectors_to_keep.

This function aggregates (sums) outcomes in dataset which differ only in sectors (usually qubits - see below) not in sectors_to_keep, and removes any operation labels which act specifically on sectors not in sectors_to_keep (e.g. an idle gate acting on all sectors because it’s .sslbls is None will not be removed).

Here “sectors” are state-space labels, present in the circuits of dataset. Each sector also corresponds to a particular character position within the outcomes labels of dataset. Thus, for this function to work, the outcome labels of dataset must all be 1-tuples whose sole element is an n-character string such that each character represents the outcome of a single sector. If the state-space labels are integers, then they can serve as both a label and an outcome-string position. The argument new_sectors may be given to rename the kept state-space labels in the returned DataSet’s circuits.

A typical case is when the state-space is that of n qubits, and the state space labels the intergers 0 to n-1. As stated above, in this case there is no need to specify sindices_to_keep. One may want to “rebase” the indices to 0 in the returned data set using new_sectors (E.g. sectors_to_keep == [4,5,6] and new_sectors == [0,1,2]).

Parameters

datasetDataSet object

The input DataSet whose data will be processed.

sectors_to_keeplist or tuple

The state-space labels (strings or integers) of the “sectors” to keep in the returned DataSet.

sindices_to_keeplist or tuple, optional

The 0-based indices of the labels in sectors_to_keep which give the postiions of the corresponding letters in each outcome string (see above). If the state space labels are integers (labeling qubits) thath are also letter-positions, then this may be left as None. For example, if the outcome strings of dataset are ‘00’,’01’,’10’,and ‘11’ and the first position refers to qubit “Q1” and the second to qubit “Q2” (present in operation labels), then to extract just “Q2” data sectors_to_keep should be [“Q2”] and sindices_to_keep should be [1].

new_sectorslist or tuple, optional

New sectors names to map the elements of sectors_to_keep onto in the output DataSet’s circuits. None means the labels are not renamed. This can be useful if, for instance, you want to run a 2-qubit protocol that expects the qubits to be labeled “0” and “1” on qubits “4” and “5” of a larger set. Simply set sectors_to_keep == [4,5] and new_sectors == [0,1].

idlestring or Label, optional

The operation label to be used when there are no kept components of a “layer” (element) of a circuit.

record_zero_countsbool, optional

Whether zero-counts present in the original dataset are recorded (stored) in the returned (filtered) DataSet. If False, then such zero counts are ignored, except for potentially registering new outcome labels.

filtercircuitsbool, optional

Whether or not to “filter” the circuits, by removing gates that act outside of the sectors_to_keep.

Returns

filtered_datasetDataSet object

The DataSet with outcomes and circuits filtered as described above.

pygsti.data.datasetconstruction.trim_to_constant_numtimesteps(ds)

Trims a DataSet so that each circuit’s data comprises the same number of timesteps.

Returns a new dataset that has data for the same number of time steps for every circuit. This is achieved by discarding all time-series data for every circuit with a time step index beyond ‘min-time-step-index’, where ‘min-time-step-index’ is the minimum number of time steps over circuits.

Parameters

dsDataSet

The dataset to trim.

Returns

DataSet

The trimmed dataset, obtained by potentially discarding some of the data.