pygsti.data.datasetconstruction
¶
Functions for creating data
Module Contents¶
Functions¶

Creates a DataSet using the probabilities obtained from a model. 







Creates a DataSet which merges certain outcomes in input DataSet. 

Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`. 

Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`. 

Creates a DataSet that is the restriction of dataset to sectors_to_keep. 
Trims a 


Creates a 
 pygsti.data.datasetconstruction.simulate_data(model_or_dataset, circuit_list, num_samples, sample_error='multinomial', seed=None, rand_state=None, alias_dict=None, collision_action='aggregate', record_zero_counts=True, comm=None, mem_limit=None, times=None)¶
Creates a DataSet using the probabilities obtained from a model.
 Parameters
model_or_dataset (Model or DataSet object) – The source of the underlying probabilities used to generate the data. If a Model, the model whose probabilities generate the data. If a DataSet, the data set whose frequencies generate the data.
circuit_list (list of (tuples or Circuits) or ExperimentDesign or None) – Each tuple or Circuit contains operation labels and specifies a gate sequence whose counts are included in the returned DataSet. e.g.
[ (), ('Gx',), ('Gx','Gy') ]
If anExperimentDesign
, then the design’s .all_circuits_needing_data list is used as the circuit list.num_samples (int or list of ints or None) – The simulated number of samples for each circuit. This only has effect when
sample_error == "binomial"
or"multinomial"
. If an integer, all circuits have this number of total samples. If a list, integer elements specify the number of samples for the corresponding circuit. IfNone
, then model_or_dataset must be aDataSet
, and total counts are taken from it (on a percircuit basis).sample_error (string, optional) –
What type of sample error is included in the counts. Can be:
”none”  no sample error: counts are floating point numbers such that the exact probabilty can be found by the ratio of count / total.
”clip”  no sample error, but clip probabilities to [0,1] so, e.g., counts are always positive.
”round”  same as “clip”, except counts are rounded to the nearest integer.
”binomial”  the number of counts is taken from a binomial distribution. Distribution has parameters p = (clipped) probability of the circuit and n = number of samples. This can only be used when there are exactly two SPAM labels in model_or_dataset.
”multinomial”  counts are taken from a multinomial distribution. Distribution has parameters p_k = (clipped) probability of the gate string using the kth SPAM label and n = number of samples.
seed (int, optional) – If not
None
, a seed for numpy’s random number generator, which is used to sample from the binomial or multinomial distribution.rand_state (numpy.random.RandomState) – A RandomState object to generate samples from. Can be useful to set instead of seed if you want reproducible distribution samples across multiple random function calls but you don’t want to bother with manually incrementing seeds between those calls.
alias_dict (dict, optional) – A dictionary mapping single operation labels into tuples of one or more other operation labels which translate the given circuits before values are computed using model_or_dataset. The resulting Dataset, however, contains the untranslated circuits as keys.
collision_action ({"aggregate", "keepseparate"}) – Determines how duplicate circuits are handled by the resulting DataSet. Please see the constructor documentation for DataSet.
record_zero_counts (bool, optional) – Whether zerocounts are actually recorded (stored) in the returned DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
comm (mpi4py.MPI.Comm, optional) – When not
None
, an MPI communicator for distributing the computation across multiple processors and ensuring that the same dataset is generated on each processor.mem_limit (int, optional) – A rough memory limit in bytes which is used to determine job allocation when there are multiple processors.
times (iterable, optional) – When not None, a list of timestamps at which data should be sampled. num_samples samples will be simulated at each time value, meaning that each circuit in circuit_list will be evaluated with the given time value as its start time.
 Returns
DataSet – A static data set filled with counts for the specified circuits.
 pygsti.data.datasetconstruction._adjust_probabilities_inbounds(ps, tol)¶
 pygsti.data.datasetconstruction._adjust_unit_sum(ps, tol)¶
 pygsti.data.datasetconstruction._sample_distribution(ps, sample_error, nSamples, rndm_state)¶
 pygsti.data.datasetconstruction.aggregate_dataset_outcomes(dataset, label_merge_dict, record_zero_counts=True)¶
Creates a DataSet which merges certain outcomes in input DataSet.
This is used, for example, to aggregate a 2qubit, 4outcome DataSet into a 1qubit, 2outcome DataSet.
 Parameters
dataset (DataSet object) – The input DataSet whose results will be simplified according to the rules set forth in label_merge_dict
label_merge_dict (dictionary) – The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a twoqubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use :function:`filter_dataset` which also updates the circuits.
record_zero_counts (bool, optional) – Whether zerocounts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.
 Returns
merged_dataset (DataSet object) – The DataSet with outcomes merged according to the rules given in label_merge_dict.
 pygsti.data.datasetconstruction._create_qubit_merge_dict(num_qubits, qubits_to_keep)¶
Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`.
The returned dictionary instructs aggregate_dataset_outcomes to aggregate all but the specified qubits_to_keep when the outcome labels are those of num_qubits qubits (i.e. strings of 0’s and 1’s).
 Parameters
num_qubits (int) – The total number of qubits
qubits_to_keep (list) – A list of integers specifying which qubits should be kept, that is, not aggregated, when the returned dictionary is passed to aggregate_dataset_outcomes.
 Returns
dict
 pygsti.data.datasetconstruction._create_merge_dict(indices_to_keep, outcome_labels)¶
Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`.
Each element of outcome_labels should be a ncharacter string (or a 1tuple of such a string). The returned dictionary’s keys will be all the unique results of keeping only the characters indexed by indices_to_keep from each outcome label. The dictionary’s values will be a list of all the original outcome labels which reduce to the key value when the nonindices_to_keep characters are removed.
For example, if outcome_labels == [‘00’,’01’,’10’,’11’] and indices_to_keep == [1] then this function returns the dict {‘0’: [‘00’,’10’], ‘1’: [‘01’,’11’] }.
Note: if the elements of outcome_labels are 1tuples then so are the elements of the returned dictionary’s values.
 Parameters
indices_to_keep (list) – A list of integer indices specifying which character positions should be kept (i.e. not aggregated together by aggregate_dataset_outcomes).
outcome_labels (list) – A list of the outcome labels to potentially merge. This can be a list of strings or of 1tuples containing strings.
 Returns
dict
 pygsti.data.datasetconstruction.filter_dataset(dataset, sectors_to_keep, sindices_to_keep=None, new_sectors=None, idle=((),), record_zero_counts=True, filtercircuits=True)¶
Creates a DataSet that is the restriction of dataset to sectors_to_keep.
This function aggregates (sums) outcomes in dataset which differ only in sectors (usually qubits  see below) not in sectors_to_keep, and removes any operation labels which act specifically on sectors not in sectors_to_keep (e.g. an idle gate acting on all sectors because it’s .sslbls is None will not be removed).
Here “sectors” are statespace labels, present in the circuits of dataset. Each sector also corresponds to a particular character position within the outcomes labels of dataset. Thus, for this function to work, the outcome labels of dataset must all be 1tuples whose sole element is an ncharacter string such that each character represents the outcome of a single sector. If the statespace labels are integers, then they can serve as both a label and an outcomestring position. The argument new_sectors may be given to rename the kept statespace labels in the returned DataSet’s circuits.
A typical case is when the statespace is that of n qubits, and the state space labels the intergers 0 to n1. As stated above, in this case there is no need to specify sindices_to_keep. One may want to “rebase” the indices to 0 in the returned data set using new_sectors (E.g. sectors_to_keep == [4,5,6] and new_sectors == [0,1,2]).
 Parameters
dataset (DataSet object) – The input DataSet whose data will be processed.
sectors_to_keep (list or tuple) – The statespace labels (strings or integers) of the “sectors” to keep in the returned DataSet.
sindices_to_keep (list or tuple, optional) – The 0based indices of the labels in sectors_to_keep which give the postiions of the corresponding letters in each outcome string (see above). If the state space labels are integers (labeling qubits) thath are also letterpositions, then this may be left as None. For example, if the outcome strings of dataset are ‘00’,’01’,’10’,and ‘11’ and the first position refers to qubit “Q1” and the second to qubit “Q2” (present in operation labels), then to extract just “Q2” data sectors_to_keep should be [“Q2”] and sindices_to_keep should be [1].
new_sectors (list or tuple, optional) – New sectors names to map the elements of sectors_to_keep onto in the output DataSet’s circuits. None means the labels are not renamed. This can be useful if, for instance, you want to run a 2qubit protocol that expects the qubits to be labeled “0” and “1” on qubits “4” and “5” of a larger set. Simply set sectors_to_keep == [4,5] and new_sectors == [0,1].
idle (string or Label, optional) – The operation label to be used when there are no kept components of a “layer” (element) of a circuit.
record_zero_counts (bool, optional) – Whether zerocounts present in the original dataset are recorded (stored) in the returned (filtered) DataSet. If False, then such zero counts are ignored, except for potentially registering new outcome labels.
filtercircuits (bool, optional) – Whether or not to “filter” the circuits, by removing gates that act outside of the sectors_to_keep.
 Returns
filtered_dataset (DataSet object) – The DataSet with outcomes and circuits filtered as described above.
 pygsti.data.datasetconstruction.trim_to_constant_numtimesteps(ds)¶
Trims a
DataSet
so that each circuit’s data comprises the same number of timesteps.Returns a new dataset that has data for the same number of time steps for every circuit. This is achieved by discarding all timeseries data for every circuit with a time step index beyond ‘mintimestepindex’, where ‘mintimestepindex’ is the minimum number of time steps over circuits.
 Parameters
ds (DataSet) – The dataset to trim.
 Returns
DataSet – The trimmed dataset, obtained by potentially discarding some of the data.
 pygsti.data.datasetconstruction._subsample_timeseries_data(ds, step)¶
Creates a
DataSet
where each circuit’s data is subsampled.Returns a new dataset where, for every circuit, we only keep the data at every ‘step’ timestep. Specifically, the outcomes at the ith time for each circuit are kept for each i such that i modulo ‘step’ is zero.
 Parameters
ds (DataSet) – The dataset to subsample
step (int) – The subsampling time step. Only data at every step increment in time is kept.
 Returns
DataSet – The subsampled dataset.