pygsti.data.datasetconstruction

Functions for creating data

Module Contents

Functions

simulate_data(model_or_dataset, circuit_list, num_samples, sample_error='multinomial', seed=None, rand_state=None, alias_dict=None, collision_action='aggregate', record_zero_counts=True, comm=None, mem_limit=None, times=None)

Creates a DataSet using the probabilities obtained from a model.

_adjust_probabilities_inbounds(ps, tol)

_adjust_unit_sum(ps, tol)

_sample_distribution(ps, sample_error, nSamples, rndm_state)

aggregate_dataset_outcomes(dataset, label_merge_dict, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in input DataSet.

_create_qubit_merge_dict(num_qubits, qubits_to_keep)

Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`.

_create_merge_dict(indices_to_keep, outcome_labels)

Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`.

filter_dataset(dataset, sectors_to_keep, sindices_to_keep=None, new_sectors=None, idle=((), ), record_zero_counts=True, filtercircuits=True)

Creates a DataSet that is the restriction of dataset to sectors_to_keep.

trim_to_constant_numtimesteps(ds)

Trims a DataSet so that each circuit's data comprises the same number of timesteps.

_subsample_timeseries_data(ds, step)

Creates a DataSet where each circuit's data is sub-sampled.

pygsti.data.datasetconstruction.simulate_data(model_or_dataset, circuit_list, num_samples, sample_error='multinomial', seed=None, rand_state=None, alias_dict=None, collision_action='aggregate', record_zero_counts=True, comm=None, mem_limit=None, times=None)

Creates a DataSet using the probabilities obtained from a model.

Parameters
  • model_or_dataset (Model or DataSet object) – The source of the underlying probabilities used to generate the data. If a Model, the model whose probabilities generate the data. If a DataSet, the data set whose frequencies generate the data.

  • circuit_list (list of (tuples or Circuits) or ExperimentDesign or None) – Each tuple or Circuit contains operation labels and specifies a gate sequence whose counts are included in the returned DataSet. e.g. [ (), ('Gx',), ('Gx','Gy') ] If an ExperimentDesign, then the design’s .all_circuits_needing_data list is used as the circuit list.

  • num_samples (int or list of ints or None) – The simulated number of samples for each circuit. This only has effect when sample_error == "binomial" or "multinomial". If an integer, all circuits have this number of total samples. If a list, integer elements specify the number of samples for the corresponding circuit. If None, then model_or_dataset must be a DataSet, and total counts are taken from it (on a per-circuit basis).

  • sample_error (string, optional) –

    What type of sample error is included in the counts. Can be:

    • ”none” - no sample error: counts are floating point numbers such that the exact probabilty can be found by the ratio of count / total.

    • ”clip” - no sample error, but clip probabilities to [0,1] so, e.g., counts are always positive.

    • ”round” - same as “clip”, except counts are rounded to the nearest integer.

    • ”binomial” - the number of counts is taken from a binomial distribution. Distribution has parameters p = (clipped) probability of the circuit and n = number of samples. This can only be used when there are exactly two SPAM labels in model_or_dataset.

    • ”multinomial” - counts are taken from a multinomial distribution. Distribution has parameters p_k = (clipped) probability of the gate string using the k-th SPAM label and n = number of samples.

  • seed (int, optional) – If not None, a seed for numpy’s random number generator, which is used to sample from the binomial or multinomial distribution.

  • rand_state (numpy.random.RandomState) – A RandomState object to generate samples from. Can be useful to set instead of seed if you want reproducible distribution samples across multiple random function calls but you don’t want to bother with manually incrementing seeds between those calls.

  • alias_dict (dict, optional) – A dictionary mapping single operation labels into tuples of one or more other operation labels which translate the given circuits before values are computed using model_or_dataset. The resulting Dataset, however, contains the un-translated circuits as keys.

  • collision_action ({"aggregate", "keepseparate"}) – Determines how duplicate circuits are handled by the resulting DataSet. Please see the constructor documentation for DataSet.

  • record_zero_counts (bool, optional) – Whether zero-counts are actually recorded (stored) in the returned DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

  • comm (mpi4py.MPI.Comm, optional) – When not None, an MPI communicator for distributing the computation across multiple processors and ensuring that the same dataset is generated on each processor.

  • mem_limit (int, optional) – A rough memory limit in bytes which is used to determine job allocation when there are multiple processors.

  • times (iterable, optional) – When not None, a list of time-stamps at which data should be sampled. num_samples samples will be simulated at each time value, meaning that each circuit in circuit_list will be evaluated with the given time value as its start time.

Returns

DataSet – A static data set filled with counts for the specified circuits.

pygsti.data.datasetconstruction._adjust_probabilities_inbounds(ps, tol)
pygsti.data.datasetconstruction._adjust_unit_sum(ps, tol)
pygsti.data.datasetconstruction._sample_distribution(ps, sample_error, nSamples, rndm_state)
pygsti.data.datasetconstruction.aggregate_dataset_outcomes(dataset, label_merge_dict, record_zero_counts=True)

Creates a DataSet which merges certain outcomes in input DataSet.

This is used, for example, to aggregate a 2-qubit, 4-outcome DataSet into a 1-qubit, 2-outcome DataSet.

Parameters
  • dataset (DataSet object) – The input DataSet whose results will be simplified according to the rules set forth in label_merge_dict

  • label_merge_dict (dictionary) – The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a two-qubit DataSet has outcome labels “00”, “01”, “10”, and “11”, and we want to ‘’aggregate out’’ the second qubit, we could use label_merge_dict = {‘0’:[‘00’,’01’],’1’:[‘10’,’11’]}. When doing this, however, it may be better to use :function:`filter_dataset` which also updates the circuits.

  • record_zero_counts (bool, optional) – Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels.

Returns

merged_dataset (DataSet object) – The DataSet with outcomes merged according to the rules given in label_merge_dict.

pygsti.data.datasetconstruction._create_qubit_merge_dict(num_qubits, qubits_to_keep)

Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`.

The returned dictionary instructs aggregate_dataset_outcomes to aggregate all but the specified qubits_to_keep when the outcome labels are those of num_qubits qubits (i.e. strings of 0’s and 1’s).

Parameters
  • num_qubits (int) – The total number of qubits

  • qubits_to_keep (list) – A list of integers specifying which qubits should be kept, that is, not aggregated, when the returned dictionary is passed to aggregate_dataset_outcomes.

Returns

dict

pygsti.data.datasetconstruction._create_merge_dict(indices_to_keep, outcome_labels)

Creates a dictionary appropriate for use with :function:`aggregate_dataset_outcomes`.

Each element of outcome_labels should be a n-character string (or a 1-tuple of such a string). The returned dictionary’s keys will be all the unique results of keeping only the characters indexed by indices_to_keep from each outcome label. The dictionary’s values will be a list of all the original outcome labels which reduce to the key value when the non-indices_to_keep characters are removed.

For example, if outcome_labels == [‘00’,’01’,’10’,’11’] and indices_to_keep == [1] then this function returns the dict {‘0’: [‘00’,’10’], ‘1’: [‘01’,’11’] }.

Note: if the elements of outcome_labels are 1-tuples then so are the elements of the returned dictionary’s values.

Parameters
  • indices_to_keep (list) – A list of integer indices specifying which character positions should be kept (i.e. not aggregated together by aggregate_dataset_outcomes).

  • outcome_labels (list) – A list of the outcome labels to potentially merge. This can be a list of strings or of 1-tuples containing strings.

Returns

dict

pygsti.data.datasetconstruction.filter_dataset(dataset, sectors_to_keep, sindices_to_keep=None, new_sectors=None, idle=((),), record_zero_counts=True, filtercircuits=True)

Creates a DataSet that is the restriction of dataset to sectors_to_keep.

This function aggregates (sums) outcomes in dataset which differ only in sectors (usually qubits - see below) not in sectors_to_keep, and removes any operation labels which act specifically on sectors not in sectors_to_keep (e.g. an idle gate acting on all sectors because it’s .sslbls is None will not be removed).

Here “sectors” are state-space labels, present in the circuits of dataset. Each sector also corresponds to a particular character position within the outcomes labels of dataset. Thus, for this function to work, the outcome labels of dataset must all be 1-tuples whose sole element is an n-character string such that each character represents the outcome of a single sector. If the state-space labels are integers, then they can serve as both a label and an outcome-string position. The argument new_sectors may be given to rename the kept state-space labels in the returned DataSet’s circuits.

A typical case is when the state-space is that of n qubits, and the state space labels the intergers 0 to n-1. As stated above, in this case there is no need to specify sindices_to_keep. One may want to “rebase” the indices to 0 in the returned data set using new_sectors (E.g. sectors_to_keep == [4,5,6] and new_sectors == [0,1,2]).

Parameters
  • dataset (DataSet object) – The input DataSet whose data will be processed.

  • sectors_to_keep (list or tuple) – The state-space labels (strings or integers) of the “sectors” to keep in the returned DataSet.

  • sindices_to_keep (list or tuple, optional) – The 0-based indices of the labels in sectors_to_keep which give the postiions of the corresponding letters in each outcome string (see above). If the state space labels are integers (labeling qubits) thath are also letter-positions, then this may be left as None. For example, if the outcome strings of dataset are ‘00’,’01’,’10’,and ‘11’ and the first position refers to qubit “Q1” and the second to qubit “Q2” (present in operation labels), then to extract just “Q2” data sectors_to_keep should be [“Q2”] and sindices_to_keep should be [1].

  • new_sectors (list or tuple, optional) – New sectors names to map the elements of sectors_to_keep onto in the output DataSet’s circuits. None means the labels are not renamed. This can be useful if, for instance, you want to run a 2-qubit protocol that expects the qubits to be labeled “0” and “1” on qubits “4” and “5” of a larger set. Simply set sectors_to_keep == [4,5] and new_sectors == [0,1].

  • idle (string or Label, optional) – The operation label to be used when there are no kept components of a “layer” (element) of a circuit.

  • record_zero_counts (bool, optional) – Whether zero-counts present in the original dataset are recorded (stored) in the returned (filtered) DataSet. If False, then such zero counts are ignored, except for potentially registering new outcome labels.

  • filtercircuits (bool, optional) – Whether or not to “filter” the circuits, by removing gates that act outside of the sectors_to_keep.

Returns

filtered_dataset (DataSet object) – The DataSet with outcomes and circuits filtered as described above.

pygsti.data.datasetconstruction.trim_to_constant_numtimesteps(ds)

Trims a DataSet so that each circuit’s data comprises the same number of timesteps.

Returns a new dataset that has data for the same number of time steps for every circuit. This is achieved by discarding all time-series data for every circuit with a time step index beyond ‘min-time-step-index’, where ‘min-time-step-index’ is the minimum number of time steps over circuits.

Parameters

ds (DataSet) – The dataset to trim.

Returns

DataSet – The trimmed dataset, obtained by potentially discarding some of the data.

pygsti.data.datasetconstruction._subsample_timeseries_data(ds, step)

Creates a DataSet where each circuit’s data is sub-sampled.

Returns a new dataset where, for every circuit, we only keep the data at every ‘step’ timestep. Specifically, the outcomes at the ith time for each circuit are kept for each i such that i modulo ‘step’ is zero.

Parameters
  • ds (DataSet) – The dataset to subsample

  • step (int) – The sub-sampling time step. Only data at every step increment in time is kept.

Returns

DataSet – The subsampled dataset.