:py:mod:`pygsti.data.datasetconstruction` ========================================= .. py:module:: pygsti.data.datasetconstruction .. autoapi-nested-parse:: Functions for creating data Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: pygsti.data.datasetconstruction.simulate_data pygsti.data.datasetconstruction.aggregate_dataset_outcomes pygsti.data.datasetconstruction.filter_dataset pygsti.data.datasetconstruction.trim_to_constant_numtimesteps .. py:function:: simulate_data(model_or_dataset, circuit_list, num_samples, sample_error='multinomial', seed=None, rand_state=None, alias_dict=None, collision_action='aggregate', record_zero_counts=True, comm=None, mem_limit=None, times=None) Creates a DataSet using the probabilities obtained from a model. Parameters ---------- model_or_dataset : Model or DataSet object The source of the underlying probabilities used to generate the data. If a Model, the model whose probabilities generate the data. If a DataSet, the data set whose frequencies generate the data. circuit_list : list of (tuples or Circuits) or ExperimentDesign or None Each tuple or Circuit contains operation labels and specifies a gate sequence whose counts are included in the returned DataSet. e.g. ``[ (), ('Gx',), ('Gx','Gy') ]`` If an :class:`ExperimentDesign`, then the design's `.all_circuits_needing_data` list is used as the circuit list. num_samples : int or list of ints or None The simulated number of samples for each circuit. This only has effect when ``sample_error == "binomial"`` or ``"multinomial"``. If an integer, all circuits have this number of total samples. If a list, integer elements specify the number of samples for the corresponding circuit. If ``None``, then `model_or_dataset` must be a :class:`~pygsti.objects.DataSet`, and total counts are taken from it (on a per-circuit basis). sample_error : string, optional What type of sample error is included in the counts. Can be: - "none" - no sample error: counts are floating point numbers such that the exact probabilty can be found by the ratio of count / total. - "clip" - no sample error, but clip probabilities to [0,1] so, e.g., counts are always positive. - "round" - same as "clip", except counts are rounded to the nearest integer. - "binomial" - the number of counts is taken from a binomial distribution. Distribution has parameters p = (clipped) probability of the circuit and n = number of samples. This can only be used when there are exactly two SPAM labels in model_or_dataset. - "multinomial" - counts are taken from a multinomial distribution. Distribution has parameters p_k = (clipped) probability of the gate string using the k-th SPAM label and n = number of samples. seed : int, optional If not ``None``, a seed for numpy's random number generator, which is used to sample from the binomial or multinomial distribution. rand_state : numpy.random.RandomState A RandomState object to generate samples from. Can be useful to set instead of `seed` if you want reproducible distribution samples across multiple random function calls but you don't want to bother with manually incrementing seeds between those calls. alias_dict : dict, optional A dictionary mapping single operation labels into tuples of one or more other operation labels which translate the given circuits before values are computed using `model_or_dataset`. The resulting Dataset, however, contains the *un-translated* circuits as keys. collision_action : {"aggregate", "keepseparate"} Determines how duplicate circuits are handled by the resulting `DataSet`. Please see the constructor documentation for `DataSet`. record_zero_counts : bool, optional Whether zero-counts are actually recorded (stored) in the returned DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels. comm : mpi4py.MPI.Comm, optional When not ``None``, an MPI communicator for distributing the computation across multiple processors and ensuring that the *same* dataset is generated on each processor. mem_limit : int, optional A rough memory limit in bytes which is used to determine job allocation when there are multiple processors. times : iterable, optional When not None, a list of time-stamps at which data should be sampled. `num_samples` samples will be simulated at each time value, meaning that each circuit in `circuit_list` will be evaluated with the given time value as its *start time*. Returns ------- DataSet A static data set filled with counts for the specified circuits. .. py:function:: aggregate_dataset_outcomes(dataset, label_merge_dict, record_zero_counts=True) Creates a DataSet which merges certain outcomes in input DataSet. This is used, for example, to aggregate a 2-qubit, 4-outcome DataSet into a 1-qubit, 2-outcome DataSet. Parameters ---------- dataset : DataSet object The input DataSet whose results will be simplified according to the rules set forth in label_merge_dict label_merge_dict : dictionary The dictionary whose keys define the new DataSet outcomes, and whose items are lists of input DataSet outcomes that are to be summed together. For example, if a two-qubit DataSet has outcome labels "00", "01", "10", and "11", and we want to ''aggregate out'' the second qubit, we could use label_merge_dict = {'0':['00','01'],'1':['10','11']}. When doing this, however, it may be better to use :func:`filter_dataset` which also updates the circuits. record_zero_counts : bool, optional Whether zero-counts are actually recorded (stored) in the returned (merged) DataSet. If False, then zero counts are ignored, except for potentially registering new outcome labels. Returns ------- merged_dataset : DataSet object The DataSet with outcomes merged according to the rules given in label_merge_dict. .. py:function:: filter_dataset(dataset, sectors_to_keep, sindices_to_keep=None, new_sectors=None, idle=((), ), record_zero_counts=True, filtercircuits=True) Creates a DataSet that is the restriction of `dataset` to `sectors_to_keep`. This function aggregates (sums) outcomes in `dataset` which differ only in sectors (usually qubits - see below) *not* in `sectors_to_keep`, and removes any operation labels which act specifically on sectors not in `sectors_to_keep` (e.g. an idle gate acting on *all* sectors because it's `.sslbls` is None will *not* be removed). Here "sectors" are state-space labels, present in the circuits of `dataset`. Each sector also corresponds to a particular character position within the outcomes labels of `dataset`. Thus, for this function to work, the outcome labels of `dataset` must all be 1-tuples whose sole element is an n-character string such that each character represents the outcome of a single sector. If the state-space labels are integers, then they can serve as both a label and an outcome-string position. The argument `new_sectors` may be given to rename the kept state-space labels in the returned `DataSet`'s circuits. A typical case is when the state-space is that of *n* qubits, and the state space labels the intergers 0 to *n-1*. As stated above, in this case there is no need to specify `sindices_to_keep`. One may want to "rebase" the indices to 0 in the returned data set using `new_sectors` (E.g. `sectors_to_keep == [4,5,6]` and `new_sectors == [0,1,2]`). Parameters ---------- dataset : DataSet object The input DataSet whose data will be processed. sectors_to_keep : list or tuple The state-space labels (strings or integers) of the "sectors" to keep in the returned DataSet. sindices_to_keep : list or tuple, optional The 0-based indices of the labels in `sectors_to_keep` which give the postiions of the corresponding letters in each outcome string (see above). If the state space labels are integers (labeling *qubits*) thath are also letter-positions, then this may be left as `None`. For example, if the outcome strings of `dataset` are '00','01','10',and '11' and the first position refers to qubit "Q1" and the second to qubit "Q2" (present in operation labels), then to extract just "Q2" data `sectors_to_keep` should be `["Q2"]` and `sindices_to_keep` should be `[1]`. new_sectors : list or tuple, optional New sectors names to map the elements of `sectors_to_keep` onto in the output DataSet's circuits. None means the labels are not renamed. This can be useful if, for instance, you want to run a 2-qubit protocol that expects the qubits to be labeled "0" and "1" on qubits "4" and "5" of a larger set. Simply set `sectors_to_keep == [4,5]` and `new_sectors == [0,1]`. idle : string or Label, optional The operation label to be used when there are no kept components of a "layer" (element) of a circuit. record_zero_counts : bool, optional Whether zero-counts present in the original `dataset` are recorded (stored) in the returned (filtered) DataSet. If False, then such zero counts are ignored, except for potentially registering new outcome labels. filtercircuits : bool, optional Whether or not to "filter" the circuits, by removing gates that act outside of the `sectors_to_keep`. Returns ------- filtered_dataset : DataSet object The DataSet with outcomes and circuits filtered as described above. .. py:function:: trim_to_constant_numtimesteps(ds) Trims a :class:`DataSet` so that each circuit's data comprises the same number of timesteps. Returns a new dataset that has data for the same number of time steps for every circuit. This is achieved by discarding all time-series data for every circuit with a time step index beyond 'min-time-step-index', where 'min-time-step-index' is the minimum number of time steps over circuits. Parameters ---------- ds : DataSet The dataset to trim. Returns ------- DataSet The trimmed dataset, obtained by potentially discarding some of the data.