:py:mod:`pygsti.data.datasetconstruction`
=========================================

.. py:module:: pygsti.data.datasetconstruction

.. autoapi-nested-parse::

   Functions for creating data


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   pygsti.data.datasetconstruction.simulate_data
   pygsti.data.datasetconstruction.aggregate_dataset_outcomes
   pygsti.data.datasetconstruction.filter_dataset
   pygsti.data.datasetconstruction.trim_to_constant_numtimesteps


.. py:function:: simulate_data(model_or_dataset, circuit_list, num_samples, sample_error='multinomial', seed=None, rand_state=None, alias_dict=None, collision_action='aggregate', record_zero_counts=True, comm=None, mem_limit=None, times=None)

   Creates a DataSet using the probabilities obtained from a model.

   Parameters
   ----------
   model_or_dataset : Model or DataSet object
       The source of the underlying probabilities used to generate the data.
       If a Model, the model whose probabilities generate the data.
       If a DataSet, the data set whose frequencies generate the data.

   circuit_list : list of (tuples or Circuits) or ExperimentDesign or None
       Each tuple or Circuit contains operation labels and
       specifies a gate sequence whose counts are included
       in the returned DataSet. e.g. ``[ (), ('Gx',), ('Gx','Gy') ]``
       If an :class:`ExperimentDesign`, then the design's `.all_circuits_needing_data`
       list is used as the circuit list.

   num_samples : int or list of ints or None
       The simulated number of samples for each circuit.  This only has
       effect when  ``sample_error == "binomial"`` or ``"multinomial"``.  If an
       integer, all circuits have this number of total samples. If a list,
       integer elements specify the number of samples for the corresponding
       circuit.  If ``None``, then `model_or_dataset` must be a
       :class:`~pygsti.objects.DataSet`, and total counts are taken from it
       (on a per-circuit basis).

   sample_error : string, optional
       What type of sample error is included in the counts.  Can be:

       - "none"  - no sample error: counts are floating point numbers such
         that the exact probabilty can be found by the ratio of count / total.
       - "clip" - no sample error, but clip probabilities to [0,1] so, e.g.,
         counts are always positive.
       - "round" - same as "clip", except counts are rounded to the nearest
         integer.
       - "binomial" - the number of counts is taken from a binomial
         distribution.  Distribution has parameters p = (clipped) probability
         of the circuit and n = number of samples.  This can only be used
         when there are exactly two SPAM labels in model_or_dataset.
       - "multinomial" - counts are taken from a multinomial distribution.
         Distribution has parameters p_k = (clipped) probability of the gate
         string using the k-th SPAM label and n = number of samples.

   seed : int, optional
       If not ``None``, a seed for numpy's random number generator, which
       is used to sample from the binomial or multinomial distribution.

   rand_state : numpy.random.RandomState
       A RandomState object to generate samples from. Can be useful to set
       instead of `seed` if you want reproducible distribution samples across
       multiple random function calls but you don't want to bother with
       manually incrementing seeds between those calls.

   alias_dict : dict, optional
       A dictionary mapping single operation labels into tuples of one or more
       other operation labels which translate the given circuits before values
       are computed using `model_or_dataset`.  The resulting Dataset, however,
       contains the *un-translated* circuits as keys.

   collision_action : {"aggregate", "keepseparate"}
       Determines how duplicate circuits are handled by the resulting
       `DataSet`.  Please see the constructor documentation for `DataSet`.

   record_zero_counts : bool, optional
       Whether zero-counts are actually recorded (stored) in the returned
       DataSet.  If False, then zero counts are ignored, except for
       potentially registering new outcome labels.

   comm : mpi4py.MPI.Comm, optional
       When not ``None``, an MPI communicator for distributing the computation
       across multiple processors and ensuring that the *same* dataset is
       generated on each processor.

   mem_limit : int, optional
       A rough memory limit in bytes which is used to determine job allocation
       when there are multiple processors.

   times : iterable, optional
       When not None, a list of time-stamps at which data should be sampled.
       `num_samples` samples will be simulated at each time value, meaning that
       each circuit in `circuit_list` will be evaluated with the given time
       value as its *start time*.

   Returns
   -------
   DataSet
       A static data set filled with counts for the specified circuits.


.. py:function:: aggregate_dataset_outcomes(dataset, label_merge_dict, record_zero_counts=True)

   Creates a DataSet which merges certain outcomes in input DataSet.

   This is used, for example, to aggregate a 2-qubit, 4-outcome DataSet into a
   1-qubit, 2-outcome DataSet.

   Parameters
   ----------
   dataset : DataSet object
       The input DataSet whose results will be simplified according to the rules
       set forth in label_merge_dict

   label_merge_dict : dictionary
       The dictionary whose keys define the new DataSet outcomes, and whose items
       are lists of input DataSet outcomes that are to be summed together.  For example,
       if a two-qubit DataSet has outcome labels "00", "01", "10", and "11", and
       we want to ''aggregate out'' the second qubit, we could use label_merge_dict =
       {'0':['00','01'],'1':['10','11']}.  When doing this, however, it may be better
       to use :func:`filter_dataset` which also updates the circuits.

   record_zero_counts : bool, optional
       Whether zero-counts are actually recorded (stored) in the returned
       (merged) DataSet.  If False, then zero counts are ignored, except for
       potentially registering new outcome labels.

   Returns
   -------
   merged_dataset : DataSet object
       The DataSet with outcomes merged according to the rules given in label_merge_dict.


.. py:function:: filter_dataset(dataset, sectors_to_keep, sindices_to_keep=None, new_sectors=None, idle=((), ), record_zero_counts=True, filtercircuits=True)

   Creates a DataSet that is the restriction of `dataset` to `sectors_to_keep`.

   This function aggregates (sums) outcomes in `dataset` which differ only in
   sectors (usually qubits - see below) *not* in `sectors_to_keep`, and removes
   any operation labels which act specifically on sectors not in
   `sectors_to_keep` (e.g. an idle gate acting on *all* sectors because it's
   `.sslbls` is None will *not* be removed).

   Here "sectors" are state-space labels, present in the circuits of
   `dataset`.  Each sector also corresponds to a particular character position
   within the outcomes labels of `dataset`.  Thus, for this function to work,
   the outcome labels of `dataset` must all be 1-tuples whose sole element is
   an n-character string such that each character represents the outcome of
   a single sector.  If the state-space labels are integers, then they can
   serve as both a label and an outcome-string position.  The argument
   `new_sectors` may be given to rename the kept state-space labels in the
   returned `DataSet`'s circuits.

   A typical case is when the state-space is that of *n* qubits, and the
   state space labels the intergers 0 to *n-1*.  As stated above, in this
   case there is no need to specify `sindices_to_keep`.  One may want to
   "rebase" the indices to 0 in the returned data set using `new_sectors`
   (E.g. `sectors_to_keep == [4,5,6]` and `new_sectors == [0,1,2]`).

   Parameters
   ----------
   dataset : DataSet object
       The input DataSet whose data will be processed.

   sectors_to_keep : list or tuple
       The state-space labels (strings or integers) of the "sectors" to keep in
       the returned DataSet.

   sindices_to_keep : list or tuple, optional
       The 0-based indices of the labels in `sectors_to_keep` which give the
       postiions of the corresponding letters in each outcome string (see above).
       If the state space labels are integers (labeling *qubits*) thath are also
       letter-positions, then this may be left as `None`.  For example, if the
       outcome strings of `dataset` are '00','01','10',and '11' and the first
       position refers to qubit "Q1" and the second to qubit "Q2" (present in
       operation labels), then to extract just "Q2" data `sectors_to_keep` should be
       `["Q2"]` and `sindices_to_keep` should be `[1]`.

   new_sectors : list or tuple, optional
       New sectors names to map the elements of `sectors_to_keep` onto in the
       output DataSet's circuits.  None means the labels are not renamed.
       This can be useful if, for instance, you want to run a 2-qubit protocol
       that expects the qubits to be labeled "0" and "1" on qubits "4" and "5"
       of a larger set.  Simply set `sectors_to_keep == [4,5]` and
       `new_sectors == [0,1]`.

   idle : string or Label, optional
       The operation label to be used when there are no kept components of a
       "layer" (element) of a circuit.

   record_zero_counts : bool, optional
       Whether zero-counts present in the original `dataset` are recorded
       (stored) in the returned (filtered) DataSet.  If False, then such
       zero counts are ignored, except for potentially registering new
       outcome labels.

   filtercircuits : bool, optional
       Whether or not to "filter" the circuits, by removing gates that act
       outside of the `sectors_to_keep`.

   Returns
   -------
   filtered_dataset : DataSet object
       The DataSet with outcomes and circuits filtered as described above.


.. py:function:: trim_to_constant_numtimesteps(ds)

   Trims a :class:`DataSet` so that each circuit's data comprises the same number of timesteps.

   Returns a new dataset that has data for the same number of time steps for
   every circuit. This is achieved by discarding all time-series data for every
   circuit with a time step index beyond 'min-time-step-index', where
   'min-time-step-index' is the minimum number of time steps over circuits.

   Parameters
   ----------
   ds : DataSet
       The dataset to trim.

   Returns
   -------
   DataSet
       The trimmed dataset, obtained by potentially discarding some of the data.