pygsti.data.datacomparator
¶
Defines the DataComparator class used to compare multiple DataSets.
Module Contents¶
Classes¶
A comparison between multiple data, presumably taken in different contexts. 
Functions¶

Returns x*log(y). 

The _likelihood for probabilities p_list of a die, given n_list counts for each outcome. 

The log of the _likelihood for probabilities p_list of a die, given n_list counts for each outcome. 

log(_likelihood ratio) between onecontext and multiplecontext models. 

Calculates the JensenShannon divergence (JSD) between onecontext and multiplecontext models. 

The pvalue of a log_likelihood ratio (LLR). 

Finds the signed number of standard deviations for the input log_likelihood ratio (LLR). 

Checks if circuit does contains any gates from op_exclusions. 

Checks if circuit contains any of the gates from op_inclusions. 

Compute a log_likelihoodratio threshold. 

Calculates the TVD between two contexts. 
 pygsti.data.datacomparator._xlogy(x, y)¶
Returns x*log(y).
 pygsti.data.datacomparator._likelihood(p_list, n_list)¶
The _likelihood for probabilities p_list of a die, given n_list counts for each outcome.
 pygsti.data.datacomparator._loglikelihood(p_list, n_list)¶
The log of the _likelihood for probabilities p_list of a die, given n_list counts for each outcome.
 pygsti.data.datacomparator._loglikelihood_ratio(n_list_list)¶
log(_likelihood ratio) between onecontext and multiplecontext models.
Calculates the loglikelood ratio between the null hypothesis that a die has the same probabilities in multiple “contexts” and that it has different probabilities in multiple “contexts”.
 Parameters
n_list_list (List of lists of ints) – A list whereby element i is a list containing observed counts for all the different possible outcomes of the “die” in context i.
 Returns
float – The loglikehood ratio for this model comparison.
 pygsti.data.datacomparator._jensen_shannon_divergence(n_list_list)¶
Calculates the JensenShannon divergence (JSD) between onecontext and multiplecontext models.
The JSD is computed between between different observed frequencies, obtained in different “contexts”, for the different outcomes of a “die” (i.e., coin with more than two outcomes).
 Parameters
n_list_list (List of lists of ints) – A list whereby element i is a list containing observed counts for all the different possible outcomes of the “die” in context i.
 Returns
float – The observed JSD for this data.
 pygsti.data.datacomparator._pval(llrval, dof)¶
The pvalue of a log_likelihood ratio (LLR).
This compares a nested null hypothsis and a larger alternative hypothesis.
 Parameters
llrval (float) – The loglikehood ratio
dof (int) – The number of degrees of freedom associated with the LLR, given by the number of degrees of freedom of the full model space (the alternative hypothesis) minus the number of degrees of freedom of the restricted model space (the null hypothesis space).
 Returns
float – An approximation of the pvalue for this LLR. This is calculated as 1  F(llrval,dof) where F(x,k) is the cumulative distribution function, evaluated at x, for the chi^2_k distribution. The validity of this approximation is due to Wilks’ theorem.
 pygsti.data.datacomparator._llr_to_signed_nsigma(llrval, dof)¶
Finds the signed number of standard deviations for the input log_likelihood ratio (LLR).
This is given by
(llrval  dof) / (sqrt(2*dof)).
This is the number of standard deviations above the mean that llrval is for a chi^2_(dof) distribution.
 Parameters
llrval (float) – The loglikehood ratio
dof (int) – The number of degrees of freedom associated with the LLR, given by the number of degrees of freedom of the full model space (the alternative hypothesis) minus the number of degrees of freedom of the restricted model space (the null hypothesis space), in the hypothesis test.
 Returns
float – The signed standard deviations.
 pygsti.data.datacomparator._is_circuit_allowed_by_exclusion(op_exclusions, circuit)¶
Checks if circuit does contains any gates from op_exclusions.
 Parameters
op_exclusions (iterable) – A sequence of gate labels to exclude.
circuit (Circuit) – The circuit to test.
 Returns
bool – True only if circuit does not contain anything in op_exclusions.
 pygsti.data.datacomparator._is_circuit_allowed_by_inclusion(op_inclusions, circuit)¶
Checks if circuit contains any of the gates from op_inclusions.
The empty circuit is a special case, for which this function always returns True.
 Parameters
op_inclusions (iterable) – A sequence of gate labels to include.
circuit (Circuit) – The circuit to test.
 Returns
bool – True if circuit contains any of the gates from op_inclusions, otherwise False.
 pygsti.data.datacomparator._compute_llr_threshold(significance, dof)¶
Compute a log_likelihoodratio threshold.
Given a pvalue threshold, below which a pvalue is considered statistically significant, it returns the corresponding log_likelihood ratio threshold, above which a LLR is considered statically significant. For a single hypothesis test, the input pvalue should be the desired “significance” level of the test (as a value between 0 and 1). For multiple hypothesis tests, this will normally be smaller than the desired global significance.
 Parameters
significance (float) – pvalue threshold.
dof (int) – The number of degrees of freedom associated with the LLR , given by the number of degrees of freedom of the full model space (the alternative hypothesis) minus the number of degrees of freedom of the restricted model space (the null hypothesis space), in the hypothesis test.
 Returns
float – The significance threshold for the LLR, given by 1  F^{1}(pVal,dof) where F(x,k) is the cumulative distribution function, evaluated at x, for the chi^2_k distribution. This formula is based on Wilks’ theorem.
 pygsti.data.datacomparator._tvd(n_list_list)¶
Calculates the TVD between two contexts.
Calculates the total variation distance (TVD) between between different observed frequencies, obtained in different “contexts”, for the two set of outcomes for roles of a “die”.
 Parameters
n_list_list (List of lists of ints) – A list whereby element i is a list counting counts for the different outcomes of the “die” in context i, for two contexts.
 Returns
float – The observed TVD between the two contexts
 class pygsti.data.datacomparator.DataComparator(dataset_list_or_multidataset, circuits='all', op_exclusions=None, op_inclusions=None, ds_names=None, allow_bad_circuits=False)¶
A comparison between multiple data, presumably taken in different contexts.
This object can be used to run all of the “context dependence detection” methods described in “Probing contextdependent errors in quantum processors”, by Rudinger et al. (See that paper’s supplemental material for explicit demonstrations of this object.)
This object stores the pvalues and log_likelihood ratio values from a consistency comparison between two or more data, and provides methods to:
Perform a hypothesis test to decide which sequences contain statistically significant variation.
Plot pvalue histograms and log_likelihood ratio box plots.
Extract (1) the “statistically significant total variation distance” for a circuit, (2) various other quantifications of the “amount” of context dependence, and (3) the level of statistical significance at which any context dependence is detected.
 Parameters
dataset_list_multidataset (List of DataSets or MultiDataSet) – Either a list of DataSets, containing two or more sets of data to compare, or a MultiDataSet object, containing two or more sets of data to compare. Note that these DataSets should contain data for the same set of Circuits (although if there are additional Circuits these can be ignored using the parameters below). This object is then intended to be used test to see if the results are indicative that the outcome probabilities for these Circuits has changed between the “contexts” that the data was obtained in.
circuits ('all' or list of Circuits, optional (default is 'all')) – If ‘all’ the comparison is implemented for all Circuits in the DataSets. Otherwise, this should be a list containing all the Circuits to run the comparison for (although note that some of these Circuits may be ignored with nondefault options for the next two inputs).
op_exclusions (None or list of gates, optional (default is None)) – If not None, all Circuits containing any of the gates in this list are discarded, and no comparison will be made for those strings.
op_exclusions – If not None, a Circuit will be dropped from the list to run the comparisons for if it doesn’t include some gate from this list (or is the empty circuit).
ds_names (None or list, optional (default is None)) – If dataset_list_multidataset is a list of DataSets, this can be used to specify names for the DataSets in the list. E.g., [“Time 0”, “Time 1”, “Time 3”] or [“Driving”,”NoDriving”].
allow_bad_circuits (bool, optional) – Whether or not the data is allowed to have zero total counts for any circuits in any of the passes. If false, then an error will be raise when there are such unimplemented circuits. If true, then the data from those circuits that weren’t run in one or more of the passes will be discarded before any analysis is performed (equivalent to excluding them explicitly in with the circuits input.
 run(self, significance=0.05, per_circuit_correction='Hochberg', aggregate_test_weighting=0.5, pass_alpha=True, verbosity=2)¶
Runs statistical hypothesis testing.
This detects whether there is statistically significant variation between the DateSets in this DataComparator. This performs hypothesis tests on the data from individual circuits, and a joint hypothesis test on all of the data. With the default settings, this is the method described and implemented in “Probing contextdependent errors in quantum processors”, by Rudinger et al. With nondefault settings, this is some minor variation on that method.
Note that the default values of all the parameters are likely sufficient for most purposes.
 Parameters
significance (float in (0,1), optional (default is 0.05)) – The “global” statistical significance to implement the tests at. I.e, with the standard per_circuit_correction value (and some other values for this parameter) the probability that a sequence that has been flagged up as context dependent is actually from a contextindependent circuit is no more than significance. Precisely, significance is what the “familywise error rate” (FWER) of the full set of hypothesis tests (1 “aggregate test”, and 1 test per sequence) is controlled to, as long as per_circuit_correction is set to the default value, or another option that controls the FWER of the persequence comparion (see below).
per_circuit_correction (string, optional (default is 'Hochberg')) –
The multihypothesis test correction used for the percircuit/sequence comparisons. (See “Probing contextdependent errors in quantum processors”, by Rudinger et al. for the details of what the percircuit comparison is). This can be any string that is an allowed value for the localcorrections input parameter of the HypothesisTest object. This includes:
’Hochberg’. This implements the Hochberg multitest compensation technique. This
is strictly the best method available in the code, if you wish to control the FWER, and it is the method described in “Probing contextdependent errors in quantum processors”, by Rudinger et al.
’Holms’. This implements the Holms multitest compensation technique. This
controls the FWER, and it results in a strictly less powerful test than the Hochberg correction.
’Bonferroni’. This implements the wellknown Bonferroni multitest compensation
technique. This controls the FWER, and it results in a strictly less powerful test than the Hochberg correction.
’none’. This implements no multitest compensation for the persequence comparsions,
so they are all implemented at a “local” signifincance level that is altered from significance only by the (inbuilt) Bonferronilike correction between the “aggregate” test and the persequence tests. This option does not control the FWER, and many sequences may be flagged up as context dependent even if none are.
‘BenjaminiHochberg’. This implements the BenjaminiHockberg multitest compensation technique. This does not control the FWER, and instead controls the “False Detection Rate” (FDR); see, for example, https://en.wikipedia.org/wiki/False_discovery_rate. That means that the global significance is maintained for the test of “Is there any context dependence?”. I.e., one or more tests will trigger when there is no context dependence with at most a probability of significance. But, if one or more persequence tests trigger then we are only guaranteed that (in expectation) no more than a fraction of “localsignifiance” of the circuits that have been flagged up as context dependent actually aren’t. Here, “localsignificance” is the significance at which the persequence tests are, together, implemented, which is significance`*(1  `aggregate_test_weighting) if the aggregate test doesn’t detect context dependence and significance if it does (as long as pass_alpha is True). This method is strictly more powerful than the Hochberg correction, but it controls a different, weaker quantity.
aggregate_test_weighting (float in [0,1], optional (default is 0.5)) – The weighting, in a generalized Bonferroni correction, to put on the “aggregate test”, that jointly tests all of the data for context dependence (in contrast to the persequence tests). If this is 0 then the aggreate test is not implemented, and if it is 1 only the aggregate test is implemented (unless it triggers and pass_alpha is True).
pass_alpha (Bool, optional (default is True)) – The aggregate test is implemented first, at the “local” significance defined by aggregate_test_weighting and significance (see above). If pass_alpha is True, then when the aggregate test triggers all the local significance for this test is passed on to the persequence tests (which are then jointly implemented with significance significance, that is then locally corrected for the multitest correction as specified above), and when the aggregate test doesn’t trigger this local significance isn’t passed on. If pass_alpha is False then local significance of the aggregate test is never passed on from the aggregate test. See “Probing contextdependent errors in quantum processors”, by Rudinger et al. (or hypothesis testing literature) for discussions of why this “significance passing” still maintains a (global) FWER of significance. Note that The default value of True always results in a strictly more powerful test.
verbosity (int, optional (default is 1)) – If > 0 then a summary of the results of the tests is printed to screen. Otherwise, the various .get_…() methods need to be queried to obtain the results of the hypothesis tests.
 Returns
None
 tvd(self, circuit)¶
Returns the observed total variation distacnce (TVD) for the specified circuit.
This is only possible if the comparison is between two sets of data. See Eq. (19) in “Probing contextdependent errors in quantum processors”, by Rudinger et al. for the definition of this observed TVD.
This is a quantification for the “amount” of context dependence for this circuit (see also, jsd(), sstvd() and ssjsd()).
 Parameters
circuit (Circuit) – The circuit to return the TVD of.
 Returns
float – The TVD for the specified circuit.
 sstvd(self, circuit)¶
Returns the “statistically significant total variation distacnce” (SSTVD) for the specified circuit.
This is only possible if the comparison is between two sets of data. The SSTVD is None if the circuit has not been found to have statistically significant variation. Otherwise it is equal to the observed TVD. See Eq. (20) and surrounding discussion in “Probing contextdependent errors in quantum processors”, by Rudinger et al., for more information.
This is a quantification for the “amount” of context dependence for this circuit (see also, jsd(), _tvd() and ssjsd()).
 Parameters
circuit (Circuit) – The circuit to return the SSTVD of.
 Returns
float – The SSTVD for the specified circuit.
 property maximum_sstvd(self)¶
Returns the maximum, over circuits, of the “statistically significant total variation distance” (SSTVD).
This is only possible if the comparison is between two sets of data. See the .sstvd() method for information on SSTVD.
 Returns
float – The circuit associated with the maximum SSTVD, and the SSTVD of that circuit.
 pvalue(self, circuit)¶
Returns the pvalue for the log_likelihood ratio test for the specified circuit.
 Parameters
circuit (Circuit) – The circuit to return the pvalue of.
 Returns
float – The pvalue of the specified circuit.
 property pvalue_pseudothreshold(self)¶
Returns the (multitestadjusted) statistical significance pseudothreshold for the persequence pvalues.
The pvalues under consideration are those obtained from the loglikehood ratio test. This is a “pseudothreshold”, because it is datadependent in general, but all the persequence pvalues below this value are statistically significant. This quantity is given by Eq. (9) in “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The statistical significance pseudothreshold for the persequence pvalue.
 llr(self, circuit)¶
Returns the log_likelihood ratio (LLR) for the input circuit.
This is the quantity defined in Eq (4) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Parameters
circuit (Circuit) – The circuit to return the LLR of.
 Returns
float – The LLR of the specified circuit.
 property llr_pseudothreshold(self)¶
Returns the statistical significance pseudothreshold for the persequence log_likelihood ratio (LLR).
This results has been multitestadjusted.
This is a “pseudothreshold”, because it is datadependent in general, but all LLRs above this value are statistically significant. This quantity is given by Eq (10) in “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The statistical significance pseudothreshold for persequence LLR.
 jsd(self, circuit)¶
Returns the observed JensenShannon divergence (JSD) between “contexts” for the specified circuit.
The JSD is a rescaling of the LLR, given by dividing the LLR by 2*N where N is the total number of counts (summed over contexts) for this circuit. This quantity is given by Eq (15) in “Probing contextdependent errors in quantum processors”, Rudinger et al.
This is a quantification for the “amount” of context dependence for this circuit (see also, _tvd(), sstvd() and ssjsd()).
 Parameters
circuit (Circuit) – The circuit to return the JSD of
 Returns
float – The JSD of the specified circuit.
 property jsd_pseudothreshold(self)¶
The statistical significance pseudothreshold for the JensenShannon divergence (JSD) between “contexts”.
This is a rescaling of the pseudothreshold for the LLR, returned by the method .llr_pseudothreshold; see that method for more details. This threshold is also given by Eq (17) in “Probing contextdependent errors in quantum processors”, by Rudinger et al.
Note that this pseudothreshold is not defined if the total number of counts (summed over contexts) for a sequence varies between sequences.
 Returns
float – The pseudothreshold for the JSD of a circuit, if welldefined.
 ssjsd(self, circuit)¶
Returns the statistically significant JensenShannon divergence” (SSJSD) between “contexts” for circuit.
This is the JSD of the circuit (see .jsd()), if the circuit has been found to be context dependent, and otherwise it is None. This quantity is the JSD version of the SSTVD given in Eq. (20) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
This is a quantification for the “amount” of context dependence for this circuit (see also, _tvd(), sstvd() and ssjsd()).
 Parameters
circuit (Circuit) – The circuit to return the JSD of
 Returns
float – The JSD of the specified circuit.
 property aggregate_llr(self)¶
Returns the “aggregate” log_likelihood ratio (LLR).
This values compares the null hypothesis of no context dependence in any sequence with the full model of arbitrary context dependence. This is the sum of the persequence LLRs, and it is defined in Eq (11) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The aggregate LLR.
 property aggregate_llr_threshold(self)¶
The (multitestadjusted) statistical significance threshold for the “aggregate” log_likelihood ratio (LLR).
Above this value, the LLR is significant. See .aggregate_llr for more details. This quantity is the LLR version of the quantity defined in Eq (14) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The threshold above which the aggregate LLR is statistically significant.
 property aggregate_pvalue(self)¶
Returns the pvalue for the “aggregate” log_likelihood ratio (LLR).
This compares the null hypothesis of no context dependence in any sequence with the full model of arbitrary dependence. This LLR is defined in Eq (11) in “Probing contextdependent errors in quantum processors”, by Rudinger et al., and it is converted to a pvalue via Wilks’ theorem (see discussion therein).
Note that this pvalue is often zero to machine precision, when there is context dependence, so a more useful number is often returned by aggregate_nsigma() (that quantity is equivalent to this pvalue but expressed on a different scale).
 Returns
float – The pvalue of the aggregate LLR.
 property aggregate_pvalue_threshold(self)¶
The (multitestadjusted) statistical significance threshold for the pvalue of the “aggregate” LLR.
Here, LLR refers to the log_likelihood ratio. Below this pvalue the LLR would be deemed significant. See the .aggregate_pvalue method for more details.
 Returns
float – The statistical significance threshold for the pvalue of the “aggregate” LLR.
 property aggregate_nsigma(self)¶
The number of standard deviations the “aggregate” LLR is above the contextindependent mean.
More specifically, the number of standard deviations above the contextindependent mean that the “aggregate” log_likelihood ratio (LLR) is. This quantity is defined in Eq (13) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The number of signed standard deviations of the aggregate LLR .
 property aggregate_nsigma_threshold(self)¶
The significance threshold above which the signed standard deviations of the aggregate LLR is significant.
The (multitestadjusted) statistical significance threshold for the signed standard deviations of the the “aggregate” log_likelihood ratio (LLR). See the .aggregate_nsigma method for more details. This quantity is defined in Eq (14) of “Probing contextdependent errors in quantum processors”, by Rudinger et al.
 Returns
float – The statistical significance threshold above which the signed standard deviations of the aggregate LLR is significant.
 worst_circuits(self, number)¶
Returns the “worst” circuits that have the smallest pvalues.
 Parameters
number (int) – The number of circuits to return.
 Returns
list – A list of tuples containing the worst number circuits along with the correpsonding pvalues.