pygsti.extras.drift.signal
Signal analysis functions for time-series data
Module Contents
Functions
|
Generates a power spectrum from the input time-series data. Before converting to a power |
|
Maps the vector x over [0, counts] as |
|
Inverts the standardizer function. |
|
Returns the Type-II discrete cosine transform of y, with an orthogonal normalization, where |
|
Inverts the dct function. |
|
Returns the discrete Fourier transform of y, with a unitary normalization, where |
|
Inverts the dft function. |
|
Performs a Lomb-Scargle periodogram (lsp) on the input data, returning powers |
|
Calculates the Bartlett power spectrum. This involves splitting the data into disjoint |
|
The omega`th DCT basis function, for a initial time of `starttime and a time-step of timedif, |
|
The multi-test adjusted signficance statistical significance threshold for |
|
Converts a power to a p-value, under the assumption that the power is chi2 |
|
The p-value of the test statistic max(lambda_i) where there are numpowers |
|
The Benjamini-Hockberg quasi-threshold for finding the statistically significant powers in |
|
Returns a set of frequencies to create spectra for, for the input data. These frequencies are |
|
Calculates the Fourier frequencies associated with a timestep and a total number of times. These frequencies |
Calculates the Fourier frequencies from a set of times. These frequencies are in 1/units, where |
|
|
Finds the amplitudes in the data at the specified frequency indices. |
|
Returns the Hoyer sparsity index of the input vector p. This is defined to be: |
|
Takes an arbitrary input vector p and maps it to a vector bounded within [0,1]. |
|
Transforms the input float x as: |
|
Implements a low-pass filter on the input array, by DCTing the input, mapping all but the lowest |
|
Implements a moving average on sequence with an averaging width of width. |
|
Generates a minimal sparsity probability trajectory for the specified power, and a specified number of modes |
|
Generates a probability trajectory with the specified power that is approximately Gaussian distribution |
- pygsti.extras.drift.signal.spectrum(x, times=None, null_hypothesis=None, counts=1, frequencies='auto', transform='dct', returnfrequencies=True)
Generates a power spectrum from the input time-series data. Before converting to a power spectrum, x is rescaled as
x - > (x - counts * null_hypothesis) / sqrt(counts * null_hypothesis * (1-null_hypothesis)),
where the arithmetic is element-wise, and null_hypothesis is a vector in (0,1). If null_hypothesis is None it is set to the mean of x. If that mean is 0 or 1 then the power spectrum returned is (0,1,1,1,…).
Parameters
- x: array
The time-series data to convert into a power spectrum
- times: array, optional
The times associated with the data in x. This is not optional for the lsp transform
- null_hypothesis: None or array, optional
Used to normalize the data, and should be the null hypothesis that is being tested for the probability trajectory from which x is drawn. If null_hypothesis is None it is set to the mean of x.
- counts: int, optional
The number of counts per time-step, whereby all values of x are within [0,counts]. In the main usages for drift detection, x is the clickstream for a single measurement outcome – so x contains integers between 0 and the number of measurements at a (perhaps coarse-grained) time. counts is this number of measurements per time.
- frequencies: ‘auto’ or array, optional
The frequencies to generate the power spectrum for. Only relevant for transform=`lsp`.
- transform: ‘dct’, ‘dft’ or ‘lsp’, optional
The transform to use to generate power spectrum. ‘dct’ is the Type-II discrete cosine transform with an orthogonal normalization; ‘dft’ is the discrete Fourier transform with a unitary normalization; ‘lsp’ is the float-meaning Lomb-Scargle periodogram with an orthogonal-like normalization.
- returnfrequencies: bool, optional
Whether to return the frequencies corrsponding to the powers
Returns
- if returnfrequencies:
- array or None
The frequencies corresponding to the power spectrum. None is returned if the frequencies cannot be ascertained (when times is not specified).
- array or None
The amplitudes, that are squared to obtain the powers. None is returned when the transform does not generate amplitudes (this is the case for lsp)
- array
The power spectrum
- pygsti.extras.drift.signal.standardizer(x, null_hypothesis=None, counts=1)
Maps the vector x over [0, counts] as
x - > (x - counts * null_hypothesis) / sqrt(counts * null_hypothesis * (1-null_hypothesis)),
where the arithmetic is element-wise, and null_hypothesis is a vector in (0,1). If null_hypothesis is None it is set to the mean of x. If that mean is 0 or 1 then None is returned.
- pygsti.extras.drift.signal.unstandardizer(z, null_hypothesis, counts=1)
Inverts the standardizer function.
- pygsti.extras.drift.signal.dct(x, null_hypothesis=None, counts=1)
Returns the Type-II discrete cosine transform of y, with an orthogonal normalization, where
y = (x - counts * null_hypothesis) / sqrt(counts * null_hypothesis * (1-null_hypothesis)),
where the arithmetic is element-wise, and null_hypothesis is a vector in (0,1). If null_hypothesis is None it is set to the mean of x. If that mean is 0 or 1 then the vector of all ones, except for the first element which is set to zero, is returned.
Parameters
- xarray
Data string, on which the normalization and discrete cosine transformation is performed. If counts is not specified, this must be a bit string.
- null_hypothesisarray, optional
If not None, an array to use in the normalization before the dct. If None, it is taken to be an array in which every element is the mean of x.
- countsint, optional
A factor in the normalization, that should correspond to the counts-per-timestep (so for full time resolution this is 1).
Returns
- array
The DCT modes described above.
- pygsti.extras.drift.signal.idct(modes, null_hypothesis, counts=1)
Inverts the dct function.
Parameters
- modesarray
The fourier modes to be transformed to the time domain.
- null_hypothesisarray
The array that was used in the normalization before the dct. This is commonly the mean of the time-domain data vector. All elements of this array must be in (0,1).
- countsint, optional
A factor in the normalization, that should correspond to the counts-per-timestep (so for full time resolution this is 1).
Returns
- array
Inverse of the dct function
- pygsti.extras.drift.signal.dft(x, null_hypothesis=None, counts=1)
Returns the discrete Fourier transform of y, with a unitary normalization, where y is an array with elements related to the x array by
y = (x - counts * null_hypothesis) / sqrt(counts * null_hypothesis * (1-null_hypothesis)),
where the arithmetic is element-wise, and null_hypothesis is a vector in (0,1). If null_hypothesis is None it is set to the mean of x. If that mean is 0 or 1 then the vector of all ones, except for the first element which is set to zero, is returned.
Parameters
- xarray
Data string, on which the normalization and discrete cosine transformation is performed. If counts is not specified, this must be a bit string.
- null_hypothesisarray, optional
If not None, an array to use in the normalization before the dct. If None, it is taken to be an array in which every element is the mean of x.
- countsint, optional
A factor in the normalization, that should correspond to the counts-per-timestep (so for full time resolution this is 1).
Returns
- array
The DFT modes described above.
- pygsti.extras.drift.signal.idft(modes, null_hypothesis, counts=1)
Inverts the dft function.
Parameters
- modesarray
The fourier modes to be transformed to the time domain.
- null_hypothesisarray
The array that was used in the normalization before the dct. This is commonly the mean of the time-domain data vector. All elements of this array must be in (0,1).
- countsint, optional
A factor in the normalization, that should correspond to the counts-per-timestep (so for full time resolution this is 1).
Returns
- array
Inverse of the dft function
- pygsti.extras.drift.signal.lsp(x, times, frequencies='auto', null_hypothesis=None, counts=1)
Performs a Lomb-Scargle periodogram (lsp) on the input data, returning powers and frequencies.
This function uses astropy, which is not a required dependency for pyGSTi
Parameters
todo
Returns
todo
- pygsti.extras.drift.signal.bartlett_spectrum(x, numspectra, counts=1, null_hypothesis=None, transform='dct')
Calculates the Bartlett power spectrum. This involves splitting the data into disjoint sections of the same length, and generating a power spectrum for each such section, and then averaging all these power spectra.
Parameters
- xarray
The data to calculate the spectrum for.
- numspectraint
The number of “chunks” to split the data into, with a spectra calculated for each chunk. Note that if len(x) / num_spectra is not an integer, then not all of the data will be used.
- countsint, optional
The number of “clicks” per time-step in x, used for standarizing the data.
- null_hypothesisarray, optional
The null hypothesis that we’re looking for violations of. If left as None then this is the no-drift null hypothesis, with the static probability set to the mean of the data.
- transformstr, optional
The transform to use the generate the power spectra.
Returns
- array
The Bartlett power spectrum
- pygsti.extras.drift.signal.dct_basisfunction(omega, times, starttime, timedif)
The omega`th DCT basis function, for a initial time of `starttime and a time-step of timedif, evaluated at the times times.
- pygsti.extras.drift.signal.power_significance_threshold(significance, numtests, dof)
The multi-test adjusted signficance statistical significance threshold for testing numtests test statistics that all have a marginal distribution that is chi2 with dof degrees of freedom.
- pygsti.extras.drift.signal.power_to_pvalue(power, dof)
Converts a power to a p-value, under the assumption that the power is chi2 distribution with dof degrees of freedom.
- pygsti.extras.drift.signal.maxpower_pvalue(maxpower, numpowers, dof)
The p-value of the test statistic max(lambda_i) where there are numpowers lambda_i test statistics, and they are i.i.d. as chi2 with dof degrees of freedom. This approximates the p-value of the largest power in “clickstream” power spectrum (generated from spectrum), with the DOF given by the number of clicks per times.
- pygsti.extras.drift.signal.power_significance_quasithreshold(significance, numstats, dof, procedure='Benjamini-Hochberg')
The Benjamini-Hockberg quasi-threshold for finding the statistically significant powers in a power spectrum.
- pygsti.extras.drift.signal.compute_auto_frequencies(ds, transform='dct')
Returns a set of frequencies to create spectra for, for the input data. These frequencies are in units of 1 / unit where unit is the units of the time-stamps in the DataSet. What this function is doing has a fundmentally different meaning depending on whether the transform is time-stamp aware (here, the LSP) or not (here, the DCT and DFT).
Time-stamp aware transforms take the frequencies to calculate powers at as an input, so this chooses these frequencies, which are, explicitly, the frequencies associated with the powers. The task of choosing the frequencies amounts to picking the best set of frequencies at which to interogate the true probability trajectory for components. As there are complex factors involved in this choice that the code has no way of knowing, sometimes it is best to choose them yourself. E.g., if different frequencies are used for different circuits it isn’t possible to (meaningfully) averaging power spectra across circuits, but this might be preferable if the time-step is sufficiently different between different circuits – it depends on your aims.
For time-stamp unaware transforms, these are the frequencies that, given that we’re implementing the, e.g., DCT, the generated power spectrum is implicitly with respect to. In the case of data on a fixed time-grid, i.e., equally spaced data, then there is a precise set of frequencies implicit in the transform. Otherwise, these frequencies are explicitly at least slightly ad hoc, and choosing these frequencies amounts to choosing those frequencies that “best” approximate the properties being interogatted with fitting each, e.g., DCT basis function to the (timestamp-free) data.
Parameters
- ds: DataSet or MultiDataset
Contains time-series data that the “auto” frequencies are calculated for.
- transform: str, optional
The type of transform that the frequencies are being chosen for.
Returns
- frequencieslist
A list of lists, where each of the consituent lists is a set of frequencies that are the are the derived frequencies for one or more of the dataset rows (i.e., for one or circuits). It is common for this to be a length-1 list, containing a single set of frequencies that are the frequencies this function has chosen for every circuit.
- pointersdict
A dictionary that “points” from the index of a circuit (indexed in the order of ds.keys()) to the index of the list of frequencies for that circuit. No entry in the dictionary for an index should be interpretted as 0. So this can be an empty dictionary (with the frequencies list then necessarily of length 1, containing a single set of frequencies for every circuit).
- pygsti.extras.drift.signal.frequencies_from_timestep(timestep, numtimes)
Calculates the Fourier frequencies associated with a timestep and a total number of times. These frequencies are in 1/units, where units is the units of time in times.
Parameters
- timestepfloat
The time difference between each data point.
- numtimesint
The total number of times.
Returns
- array
The frequencies associated with Fourier analysis on data with these timestamps.
- pygsti.extras.drift.signal.fourier_frequencies_from_times(times)
Calculates the Fourier frequencies from a set of times. These frequencies are in 1/units, where units is the units of time in times. Note that if the times are not exactly equally spaced, then the Fourier frequencies are ill-defined, and this returns the frequencies based on assuming that the time-step is the mean time-step. This is reasonable for small deviations from equally spaced times, but not otherwise.
Parameters
- timeslist
The times from which to calculate the frequencies
Returns
- array
The frequencies associated with Fourier analysis on data with these timestamps.
- pygsti.extras.drift.signal.amplitudes_at_frequencies(freq_indices, timeseries, times=None, transform='dct')
Finds the amplitudes in the data at the specified frequency indices. Todo: better docstring. Currently only works for the DCT.
- pygsti.extras.drift.signal.sparsity(p)
Returns the Hoyer sparsity index of the input vector p. This is defined to be:
HoyerIndex = (sqrt(l) - (|p|_1 / |p|_2)) / (sqrt(l) - 1)
where l is the length of the vector and |.|_1 and |.|_2 are the 1-norm and 2-norm of the vector, resp.
- pygsti.extras.drift.signal.renormalizer(p, method='logistic')
Takes an arbitrary input vector p and maps it to a vector bounded within [0,1].
If method = “sharp”, then it maps any value in p below zero to zero and any value in p above one to one.
If method = “logistic” then it ‘squashes’ the vector around it’s mean value using a logistic function. The exact transform for each element x of p is:
x -> mean - nu + (2*nu) / (1 + exp(-2 * (x - mean) / nu))
where mean is the mean value of p, and nu is min(1-mean,mean). This transformation is only sensible when the mean of p is within [0,1].
Parameters
- parray of floats
The vector with values to be mapped to [0,1]
- method{‘logistic’, ‘sharp’}
The method for “squashing” the input vector.
Returns
- numpy.array
The input vector “squashed” using the specified method.
- pygsti.extras.drift.signal.logistic_transform(x, mean)
Transforms the input float x as:
x -> mean - nu + (2*nu) / (1 + exp(-2 * (x - mean) / nu))
where nu is min(1-mean,mean). This is a form of logistic transform, and maps x to a value in [0,1].
- pygsti.extras.drift.signal.lowpass_filter(data, max_freq=None)
Implements a low-pass filter on the input array, by DCTing the input, mapping all but the lowest max_freq modes to zero, and then inverting the transform.
Parameters
- datanumpy.array,
The vector to low-pass filter
- max_freqNone or int, optional
The highest frequency to keep. If None then it keeps the minimum of 50 or l/10 frequencies, where l is the length of the data vector
Returns
- numpy.array
The low-pass-filtered data.
- pygsti.extras.drift.signal.moving_average(sequence, width=100)
Implements a moving average on sequence with an averaging width of width.
- pygsti.extras.drift.signal.generate_flat_signal(power, nummodes, n, candidatefreqs=None, base=0.5, method='sharp')
Generates a minimal sparsity probability trajectory for the specified power, and a specified number of modes containing non-zero power. This is a probability trajectory that has a signal power of power (where, for a probability trajectory p, the signal vector is s = p - mean(p)), and has that power equally spread over nummodes different frequencies. The phases on the amplitudes are randomized.
Parameters
- powerfloat
The amount of power in the signal of the generated probability trajectory
- nummodesint
The number of modes to equally spread that power over.
- nint
The number of sample times that the probability trajectory is being created for.
- candidatefreqslist, optional
A list containing a subset of 1,2,…,n-1. If not None, then all frequencies are included.
- basefloat in (0,1), optional
The mean of the generated probability trajectory.
- methodstr, optional
The method used to guarantee that the created probability trajectory is a valid probability, i.e., is within [0,1]. Options:
- None. In this case nothing is done to guarantee that the output is a valid probability
trajectory, but the power in the signal is guaranteed to be the input power.
- ‘sharp’ or ‘logistic’. The ‘sharp’ or ‘logistic’ methods with the renormalizer()
function are used to map the generated vector onto [0,1]. When the renormalizer does a non-trivial action on the generated vector, the power in the output will be below the requested power. Moreover, the power will not be distributed only over the requested number of modes (i.e., it impacts the spectrum of the output vector).
Returns
- array
A probability trajectory.
- pygsti.extras.drift.signal.generate_gaussian_signal(power, center, spread, n, base=0.5, method='sharp')
Generates a probability trajectory with the specified power that is approximately Gaussian distribution accross frequencies, centered in the frequency specified by center. The probability trajectory has a rescaled signal power of power, where, for a probability trajectory p, the signal vector is s = p - mean(p), and the rescaled signal vector is p - mean(p) / sqrt(mean(p) * (1 - mean(p)). The phases on the amplitudes are randomized.
Parameters
- powerfloat
The amount of power in the rescaled signal of the generated probability trajectory.
- centerint
The mode on which to center the Gaussian.
- spreadint
The spread of the Gaussian
- nint
The number of sample times that the probability trajectory is being created for.
- basefloat in (0,1), optional
The mean of the generated probability trajectory.
- methodstr, optional
The method used to guarantee that the created probability trajectory is a valid probability, i.e., is within [0,1]. Options:
- None. In this case nothing is done to guarantee that the output is a valid probability
trajectory, but the power in the signal is guaranteed to be the input power.
- ‘sharp’ or ‘logistic’. The ‘sharp’ or ‘logistic’ methods with the renormalizer()
function are used to map the generated vector onto [0,1]. When the renormalizer does a non-trivial action on the generated vector, the power in the output will be below the requested power. Moreover, the power will not be distributed only over the requested number of modes (i.e., it impacts the spectrum of the output vector).
Returns
- array
A probability trajectory.