Data Processor: Wrangling data

Data processing is the act of taking the data returned by the backend and converting it into a format that can be analyzed. It is implemented as a chain of data processing steps that transform various input data, e.g. IQ data, into a desired format, e.g. population, which can be analyzed. These data transformations may consist of multiple steps, such as kerneling and discrimination. Each step is implemented by a member of the DataAction class, also called a node.

The data processor implements the __call__() method. Once initialized, it can thus be used as a standard python function:

processor = DataProcessor(input_key="memory", [Node1(), Node2(), ...])
out_data = processor(in_data)

The data input to the processor is a sequence of dictionaries each representing the result of a single circuit. The output of the processor is a numpy array whose shape and data type depend on the combination of the nodes in the data processor.

Uncertainties that arise from quantum measurements or finite sampling can be taken into account in the nodes: a standard error can be generated in a node and can be propagated through the subsequent nodes in the data processor. Correlation between computed values is also considered.

Let’s look at an example to see how to initialize an instance of DataProcessor and create the DataAction nodes that process the data.

Data types on IBM Quantum backends

IBM Quantum backends can return different types of data. There is counts data and IQ data [1], referred to as level 2 and level 1 data, respectively. Level 2 data corresponds to a dictionary with bit-strings as keys and the number of times the bit-string was measured as a value. Importantly for some experiments, the backends can return a lower data level known as IQ data. Here, I and Q stand for in phase and quadrature. The IQ are points in the complex plane corresponding to a time integrated measurement signal which is reflected or transmitted through the readout resonator depending on the setup. IQ data can be returned as “single” or “averaged” data. Here, single means that the outcome of each single shot is returned while average only returns the average of the IQ points over the measured shots. The type of data that an experiment should return is specified by the run_options() of an experiment.

Processing data of different types

An experiment should work with the different data levels. Crucially, the analysis, such as a curve analysis, expects the same data format no matter the run options of the experiment. Transforming the data returned by the backend into the format that the analysis accepts is done by the data_processing library. The key class here is the DataProcessor. It is initialized from two arguments. The first is the input_key, which is typically “memory” or “counts”, and identifies the key in the experiment data where the data is located. The second argument data_actions is a list of nodes where each node performs a processing step of the data processor. Crucially, the output of one node in the list is the input to the next node in the list.

To illustrate the data processing module, we consider an example in which we measure a rabi oscillation with different data levels. The code below sets up the Rabi experiment.

Note

This tutorial requires the qiskit_dynamics package to run simulations. You can install it with python -m pip install qiskit-dynamics.

import numpy as np

from qiskit import pulse
from qiskit.circuit import Parameter

from qiskit_experiments.test.pulse_backend import SingleTransmonTestBackend
from qiskit_experiments.data_processing import DataProcessor, nodes
from qiskit_experiments.library import Rabi

with pulse.build() as sched:
    pulse.play(
        pulse.Gaussian(160, Parameter("amp"), sigma=40),
        pulse.DriveChannel(0)
    )

backend = SingleTransmonTestBackend(seed=100)

exp = Rabi(
    physical_qubits=(0,),
    backend=backend,
    schedule=sched,
    amplitudes=np.linspace(-0.1, 0.1, 21)
)

We now run the Rabi experiment twice, once with level 1 data and once with level 2 data. Here, we manually configure two data processors but note that typically you do not need to do this yourself. We begin with single-shot IQ data.

data_nodes = [nodes.SVD(), nodes.AverageData(axis=1), nodes.MinMaxNormalize()]
iq_processor = DataProcessor("memory", data_nodes)
exp.analysis.set_options(data_processor=iq_processor)

exp_data = exp.run(meas_level=1, meas_return="single").block_for_results()

display(exp_data.figure(0))
../_images/data_processor_2_0.png

Since we requested IQ data we set the input key to “memory” which is the key under which the data is located in the experiment data. The iq_processor contains three nodes. The first node SVD is a singular value decomposition which projects the two-dimensional IQ data on its main axis. The second node averages the single-shot data. The output is a single float per quantum circuit. Finally, the last node MinMaxNormalize normalizes the measured signal to the interval [0, 1]. The iq_dataprocessor is then set as an option of the analysis class. For those who are wondering what single-shot IQ data looks like we plot the data returned by the zeroth and sixth circuit in the code block below.

from qiskit_experiments.visualization import IQPlotter, MplDrawer

plotter = IQPlotter(MplDrawer())

for idx in [0, 6]:
    plotter.set_series_data(
        f"Circuit {idx}",
        points=np.array(exp_data.data(idx)["memory"]).squeeze(),
    )

plotter.figure()
../_images/data_processor_4_0.png

Now we turn to counts data and see how the data processor needs to be changed.

data_nodes = [nodes.Probability(outcome="1")]
count_processor = DataProcessor("counts", data_nodes)
exp.analysis.set_options(data_processor=count_processor)

exp_data = exp.run(meas_level=2).block_for_results()

display(exp_data.figure(0))
../_images/data_processor_5_0.png

Now, the input_key is “counts” since that is the key under which the counts data is saved in instances of ExperimentData. The list of nodes comprises a single data action which converts the counts to an estimation of the probability of measuring the outcome “1”.

Writing data actions

The nodes in a data processor are all sub-classes of DataAction. Users who wish to write their own data actions must (i) sub-class DataAction and (ii) implement the internal _process method called by instances of DataProcessor. This method is the processing step that the node implements. It takes a numpy array as input and returns the processed numpy array as output. This output serves as the input for the next node in the data processing chain. Here, the input and output numpy arrays can have a different shape.

In addition to the standard DataAction the data processing package also supports trainable data actions as subclasses of TrainableDataAction. These nodes must first be trained on the data before they can process the data. An example of a TrainableDataAction is the SVD node which must first learn the main axis of the data before it can project a data point onto this axis. To implement trainable nodes developers must also implement the train() method. This method is called when train() is called.

Conclusion

Data is processed by data processors that call a list of nodes each acting once on the data. Data processing connects the data returned by the backend to the data that the analysis classes need. Typically, you will not need to implement the data processing yourself since Qiskit Experiments has built-in methods that determine the correct instance of DataProcessor for your data. More advanced data processing includes, for example, handling restless measurements.

References

See also