ad_hoc_data

ad_hoc_data(training_size, test_size, n, gap=0, plot_data=False, one_hot=True, include_sample_total=False, entanglement='full', sampling_method='grid', divisions=0, labelling_method='expectation', class_labels=None)[source]

Generates a dataset that can be fully separated by ZZFeatureMap according to the procedure outlined in [1]. First, vectors \(\vec{x} \in (0, 2\pi]^{n}\) are generated from a uniform distribution, using a sampling method determined by the sampling_method argument. Next, a feature map is applied:

\[|\Phi(\vec{x})\rangle = U_{\Phi(\vec{x})} \, H^{\otimes n} \, U_{\Phi(\vec{x})} \, H^{\otimes n} \, |0^{\otimes n}\rangle\]

where

\[U_{\Phi(\vec{x})} = \exp\Bigl(i \sum_{S \subseteq [n]} \phi_S(\vec{x}) \prod_{i \in S} Z_i\Bigr),\]

and

\[\begin{split}\begin{cases}\phi_{\{i, j\}} = (\pi - x_i)(\pi - x_j) \\ \phi_{\{i\}} = x_i \end{cases}\end{split}\]

The choice of second-order terms \(Z_i Z_j\) in the above summation depends on the entanglement argument ("linear", "circular", or "full"). See arguments for more information.

An observable is then defined as

\[O = V^\dagger \bigl(\prod_i Z_i\bigr) V\]

where \(V\) is a randomly generated unitary matrix. Depending on the labelling_method, if "expectation" is used, the expectation value \(\langle \Phi(\vec{x})| O |\Phi(\vec{x})\rangle\) is compared to the gap parameter \(\Delta\) (from gap) to assign \(\pm 1\) labels. if "measurement" is used, a simple measurement in the computational basis is performed to assign labels.

References:

[1] Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM. Supervised learning with quantum-enhanced feature spaces. Nature. 2019 Mar;567(7747):209–212. arXiv:1804.11326

Parameters:
  • training_size (int) – Number of training samples per class.

  • test_size (int) – Number of testing samples per class.

  • n (int) – Number of qubits (dimension of the feature space).

  • gap (int) – Separation gap \(\Delta\) used when labelling_method="expectation". Default is 0.

  • plot_data (bool) – If True, plots the sampled data (disabled automatically if n > 3). Default is False.

  • one_hot (bool) – If True, returns labels in one-hot format. Default is True.

  • include_sample_total (bool) – If True, the function also returns the total number of accepted samples. Default is False.

  • entanglement (str) –

    Determines which second-order terms \(Z_i Z_j\) appear in \(U_{\Phi(\vec{x})}\). The options are:

    • "linear": Includes terms \(Z_i Z_{i+1}\).

    • "circular": Includes "linear" terms plus \(Z_{n-1}Z_0\).

    • "full": Includes all pairwise terms \(Z_i Z_j\).

    Default is "full".

  • sampling_method (str) –

    The method used to generate uniform samples \(\vec{x}\). Choices are:

    • "grid": Chooses points from a uniform grid (supported only if n <= 3)

    • "hypercube": Uses a variant of Latin Hypercube sampling for stratification

    • "sobol": Uses Sobol sequences

    Default is "grid".

  • divisions (int) – Must be specified if sampling_method="hypercube". This parameter determines the number of stratifications along each dimension. Recommended to be chosen close to training_size.

  • labelling_method (str) –

    Method for assigning labels. The options are:

    • "expectation": Uses the expectation value of the observable.

    • "measurement": Performs a measurement in the computational basis.

    Default is "expectation".

  • class_labels (list | None) – Custom labels for the two classes when one-hot is not enabled. If not provided, the labels default to -1 and +1

Returns:

Tuple containing the following:

  • training_features : np.ndarray

  • training_labels : np.ndarray

  • testing_features : np.ndarray

  • testing_labels : np.ndarray

If include_sample_total=True, a fifth element (np.ndarray) is included that specifies the total number of accepted samples.

Return type:

tuple[ndarray, ndarray, ndarray, ndarray] | tuple[ndarray, ndarray, ndarray, ndarray, ndarray]