Sample-based quantum diagonalization with Fulqrum

In this tutorial, we will use Fulqrum to find an approximation to the ground state of a chemistry Hamiltonian following a sampled-based quantum diagonalization (SQD) approach.

We will use different blocks from Fulqrum to construct a full end-to-end workflow to estimate molecular Hamiltonian ground state energy using configuration recovery.

  • Convert a molecular Hamiltonian from one- and two-body integrals to a Fulqrum compatible operator.

  • Construct a subspace from bitstrings (configurations or Slater Determinants) and project the operator on to the subsapce as a sparse matrix in the CSR format.

  • Eigensolve the sparse matrix with a choice of an eigensolver.

  • Compute electron orbital occupancies from the eigenvector and subspace bitstrings and use them to recover configurations.

  • Use recovered configurations along with carryover bitstrings in the next iteration to refine ground state estimation.

This tutorial requires following two packages in addition to Fulqrum:

  • PySCF: To define the molecule and compute the one- and two-body integrals.

  • Openfermion: To convert the one- and two-body integrals into a Fulqrum compatible operator internally.

Define Molecule

In this tutorial, we will approximate the ground state energy of an \(N_2\) molecule in the 6-31g basis. First, we will specify the molecule and its properties. Then, we will compute one- and two-body integrals using the PySCF package.

[1]:
import warnings

import pyscf
import pyscf.mcscf

warnings.filterwarnings("ignore")

# Build N2 molecule
mol = pyscf.gto.Mole()
mol.build(
    atom=[["N", (0, 0, 0)], ["N", (1.0, 0, 0)]],
    basis="6-31g",
    symmetry="Dooh",
)

# Define active space
n_frozen = 2
active_space = range(n_frozen, mol.nao_nr())

# Get molecular integrals
scf = pyscf.scf.RHF(mol).run()
num_orbitals = len(active_space)
n_electrons = int(sum(scf.mo_occ[active_space]))
num_elec_a = (n_electrons + mol.spin) // 2
num_elec_b = (n_electrons - mol.spin) // 2
cas = pyscf.mcscf.CASCI(scf, num_orbitals, (num_elec_a, num_elec_b))
mo = cas.sort_mo(active_space, base=0)
hcore, nuclear_repulsion_energy = cas.get_h1cas(mo)  # hcore: one-body integrals
eri = pyscf.ao2mo.restore(1, cas.get_h2cas(mo), num_orbitals)  # eri: two-body integrals
converged SCF energy = -108.835236570774

Convert to the Fulqrum operator

Next, we will convert the one- and two-body integrals into a Fulqrum FermionicOperator using the integrals_to_fq_fermionic_op() converter available from Fulqrum. Next, we will perform an extended Jordan-Wigner transformation using extended_jw_transform() to convert it to a Fulqrum QubitOperator. The QubitOperator will be used in the subsequent projection step.

[2]:
from fulqrum.convert.integrals import integrals_to_fq_fermionic_op

fermionic_op = integrals_to_fq_fermionic_op(
    one_body_integrals=hcore, two_body_integrals=eri
)

print(f"Num terms operator: {fermionic_op.num_terms}")

fulqrum_operator = fermionic_op.extended_jw_transformation()
print(f"Num terms operator [after JW transform]: {fulqrum_operator.num_terms}")
Num terms operator: 36960
Num terms operator [after JW transform]: 13280

Load random bitstrings

Now, we load some pre-generated random bitstrings for the subspace construction and projection.

[3]:
import json

with open("./data/random_counts_n2_10000_samples.json", "r") as jf:
    counts = json.load(jf)
print(f"Num bitstrings in raw counts: {len(counts)}")
Num bitstrings in raw counts: 10000

Next, we will separate bitstrings and their respective probabilities into two separate lists so that it becomes easier to use them later.

[4]:
import numpy as np

raw_bitstrings = []
raw_probs = []
for bs, cnt in counts.items():
    raw_bitstrings.append(bs)
    raw_probs.append(cnt)
raw_probs = np.array(raw_probs, dtype=float)
raw_probs /= np.linalg.norm(raw_probs)

Configuration recovery loop

Now, we are ready to run the full sample-based quantum diagonalization (SQD) routine with iterative configuration recovery to refine our estimation of the ground state energy of \(N_2\) molecule. Fulqrum has an sqd module that uses the qiskit-addon-sqd-hpc as a submodule, which provides postselection, subsampling, and configuration recovery capabilities. Each SQD loop has the following steps:

  1. Construct alpha (beta) half strings by combining Hartree-Fock state, carryover configurations from previous round, and recovered configurations from raw counts - in that order.

  2. Construct a Fulqrum Subspace using the spin-up (alpha) and spin-down (beta) half strings. For this \(N_2\) example, both will be same. Fulqrum Subspace internally sorts the half strings and takes a Cartesian product to construct full strings.

  3. Project the fulqrum_operator on to this subspace and construct a CSR-format sparse matrix representation of the projected Hamiltonian. The CSR matrix will be wrapped into a LinearOperator for faster eigensolving.

  4. Prepare a simple initial guess vector \(v0\) for the eigensolver. While it is optional, it can speed up eigensolving.

  5. Eigensolve the projected Hamiltonian matrix to get the minimum eigenvalue (ground state energy estimate) and the corresponding eigenvector.

  6. Use the eigenvector to compute electron orbital occupancies that will be used to recover configurations in the next round and also to select important configurations to carryover to the next round.

[5]:
from collections import OrderedDict


def unique_alpha_beta_combined(bitstrings):
    """A utility function for getting unique alpha and beta halves from full bitstrings.

    Args:
        bitstrings (list[str]): A list of full bitstrings consisting of both alpha and beta parts.

    Returns:
        A dictionary with combined unique ``alpha`` and ``beta`` half strings.
    """
    if not bitstrings:
        return OrderedDict()

    unique_ab = OrderedDict()
    num_spatial_orb = len(bitstrings[0]) // 2

    for bs in bitstrings:
        a = bs[num_spatial_orb:]
        b = bs[:num_spatial_orb]

        unique_ab[a] = 1
        unique_ab[b] = 1

    return unique_ab
[6]:
import time

import scipy.sparse.linalg as spla

import fulqrum as fq
from fulqrum.core.sqd import (
    postselect_by_hamming_right_and_left,
    subsample,
    recover_configurations,
    get_carryover_full_strs,
)

# options
tol = 1e-5
num_batches = 1
carryover_threshold = 1e-4
seed = 0
nelec = (num_elec_a, num_elec_b)
samples_per_batch = 500
current_occupancies = None
max_iterations = 5

carryover_full_strs = []
S = None  # Subspace

for ni in range(max_iterations):
    iter_start = time.perf_counter()
    print(f"ITERATION {ni}")
    if current_occupancies is None:
        # If we don't have average orbital occupancy information, simply postselect
        # bitstrings with the correct numbers of spin-up and spin-down electrons
        bitstrings, probs = postselect_by_hamming_right_and_left(
            raw_bitstrings, raw_probs, num_elec_a, num_elec_b
        )
    else:
        # If we do have average orbital occupancy information, use it to refine the
        # full set of noisy configurations
        bitstrings, probs = recover_configurations(
            raw_bitstrings,
            raw_probs,
            current_occupancies[0],  # occs_a
            current_occupancies[1],  # occs_b
            num_elec_a,
            num_elec_b,
            seed,
        )

    subsamples = [
        subsample(bitstrings, probs, min(len(bitstrings), len(bitstrings)), seed)
        for _ in range(num_batches)
    ]

    for batch in subsamples:
        hs_start = time.perf_counter()

        # In this tutorial, we include half strings with the following precedence
        # Hartree-Fock state (always include) > carryover (from previous round) > recovered samples.
        # We add bitstrings in a dictionary as it preserves order and also filters out duplicates.
        half_strs_dict = OrderedDict()

        # Add hartree-fock state
        hartree_fock_state = "0" * (num_orbitals - num_elec_a) + "1" * num_elec_a
        half_strs_dict[hartree_fock_state] = 1

        # Add carry-overs
        # First, we split carryover full strs into unique alpha and beta halves
        unique_ab_co = unique_alpha_beta_combined(carryover_full_strs)
        for bs in unique_ab_co.keys():
            half_strs_dict[bs] = 1

        # Add recovered bitstrings
        unique_ab_recovered = unique_alpha_beta_combined(batch)
        for bs in unique_ab_recovered.keys():
            half_strs_dict[bs] = 1

        half_strs = list(half_strs_dict.keys())
        print("  num total half strs: ", len(half_strs))

        # We will use a subset of accumulated half strings in the Subspace construction
        # defined by the `samples_per_batch` parameter. For this tutorial, the subspace
        # dimension will be samples_per_batch x samples_per_batch
        half_strs = half_strs[:samples_per_batch]
        print("  num selected half strs: ", len(half_strs))
        half_strs.sort()  # fq.Subspace() also sorts bitstrings, so can be skipped.

        hs_end = time.perf_counter()
        print(f"  Half strs construction took: {hs_end - hs_start:.4f} seconds")

        s_start = time.perf_counter()
        S = fq.Subspace([half_strs, half_strs])
        s_end = time.perf_counter()
        print(f"  Subspace construction took: {s_end - s_start:4f} seconds")
        print(
            f"  Subspace dimension: {len(half_strs)} x {len(half_strs)} = {len(half_strs) ** 2:_}"
        )

        proj_start = time.perf_counter()
        Hsub = fq.SubspaceHamiltonian(fulqrum_operator, S)
        # Convert the SubspaceHamiltonian into a CSR format sparse matrix,
        # which will also be wrapped in a LinearOperator for faster eigensolving.
        Hsub_csr_linop = Hsub.to_csr_linearoperator_fast(verbose=False)
        proj_end = time.perf_counter()
        print(f"  Operator projection took: {proj_end - proj_start:4f} seconds")

        total_bytes = Hsub_csr_linop.memory_size
        total_mega_bytes = total_bytes / (1024 * 1024)
        print(f"  CSR matrix memory: {total_mega_bytes:.6f} MBs")

        # Constructing a simple initial guess vector
        v0_start = time.perf_counter()
        diag_vec = Hsub.diagonal_vector()
        min_idx = np.where(diag_vec == diag_vec.min())[0]
        v0 = np.zeros(len(S), dtype=Hsub.dtype)
        v0[min_idx] = 1
        v0_end = time.perf_counter()
        print(
            f"  Initial guess vector v0 construction took: {v0_end - v0_start:.4f} seconds"
        )

        print("  Starting eigensolving ...")
        start = time.perf_counter()
        eigvals, eigvecs = spla.eigsh(
            Hsub_csr_linop,  # use `Hsub` for the matrix-free mode. Memory-efficient but slower.
            k=1,
            which="SA",
            tol=tol,
            v0=v0,
        )
        end = time.perf_counter()
        print(f"  Eigensolving took: {end - start:.4f} seconds")
        print(f"  Electronic Energy: {eigvals}")
        print(f"  Total Energy: {eigvals + nuclear_repulsion_energy}")

        eigvecs = eigvecs.ravel()

        # Subspace class has a method named `get_orbital_occupancies()` to compute
        # electron orbital occupancies from the eigenvector and subspace bitstrings.
        # The method accepts probabilities, and thus, we must do |amplitude| ^ 2.
        # Returns alpha and beta occupanices as a length-2 tuple in the order
        # ([a0 ... aN], [b0 ... bN]).
        current_occupancies = S.get_orbital_occupancies(
            probs=np.abs(eigvecs) ** 2, norb=num_orbitals
        )

        # Next, we get important bitstrings to include (carryover to) in the next round.
        # Important bitstrings are the ones with |amplitude| > threshold.
        # It is a must to provide |amplitude|, i.e., np.abs(eigenvector) to the function.
        # The function returns a list of length-2 tuples, where the first element of the tuple
        # is the full bitstrings, and the second element is its weight (|amplitude|).
        # The list is also sorted in the descending order of |amplitude|.
        carryover_full_strs_and_weights = get_carryover_full_strs(
            S, np.abs(eigvecs), carryover_threshold
        )

        carryover_full_strs = [item[0] for item in carryover_full_strs_and_weights]

        print(f"  num carryover full strs: {len(carryover_full_strs)}")

    iter_end = time.perf_counter()
    print(f"Iter {ni} took: {iter_end - iter_start:.4f} seconds\n")
ITERATION 0
  num total half strs:  75
  num selected half strs:  75
  Half strs construction took: 0.0001 seconds
  Subspace construction took: 0.001573 seconds
  Subspace dimension: 75 x 75 = 5_625
  Operator projection took: 0.064944 seconds
  CSR matrix memory: 0.287434 MBs
  Initial guess vector v0 construction took: 0.0000 seconds
  Starting eigensolving ...
  Eigensolving took: 0.0055 seconds
  Electronic Energy: [-32.60417101]
  Total Energy: [-108.83527355]
  num carryover full strs: 5
Iter 0 took: 0.0776 seconds

ITERATION 1
  num total half strs:  3867
  num selected half strs:  500
  Half strs construction took: 0.0031 seconds
  Subspace construction took: 0.069567 seconds
  Subspace dimension: 500 x 500 = 250_000
  Operator projection took: 2.525665 seconds
  CSR matrix memory: 115.867283 MBs
  Initial guess vector v0 construction took: 0.0003 seconds
  Starting eigensolving ...
  Eigensolving took: 0.2498 seconds
  Electronic Energy: [-32.73355104]
  Total Energy: [-108.96465358]
  num carryover full strs: 2229
Iter 1 took: 3.0796 seconds

ITERATION 2
  num total half strs:  3884
  num selected half strs:  500
  Half strs construction took: 0.0035 seconds
  Subspace construction took: 0.071565 seconds
  Subspace dimension: 500 x 500 = 250_000
  Operator projection took: 2.714844 seconds
  CSR matrix memory: 161.923939 MBs
  Initial guess vector v0 construction took: 0.0014 seconds
  Starting eigensolving ...
  Eigensolving took: 0.3775 seconds
  Electronic Energy: [-32.78512381]
  Total Energy: [-109.01622635]
  num carryover full strs: 4765
Iter 2 took: 3.4013 seconds

ITERATION 3
  num total half strs:  3884
  num selected half strs:  500
  Half strs construction took: 0.0042 seconds
  Subspace construction took: 0.068180 seconds
  Subspace dimension: 500 x 500 = 250_000
  Operator projection took: 2.727803 seconds
  CSR matrix memory: 196.800907 MBs
  Initial guess vector v0 construction took: 0.0004 seconds
  Starting eigensolving ...
  Eigensolving took: 0.4339 seconds
  Electronic Energy: [-32.79179802]
  Total Energy: [-109.02290056]
  num carryover full strs: 6098
Iter 3 took: 3.4597 seconds

ITERATION 4
  num total half strs:  3874
  num selected half strs:  500
  Half strs construction took: 0.0044 seconds
  Subspace construction took: 0.069365 seconds
  Subspace dimension: 500 x 500 = 250_000
  Operator projection took: 2.874993 seconds
  CSR matrix memory: 225.126499 MBs
  Initial guess vector v0 construction took: 0.0003 seconds
  Starting eigensolving ...
  Eigensolving took: 0.4042 seconds
  Electronic Energy: [-32.79448011]
  Total Energy: [-109.02558265]
  num carryover full strs: 7016
Iter 4 took: 3.5714 seconds