HPC-Prefect: Portable HPC Workflow Orchestration¶
HPC-Prefect is a Python framework that enables portable workflow orchestration across multiple HPC systems (Fugaku, Miyabi, and Slurm) using Prefect. Write your workflow once and run it on any supported HPC system without modification.
Core Concept¶
HPC-Prefect separates execution intent from execution environment by introducing a three-layer block architecture:
flowchart TD
A[Workflow Code<br/>algorithm logic + parameters] --> B[CommandBlock<br/>WHAT to run]
B --> C[ExecutionProfileBlock<br/>HOW to run]
C --> D[HPCProfileBlock<br/>WHERE to run]
D --> E[Executor<br/>submits to qsub/pjsub/sbatch]
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#ffe1f5
style D fill:#e1ffe1
style E fill:#f5e1ff
This architecture allows: - Workflow portability: Same workflow code runs on different HPC systems - Centralized expertise: HPC administrators encode best practices in reusable blocks - User flexibility: Users can tune resources without understanding system details
Project Structure¶
This is a monorepo workspace containing four core packages:
qcsc-prefect/
├── packages/
│ ├── qcsc-prefect-core/ # Core models (ExecutionProfile)
│ ├── qcsc-prefect-blocks/ # Prefect Block definitions
│ ├── qcsc-prefect-adapters/ # HPC-specific job builders & runtimes
│ └── qcsc-prefect-executor/ # High-level execution API
├── examples/
└── docs/
└── concept.md
Package Overview¶
qcsc-prefect-core¶
Core data models and resolution logic. Defines ExecutionProfile which represents execution intent independent of any specific HPC system.
qcsc-prefect-blocks¶
Prefect Block definitions for the three-layer architecture:
- CommandBlock: Defines WHAT to execute (command name, executable key)
- ExecutionProfileBlock: Defines HOW to execute (nodes, MPI ranks, walltime, modules)
- HPCProfileBlock: Defines WHERE to execute (queue, project/group, system-specific settings)
qcsc-prefect-adapters¶
HPC system-specific adapters that handle job script generation and submission:
- miyabi: PBS/Torque adapter for Miyabi
- fugaku: PJM adapter for Fugaku
- slurm: Slurm adapter for generic clusters
- Job script templates using Jinja2
- Runtime classes for job submission, monitoring, and cancellation
For local Slurm testing with Docker, see
docs/howto/howto_test_slurm_with_docker_cluster.md.
qcsc-prefect-executor¶
High-level execution API that orchestrates the entire workflow:
- run_job_from_blocks(): Main entry point for block-based execution
- Scheduler resolution helpers such as resolve_submission_target() and build_scheduler_script_filename()
- System-specific runners: run_miyabi_job(), run_fugaku_job()
- Automatic block resolution and job lifecycle management
Quick Start¶
1. Installation¶
# Clone the repository
git clone <repository-url>
cd qcsc-prefect
# Install dependencies using uv (recommended)
uv sync
# Or using pip
pip install -e packages/qcsc-prefect-core
pip install -e packages/qcsc-prefect-blocks
pip install -e packages/qcsc-prefect-adapters
pip install -e packages/qcsc-prefect-executor
2. Register Block Types¶
# Register blocks with Prefect
uv run prefect block register -m qcsc_prefect_blocks.common.blocks
3. Create Blocks¶
Create blocks programmatically or via Prefect UI. Example for Miyabi:
from qcsc_prefect_blocks.common.blocks import (
CommandBlock,
ExecutionProfileBlock,
HPCProfileBlock,
)
# Define WHAT to run
cmd = CommandBlock(
command_name="my-simulation",
executable_key="simulation_binary",
description="My HPC simulation",
)
cmd.save("cmd-my-simulation", overwrite=True)
# Define HOW to run
exec_profile = ExecutionProfileBlock(
profile_name="simulation-mpi-16",
command_name="my-simulation",
resource_class="cpu",
num_nodes=2,
mpiprocs=8,
walltime="01:00:00",
launcher="mpiexec.hydra",
modules=["intel/2023.2.0", "impi/2021.10.0"],
)
exec_profile.save("exec-simulation-mpi-16", overwrite=True)
# Define WHERE to run (Miyabi-specific)
hpc_profile = HPCProfileBlock(
hpc_target="miyabi",
queue_cpu="regular-c",
queue_gpu="regular-g",
project_cpu="your-project-id",
project_gpu="your-project-id",
executable_map={"simulation_binary": "/path/to/simulation"},
)
hpc_profile.save("hpc-miyabi", overwrite=True)
4. Run Your Workflow¶
from prefect import flow
from qcsc_prefect_executor.from_blocks import (
build_scheduler_script_filename,
run_job_from_blocks,
)
@flow
async def my_workflow():
result = await run_job_from_blocks(
command_block_name="cmd-my-simulation",
execution_profile_block_name="exec-simulation-mpi-16",
hpc_profile_block_name="hpc-miyabi",
work_dir="./work/my-simulation",
script_filename=build_scheduler_script_filename("my_simulation", "miyabi"),
user_args=["--input", "data.txt"],
)
return result
# Run the workflow
import asyncio
asyncio.run(my_workflow())
Design Principles¶
1. Separation of Concerns¶
- Workflow developers focus on algorithm logic
- HPC administrators encode system expertise in blocks
- Users select appropriate profiles and tune as needed
2. Portability¶
The same workflow code runs on different HPC systems by switching the
appropriate execution/target profile pair. In many real workflows,
launcher, modules, and other execution settings differ across systems or
CPU/GPU routes, so both ExecutionProfileBlock and HPCProfileBlock may
change together. When the execution recipe is truly portable, the same
ExecutionProfileBlock can still be reused across multiple targets.
The workflow can keep a logical script stem and let
build_scheduler_script_filename() choose the scheduler-specific suffix:
from qcsc_prefect_executor.from_blocks import build_scheduler_script_filename
# Run on Miyabi
result = await run_job_from_blocks(
command_block_name="cmd-simulation",
execution_profile_block_name="exec-simulation-miyabi",
hpc_profile_block_name="hpc-miyabi",
work_dir="./work/simulation",
script_filename=build_scheduler_script_filename("simulation", "miyabi"),
)
# Run on Fugaku (same workflow code!)
result = await run_job_from_blocks(
command_block_name="cmd-simulation",
execution_profile_block_name="exec-simulation-fugaku",
hpc_profile_block_name="hpc-fugaku",
work_dir="./work/simulation",
script_filename=build_scheduler_script_filename("simulation", "fugaku"),
)
3. Centralized Expertise¶
HPC administrators create and maintain execution profiles that encode: - Optimal resource configurations - Required modules and environment variables - MPI launcher settings and options - System-specific best practices
Users benefit from this expertise without needing deep HPC knowledge.
4. Controlled Flexibility¶
Users can keep workflow code stable while changing behavior by:
- Switching block instances (execution_profile_block_name, hpc_profile_block_name)
- Passing command-line arguments via user_args
- Preparing multiple execution profiles (for example small/large scale) and selecting one at runtime
Supported HPC Systems¶
| System | Scheduler | Status | Adapter Module |
|---|---|---|---|
| Miyabi | PBS/Torque | Supported | qcsc_prefect_adapters.miyabi |
| Fugaku | PJM | Supported | qcsc_prefect_adapters.fugaku |
| Slurm | Slurm | Supported | qcsc_prefect_adapters.slurm |
Architecture Details¶
Block Resolution Flow¶
sequenceDiagram
participant User as Workflow Code
participant Executor as run_job_from_blocks
participant Blocks as Prefect Blocks
participant Adapter as HPC Adapter
participant HPC as HPC System
User->>Executor: Call with block names
Executor->>Blocks: Load CommandBlock
Executor->>Blocks: Load ExecutionProfileBlock
Executor->>Blocks: Load HPCProfileBlock
Executor->>Executor: Build ExecutionProfile
Executor->>Executor: Merge default_args and user_args
Executor->>Adapter: Create job request
Adapter->>Adapter: Generate job script (Jinja2)
Adapter->>HPC: Submit job (qsub/pjsub)
HPC-->>Adapter: Job ID
Adapter->>HPC: Poll status
HPC-->>Adapter: Job completed
Adapter-->>Executor: Job result
Executor-->>User: Return result
Execution Profile Model¶
The ExecutionProfile is the central data model representing execution intent:
@dataclass
class ExecutionProfile:
command_key: str
num_nodes: int
mpiprocs: int
ompthreads: int | None
walltime: str
launcher: Literal["single", "mpirun", "mpiexec", "mpiexec.hydra"]
mpi_options: list[str]
modules: list[str]
environments: dict[str, str]
arguments: list[str]
This model is system-agnostic and gets translated to system-specific job requests by adapters.