🚀 ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation
A comprehensive framework for translating benchmark code between serial and parallel implementations (OpenMP, CUDA) and between different parallel programming models using AI agents, with automated performance testing and correctness verification. Supports Rodinia, NAS, HeCBench, and custom benchmarks.
This repository implements a complete pipeline for:
- 🔄 Code Translation: Converting between serial C/C++ code and parallel implementations (OpenMP, CUDA) using AI agents
- 🔀 Cross-Parallel Translation: Translating between different parallel programming models (e.g., CUDA to OpenMP)
- ⚡ Performance Optimization: Multi-stage optimization with GPU offloading and profiling
- ✅ Correctness Verification: Automated testing to ensure numerical equivalence
paracodex/
├── pipeline/ # Core translation and optimization pipeline
│ ├── initial_translation_codex.py # Initial code translation using AI
│ ├── optimize_codex.py # Multi-stage optimization pipeline
│ ├── supervisor_codex.py # Correctness verification agent
│ ├── path_config.py # Path configuration and Codex CLI helpers
│ ├── SERIAL_OMP_PROMPTS.md # AI prompts for serial-to-OpenMP translation
│ ├── CUDA_PROMPTS.md # AI prompts for CUDA translation
│ ├── combined_serial_filenames.jsonl # Serial kernel listings
│ ├── combined_omp_filenames.jsonl # OpenMP kernel listings
│ ├── combined_cuda_filenames.jsonl # CUDA kernel listings
│ └── combined_omp_pareval_filenames.jsonl # ParEval benchmark listings
├── performance_testers/ # Performance testing and benchmarking tools
│ └── performance_comparison.py # Performance comparison utilities
├── utils/ # Utility scripts
│ └── clean_kernel_dirs.py # Cleanup utilities
├── workdirs/ # Working directories for different benchmarks
│ ├── serial_omp_rodinia_workdir/ # Rodinia benchmark workspace
│ │ ├── data/ # Source code and benchmarks (parallel versions)
│ │ │ └── src/ # Kernel directories (e.g., nw-omp, lud-omp)
│ │ ├── gate_sdk/ # GATE SDK for correctness verification
│ │ ├── golden_labels/ # Reference serial implementations
│ │ │ └── src/ # Serial kernel directories
│ │ └── serial_kernels_changedVars/ # Transformed serial kernels
│ │ └── src/ # Modified serial kernels for translation
│ ├── serial_omp_nas_workdir/ # NAS benchmark workspace
│ ├── serial_omp_hecbench_workdir/ # HeCBench workspace
│ └── cuda_omp_pareval_workdir/ # ParEval CUDA/OpenMP workspace
├── results/ # Results and performance data
├── setup_environment.sh # Main environment setup script
├── install_nvidia_hpc_sdk.sh # Automated NVIDIA HPC SDK installer
├── verify_environment.sh # Environment verification tool
├── kill_gpu_processes.py/sh # GPU process management utilities
└── requirements.txt # Python dependencies
- Multi-Agent Pipeline: Specialized AI agents for translation, optimization, and verification
- Serial-to-Parallel Translation: Converting serial code to OpenMP and CUDA implementations
- Cross-Parallel Translation: Translating between different parallel programming models
- Intelligent Analysis: Automatic hotspot identification and offload target selection
- GPU Offloading: Automatic translation to OpenMP with GPU acceleration
- CUDA Implementation: Direct CUDA kernel generation and optimization
- 2-Stage Process: Systematic optimization from correctness to performance (GPU offload + performance tuning)
- GPU Profiling: Integration with NVIDIA Nsight Systems (nsys) for detailed analysis
- Retry Mechanisms: Robust error handling with automatic retry logic
- Performance Tracking: Continuous monitoring of optimization progress
- Cyclic Optimization: Iterative refinement until target performance is achieved
- GATE SDK Integration: Automated numerical correctness checking
- Reference Comparison: Validation against golden reference implementations
- Supervisor Agent: AI-powered code repair and correctness enforcement
- Numerical Equivalence: Ensures translated code produces identical results
- Comprehensive Benchmarking: CPU and GPU performance testing
- Performance Comparison: Side-by-side analysis of different implementations
- Results Visualization: JSON output for easy integration with analysis tools
- Automated Testing: Batch processing of multiple kernels and configurations
- Node.js 22+ and npm: For Codex CLI
- Python 3.8+: For the pipeline scripts
- OpenAI Codex CLI: For AI agent interactions (
npm install -g @openai/codex) - OpenAI API Access:
- Pro/Plus Users: Login directly with
codex login(recommended) - API Key Users: Set
OPENAI_API_KEYenvironment variable
- Pro/Plus Users: Login directly with
- NVIDIA HPC SDK: For OpenMP GPU offloading (nvc++ compiler)
- CUDA Toolkit: For CUDA development
- NVIDIA Nsight Systems (nsys): For GPU profiling
- NVIDIA GPU: With OpenMP offloading support
We provide automated setup scripts to streamline the installation process:
# Clone the repository
git clone <repository-url>
cd paracodex
# Make scripts executable
chmod +x setup_environment.sh install_nvidia_hpc_sdk.sh verify_environment.sh
# Step 1: Install NVIDIA HPC SDK (if not already installed)
sudo ./install_nvidia_hpc_sdk.sh
# Step 2: Run the main environment setup
./setup_environment.sh
# Step 3: Verify your installation
./verify_environment.sh1. install_nvidia_hpc_sdk.sh - NVIDIA HPC SDK Automated Installer
This script automates the download and installation of NVIDIA HPC SDK v25.7:
- ✅ Downloads NVIDIA HPC SDK (~3GB)
- ✅ Verifies system requirements (10GB disk space, x86_64 architecture)
- ✅ Installs compilers (nvc++, nvfortran) with OpenMP GPU offload support
- ✅ Configures PATH and environment variables in
~/.bashrc - ✅ Tests OpenMP CPU and GPU offload support
⚠️ Requires sudo privileges
sudo ./install_nvidia_hpc_sdk.sh
source ~/.bashrc # Activate the new environment2. setup_environment.sh - Main Environment Setup Script
Checks and installs all ParaCodex dependencies:
- ✅ Check for Node.js v22+ and npm
- ✅ Install Codex CLI (
@openai/codex) if missing - ✅ Verify Python 3.8+ installation
- ✅ Install Python dependencies from
requirements.txt - ✅ Check for NVIDIA HPC SDK (nvc++)
- ✅ Check for Nsight Systems (nsys)
- ✅ Verify GPU accessibility
- ✅ Confirm OpenAI API key is set
- 📊 Provides summary of missing dependencies
./setup_environment.sh3. verify_environment.sh - Environment Verification Tool
Comprehensive check of your ParaCodex environment:
- ✅ Verifies all required tools are installed and accessible
- ✅ Shows version information for each component
- ✅ Tests OpenMP support
- ✅ Checks GPU accessibility with detailed info
- ✅ Validates Python package installations
- 📊 Provides a summary of any missing dependencies
./verify_environment.shIf you prefer manual installation or need to install specific components:
Option A: Automated (Recommended)
sudo ./install_nvidia_hpc_sdk.sh
source ~/.bashrcOption B: Manual
- Download from: https://developer.nvidia.com/hpc-sdk
- Install version 25.7 for CUDA 12.6 support
- Add to PATH:
export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/compilers/bin:$PATH export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/compilers/lib:$LD_LIBRARY_PATH export MANPATH=/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/compilers/man:$MANPATH
- Add to
~/.bashrcfor persistence
# Using nvm (recommended)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
source ~/.bashrc
nvm install 22
# Or using apt
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejsnpm install -g @openai/codexOption 1: Login with OpenAI Pro Account (Recommended)
# Interactive login (opens browser for authentication)
codex login
# Or login with API key from environment
echo $OPENAI_API_KEY | codex login --with-api-key
# Verify login status
codex login statusOption 2: Use API Key Environment Variable
export OPENAI_API_KEY='your-api-key-here'
# Add to ~/.bashrc for persistence
echo "export OPENAI_API_KEY='your-api-key-here'" >> ~/.bashrcpip install -r requirements.txt- Download from: https://developer.nvidia.com/nsight-systems
- Or use package manager if available
After setup, verify your environment is correctly configured:
./verify_environment.shThis will check and display:
- ✅ Node.js and npm versions
- ✅ Codex CLI installation
- ✅ Python version and key packages (openai, numpy, torch, matplotlib)
- ✅ NVIDIA HPC SDK (nvc++) version
- ✅ Nsight Systems (nsys) installation
- ✅ CUDA toolkit (optional)
- ✅ GPU information (name, driver, memory)
- ✅ OpenMP support
- ✅ OpenAI API key configuration
Expected output for a properly configured system:
✅ All core dependencies are installed and configured
If you see missing dependencies:
❌ X core dependencies are missing
Run './setup_environment.sh' or see installation instructions
Manual verification commands:
codex --version
python3 --version
nvc++ --version
nsys --version
nvidia-smi
echo $OPENAI_API_KEYKey Python packages (see requirements.txt for full list):
openai>=2.8.1- OpenAI API clientpandas>=2.3.3- Data processingnumpy>=2.1.2- Numerical computingmatplotlib>=3.10.6- Visualizationseaborn>=0.13.2- Statistical visualizationtqdm>=4.67.1- Progress bars- NVIDIA CUDA libraries for GPU support
After installation, ensure your working directory is properly configured:
# The workdirs contain benchmark-specific source code and configurations
# For Rodinia benchmarks, use workdirs/serial_omp_rodinia_workdir/
# Ensure proper directory structure:
# - data/src/ containing parallel kernel directories (e.g., nw-omp, lud-omp)
# - golden_labels/src/ containing serial reference implementations
# - serial_kernels_changedVars/src/ containing transformed serial kernels (optional)
# Ensure jsonl file with kernel names present in pipeline/combined_*_filenames.jsonl
# Ensure `golden_labels/src` exists if you want to run the supervisor agentParaCodex requires you to specify the working directory path when running pipeline scripts. Here are the paths you need to configure:
1. Codex Working Directory (--codex-workdir)
This is the main path you need to set when running translation scripts. It points to the benchmark-specific workdir:
# For Rodinia benchmarks
--codex-workdir /path/to/paracodex/workdirs/serial_omp_rodinia_workdir/
# For NAS benchmarks
--codex-workdir /path/to/paracodex/workdirs/serial_omp_nas_workdir/
# For HeCBench benchmarks
--codex-workdir /path/to/paracodex/workdirs/serial_omp_hecbench_workdir/
# For ParEval benchmarks
--codex-workdir /path/to/paracodex/workdirs/cuda_omp_pareval_workdir/Alternative: Environment Variable
You can also set the CODEX_WORKDIR environment variable instead of using the --codex-workdir flag:
export CODEX_WORKDIR=/path/to/paracodex/workdirs/serial_omp_rodinia_workdir/
python pipeline/initial_translation_codex.py --source-api serial --target-api omp2. Makefile Paths (GATE_ROOT)
The Makefiles in each kernel directory use GATE_ROOT to locate the GATE SDK for correctness checking. This is automatically set using relative paths:
GATE_ROOT ?= $(abspath ../..)This means GATE_ROOT is automatically resolved relative to the kernel directory (e.g., workdirs/serial_omp_rodinia_workdir/). You typically don't need to change this unless you have a custom directory structure.
If you need to override it, you can set it when running make:
make -f Makefile.nvc GATE_ROOT=/custom/path/to/workdir3. Output Directories
Translation outputs are saved to directories relative to the pipeline directory:
- Rodinia:
pipeline/rodinia_outputs/ - NAS:
pipeline/nas_outputs/ - HeCBench:
pipeline/hecbench_outputs/ - ParEval:
pipeline/pareval_outputs/
These paths are automatically determined based on the workdir name. You don't need to configure these.
Quick Path Setup Example:
# Set your repository path
export PARACODEX_ROOT="/path/to/paracodex"
# For Rodinia benchmarks
python pipeline/initial_translation_codex.py \
--codex-workdir $PARACODEX_ROOT/workdirs/serial_omp_rodinia_workdir/ \
--source-api serial \
--target-api omp \
--kernels nwTranslate Rodinia benchmarks from serial to OpenMP:
# To translate all the kernels in the jsonl file run without --kernels
# Serial to OpenMP for Rodinia benchmarks
python pipeline/initial_translation_codex.py \
--codex-workdir /path/to/paracodex/workdirs/serial_omp_rodinia_workdir/ \
--source-api serial \
--target-api omp \
--kernels nw,srad,ludNote: The --codex-workdir flag specifies the working directory containing your benchmark kernels. The output will be saved to pipeline/rodinia_outputs/ (or appropriate benchmark output directory) by default.
# Serial to OpenMP with optimization
python pipeline/initial_translation_codex.py \
--codex-workdir /path/to/paracodex/workdirs/serial_omp_rodinia_workdir/ \
--source-api serial \
--target-api omp \
--optimize# Serial to OpenMP with supervision for correctness verification
python pipeline/initial_translation_codex.py \
--codex-workdir /path/to/paracodex/workdirs/serial_omp_rodinia_workdir/ \
--source-api serial \
--target-api omp \
--superviseThis will create initial_supervised_* files in the output directory, including:
initial_supervised_nsys_output.txt- Full nsys profiling outputinitial_supervised_nsys_relevant.txt- Extracted GPU performance metricsinitial_supervised_compilation.txt- Compilation logsinitial_supervised_output.txt- Execution output
# Serial to OpenMP with optimization and supervision after optimization steps
python pipeline/initial_translation_codex.py \
--codex-workdir /path/to/paracodex/workdirs/serial_omp_rodinia_workdir/ \
--source-api serial \
--target-api omp \
--optimize \
--supervise \
--opt-supervisor-steps 2This will:
- Perform initial translation
- Run supervision (correctness verification)
- Run optimization steps (step1, step2)
- Run supervision after specified optimization steps (creates
step2_supervised/directory)
pipeline/rodinia_outputs/
├── {kernel_name}-{target_api}/ # Per-kernel results (e.g., nw-omp, lud-omp)
│ ├── compilation_result.txt # Initial compilation result
│ ├── initial_compilation.txt # Initial compilation result
│ ├── initial_transcript.txt # Initial translation transcript
│ ├── initial_transcript_summary.txt # Summary of initial translation
│ ├── {file}_initial.c # Initial translated code (root level)
│ ├── initial/ # Initial translation directory
│ │ └── {file}.c # Initial translated code
│ ├── initial_supervised_nsys_output.txt # Full nsys profiling output (if --supervise)
│ ├── initial_supervised_nsys_relevant.txt # Extracted GPU metrics (if --supervise)
│ ├── initial_supervised_compilation.txt # Compilation logs (if --supervise)
│ ├── initial_supervised_output.txt # Execution output (if --supervise)
│ ├── initial_correct/ # After supervisor correction
│ │ └── {file}.c # Supervised initial code
│ ├── step1/ # Optimization step 1
│ │ ├── {file}.c # Code after step 1
│ │ ├── transcript.txt # AI agent transcript
│ │ ├── transcript_summary.txt # Transcript summary
│ │ ├── nsys_output.txt # Full nsys profiling output
│ │ └── nsys_relevant.txt # Extracted relevant nsys metrics
│ ├── step2/ # Optimization step 2
│ │ ├── {file}.c # Code after step 2
│ │ ├── transcript.txt # AI agent transcript
│ │ ├── transcript_summary.txt # Transcript summary
│ │ ├── nsys_output.txt # Full nsys profiling output
│ │ └── nsys_relevant.txt # Extracted relevant nsys metrics
│ ├── step2_supervised/ # Supervision after step 2 (if --opt-supervisor-steps 2)
│ │ ├── {file}.c # Supervised code after step 2
│ │ ├── supervised_nsys_output.txt # Full nsys profiling output
│ │ └── supervised_nsys_relevant.txt # Extracted GPU metrics
│ └── optimized/ # Final optimized code
│ └── {file}.c # Final optimized code
├── {kernel2_name}-{target_api}/ # Results for second kernel
│ └── [same structure as above]
└── [additional kernels...] # Results for other kernels
Key Artifacts Explained:
- Source Code Snapshots: Versioned code at each optimization stage (in
initial/,step1/,step2/,optimized/directories) - Transcripts: AI agent conversations and decision logs (
initial_transcript.txt,step*/transcript.txt) - Nsys Outputs: GPU profiling data (
step*/nsys_output.txt,initial_supervised_nsys_output.txt) and extracted metrics (step*/nsys_relevant.txt,initial_supervised_nsys_relevant.txt) - Supervised Files: Correctness-verified code and performance metrics from supervision phase
python performance_testers/performance_comparison.py \
--candidate_dir <your path to the parent directory of translated code> \
--reference_dir <parent directory of golden label parallel code> \
--output <output directory for generated artifacts>python performance_testers/performance_comparison.py \
--candidate_dir /path/to/paracodex/pipeline/rodinia_outputs \
--reference_dir /path/to/paracodex/workdirs/serial_omp_rodinia_workdir/data/src \
--output /path/to/paracodex/results/perf_rodinia_nsysThe framework supports the following translation paths across multiple benchmark suites:
- Serial → OpenMP: Converting serial code to OpenMP with GPU offloading (primary use case for Rodinia, NAS, HeCBench)
- Serial → CUDA: Converting serial code to CUDA kernels
- OpenMP → CUDA: Translating OpenMP code to CUDA implementations
- CUDA → OpenMP: Converting CUDA kernels to OpenMP with GPU offloading (ParEval benchmarks)
For Rodinia benchmarks, the typical workflow is:
- Prepare serial kernels: Place serial reference implementations in
workdirs/serial_omp_rodinia_workdir/golden_labels/src/{kernel}-serial/ - Optional transformations: Apply variable renaming, comment stripping, and reorderings to create
serial_kernels_changedVars/src/{kernel}-serial/ - Run translation: Use
initial_translation_codex.pywith--codex-workdirpointing toworkdirs/serial_omp_rodinia_workdir/ - Verify correctness: Use
--superviseflag to run GATE SDK correctness checks - Optimize performance: Use
--optimizeflag for multi-stage GPU optimization - Review metrics: Check
*_nsys_relevant.txtor*_nsys_relevant.txtfiles for GPU performance metrics
For NAS benchmarks, use workdirs/serial_omp_nas_workdir/ as the working directory with similar workflow steps.
For HeCBench benchmarks, use workdirs/serial_omp_hecbench_workdir/ as the working directory.
For ParEval CUDA/OpenMP translation, use workdirs/cuda_omp_pareval_workdir/ as the working directory.
ParaCodex includes several utility scripts to simplify environment management and troubleshooting:
Main environment setup and dependency checker. Checks for all required tools and installs missing Python dependencies.
Usage:
./setup_environment.shWhat it does:
- Checks Node.js, npm, and Codex CLI
- Installs Codex CLI if npm is available
- Verifies Python and pip
- Installs Python packages from
requirements.txt - Checks NVIDIA HPC SDK, Nsight Systems, and GPU
- Validates OpenAI API key
- Provides summary of missing dependencies
Automated installer for NVIDIA HPC SDK v25.7 with OpenMP GPU offload support.
Usage:
sudo ./install_nvidia_hpc_sdk.sh
source ~/.bashrcFeatures:
- Downloads NVIDIA HPC SDK v25.7 (~3GB)
- Checks system requirements (10GB disk, x86_64 arch)
- Installs to
/opt/nvidia/hpc_sdk - Configures environment variables automatically
- Tests OpenMP CPU and GPU offload support
- Colorful progress indicators
Requirements: sudo access, 10GB free disk space
Comprehensive environment verification tool that checks all dependencies and displays detailed version information.
Usage:
./verify_environment.shChecks:
- Node.js, npm, Codex CLI versions
- Python version and key packages
- NVIDIA HPC SDK (nvc++) and OpenMP support
- Nsight Systems and CUDA toolkit
- GPU info (name, driver, memory)
- OpenAI API key configuration
Utilities to terminate GPU processes when needed (e.g., clearing hung processes during development).
Usage:
./kill_gpu_processes.sh
# or
python3 kill_gpu_processes.pyCleans up generated files in kernel directories during development.
Usage:
python3 utils/clean_kernel_dirs.py --helpFor questions and support:
- Create an issue in the repository
- Check the prompts documentation:
pipeline/SERIAL_OMP_PROMPTS.mdfor serial-to-OpenMP translationpipeline/CUDA_PROMPTS.mdfor CUDA-related translations
Using the automated scripts:
If setup_environment.sh reports missing dependencies, install them as indicated. Common issues:
-
NVIDIA HPC SDK not found:
sudo ./install_nvidia_hpc_sdk.sh source ~/.bashrc
If installation fails:
- Check disk space (need 10GB+)
- Verify you have sudo privileges
- Check internet connection (downloads ~3GB)
- Manually download from https://developer.nvidia.com/hpc-sdk
-
Codex CLI not found:
npm install -g @openai/codex
If this fails, ensure Node.js and npm are installed first.
-
OpenAI API Key not set:
export OPENAI_API_KEY='your-api-key' echo "export OPENAI_API_KEY='your-api-key'" >> ~/.bashrc source ~/.bashrc
-
GPU not accessible:
- Check with
nvidia-smi - Install NVIDIA drivers if needed:
sudo apt install nvidia-driver-XXX - Verify CUDA compatibility
- Reboot after driver installation
- Check with
-
nvc++ installed but not in PATH:
export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/compilers/bin:$PATH export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/compilers/lib:$LD_LIBRARY_PATH echo 'export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/compilers/bin:$PATH' >> ~/.bashrc source ~/.bashrc
-
Python package installation fails:
pip install --upgrade pip pip install -r requirements.txt
If specific packages fail, install them individually:
pip install openai pandas numpy matplotlib seaborn tqdm
- Compilation errors: Ensure NVIDIA HPC SDK is properly installed and in PATH
- Permission denied on scripts: Run
chmod +x setup_environment.sh - Python import errors: Run
pip install -r requirements.txt
Note: This framework is designed for research and development purposes with multiple benchmark suites. Ensure you have appropriate hardware (NVIDIA GPU with OpenMP offloading support) and software (NVIDIA HPC SDK, CUDA Toolkit) before use. The framework supports translation between serial, OpenMP, and CUDA programming models with automated correctness verification and performance optimization.
Supported Benchmarks: The framework is configured for multiple benchmark suites:
- Rodinia: nw (Needleman-Wunsch), srad (Speckle Reducing Anisotropic Diffusion), lud (LU Decomposition), b+tree, backprop, bfs, hotspot, and others
- NAS Parallel Benchmarks: Scientific computing kernels
- HeCBench: Heterogeneous computing benchmarks
- ParEval: Parallel evaluation benchmarks for CUDA/OpenMP translation