DiffDock-NIM

Tutorial to run DiffDock NIM on HiPerGator

DiffDock is a state-of-the-art generative model used for drug discovery that predicts the three-dimensional structure of a protein-ligand complex, a crucial step in the drug discovery process. It predicts the binding structure of a small molecule ligand to a protein, known as molecular docking or pose prediction.

Features

Helps AI drug discovery pipelines and opens new research avenues for downstream task integrations.
Highly accurate and computationally efficient
Fast inference times and provides confidence estimates with high selective accuracy.

Prerequisites

Hardware:
Supported GPUs: Hopper (H100), Ampere (e.g., A100, A6000), Data Center GPUs (e.g., L40S), Volta (e.g., V100)
Minimum GPU memory: 16 GB
Storage:
Minimum driver version: 535.104.05

Launch DiffDock on HPG

Go to OOD and launch the HiPerGator Desktop.

Note: Remember to update the SLURM Account and QoS to match your group, and adjust the job time accordingly.

Start a terminal and run the following commands:

mkdir -p /blue/groupname/gatorlink/.cache/nim/diffdock  # Run only the first time
export LOCAL_NIM_CACHE=/blue/groupname/gatorlink/.cache/nim/diffdock
mkdir -p /blue/groupname/gatorlink/.cache/nim/diffdock/workspace # Run only the first time
export LOCAL_WORKSPACE=/blue/groupname/gatorlink/.cache/nim/diffdock/workspace
ml diffdock-nim
diffdock
start_server

Open a new terminal, use the following command to check the status of API until it returns true. This can take a couple of minutes.
```
curl localhost:8000/v1/health/ready
```

Running Inference

Open a New Terminal
Keep the original terminal running with the launched service.
Navigate to your DESIRED job running directory
```
cd /blue/groupname/gatorlink/...
```

Prepare JSON formatted post-data
This step requires being launched in the most common bash shell environment in Linux. Users can verify if the current session is bash by using the command echo $0. If not, run the command /bin/bash before this step.

protein_bytes=`curl https://files.rcsb.org/download/8G43.pdb | grep -E '^ATOM' | sed -z 's/\n/\\\n/g'`; \
ligand_bytes=`curl https://files.rcsb.org/ligands/download/ZU6_ideal.sdf | sed -z 's/\n/\\\n/g'`; \
echo "{
\"ligand\": \"${ligand_bytes}\",
\"ligand_file_type\": \"sdf\",
\"protein\": \"${protein_bytes}\",
\"num_poses\": 1,
\"time_divisions\": 20,
\"steps\": 18,
\"save_trajectory\": false,
\"is_staged\": false
}" > diffdock.json

Run Inference and save to output.json.

curl --header "Content-Type: application/json" \
 --request POST \
 --data @diffdock.json \
 --output output.json \
 http://localhost:8000/molecular-docking/diffdock/generate

View the Outputs
The output file output.json is a JSON formatted content with predicted docking poses (coordinates of ligand atoms) with the structure below:

Field	Type	Description
`status`	`str`	Report `success` or `fail` for this request
`ligand_positions`	`list of str`	List of SDF formatted text as the generated poses
`position_confidence`	`list of float`	List of confidence scores for the generated poses
`trajectory`	`list of str`	List of PDB formatted text as the diffusion trajectories for the generated poses (optional)

Dump Generated Poses

A simple Python script provided in this section is used to dump the inference results (docked poses of ligands) into a folder named as output. Create a new blank file, name it as dump_output.py.
Run the command below to launch the Python script.
```
python3 dump_output.py
```
List the content in the output folder.
```
$ ls output
rank01_confidence_-0.82.sdf
```

Advanced Usage

Run Inference with Bash Script

In this example, we create a simple bash script to launch inference using two local files as input and dump the generated poses in the output folder.

Create a new blank file in the same folder, name it as diffdock.sh.
Make the script executable.
```
chmod +x diffdock.sh
```

Download the input files from RCSB database and launch the inference.

curl -o 8G43.pdb https://files.rcsb.org/download/8G43.pdb
curl -o ZU6.sdf https://files.rcsb.org/ligands/download/ZU6_ideal.sdf
./diffdock.sh 8G43.pdb ZU6.sdf

Dump the output using the python script created in Getting Started.
```
python3 dump_output.py
ls output1
```

Example of output

rank01_confidence_0.52.sdf   rank06_confidence_-0.54.sdf
rank02_confidence_0.31.sdf   rank07_confidence_-0.56.sdf
rank03_confidence_-0.03.sdf  rank08_confidence_-0.65.sdf
rank04_confidence_-0.04.sdf  rank09_confidence_-1.04.sdf
rank05_confidence_-0.41.sdf  rank10_confidence_-1.54.sdf

Run Inference for Batch-Docking

DiffDock NIM allows for a Batch-Docking mode, which docks a group of ligand molecules against the same protein receptor through a single inference request if a multi-molecule SDF file is submitted in this request. Batch-docking mode is much more efficient than running separate inference requests. The example below illustrates batch-docking using a protein PDB file with five molecule SDF files downloaded from RSCB.

Prepare the SDF input file with multiple ligand molecules. Create a new blank file, name it as make-multiligand.sh.

Run the commands below to generate the multi_ligands.sdf for input.

chmod +x make-multiligand.sh
./make-multiligand.sh COM Q4H QPK R4W SIN

Download the protein PDB file and launch the inference.

curl -o 7RWO.pdb "https://files.rcsb.org/download/7RWO.pdb"
./diffdock.sh 7RWO.pdb multi_ligands.sdf

Dump the result and an example of output is below.

python3 dump_output.py
ls output/*

diffdock-output/ligand0:
rank01_confidence_-0.74.sdf  rank05_confidence_-1.15.sdf  rank09_confidence_-1.55.sdf
rank02_confidence_-0.92.sdf  rank06_confidence_-1.25.sdf  rank10_confidence_-1.93.sdf
rank03_confidence_-0.93.sdf  rank07_confidence_-1.46.sdf
rank04_confidence_-1.04.sdf  rank08_confidence_-1.46.sdf

diffdock-output/ligand1:
rank01_confidence_-0.25.sdf  rank05_confidence_-0.55.sdf  rank09_confidence_-0.72.sdf
rank02_confidence_-0.28.sdf  rank06_confidence_-0.55.sdf  rank10_confidence_-0.77.sdf
rank03_confidence_-0.34.sdf  rank07_confidence_-0.56.sdf
rank04_confidence_-0.49.sdf  rank08_confidence_-0.57.sdf

...

Batch-Docking using SMILES

Besides the SDF format for ligand molecules, DiffDock also support SMILES text strings as the input. DiffDock uses RDKit to generate random molecular conformers from the SMILES information. A plain text file can be used as the ligand input with multiple lines, each of which is a SMILES formula representing a molecule, to conduct batch-docking.

Create a new blank file, name it as ligands.txt.
Run the commands below to invoke the DiffDock model. The script generates an input JSON file and returns the inference result in JSON format in the file output.json.
```
./diffdock.sh 8G43.pdb ligands.txt
```

Dump the result and check the output folder.

$ python3 dump_output.py
$ ls output/*

diffdock-output/ligand0:
rank01_confidence_-0.98.sdf  rank05_confidence_-1.30.sdf  rank09_confidence_-1.77.sdf
rank02_confidence_-1.00.sdf  rank06_confidence_-1.36.sdf  rank10_confidence_-2.27.sdf
rank03_confidence_-1.03.sdf  rank07_confidence_-1.58.sdf
rank04_confidence_-1.21.sdf  rank08_confidence_-1.61.sdf

diffdock-output/ligand1:
rank01_confidence_-0.15.sdf  rank05_confidence_-1.25.sdf  rank09_confidence_-1.55.sdf
rank02_confidence_-0.54.sdf  rank06_confidence_-1.29.sdf  rank10_confidence_-1.66.sdf
rank03_confidence_-0.91.sdf  rank07_confidence_-1.38.sdf
rank04_confidence_-1.03.sdf  rank08_confidence_-1.39.sdf

...

Stopping the NIM Service

To stop the NIM service, simply close the terminal window.

Important Note

It is recommended to clean your cache files every time you stop the server to ensure it won't affect your next run. You can do this by removing the cache directory:

rm -r /blue/groupname/gatorlink/.cache/nim/diffdock/ngc
rm -rf /blue/groupname/gatorlink/.cache/nim/diffdock/workspace/*

Another way to run DiffDock NIM on HPG

Submit a SLURM batch job
Use sbatch to start the NIM service with GPU resources, and record the name of the node where the service is running.
Open a terminal to run inference
Start an SSH terminal on the same node as the service to run inference.
```
ssh node_name
```

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
images		images
README.md		README.md
diffdock.sh		diffdock.sh
dump_output.py		dump_output.py
ligands.txt		ligands.txt
make-multiligand.sh		make-multiligand.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DiffDock-NIM

Features

Prerequisites

Launch DiffDock on HPG

Running Inference

Dump Generated Poses

Advanced Usage

Run Inference with Bash Script

Run Inference for Batch-Docking

Batch-Docking using SMILES

Stopping the NIM Service

Important Note

Another way to run DiffDock NIM on HPG

References

About

Uh oh!

Releases

Packages

Languages

UFResearchComputing/DiffDock-NIM

Folders and files

Latest commit

History

Repository files navigation

DiffDock-NIM

Features

Prerequisites

Launch DiffDock on HPG

Running Inference

Dump Generated Poses

Advanced Usage

Run Inference with Bash Script

Run Inference for Batch-Docking

Batch-Docking using SMILES

Stopping the NIM Service

Important Note

Another way to run DiffDock NIM on HPG

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages