Skip to content

Conversation

@wenju-he
Copy link
Contributor

Replace __clc_max/min with __clc_fmax/fmin in __clc_clamp. FP __clc_min/max/clamp now lowers to @llvm.minimumnum/@llvm.maximumnum, and integer clamp lowers to @llvm.umin/@llvm.umax. This reduce fcmp+select chains and improving codegen. Example change to amdgcn--amdhsa.bc:

in function _Z5clamphhh:
    >   %4 = icmp ugt i8 %0, %2
        %4 = tail call noundef i8 @llvm.umax.i8(i8 %0, i8 %1)
    >   %6 = select i1 %4, i8 %2, i8 %5
    >   ret i8 %6
    <   %5 = tail call noundef i8 @llvm.umin.i8(i8 %2, i8 %4)
    <   ret i8 %5
in function _Z5clampddd:
  in block %3 / %3:
    >   %4 = fcmp ogt double %0, %2
    >   %5 = fcmp olt double %0, %1
    >   %6 = select i1 %5, double %1, double %0
    >   %7 = select i1 %4, double %2, double %6
    >   ret double %7
    <   %4 = tail call noundef double @llvm.maximumnum.f64(double %0, double %1)
    <   %5 = tail call noundef double @llvm.minimumnum.f64(double %4, double %2)
    <   ret double %5

Replace __clc_max/min with __clc_fmax/fmin in __clc_clamp.
FP __clc_min/max/clamp now lowers to @llvm.minimumnum/@llvm.maximumnum,
and integer clamp lowers to @llvm.umin/@llvm.umax. This reduce
fcmp+select chains and improving codegen. Example change to amdgcn--amdhsa.bc:
```
in function _Z5clamphhh:
    >   %4 = icmp ugt i8 %0, %2
        %4 = tail call noundef i8 @llvm.umax.i8(i8 %0, i8 %1)
    >   %6 = select i1 %4, i8 %2, i8 %5
    >   ret i8 %6
    <   %5 = tail call noundef i8 @llvm.umin.i8(i8 %2, i8 %4)
    <   ret i8 %5
in function _Z5clampddd:
  in block %3 / %3:
    >   %4 = fcmp ogt double %0, %2
    >   %5 = fcmp olt double %0, %1
    >   %6 = select i1 %5, double %1, double %0
    >   %7 = select i1 %4, double %2, double %6
    >   ret double %7
    <   %4 = tail call noundef double @llvm.maximumnum.f64(double %0, double %1)
    <   %5 = tail call noundef double @llvm.minimumnum.f64(double %4, double %2)
    <   ret double %5
```
@wenju-he wenju-he requested a review from Copilot December 17, 2025 06:09
@llvmbot llvmbot added the libclc libclc OpenCL library label Dec 17, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the implementation of __clc_min, __clc_max, and __clc_clamp functions in libclc by leveraging LLVM intrinsics for better code generation. The changes replace floating-point min/max operations with calls to __clc_fmin/__clc_fmax (which lower to @llvm.minimumnum/@llvm.maximumnum), and simplify __clc_clamp to compose min/max operations instead of nested ternary expressions.

Key changes:

  • Floating-point __clc_min/__clc_max now delegate to __clc_fmin/__clc_fmax for better LLVM intrinsic lowering
  • __clc_clamp implementation simplified from nested ternary to composed __clc_min(__clc_max(x, y), z)
  • Integer clamp now benefits from LLVM's @llvm.umin/@llvm.umax intrinsics

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
libclc/clc/lib/generic/shared/clc_min.inc Adds floating-point path that delegates to __clc_fmin, preserves integer path with ternary
libclc/clc/lib/generic/shared/clc_min.cl Includes clc_fmin.h header for floating-point delegation
libclc/clc/lib/generic/shared/clc_max.inc Adds floating-point path that delegates to __clc_fmax, preserves integer path with ternary
libclc/clc/lib/generic/shared/clc_max.cl Includes clc_fmax.h header for floating-point delegation
libclc/clc/lib/generic/shared/clc_clamp.inc Replaces nested ternary with __clc_min(__clc_max(...)) composition
libclc/clc/lib/generic/shared/clc_clamp.cl Includes clc_max.h and clc_min.h headers for composed implementation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libclc libclc OpenCL library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants