Skip to content

Conversation

@stevesuzuki-arm
Copy link
Contributor

Fix the output mismatch in fmls with float16 type where error() was optimized in a way that it is fused with scalar computation. compute_root() makes sure scalar result is computed independently.

Fix the output mismatch in fmls with float16 type where
error() was optimized in a way that it is fused with scalar computation.
compute_root() makes sure scalar result is computed independently.
@stevesuzuki-arm
Copy link
Contributor Author

In simd_op_check_wasm, i8x16.splat generates

	v128.load8_splat	0

, which was previously

	i32.load8_u	0
	local.tee	19
	i8x16.splat

for data reuse .

// Include a scalar version
Halide::Func f_scalar("scalar_" + name);
f_scalar(x, y) = e;
f_scalar.compute_root();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a comment as to why this is necessary for correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@alexreinking
Copy link
Member

alexreinking commented Nov 18, 2025

In simd_op_check_wasm, i8x16.splat generates

	v128.load8_splat	0

That's coming from these tests:

// Load vector with identical lanes generates *.splat.
check("i8x16.splat", 16 * w, in_u8(0));
check("i16x8.splat", 8 * w, in_u16(0));
check("i32x4.splat", 4 * w, in_u32(0));
check("i64x2.splat", 2 * w, in_u64(0));

I think it's actually an improvement to use v128.load8_splat in these cases and these tests can be updated (along with the comment to read _splat instead of .splat).

Copy link
Member

@alexreinking alexreinking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just fixed simd_op_check_wasm myself. Hope this works!

@zvookin zvookin merged commit a1de3e3 into halide:main Nov 21, 2025
7 of 18 checks passed
@stevesuzuki-arm stevesuzuki-arm deleted the pr-no-fusion branch December 1, 2025 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants