Skip to content

Conversation

@Pfannkuchensack
Copy link
Collaborator

Summary

When using GGUF-quantized models on MPS (Apple Silicon), the dequantized tensors could end up on a different device than the other operands in math operations, causing "Expected all tensors to be on the same device" errors.

This fix ensures that after dequantization, tensors are moved to the same device as the other tensors in the operation.

Related Issues / Discussions

(https://discord.com/channels/1020123559063990373/1149506274971631688/1454480237311168654)

QA Instructions

Test with z_image_turbo-Q4_K.gguf, Qwen_3_4b-Q6_K.gguf on mac

Merge Plan

No big change.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

When using GGUF-quantized models on MPS (Apple Silicon), the
dequantized tensors could end up on a different device than the
other operands in math operations, causing "Expected all tensors
to be on the same device" errors.

This fix ensures that after dequantization, tensors are moved to
the same device as the other tensors in the operation.
@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files labels Dec 27, 2025
@Vargol
Copy link
Contributor

Vargol commented Dec 29, 2025

Note this issue doesn't occur with keep_ram_copy_of_weights enabled as it defaults on enabled you'll need

keep_ram_copy_of_weights: False

in the invoke.yaml when testing
Also I believe partial loading can work around the issue too, either of those settings being enabled are sub-optimal on MPS.

It's basically the same issue as #7939

Oh, I believe it breaks on CUDA with the same settings, If you've got the VRAM to run it.

@gogurtenjoyer
Copy link
Contributor

I've tested this PR on a M5 with 32gb RAM. It allows generations, with no errors, with keep_ram_copy_of_weights: False set. No errors reported in console, and the speed is the same as before.

@lstein lstein marked this pull request as ready for review January 2, 2026 00:07
Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm going ahead with an approval to merge.

@lstein lstein enabled auto-merge (squash) January 2, 2026 00:32
@lstein lstein merged commit 3b2d2ef into invoke-ai:main Jan 2, 2026
13 checks passed
@Pfannkuchensack Pfannkuchensack deleted the fix/gguf-mps-device-mismatch branch January 3, 2026 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend PRs that change backend files python PRs that change python files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants