-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Description
System Info
- `transformers` version: 5.0.0.dev0
- Platform: Linux-5.4.292-1.el8.elrepo.x86_64-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 1.2.2
- Safetensors version: 0.6.2
- Accelerate version: 1.8.1
- Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: FSDP
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- fsdp_config: {'fsdp_activation_checkpointing': False, 'fsdp_auto_wrap_policy': 'TRANSFORMER_BASED_WRAP', 'fsdp_cpu_ram_efficient_loading': True, 'fsdp_offload_params': False, 'fsdp_reshard_after_forward': True, 'fsdp_state_dict_type': 'SHARDED_STATE_DICT', 'fsdp_transformer_layer_cls_to_wrap': 'Gemma3DecoderLayer', 'fsdp_version': 2}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.8.0+cu128 (CUDA)
- Using distributed or parallel set-up in script?: <fill in>
- Using GPU in script?: <fill in>
- GPU type: NVIDIA A100 80GB PCIe
Who can help?
@zucchini-nlp when I run unit test like tests/models/pe_audio_video/test_modeling_pe_audio_video.py::PeAudioVideoEncoderTest::test_model_parallelism and tests/models/pe_video/test_modeling_pe_video.py::PeVideoEncoderTest::test_model_parallelism, it failed and returned error:
tests/test_modeling_common.py:2529:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1773: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1784: in _call_impl
return forward_call(*args, **kwargs)
/root/ft_test/frameworks.ai.client-ai.hf-accelerate-client/src/accelerate/hooks.py:175: in new_forward
output = module._old_forward(*args, **kwargs)
src/transformers/utils/generic.py:790: in wrapper
output = func(self, *args, **kwargs)
src/transformers/utils/generic.py:945: in wrapper
outputs = func(self, *args, **kwargs)
src/transformers/models/pe_audio_video/modeling_pe_audio_video.py:593: in forward
inputs_embeds, padding_mask, audio_output, video_output = self.embedder(
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1773: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1784: in _call_impl
return forward_call(*args, **kwargs)
src/transformers/models/pe_audio_video/modeling_pe_audio_video.py:223: in forward
video_output = self.video_encoder(pixel_values_videos, padding_mask_videos=padding_mask_videos)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1773: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1784: in _call_impl
return forward_call(*args, **kwargs)
src/transformers/utils/generic.py:790: in wrapper
output = func(self, *args, **kwargs)
src/transformers/utils/generic.py:945: in wrapper
outputs = func(self, *args, **kwargs)
src/transformers/models/pe_video/modeling_pe_video.py:530: in forward
inputs_embeds, padding_mask = self.embedder(pixel_values_videos, padding_mask=padding_mask_videos)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1773: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1784: in _call_impl
return forward_call(*args, **kwargs)
src/transformers/models/pe_video/modeling_pe_video.py:182: in forward
vision_encoder_outputs = self.vision_model(pixel_values_videos)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1773: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1784: in _call_impl
return forward_call(*args, **kwargs)
src/transformers/models/timm_wrapper/modeling_timm_wrapper.py:360: in forward
logits = self.timm_model(pixel_values, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1773: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1784: in _call_impl
return forward_call(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/timm/models/eva.py:993: in forward
x = self.forward_features(x)
/usr/local/lib/python3.10/dist-packages/timm/models/eva.py:964: in forward_features
x = blk(x, rope=rot_pos_embed)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1773: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1784: in _call_impl
return forward_call(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = EvaBlock(
(norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(attn): AttentionRope(
(qkv): Linear(i...n_features=256, out_features=64, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_path2): Identity() ) x = tensor([[[ 1.2520, 0.1755, 0.2867, ..., -0.3618, 0.6712, 0.4061], [-0.3215, -0.1512, -0.8277, ..., -0.9... [-0.5117, -0.0159, -1.1778, ..., -0.6055, -0.6392, 0.1033]]], device='cuda:0', grad_fn=<AddBackward0>) rope = tensor([[-0.9056, -0.9056, -0.9056, -0.9056, 0.4242, 0.4242, 0.4242, 0.4242]], device='cuda:0'), attn_mask = None def forward( self, x: torch.Tensor, rope: Optional[torch.Tensor] = None, attn_mask: Optional[torch.Tensor] = None, ) -> torch.Tensor: if self.gamma_1 is None: x = x + self.drop_path1(self.attn(self.norm1(x), rope=rope, attn_mask=attn_mask)) > x = x + self.drop_path2(self.mlp(self.norm2(x))) E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
The failing test test_model_parallelism was caused by the timm model (EvaBlock) being split across multiple devices during model parallelism. Since it does not have a mature ways to process model parallelism for the timm model, I think a suitbale manner would be add the TimmWrapperForImageClassification into _no_split_modules, like the drafted PR 42917, but it seems there is almost nothing else to be split, or we can skip related unit test? WDYT?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
git clone transformers
cd transformers
pip install -e .
pytest -rA tests/models/pe_audio_video/test_modeling_pe_audio_video.py::PeAudioVideoEncoderTest::test_model_parallelism
Expected behavior
unit test pass or be skipped