-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Description
System Info
Issue Title
AutoTokenizer ignores HF_HUB_OFFLINE=1 and attempts to download chat_template.jinja for Llama/LLM models
Description
I am encountering an issue where AutoTokenizer.from_pretrained() attempts to connect to the Hugging Face Hub to download chat_template.jinja, even when the environment variable HF_HUB_OFFLINE=1 is explicitly set.
While this workflow works correctly for older architectures (e.g., bert-base-uncased), it fails for Llama 3.1 and 3.2 models. I have also attempted to manually place the chat_template.jinja file in the cache revision directory, but the library still attempts an online connection.
Reproduction*
Script (repro_script.py):*
from transformers import AutoModelForCausalLM, AutoTokenizer
# MODEL_NAME = "bert-base-uncased" # MODEL IS LOADED AND TOKENIZER TOO
MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct" # MODEL IS LOADED BUT NOT TOKENIZER
MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct" # MODEL IS LOADED BUT NOT TOKENIZER
token = "hf_*"
# model = AutoModelForCausalLM.from_pretrained(MODEL_NAME) # WORKING
# tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) # FAILS WITH LLAMA
# tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, local_files_only=True) # FAILS WITH LLAMA
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, token=token) # FAILS WITH LLAMA
# tokenizer.save_pretrained("./dummy_tokenizer") # SAVE TO COPY FILES MANUALLY TO CACHE
print("Model and tokenizer loaded successfully.", tokenizer)Command:
HF_HUB_OFFLINE=1 python repro_script.py
Expected Behavior
The tokenizer should load entirely from the local cache without attempting any network connections, respecting the HF_HUB_OFFLINE=1 flag.
Actual Behavior
The script fails with a connection error/timeout (depending on network status) indicating it is trying to retrieve chat_template.jinja from the Hub.
Error Log
huggingface_hub.errors.OfflineModeIsEnabled: Cannot reach https://huggingface.co/api/models/meta-llama/Llama-3.1-8B-Instruct: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable.
Environment Information* transformers version: (Run transformers-cli env and paste results here)
python: 3.10.19transformers: 4.57.3- Platform: Linux
EDIT: I tried running this in transformers==4.55.2, it worked but their was a new problem -- Unable to apply chat_template
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Paste the given script and run the given command
Expected behavior
No Error