Skip to content

[Tokenizer][OFFLINE] chat_template.jinja not downloaded in cache #42914

@khatrimann

Description

@khatrimann

System Info

Issue Title
AutoTokenizer ignores HF_HUB_OFFLINE=1 and attempts to download chat_template.jinja for Llama/LLM models

Description
I am encountering an issue where AutoTokenizer.from_pretrained() attempts to connect to the Hugging Face Hub to download chat_template.jinja, even when the environment variable HF_HUB_OFFLINE=1 is explicitly set.

While this workflow works correctly for older architectures (e.g., bert-base-uncased), it fails for Llama 3.1 and 3.2 models. I have also attempted to manually place the chat_template.jinja file in the cache revision directory, but the library still attempts an online connection.

Reproduction*
Script (repro_script.py):*

from transformers import AutoModelForCausalLM, AutoTokenizer

# MODEL_NAME = "bert-base-uncased"    # MODEL IS LOADED AND TOKENIZER TOO
MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct"    # MODEL IS LOADED BUT NOT TOKENIZER
MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"    # MODEL IS LOADED BUT NOT TOKENIZER
token = "hf_*"

# model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)    # WORKING
# tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)    # FAILS WITH LLAMA
# tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, local_files_only=True)    # FAILS WITH LLAMA
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, token=token)    # FAILS WITH LLAMA
# tokenizer.save_pretrained("./dummy_tokenizer")    # SAVE TO COPY FILES MANUALLY TO CACHE

print("Model and tokenizer loaded successfully.", tokenizer)

Command:

HF_HUB_OFFLINE=1 python repro_script.py

Expected Behavior
The tokenizer should load entirely from the local cache without attempting any network connections, respecting the HF_HUB_OFFLINE=1 flag.

Actual Behavior
The script fails with a connection error/timeout (depending on network status) indicating it is trying to retrieve chat_template.jinja from the Hub.

Error Log

huggingface_hub.errors.OfflineModeIsEnabled: Cannot reach https://huggingface.co/api/models/meta-llama/Llama-3.1-8B-Instruct: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable.

Environment Information* transformers version: (Run transformers-cli env and paste results here)

  • python: 3.10.19
  • transformers: 4.57.3
  • Platform: Linux

EDIT: I tried running this in transformers==4.55.2, it worked but their was a new problem -- Unable to apply chat_template

ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Paste the given script and run the given command

Expected behavior

No Error

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions