[cache] add a test to confirm we can use cache at train time #35709

gante · 2025-01-15T12:11:19Z

What does this PR do?

In #35648, the idea of always disabling the cache at train-time was floated. However, @BenjaminBossan pointed to me that some fine-tuning scripts need cache at train time (e.g. prefix tuning in PEFT).

I couldn't find any test that ensured the survival of this property, so this PR adds a test that will fail if we decide to disable it :)

HuggingFaceDocBuilderDev · 2025-01-15T12:38:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2025-01-15T13:16:16Z

Thanks a lot for adding this test to prevent possible regressions with PEFT methods like prefix tuning. I wonder if we can also try injecting the cache and checking if that works, basically like a mini prefix tuning:

        # simulate injecting virtual tokens like in prefix tuning
        num_virtual_tokens = 3
        past_key_values = [torch.randn(2, 1, 2, num_virtual_tokens, 8)] * 2
        past_key_values = DynamicCache.from_legacy_cache(past_key_values)
        model_inputs["attention_mask"] = torch.cat((
            model_inputs["attention_mask"],
            torch.ones(1, num_virtual_tokens).to(model_inputs["attention_mask"].device)
        ), dim=1)
        model_outputs = model(**model_inputs, past_key_values=past_key_values, use_cache=True)

Maybe this is overkill, LMK what you think.

gante · 2025-01-15T14:01:42Z

@BenjaminBossan added, thank you for the suggestion 👍 (the extra check is nearly free, and may save some pain down the road)

BenjaminBossan

Thanks so much for taking of ensuring that this works, LGTM.

ArthurZucker

Okay! I think we did break that once or twice, good addition! 🤗

tests/utils/test_modeling_utils.py

Co-authored-by: Arthur <[email protected]>

add test

dd1dd7b

gante requested review from Rocketknight1 and ArthurZucker as code owners January 15, 2025 12:11

augment test as suggested

d10963b

BenjaminBossan approved these changes Jan 15, 2025

View reviewed changes

ArthurZucker approved these changes Jan 16, 2025

View reviewed changes

tests/utils/test_modeling_utils.py Outdated Show resolved Hide resolved

gante and others added 2 commits January 16, 2025 16:40

Update tests/utils/test_modeling_utils.py

638ddbe

Co-authored-by: Arthur <[email protected]>

rerun tests

88a5540

gante merged commit aeeceb9 into huggingface:main Jan 16, 2025
11 checks passed

gante deleted the train_cache_test branch January 16, 2025 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cache] add a test to confirm we can use cache at train time #35709

[cache] add a test to confirm we can use cache at train time #35709

gante commented Jan 15, 2025

HuggingFaceDocBuilderDev commented Jan 15, 2025

BenjaminBossan commented Jan 15, 2025

gante commented Jan 15, 2025 •

edited

Loading

BenjaminBossan left a comment

ArthurZucker left a comment

[cache] add a test to confirm we can use cache at train time #35709

[cache] add a test to confirm we can use cache at train time #35709

Conversation

gante commented Jan 15, 2025

What does this PR do?

HuggingFaceDocBuilderDev commented Jan 15, 2025

BenjaminBossan commented Jan 15, 2025

gante commented Jan 15, 2025 • edited Loading

BenjaminBossan left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

gante commented Jan 15, 2025 •

edited

Loading