Improve Adapter/LoRA handling #1095

winglian · 2024-01-11T13:37:44Z

⚠️ Please check that this feature request hasn't been suggested before.

I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

there are a few use cases that aren't cleanly handled atm.

loading an existing adapter, then doing FFT over the merged model. We would need to load and merge the adapter and continue a regular FFT
loading an existing adapter and then training a new adapter over the merged model.

currently both of these can be worked around by simply manually merging the models beforehand, but it would be nice to handle these cases.

✔️ Solution

see above

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.

winglian · 2024-01-11T18:58:24Z

Additionally, we should simplify the lora/qlora w 8/4bit loading. Ideally we get rid of the adapter: qlora option as qlora is simply a specific subset of lora where all linear layers are targeted, and 4 bit quantization. I think we can simplify this to only lora and allow either 4 or 8bit to be set. And if a user selects qlora, then we warn about the specific cases where qlora applies - 4 bit and targeting all linear layers

simhallq · 2024-01-14T21:11:02Z

I'll give this a go - just so I understand the first part correctly:

I should be able to do (full) finetuning with an existing adapter model as base_model arg in the training config. In that case the base model of the adapter should be merged with the adapter (i.e. using merge_and_unload) and then full finetune should be ran on the merged model.
Optionally full finetune can be swapped for new lora/qlora training run, adding new lora-layers/adapters to the merged model and then proceeding to train the new adapter weights only.

Is that right?

winglian · 2024-01-14T23:21:15Z

So in the first case, a user coukd add a lora model dir arg, but have an empty adapter. The model should simply just merge and unload the adapter into the base model only.

I'm not clear on what you are asking for the second case.

winglian added enhancement New feature or request good first issue Good for newcomers labels Jan 11, 2024

winglian pinned this issue Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Adapter/LoRA handling #1095

Improve Adapter/LoRA handling #1095

winglian commented Jan 11, 2024

winglian commented Jan 11, 2024

simhallq commented Jan 14, 2024 •

edited

Loading

winglian commented Jan 14, 2024

Improve Adapter/LoRA handling #1095

Improve Adapter/LoRA handling #1095

Comments

winglian commented Jan 11, 2024

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

winglian commented Jan 11, 2024

simhallq commented Jan 14, 2024 • edited Loading

winglian commented Jan 14, 2024

simhallq commented Jan 14, 2024 •

edited

Loading