Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add
aux_outputs
for CPO and SimPO (#492)
## Summary In trl [implementation](https://github.com/huggingface/trl/blob/8c49ea39ec2e11ce4c61291ff2ad59edb46522aa/trl/trainer/cpo_trainer.py#L669C1-L672C56), CPO should have 2 extra return values (`chosen_rewards`, `rejected_rewards`), but this is not implemented in Liger-kernel. <!--- This is a required section; please describe the main purpose of this proposed code change. ---> <!--- ## Details This is an optional section; is there anything specific that reviewers should be aware of? ---> ## Testing Done <!--- This is a required section; please describe how this change was tested. ---> <!-- Replace BLANK with your device type. For example, A100-80G-PCIe Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> - Hardware Type: <BLANK> - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [ ] run `make test-convergence` to ensure convergence --------- Signed-off-by: Mecoli1219 <[email protected]> Co-authored-by: Austin Liu <[email protected]>
- Loading branch information