[c++] Add Bagging by Query for Lambdarank #6623

shiyu1994 · 2024-08-27T03:24:41Z

Add bagging by query instead of by items in lambdarank, suggested by @metpavel. This should be more reasonable for bagging in ranking tasks. For a comparison of performance, on MS LTR dataset:
With bagging_freq=1 and bagging_fraction=0.1, if bagging_by_query=true

[LightGBM] [Info] Iteration:100, training ndcg@1 : 0.528525
[LightGBM] [Info] Iteration:100, training ndcg@3 : 0.502271
[LightGBM] [Info] Iteration:100, training ndcg@5 : 0.5034
[LightGBM] [Info] 23.123690 seconds elapsed, finished iteration 100
[LightGBM] [Info] Finished training

and if bagging_by_query=false

[LightGBM] [Info] Iteration:100, training ndcg@1 : 0.524889
[LightGBM] [Info] Iteration:100, training ndcg@3 : 0.502272
[LightGBM] [Info] Iteration:100, training ndcg@5 : 0.502838
[LightGBM] [Info] 43.811966 seconds elapsed, finished iteration 100
[LightGBM] [Info] Finished training

Without bagging

[LightGBM] [Info] Iteration:100, training ndcg@1 : 0.535041
[LightGBM] [Info] Iteration:100, training ndcg@3 : 0.509657
[LightGBM] [Info] Iteration:100, training ndcg@5 : 0.510785
[LightGBM] [Info] 50.232102 seconds elapsed, finished iteration 100
[LightGBM] [Info] Finished training

…hub.com/Microsoft/LightGBM into bagging/bagging-by-query-for-lambdarank

borchero · 2024-09-02T11:10:35Z

include/LightGBM/cuda/cuda_objective_function.hpp

+  void GetGradients(const double* scores, const data_size_t /*num_sampled_queries*/, const data_size_t* /*sampled_query_indices*/, score_t* gradients, score_t* hessians) const override {
+    LaunchGetGradientsKernel(scores, gradients, hessians);
+    SynchronizeCUDADevice(__FILE__, __LINE__);
+  }
+


@neNasko1 is this something that might be missing for CUDA support in #6586?

shiyu1994 · 2024-09-03T09:58:34Z

@guolinke Could you please help to review this when you have time? Thanks.

StrikerRUS

Just some very minor suggestions from me below:

include/LightGBM/objective_function.h

tests/python_package_test/test_engine.py

StrikerRUS · 2024-09-05T20:43:23Z

tests/python_package_test/test_engine.py

+    assert ndcg_score_bagging_by_query >= ndcg_score - 0.1
+    assert ndcg_score_no_bagging_by_query >= ndcg_score - 0.1


PR's description states that bagging_by_query=True should improve metrics, but I don't see any comparison of bagging_by_query=True and bagging_by_query=False here...

Since I found the result can be random when the dataset is small. For example, on CPU, bagging_by_query=True gets higher NDCG with the toy test dataset (even higher than the case without bagging), while with GPU bagging_by_query=True could get worse results compared with bagging_by_query=False. But when the dataset is large, for example, with MS LTR dataset, the results are less random, and bagging_by_query=True should improve performance, as in the description of this PR.

In addition, we also see a significant improvement in training speed with bagging_by_query=True.

OK, I see. So this test is for something like "bagging_by_query=True doesn't break training".

Co-authored-by: Nikita Titov <[email protected]>

StrikerRUS

LGTM, thanks!

add bagging by query for lambdarank

0618bb2

shiyu1994 added effectiveness feature labels Aug 27, 2024

shiyu1994 self-assigned this Aug 27, 2024

shiyu1994 requested review from guolinke, jameslamb, jmoralez, borchero and StrikerRUS as code owners August 27, 2024 03:24

shiyu1994 added 9 commits August 27, 2024 11:27

Merge branch 'master' into bagging/bagging-by-query-for-lambdarank

185bdf6

fix pre-commit

38fa4c2

Merge branch 'bagging/bagging-by-query-for-lambdarank' of https://git…

2fce147

…hub.com/Microsoft/LightGBM into bagging/bagging-by-query-for-lambdarank

Merge branch 'master' into bagging/bagging-by-query-for-lambdarank

1f7f967

fix bagging by query with cuda

9e2a322

fix bagging by query test case

666c51e

fix bagging by query test case

9e2c338

fix bagging by query test case

3abbc11

add #include <vector>

13fa0a3

shiyu1994 added the awaiting review label Aug 30, 2024

Merge branch 'master' into bagging/bagging-by-query-for-lambdarank

9264768

borchero reviewed Sep 2, 2024

View reviewed changes

Merge branch 'master' into bagging/bagging-by-query-for-lambdarank

481ab03

guolinke approved these changes Sep 4, 2024

View reviewed changes

StrikerRUS requested changes Sep 5, 2024

View reviewed changes

jameslamb mentioned this pull request Sep 6, 2024

[ci] Update CUDA versions for CI #6539

Merged

shiyu1994 and others added 4 commits September 6, 2024 10:42

Update include/LightGBM/objective_function.h

7e51534

Co-authored-by: Nikita Titov <[email protected]>

Update tests/python_package_test/test_engine.py

8124999

Co-authored-by: Nikita Titov <[email protected]>

Update tests/python_package_test/test_engine.py

cc6f688

Co-authored-by: Nikita Titov <[email protected]>

Merge branch 'master' into bagging/bagging-by-query-for-lambdarank

0993154

StrikerRUS approved these changes Sep 6, 2024

View reviewed changes

Merge branch 'master' into bagging/bagging-by-query-for-lambdarank

8a9b356

shiyu1994 merged commit d1d218c into master Oct 2, 2024
45 checks passed

shiyu1994 deleted the bagging/bagging-by-query-for-lambdarank branch October 2, 2024 16:19

jameslamb removed the awaiting review label Oct 2, 2024

StrikerRUS removed the effectiveness label Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[c++] Add Bagging by Query for Lambdarank #6623

[c++] Add Bagging by Query for Lambdarank #6623

shiyu1994 commented Aug 27, 2024

borchero Sep 2, 2024

shiyu1994 commented Sep 3, 2024

StrikerRUS left a comment

StrikerRUS Sep 5, 2024

shiyu1994 Sep 6, 2024

shiyu1994 Sep 6, 2024

StrikerRUS Sep 6, 2024

StrikerRUS left a comment

		assert ndcg_score_bagging_by_query >= ndcg_score - 0.1
		assert ndcg_score_no_bagging_by_query >= ndcg_score - 0.1

[c++] Add Bagging by Query for Lambdarank #6623

[c++] Add Bagging by Query for Lambdarank #6623

Conversation

shiyu1994 commented Aug 27, 2024

borchero Sep 2, 2024

Choose a reason for hiding this comment

shiyu1994 commented Sep 3, 2024

StrikerRUS left a comment

Choose a reason for hiding this comment

StrikerRUS Sep 5, 2024

Choose a reason for hiding this comment

shiyu1994 Sep 6, 2024

Choose a reason for hiding this comment

shiyu1994 Sep 6, 2024

Choose a reason for hiding this comment

StrikerRUS Sep 6, 2024

Choose a reason for hiding this comment

StrikerRUS left a comment

Choose a reason for hiding this comment