Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(3 in pipeline) - runcard fixes, multi-backend gridpacks, python/bash profiling in tlau #948

Draft
wants to merge 155 commits into
base: master
Choose a base branch
from

Conversation

valassi
Copy link
Member

@valassi valassi commented Aug 2, 2024

This is a WIP PR with fixes and studies in generate_events

It includes a possible fix for #947

@valassi valassi self-assigned this Aug 2, 2024
@valassi valassi marked this pull request as draft August 2, 2024 11:45
@valassi valassi linked an issue Aug 2, 2024 that may be closed by this pull request
@valassi
Copy link
Member Author

valassi commented Aug 2, 2024

As suggested by @oliviermattelaer the fix for #947 is moved to mg5amcnlo/mg5amcnlo#125

@valassi
Copy link
Member Author

valassi commented Aug 2, 2024

The fix for #947 has been stripped off in #949 (which can be merged now)

This has been rebased over it and remains in WIP

valassi added 22 commits August 5, 2024 09:34
(August 2024: Fix conflicts in epochX/cudacpp/tlau/lauX.sh after rebasing on upstream/master)
./tlau/lauX.sh -cuda gg_tt.mad
…_type' to 'cudacpp_fptype' for runcard options (madgraph5#700, madgraph5#788, madgraph5#835)

(Older commit on gg_tt.mad, later cherry-picked fixing AVX conflicts, and later backported to CODEGEN too)
…acpp_backend in add_input_for_banner

Then regenerate gg_tt.mad: this only affects the gg_tt.mad/bin/internal/plugin_run_card file
…acpp_backend in default_setup (will revert but keep the added comment)

Then regenerate gg_tt.mad: this only adds "CUDACPP_BACKEND = 'cpp'" to run_card.inc
…dacpp_backend in default_setup

Regenerate gg_tt.mad: this removes "CUDACPP_BACKEND = 'cpp'" from run_card.inc
…acpp_fptype in default_setup and comment out fct_mod (will revert but keep the added comment)

Then regenerate gg_tt.mad: this only adds "CUDACPP_FPTYPE = 'm'" to run_card.inc (if fct_mod is commented out, else this fails codegen)
…dacpp_fptype in default_setup and uncomment out fct_mod

Then regenerate gg_tt.mad: this removes "CUDACPP_FPTYPE = 'm'" from run_card.inc
…p_fptype

Then regenerate gg_tt.mad: this gives a warning "run_card missed argument cudacpp_fptype. Takes default: m", but no code/card change?
…e runcard template, and keep hidden=False

Then regenerate gg_tt.mad: a cudacpp_fptype line is added to the runcard and the warning "run_card missed argument cudacpp_fptype. Takes default: m" disappears
…nner (and reorder code)

Then regenerate gg_tt.mad: this only affects the gg_tt.mad/bin/internal/plugin_run_card file
…ILDDIR=1 to make_opts (part of madgraph5#945)

Then regenerate gg_tt.mad: this only affects launch_plugin.py itself

Launching lauX.sh on ggtt fails as follows

Error detected in "generate_events -f"
write debug file /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/run_01_tag_1_debug.log
If you need help with this issue please contact us on https://answers.launchpad.net/mg5amcnlo
str : A compilation Error occurs when trying to compile /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx.
        The compilation fails with the following output message:
            make USEGTEST=0 BACKEND=cuda build.auto_m_inl0_hrd0/madevent_cuda
            make[1]: Entering directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx'
            make[1]: *** No rule to make target 'build.auto_m_inl0_hrd0/madevent_cuda'.  Stop.
            make[1]: Leaving directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx'
            make: *** [makefile:163: madevent_cuda_link] Error 2

        Please try to fix this compilations issue and retry.
        Help might be found at https://answers.launchpad.net/mg5amcnlo.
        If you think that this is a bug, you can report this at https://bugs.launchpad.net/mg5amcnlo
… madevent_xxx_link' in args[0][0] to support USEBUILDDIR=1 (part of madgraph5#945)

Then regenerate gg_tt.mad

However lauX.sh still fails
…][1] to 'make madevent_xxx_link' to support USEBUILDDIR=1 (part of madgraph5#945)

Then regenerate gg_tt.mad
…two patches, will fix gg_tt.mad makefile instead

Revert "[runcard] in CODEGEN launch_plugin.py, append 'BACKEND=xxx' as args[0][1] to 'make madevent_xxx_link' to support USEBUILDDIR=1 (part of madgraph5#945)"
This reverts commit ce67fb5.

Revert "[runcard] in CODEGEN launch_plugin.py, prepend 'BACKEND=xxx' to 'make madevent_xxx_link' in args[0][0] to support USEBUILDDIR=1 (part of madgraph5#945)"
This reverts commit b51a59a.
…targets instead of having this in a python script (part of madgraph5#945)
…rgets make commands (eg propagate FPTYPE=m from make_opts)
…_opts configuration, this will be hardcoded in link targets (part of madgraph5#945)"

Revert "[runcard] in CODEGEN launch_plugin.py, configure compile to add USEBUILDDIR=1 to make_opts (part of madgraph5#945)"
This reverts commit 853a20e.
…(and propagate FPTYPE from make_opts) in _link targets for madgraph5#945

The only files that still need to be patched are
- 3 in patch.common: Source/makefile, Source/genps.inc, SubProcesses/makefile
- 3 in patch.P1: auto_dsig1.f, driver.f, matrix1.f

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
valassi added 11 commits October 6, 2024 13:05
…st v1.00.00 and fixing SubProcesses/makefile madgraph5#700

The only files that still need to be patched are
- 1 in patch.common: Source/genps.inc, SubProcesses/makefile
- 2 in patch.P1: driver.f, matrix1.f

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad

(Later regenerated gg_tt.mad and checked that all is ok)
… runcards madgraph5#700 (SubProcesses/makefile, runcard, launch_plugin)
…h) after merging banner+makefiles - all ok

./tput/allTees.sh -hip

Approximately:
STARTED  AT Sun 06 Oct 2024 02:20:00 PM EEST
ENDED    AT Sun 06 Oct 2024 03:12:00 PM EEST
… - all as expected (heft fail madgraph5#833, skip ggttggg madgraph5#933)

./tmad/allTees.sh -hip

STARTED  AT Sun 06 Oct 2024 03:12:36 PM EEST
(SM tests)
ENDED(1) AT Sun 06 Oct 2024 05:40:31 PM EEST [Status=0]
(BSM tests)
ENDED(1) AT Sun 06 Oct 2024 05:50:33 PM EEST [Status=0]
Revert "[grid] rerun 30 tmad tests on LUMI/HIP after merging banner+makefiles - all as expected (heft fail madgraph5#833, skip ggttggg madgraph5#933)"
This reverts commit 0fe4544.

Revert "[grid] rerun 96 tput builds and tests on LUMI worker node (small-g 72h) after merging banner+makefiles - all ok"
This reverts commit e3ed264.
…makefiles - all ok

STARTED  AT Sun Oct  6 01:22:14 PM CEST 2024
./tput/teeThroughputX.sh -dmf -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean  -cpponly
ENDED(1) AT Sun Oct  6 01:35:24 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -d_f -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean  -cpponly
ENDED(2) AT Sun Oct  6 01:40:38 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -d_f -bridge -makeclean  -cpponly
ENDED(3) AT Sun Oct  6 01:45:35 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -rmbhst  -cpponly
ENDED(4) AT Sun Oct  6 01:47:03 PM CEST 2024 [Status=0]
SKIP './tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common  -cpponly'
ENDED(5) AT Sun Oct  6 01:47:03 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common  -cpponly
ENDED(6) AT Sun Oct  6 01:48:30 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -dmf -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean  -cpponly
ENDED(7) AT Sun Oct  6 01:53:13 PM CEST 2024 [Status=0]

No errors found in logs

No FPEs or '{ }' found in logs
… all as expected

STARTED  AT Sun Oct  6 01:53:13 PM CEST 2024
(SM tests)
ENDED(1) AT Sun Oct  6 04:50:19 PM CEST 2024 [Status=0]
(BSM tests)
ENDED(1) AT Sun Oct  6 04:55:35 PM CEST 2024 [Status=0]
Revert "[makefiles] rerun 30 tmad tests on gold91 after merging banner+makefiles - all as expected"
This reverts commit 0bb573a.

Revert "[grid] rerun 96 tput builds and tests on gold91 after merging banner+makefiles - all ok"
This reverts commit b249b5e.
…er+makefiles - all ok

STARTED  AT Sun Oct  6 01:19:50 PM CEST 2024
./tput/teeThroughputX.sh -dmf -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Sun Oct  6 01:44:11 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -d_f -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Sun Oct  6 01:52:30 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -d_f -bridge -makeclean
ENDED(3) AT Sun Oct  6 02:01:40 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -rmbhst
ENDED(4) AT Sun Oct  6 02:04:29 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -curhst
ENDED(5) AT Sun Oct  6 02:07:14 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common
ENDED(6) AT Sun Oct  6 02:10:06 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -dmf -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean
ENDED(7) AT Sun Oct  6 02:19:40 PM CEST 2024 [Status=0]

No errors found in logs

No FPEs or '{ }' found in logs
@valassi
Copy link
Member Author

valassi commented Oct 6, 2024

I removed the link to #947 (this was fixed in mg5amcnlo/mg5amcnlo#125, submodule was updated in PR #949)

…ranch grid

This includes in summary
- madgraph5#700 add cudacpp_helinl, cudacpp_hrdcod, rename cudacpp_fptype
- madgraph5#945 add cudacpp_bldall for multi-backend gridpacks
- madgraph5#957 enhance tlau tests: instrument python code for gridpack profiling, add test logs and scripts to parse them
@valassi valassi changed the title WIP studies and fixes with generate_events (3 in pipeline) - runcard fixes, multi-backend gridpacks, python/bash profiling in tlau Oct 7, 2024
@valassi valassi changed the base branch from master to valassi_2_makefiles October 7, 2024 08:27
@valassi valassi marked this pull request as ready for review October 7, 2024 08:28
@valassi
Copy link
Member Author

valassi commented Oct 7, 2024

Hi @oliviermattelaer as discussed via email.

This is N=3 in the pipeline I would like to merge.

I changed this to target the N=2 makefiles branch for easier reviewing, but I would then merge it against master once approved.

Can you please tell me if this looks ok? Thanks
Andrea

@valassi
Copy link
Member Author

valassi commented Oct 7, 2024

(Note: this also addresses #700 but I did not manage to link it yet)

@valassi valassi linked an issue Oct 7, 2024 that may be closed by this pull request
@oliviermattelaer
Copy link
Member

Point by Point review

multi backend gridpacks (cudacpp_bldall runcard)

No problem on that one, but maybe why including it by default in the run_card?
If you have a comment:

 WARNING! The following cudacpp runcards are experimental! Users should normally change only the cudacpp_backend card ***

Would it make more sense to not display them at all (or in a independent hidden block "simd_advance"?)
-> to be discussed

add cudacpp_helinl and cudacpp_hrdcod runcards for HELINL=1 and HRDCOD=1 modes

  • Same comment as above
  • comment on the default_setup() need to be changed from a question to an assertion
  • maybe good to add the comment field for those?

rename the FPTYPE runcard as cudacpp_fptype

This is delicate since this is going to break retro-compatibility of the code which we have to preserve.
So we either have to not do this change (which is ok) or do it but add a mechanism preserving retro-compatibility.
-> to be discussed

instrument gridpacks with time profiling of the python/bash orchestrator (this is only in my "tlau" tests for the moment, eventually it can be moved in mg5amcnlo, but for the moment it is there so I can use it and further develop it)

I do not see any change in the codegen directory so I guess that you edit manually a .tmad directory, which is fine since those directory needs anyway to be remove from the repo.

Global Comment

Touching the patch, should be avoid as much as possible, do you want to add new handle in the template?
Or do you want to do it? Would be nice to avoid to touch that makefile in such a way...

@valassi valassi marked this pull request as draft October 8, 2024 14:23
@valassi
Copy link
Member Author

valassi commented Oct 8, 2024

Hi @oliviermattelaer thanks :-)

I will reply later. For the moment I moved back to draft in any case while working on the base PRs

git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…and makefiles) into grid

Fix conflicts:
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/CHANGELOG.md (fix comment about floating_type card)
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (will regenerate anyway)
…/master with updated mg5amcnlo

Only the following files are needed to build the patch:
- 1 in patch.common: SubProcesses/makefile
- 2 in patch.P1: driver.f, matrix1.f

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
@valassi
Copy link
Member Author

valassi commented Oct 8, 2024

Hi Olivier thanks for the comments :-)

Maybe easiest to have a short zoom when you have time tomorrow or another time.

In short

  • As you prefer for the visibility of the cards. I find it nice to have them in the cards, because this way it's easy to add a comments for users about which options are available. But I do not know how users use runcards enough to judge. Maybe there is some other way to document which cards are available and which options are available for each card.

  • As you prefer for floating_type, ok for me to keep it that way. I thought cudacpp_fptype was more consistent, but I really do not care much. I had not thought that backward compatibility was an issue.

  • Thanks for accepting the concept of multi backend gridpacks and of python profiling.

  • Your comment about "deault_setup" I have not understood. To be discussed.

  • About "Touching the patch, should be avoid as much as possible, do you want to add new handle in the template?", I guess you mean that we should go in the direction of what I did for driver.f in Allow patching header/initialise/finalise in driver.f mg5amcnlo/mg5amcnlo#148. My answer is yes, eventually I'd like to do that, but I have not done it here. For the moment the only thing I am trying to do is to NOT INCREASE the stuff in patch.P1 and patch.common. Thanks to various changes you did we reduced this to three files (SubProcesses/makefile, driver.f, matrix1.f), and with what I did fro driver.f I plan to also get rid of driver.f. Then yes we should also get rid of the patch for makefile. I would just do this later... but if you prefer, I will try to get rid of these three first thing (probably after CHEP).

Apart from this:

  • I have merged the two base PRs
  • So I have updated this PR accordingly
  • And I changed its base to master as now it is the lowest level in my "pipeline"

Lets dscuss the other points above by zoome (or just reply here in the meantime), tomorrow.

Thanks! :-)
Andrea

@valassi valassi changed the base branch from valassi_2_makefiles to master October 8, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants