Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128515: Add BOLT build to CI #128845

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

gh-128515: Add BOLT build to CI #128845

wants to merge 6 commits into from

Conversation

zanieb
Copy link
Contributor

@zanieb zanieb commented Jan 14, 2025

Adds BOLT test coverage to CI, which will allow us to prevent regressions and move towards stabilization of this feature.

Of note:


@zanieb zanieb force-pushed the zb/bolt branch 5 times, most recently from 29351fc to 1d7ab1e Compare January 14, 2025 20:35
Copied from the JIT workflow
@zanieb
Copy link
Contributor Author

zanieb commented Jan 14, 2025

Interesting, test_pickle failing on the instrumented binaries. Will need to investigate that, as I haven't seen it before.

edit: This occurs because test_unpickle_module_race fails on a read-only file system. See c3a3800

@zanieb
Copy link
Contributor Author

zanieb commented Jan 14, 2025

I encountered a couple blockers for aarch64, a failed assertion in the instrumented binary

./python -m test --pgo --rerun --verbose3 --timeout=
python: ../cpython-ro-srcdir/Python/generated_cases.c.h:1074: _PyEval_EvalFrameDefault: Assertion `tp->tp_alloc == PyType_GenericAlloc' failed.
Aborted (core dumped)

and (after hacking past that) a segfault in BOLT

# Run bolt against the merged data to produce an optimized binary.
for bin in python; do \
  /usr/lib/llvm-19/bin/llvm-bolt "${bin}.prebolt" -o "${bin}.bolt" -data="${bin}.fdata" -update-debug-sections -skip-funcs=_PyEval_EvalFrameDefault,sre_ucs1_match/1,sre_ucs2_match/1,sre_ucs4_match/1  -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot ; \
  mv "${bin}.bolt" "${bin}"; \
done
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: <unknown>
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: enabling relocation mode
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-INFO: number of removed linker-inserted veneers: 0
BOLT-INFO: 8500 out of 12058 functions in the binary (70.5%) have non-empty execution profile
BOLT-INFO: 41 functions with profile could not be optimized
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: removed 1 empty block
BOLT-INFO: ICF folded 678 out of 12439 functions in 5 passes. 0 functions had jump tables.
BOLT-INFO: Removing all identical functions will save 46.23 KB of code space. Folded functions were called 3909549484 times based on profile.
BOLT-INFO: ICP Total indirect calls = 1808544446, 153 callsites cover 99% of all indirect calls
 #0 0x0000aacc1be768cc (/usr/lib/llvm-19/bin/llvm-bolt+0x1ae68cc)
 #1 0x0000aacc1be74b80 (/usr/lib/llvm-19/bin/llvm-bolt+0x1ae4b80)
 #2 0x0000aacc1be77174 (/usr/lib/llvm-19/bin/llvm-bolt+0x1ae7174)
 #3 0x0000ff03feee37e0 (linux-vdso.so.1+0x7e0)
 #4 0x0000aacc1c397200 (/usr/lib/llvm-19/bin/llvm-bolt+0x2007200)
 #5 0x0000aacc1c39aa1c (/usr/lib/llvm-19/bin/llvm-bolt+0x200aa1c)
 #6 0x0000aacc1c39a9e4 (/usr/lib/llvm-19/bin/llvm-bolt+0x200a9e4)
 #7 0x0000aacc1c39a9e4 (/usr/lib/llvm-19/bin/llvm-bolt+0x200a9e4)
 #8 0x0000aacc1bf1ebc4 (/usr/lib/llvm-19/bin/llvm-bolt+0x1b8ebc4)
 #9 0x0000aacc1bf21328 (/usr/lib/llvm-19/bin/llvm-bolt+0x1b91328)
#10 0x0000aacc1becfe3c (/usr/lib/llvm-19/bin/llvm-bolt+0x1b3fe3c)
#11 0x0000aacc1aadf2f0 (/usr/lib/llvm-19/bin/llvm-bolt+0x74f2f0)
#12 0x0000ff03fe8684c4 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#13 0x0000ff03fe868598 call_init ./csu/../csu/libc-start.c:128:20
#14 0x0000ff03fe868598 __libc_start_main ./csu/../csu/libc-start.c:347:5
#15 0x0000aacc1aadd4f0 (/usr/lib/llvm-19/bin/llvm-bolt+0x74d4f0)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /usr/lib/llvm-19/bin/llvm-bolt python.prebolt -o python.bolt -data=python.fdata -update-debug-sections -skip-funcs=_PyEval_EvalFrameDefault,sre_ucs1_match/1,sre_ucs2_match/1,sre_ucs4_match/1 -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
Segmentation fault (core dumped)

I dropped aarch64 in 684ece4 — we can add it later.

@corona10 corona10 self-assigned this Jan 14, 2025
@zanieb
Copy link
Contributor Author

zanieb commented Jan 14, 2025

A few tests are failing after BOLT optimization. I'd appreciate some guidance on that.

test_sys_api (test.test_perf_profiler.TestPerfTrampoline.test_sys_api) ... FAIL
test_trampoline_works (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works) ... FAIL
test_trampoline_works_with_forks (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works_with_forks) ... FAIL

======================================================================
FAIL: test_sys_api (test.test_perf_profiler.TestPerfTrampoline.test_sys_api)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_perf_profiler.py", line 203, in test_sys_api
    self.assertIn(f"py::spam:{script}", perf_file_contents)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 'py::spam:/tmp/test_python_qxe1_ajb/tmpqablk9qp/perftest.py' not found in '7f2d97946000 80600b py::baz:/tmp/test_python_qxe1_ajb/tmpqablk9qp/perftest.py\n'

======================================================================
FAIL: test_trampoline_works (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_perf_profiler.py", line 91, in test_trampoline_works
    self.assertIsNotNone(
    ~~~~~~~~~~~~~~~~~~~~^
        perf_line, f"Could not find {expected_symbol} in perf file"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
AssertionError: unexpectedly None : Could not find py::foo:/tmp/test_python_qxe1_ajb/tmpdd3d4w9f/perftest.py in perf file

======================================================================
FAIL: test_trampoline_works_with_forks (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works_with_forks)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_perf_profiler.py", line 145, in test_trampoline_works_with_forks
    self.assertEqual(process.returncode, 0)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: -11 != 0

----------------------------------------------------------------------
Ran 3 tests in 0.463s

FAILED (failures=3)
test test_perf_profiler failed
1 test failed again:
    test_perf_profiler

@zanieb
Copy link
Contributor Author

zanieb commented Jan 14, 2025

The timing on this actually seems pretty reasonable at 13 minutes.

We could expand this to perform other build optimizations, e.g., PGO, to verify they're working as intended? Right now it's just BOLT though.

@corona10
Copy link
Member

Two things

  • We should use this action for BOLT only since the test coverage is different from PGO+LTO build.
  • Let's skip 3 test failure tests by using @unittest.skipIf(support.check_bolt_optimized.

@zanieb
Copy link
Contributor Author

zanieb commented Jan 15, 2025

We should use this action for BOLT only since the test coverage is different from PGO+LTO build.

Can you expand on this comment?

Let's skip 3 test failure tests by using @unittest.skipIf(support.check_bolt_optimized.

Sounds good to me — should I open an issue to investigate why they fail too? Like is the profiler actually broken?

@corona10
Copy link
Member

Can you expand on this comment?

Because we skip several tests with BOLTed binary, PGO + LTO can not check the regression issue where tests are skipped. Currently, PGO + LTO is the standard optimization policy of the CPython project.
So this is why I suggested let's handle it separately for the PGO + LTO build.

Sounds good to me — should I open an issue to investigate why they fail too? Like is the profiler actually broken?

Yeah, we should; maybe @pablogsal is interested in this issue.

@zanieb
Copy link
Contributor Author

zanieb commented Jan 15, 2025

Created a tracking issue at #128883; skipped the tests in 01cb8d8

@zanieb zanieb marked this pull request as ready for review January 15, 2025 15:05
@zanieb zanieb added the infra CI, GitHub Actions, buildbots, Dependabot, etc. label Jan 15, 2025
Comment on lines +253 to +258
# Do not test BOLT with free-threading, to conserve resources
- bolt: true
free-threading: true
# BOLT currently crashes during instrumentation on aarch64
- os: ubuntu-24.04-aarch64
bolt: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong feelings about this pattern (using exclude instead of include), but liked that I could document why we're not running the additional cases.

@@ -246,10 +250,17 @@ jobs:
exclude:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion, but I would prefer to have just 1, 2, or 3 jobs with bolt. unless it is absolutely needed.

We can move some very specific builds to buildbots, while maintatining the bare minimum in CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just one job with BOLT — I think in the future we'd want a second job for aarch64 once that's unblocked. Are you suggesting I should frame this as an include instead? ref #128845 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Sorry for not being clear :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review infra CI, GitHub Actions, buildbots, Dependabot, etc. skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants