-
Notifications
You must be signed in to change notification settings - Fork 4.5k
General Debugging
Although it isn't the most advanced procedure, sometimes logging state to the screen is sufficient to figure out a problem. The following outlines how to add logging statements to a desired test.
-
Make sure the following line of code is at the top of your test to “route” log statements to terminal:
solana_logger::setup()
-
Add any of the following statements to display whatever information you'd like:
error!(), info!(), warn!(), debug!(), trace!()
- Note that the codebase contains these statements which will also be displayed on screen. As such, if you want to reduce "noise", it may be preferable to choose a higher level (error or info).
-
Set the
RUST_LOG
environment variable, making sure to choose a level equal to or lower than the statement in previous step.- For example:
export RUST_LOG=solana=error
- The level can also be specified for a single command:
RUST_LOG=solana=error cargo test your_test
- The level can also be specified on a per-crate basis, such as:
RUST_LOG=solana_ledger=trace,solana_core=info cargo test your_test
- For example:
Adding log statements to executables, such as solana-validator
, is similar. Same as before, make sure your executable is run with the proper log level, and consider adding any prints at a high level to avoid excessive output.
For more info, check out the official documentation: https://docs.rs/log/latest/log/index.html
Running a failing unit test inside a debugger provides much more detail, and is a useful tool to have in your tool belt. The following outlines the process for setting up and running unit tests with rust-lldb
.
- Build the code in debug mode (debug mode is the default mode for
cargo build
)- This is obvious, but worthwhile to mention. It's likely that the code has already built and the tests have been run, which prompted the reason to run the test(s) in a debugger.
- Locate the binary to debug
- The unit test binaries are in
target/debug/deps
. - Binaries have a hash appended as a string to the end of the binary's filename.
- The unit test binaries are in
- Launch
rust-lldb
$ rust-lldb target/debug/deps/solana_binary_to_debug-SOMEHASH
- Set a breakpoint
(lldb) breakpoint set -r test_name_here
- Run
(lldb) run -- test_name_here
Inspecting core dumps can be useful for examining the state of a program at the time of a crash.
System configuration may not allow the generation of core dumps by default. Perform the following steps to allow generation of core dumps:
- Check that a program is configured to handle core patterns by printing the contents of
/proc/sys/kernel/core_pattern
. For example,
$ cat /proc/sys/kernel/core_pattern
|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h
- Confirm that the service in step above is enabled; it might be disabled by default (ie if you're using apport).
systemctl status apport
- Confirm that the limit for created core files is large enough.
$ ulimit -c
unlimited
If the above is 0
, no core files will be created. Note that you can configure the value with ulmit; however, that value won't extend to other processes or persist. To persistently allow unlimited dumps for all processes, add a line to /etc/security/limits.conf
.
At the time of writing this, the project is using jemalloc in place of the system allocator. jemalloc has some builtins to allow profiling; the following procedure outlines how to do so. The jemalloc wiki also has some good documentation, specifically in the "Use cases" pages.
- Clone and build the jemalloc repo by following these instructions. Note that it may be preferable to NOT run
make install
to leave the default binaries alone as we are only building to get the profile analysis tool. - Install
ghostscript
andgraphviz
; these will be needed if you wish to create graph outputs.
sudo apt install ghostscript graphviz
- Note that there is a
jeprof
binary in thebin
directory.
The directions listed below will assume that solana-validator
is the binary to profile.
- Apply the following patch (or similar change should the patch become out of date) to enable debug information in the release binaries, as well as the jemalloc profiling feature itself.
Patch
diff --git a/Cargo.toml b/Cargo.toml
index 9aad90aeb4..f0492c7fce 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -92,5 +92,14 @@ exclude = [
"programs/bpf",
]
+[profile.release]
+debug = true
+
+[profile.release.package."*"]
+debug = true
+
+[profile.release.build-override]
+debug = true
+
# This prevents a Travis CI error when building for Windows.
resolver = "2"
diff --git a/ledger-tool/Cargo.toml b/ledger-tool/Cargo.toml
index 7a984bf6ad..28eb7ebf4f 100644
--- a/ledger-tool/Cargo.toml
+++ b/ledger-tool/Cargo.toml
@@ -39,7 +39,7 @@ solana-vote-program = { path = "../programs/vote", version = "=1.14.2" }
tokio = { version = "1", features = ["full"] }
[target.'cfg(not(target_env = "msvc"))'.dependencies]
-jemallocator = { package = "tikv-jemallocator", version = "0.4.1", features = ["unprefixed_malloc_on_supported_platforms"] }
+jemallocator = { package = "tikv-jemallocator", version = "0.4.1", features = ["profiling", "unprefixed_malloc_on_supported_platforms"] }
[dev-dependencies]
assert_cmd = "2.0"
diff --git a/validator/Cargo.toml b/validator/Cargo.toml
index 877ffccbc9..8db96cacf2 100644
--- a/validator/Cargo.toml
+++ b/validator/Cargo.toml
@@ -55,7 +55,7 @@ solana-vote-program = { path = "../programs/vote", version = "=1.14.2" }
symlink = "0.1.0"
[target.'cfg(not(target_env = "msvc"))'.dependencies]
-jemallocator = { package = "tikv-jemallocator", version = "0.4.1", features = ["unprefixed_malloc_on_supported_platforms"] }
+jemallocator = { package = "tikv-jemallocator", version = "0.4.1", features = ["profiling", "unprefixed_malloc_on_supported_platforms"] }
[target."cfg(unix)".dependencies]
libc = "0.2.126"
- With the patch applied, recompile the the desired binary with the following flags. For example, the build command may look like:
RUSTFLAGS='-g -C force-frame-pointers=yes' ./cargo build --release --features jemallocator/profiling
- Execute the
solana-validator
with the following flags; check out the documentation here to see what these flags do and tweak them as necessary to fit your use case.
MALLOC_CONF="prof:true,lg_prof_interval:32,lg_prof_sample:17,prof_prefix:jeprof" solana-validator ...
- As your process runs, it should now create files of the format
jeprof.<...>.heap
.
- Run something similar to the following command to create a PDF output; see the
jeprof
help for all the options:
jeprof --show_bytes /path/to/binary /path/to/heap/profile/file --pdf > heap.pdf
Note that if your process created multiple files, you can pass multiple files to jeprof
. For exampple, if all of your profile files from one run are in the same directory, you could do:
jeprof --show_bytes /path/to/binary /path/to/heap/profile/directory/jeprof.* --pdf > heap.pdf
some background for frame pointers to convince you that this is cool: https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer
use ubuntu 22.04 LTS.
use this ppa:
https://launchpad.net/~ryoqun/+archive/ubuntu/glibc-fp/
sadly, stable rustc can't be used due to: https://github.com/rust-lang/rust/issues/103711. in short we need to enable -Z build-std
.
Run this rustup component add rust-src --toolchain nightly-x86_64-unknown-linux-gnu
(requisite of -Z build-std
).
then, apply the following patch:
diff -u -r ./rust-src-nightly/rust-src/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_condvar.rs /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_condvar.rs
--- ./rust-src-nightly/rust-src/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_condvar.rs 2022-11-12 23:16:29.000000000 +0000
+++ /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_condvar.rs 2022-11-16 01:41:59.370745687 +0000
@@ -37,7 +37,8 @@
self.wait_optional_timeout(mutex, Some(timeout))
}
- unsafe fn wait_optional_timeout(&self, mutex: &Mutex, timeout: Option<Duration>) -> bool {
+ #[inline(never)]
+ pub unsafe fn wait_optional_timeout(&self, mutex: &Mutex, timeout: Option<Duration>) -> bool {
// Examine the notification counter _before_ we unlock the mutex.
let futex_value = self.futex.load(Relaxed);
diff -u -r ./rust-src-nightly/rust-src/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_mutex.rs /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_mutex.rs
--- ./rust-src-nightly/rust-src/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_mutex.rs 2022-11-12 23:16:29.000000000 +0000
+++ /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_mutex.rs 2022-11-16 01:31:16.190253714 +0000
@@ -30,7 +30,8 @@
}
#[cold]
- fn lock_contended(&self) {
+ #[inline(never)]
+ pub fn lock_contended(&self) {
// Spin first to speed things up if the lock is released quickly.
let mut state = self.spin();
diff -u -r ./rust-src-nightly/rust-src/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_rwlock.rs /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_rwlock.rs
--- ./rust-src-nightly/rust-src/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_rwlock.rs 2022-11-12 23:16:29.000000000 +0000
+++ /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/locks/futex_rwlock.rs 2022-11-16 01:43:42.173874145 +0000
@@ -102,7 +102,8 @@
}
#[cold]
- fn read_contended(&self) {
+ #[inline(never)]
+ pub fn read_contended(&self) {
let mut state = self.spin_read();
loop {
@@ -167,7 +168,8 @@
}
#[cold]
- fn write_contended(&self) {
+ #[inline(never)]
+ pub fn write_contended(&self) {
let mut state = self.spin_write();
let mut other_writers_waiting = 0;
RUSTFLAGS="-g -C force-frame-pointers=yes -C target-cpu=native -C opt-level=3 -C lto=no -C symbol-mangling-version=v0" cargo +nightly run -Z build-std --release --target x86_64-unknown-linux-gnu --bin solana-validator
(lto is disabled for now. maybe there's some workaround yet to be found by me...)
you need to build patched perf
: https://github.com/ryoqun/linux/tree/perf-p-offset-hack, something like this: apt build-dep linux
and apt install clang
and BUILD_BPF_SKEL=1 make
~/work/linux/tools/perf/perf record --call-graph fp --off-cpu -e dummy --proc-map-timeout 30000 -p $PID_OF_RUNNING_SOLANA_VALIDATOR
sudo ~/work/linux/tools/perf/perf script -F +pid > ./perf-offcpu.perf
- General
- Feature Gates
- Technical
- Schedule
- Testnet