forked from HPCToolkit/libmonitor
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME
239 lines (188 loc) · 9.55 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
====================
What is Libmonitor
====================
$Id$
Libmonitor is a library providing callback functions for the begin and
end of processes, threads, fork, exec, etc. It provides a layer on
which to build process monitoring tools such as profilers.
For example, Rice HPCToolkit uses libmonitor to turn on profiling at
the start of processes and threads, using the monitor_init_process()
and monitor_init_thread() callback functions, and then turns profiling
off with monitor_fini_process() and monitor_fini_thread(). In this
case, libmonitor separates the job of hooking the application program
from the profiling tasks. See src/monitor.h for the declarations of
the callback functions.
Libmonitor is now hosted on GitHub as a separate repository within the
HPCToolkit project.
https://github.com/HPCToolkit
Libmonitor can be cloned from github by:
git clone https://github.com/hpctoolkit/libmonitor.git
Libmonitor is a rewrite from scratch of Philip Mucci's monitor at the
University of Tennessee to allow running monitor on both dynamic and
static binaries and on both Linux and non-Linux systems.
http://icl.cs.utk.edu/~mucci/monitor/
Libmonitor is part of HPCToolkit and is supported by the Center for
Scalable Application Development Software (CScADS) and the Performance
Engineering Research Institute (PERI).
http://hpctoolkit.org/
Libmonitor is Copyright (c) 2007-2019, Rice University and is licensed
under a 3-clause BSD license. See the file LICENSE for details.
Send bug reports to: [email protected]
======================
How Libmonitor Works
======================
Libmonitor inserts itself into an application program either
dynamically with LD_PRELOAD or else statically at link time. In the
dynamic case, monitor uses a shared library, libmonitor.so, with
definitions for __libc_start_main, _exit, fork, pthread_create, etc.
It uses LD_PRELOAD to override the application's calls to these
functions and uses dlopen to find the real versions of these
functions. See ld.so(8).
The dynamic method can monitor unmodified binaries, but they must be
dynamically-linked, since LD_PRELOAD does not work for statically
linked binaries. More importantly, libc must pass control from
_start() to main() via __libc_start_main(). GNU libc on Linux uses
__libc_start_main, but BSD and other systems, although they support
LD_PRELOAD, do not use __libc_start_main. So, the dynamic case is
pretty much limited to GNU/Linux systems.
In the static case, monitor uses an library file, libmonitor_wrap.a,
which is linked into the application statically. libmonitor_wrap.a
contains definitions of __wrap_main, __wrap__exit, etc, and is linked
into the application with the --wrap linker option so that the
application's use of main is linked to monitor's definition of
__wrap_main, etc. See ld(1).
The static method not require re-compiling the application program
into .o files, but it does require re-linking the application's object
files with libmonitor_wrap.a. However, the static method can be used
on BSD and other systems that don't use __libc_start_main, and on
systems such as Catamount that don't run dynamically-linked binaries.
In either case, the libmonitor client should write its own versions of
the callback functions, or a subset of them, which libmonitor will
then call.
=========================
How to Build Libmonitor
=========================
Libmonitor uses the autoconf and automake build procedure. From the
top-level directory (the one containing LICENSE and this README, not
the src directory), run:
./configure --prefix=/path/to/install
make
make install
In addition to the standard configure options, libmonitor provides the
following additional options. See ./configure --help.
--enable-debug
When monitor runs and the environment includes the variable
MONITOR_DEBUG, then it writes debugging messages to stderr
indicating when it gains control at the begin and end of
processes, threads, etc. This option has the effect of always
turning on MONITOR_DEBUG. Default=no.
--enable-link-preload
When enabled, monitor builds libmonitor.so and the monitor-run
script for running monitor dynamically. This option requires
dlopen and __libc_start_main and will be automatically
disabled if they are not available. Default=yes.
--enable-link-static
When enabled, monitor builds libmonitor_wrap.a and the
monitor-link script for running monitor statically.
Note: this option is different from --enable-static, a builtin
configure option for building the libmonitor.a library (which
monitor doesn't use). Default=yes.
--enable-dlfcn
--enable-fork
--enable-mpi
--enable-pthreads
--enable-signals
These options include support for monitoring dlopen, fork and
exec, MPI, pthreads and signals. Even when enabled, configure
will search for the relevant libraries and functions and will
disable them if they are not found. Default=yes.
Note: --enable-dlopen is already used by configure for
something else, so we use dlfcn (named after dlfcn.h) to
enable support for dlopen.
Note: the MPI support is now generic and attempts to work with any MPI
library for C, C++ or Fortran programs. Since each MPI library
defines its own MPI_Comm and MPI_COMM_WORLD, monitor waits for the
application program to call MPI_Comm_rank() and uses its comm value to
determine size/rank. The advantage of this approach is that monitor
should work with any MPI library. The disadvantage of this approach
is that monitor depends on the application program calling
MPI_Comm_rank() and won't know the size/rank until that happens.
=======================
How to Run Libmonitor
=======================
Libmonitor provides hooks into an application program, but by itself,
monitor doesn't really do anything. The monitor client needs to
define its own callback functions to do anything useful. Libmonitor
includes default versions of these functions defined as weak symbols,
allowing monitor's client to override them. The file src/callback.c
may be used as a starting point for this.
To run monitor dynamically, compile the client's callback functions
into a .so shared library. Edit the monitor-run script to add the
callback.so file to the list of preloaded files. Then, run the
program with:
monitor-run [-i file.so] program arg ...
The -i option is another way of inserting the client's callback
functions, instead of editing the monitor-run script. The dynamic
method only works with a dynamically-linked ELF program. It also
requires __libc_start_main from libc, and so it is pretty much limited
to Linux systems.
In the static case, compile the callback functions into a .o object
file, and edit the monitor-link script to add the callback.o file to
the list of insert files. Relink the application program with the
monitor-link script, and provide the same command line used to build
the application as arguments to the script, beginning with the
compiler.
monitor-link [-i file.o] compiler file ...
Again, the -i option is another way of adding the client's callback
functions. The arguments to the script should be the final step that
produces a runnable program. The files can be source or object files,
and the resulting program can be fully-static or dynamically-linked,
as long as libmonitor and the callback functions are linked in
statically.
Note: sometimes monitor-link will fail with undefined references to
some of monitor's wrapped symbols. See the "Known Issues" section
below for a workaround.
Note: the callback.c file and the monitor-run and monitor-link scripts
are in the public domain. These files may be useful as a starting
point for writing and linking the client's callback functions.
=========================
Platform-Specific Notes
=========================
Libmonitor has been successfully tested on x86, x86_64 and ia64
platforms, and on Linux and BSD systems. It currently doesn't build
on Solaris due to some header file issues, but that should be fixable.
Although the dynamic method is limited to GNU/Linux systems with
__libc_start_main, the static method should work on most unix systems.
On Catamount, one of our original motivations for rewriting monitor,
compile monitor with gcc (the default) and disable support for dlfcn,
fork and pthreads, since these functions are not available on the
compute nodes.
./configure --prefix=/path/to/install \
--disable-dlfcn \
--disable-fork \
--disable-pthreads
Then, link the application with the cc script to produce a binary that
will run on the compute nodes.
monitor-link cc --target=catamount file.o ...
==============
Known Issues
==============
When linking monitor statically, sometimes there will be undefined
references to some of the wrapped symbols. For example, on Compute
Node Linux:
./monitor-link cc hello.c
/opt/xt-asyncpe/1.0/bin/cc: INFO: linux target is being used
hello.c:
/opt/pgi/7.1.6/linux86-64/7.1-6/lib/libpgc.a(trace.o)(.text+0xfe):
In function `__pgi_abort': undefined reference to `__wrap_system'
The problem is that the PGI compiler is adding libpgc.a (with
undefined references to system) on the link line. In this case,
hello.c does not reference system itself and libpgc.a comes after
libmonitor_wrap.a. Thus, the linker fails to pull in the necessary
objects from libmonitor's archive and reports that __wrap_system is
undefined.
This problem is very difficult for monitor to fix, but there is a work
around. Use the -u option to monitor-link to add an undefined
reference to 'system' early on the link line, thus forcing the linker
to pull in the necessary object.
./monitor-link -u system cc hello.c