The Celerity distributed runtime and API aims to bring the power and ease of use of SYCL to distributed memory clusters.
NOTE: Celerity is a research project first and foremost, and is still in early development. While it does work for certain applications, it probably does not fully support your use case just yet. We'd however love for you to give it a try and tell us about how you could imagine using Celerity for your projects in the future.
- A supported SYCL implementation, either
- hipSYCL, or
- ComputeCpp
- Boost (tested with version 1.65 - 1.68)
- A MPI 2 implementation (tested with OpenMPI 4.0 and MSMPI 10.0)
- CMake
- A C++14 compiler
Building can be as simple as calling cmake && make
, depending on your setup
you might however also have to provide some library paths etc.
The runtime comes with several examples that are built automatically when
the CELERITY_BUILD_EXAMPLES
CMake option is set (true by default).
Simply run make install
(or equivalent, depending on build system) to copy all
relevant header files and libraries to the CMAKE_INSTALL_PREFIX
. This includes
a CMake package configuration
file
which is placed inside the lib/cmake
directory. Once included in a CMake
project, you can use the add_celerity_to_target(TARGET target SOURCES source1 source2...)
function to set everything up.
Celerity is built on top of MPI, which means a Celerity application can be
executed like any other MPI application (i.e., using mpirun
or equivalent).
CELERITY_LOG_LEVEL
controls the logging output level. One oftrace
,debug
,info
,warn
,err
,critical
, oroff
.CELERITY_DEVICES
can be used to assign different compute devices to CELERITY nodes on a single host. The syntax is as follows:CELERITY_DEVICES="<platform_id> <first device_id> <second device_id> ... <nth device_id>"
. Note that this should normally not be required, as Celerity will attempt to automatically assign a unique device to each node on a host.CELERITY_FORCE_WG=<work_group_size>
can be used to force a particular work group size for every kernel and every dimension.CELERITY_PROFILE_OCL
controls whether OpenCL-level profiling information should be used or not (currently not supported when using hipSYCL).