Quickstart

How to build and run

bash
mkdir build;
cd build;
# if loop analysis module is required, run autogen.sh instead of autoreconf -i
autoreconf -i
../configure [--prefix=/path/to/install/dir]                   \
           [--enable-discovery[=yes|no|default]];            \
           [--with-fortran]                                  \
           [--with-fortran-ISO-bindings-includedir=/p/a/t/h] \
           [--enable-embedded]                               \
           [--enable-cuda[=yes|no|<arch>]]                   \
           [--enable-hip-rocm[=yes|no]]                      \
           [--enable-opencl[=yes|no]]                        \
           [--with-opencl=/path/to/opencl/install]           \
           [--enable-pmem[=yes|no]]                          \
           [--with-memkind=/path/to/libmemkind/install]      \
           [--with-numa[=/path/to/libnuma/install]]          \
           [--with-loop-analysis]                            \
           [--with-cost-model[=/path/to/costmodel/install]]  \
           [--with-sicm=/path/to/sicm/install]               \
           [--with-umpire=/path/to/umpire/install]           \
           [--with-jemalloc=/path/to/jemalloc/install]       \
           [--with-jemalloc-prefix=<prefix>]
make;
make check-tests;
make check-examples;
make install; (optional)

Configure

autogen.sh

Only required to use –with-loop-analysis This will get and update mamba loop analysis dependencies as submodules, and is an optional step if you have already recursively cloned the repository using git clone --recursive. In this case, you may use autoreconf -i instead.

–prefix

Set the directory prefix for make install

–enable-discovery

Enable discovery mode, where Mamba will use hwloc to analyse the memory topology and construct a set of appropriate memory spaces during initialisation. This requires hwloc>=2.0 to be installed. default behaviour is to look for a suitable version of hwloc, and enable discovery if found, otherwise disable and issue a warning message at configure time.

–with-fortran

Build the Fortran Mamba library.

–with-fortran-ISO-bindings-includedir

Specify a non-standard path to the location of ISO_Fortran_binding.h to use the C/Fortran ISO bindings (required for the fortran build)

–enable-embedded

Enable the embedded support generating the libtool convenience libraries to easily import the library and its dependencies into your own project.

–enable-cuda

Enable the CUDA support in the memory manager. The configure lists all the pkg-config module files containing the sub-string ‘cuda’ and test each until one provides the support requested.

–enable-hip-rocm

Enable HIP support for AMD devices (via ROCM) in the memory manager. We use hipconfig to determine appropriate CFLAGS, see common issues section for info on passing additional hipcc flags

–enable-opencl

Enable OpenCL support, currently tested on AMD and NVIDIA GPU devices and Xilinx FPGA devices

–with-opencl

Provide a non-standard path to your OpenCL installation

–enable-pmem

Enable persistent memory support, such as Intel Optane non-volatile DIMMs. Requires the memkind library.

–with-memkind

Build with libmemkind support, which allows HBM (e.g. Intel KNL MCDRAM) and persistent memory allocation (e.g. Intel Optane NV-DIMMs). Disabled by default;

–with-numa

Build with libnuma support for numa-aware memory spaces.

–with-loop-analysis

Build with loop analysis features. The loop analysis module depends on external loop analysis libraries; during autogen, the appropriate libraries will be downloaded as git submodules. A dependency on LLVM is also introduced, if you have trouble building the loopanalyzer library, refer to the build instructions in the loopanalyzer repository. If you have previously built without this option you will also need to make clean. To test the support libraries, make check will run tests for all dependencies integrated into the Mamba build system.

–with-cost-model

Build with cost model library support for automatic tile sizing features.

–with-sicm

Experimental external library support. Allows underlying memory allocation using the LANL/SICM memory manager.

–with-umpire

Experimental external library support. Allows underlying memory allocation using the LLNL/Umpire memory manager.

–with-jemalloc and –with-jemalloc-prefix

Allows underlying memory allocation using the jemalloc malloc implementation. The default prefix of the jemalloc functions namespace is je_.

Additional options

To change the compiler used, set CC=..., CXX=... and/or FTN=... during configure.

Cray (CCE)

On a Cray system, it is typical to use the compiler wrappers to manage the compilation environment correctly:

./configure CC=cc CXX=CC FTN=ftn ...

GNU

Add -std=gnu11 to get C11 std with gnu extensions, required for posix pthread lock structures.

./configure CFLAGS="-std=gnu11" ...

Configuration variables

In order to modify the default behavior in order to make it fit better your usage, additional compile-time and run-time variables can be set.

Compile-time

The following variables can be set at compile time (or during the call to configure when provided to CPPFLAGS). In order to set their value, use the format -D<name>=<value>.

  • MMB_LOG_LEVEL: Compile-time max log level cut-off, default MMB_LOG_DEBUG

  • MMB_CONFIG_PROVIDER_DEFAULT: Default memory provider to use to allocate memory when none is requested. Default: MMB_NATIVE.

  • MMB_CONFIG_STRATEGY_DEFAULT: Default memory allocation strategy to use when none is requested. Default: MMB_STRATEGY_NONE.

  • MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT: Default execution context to use when allocating and copying memory to/from GPUs. Default: MMB_GPU_CUDA.

  • MMB_CONFIG_PROVIDER_DEFAULT_ENV_NAME: Environment variable’s name to look for when setting default provider. Default: MMB_CONFIG_PROVIDER_DEFAULT.

  • MMB_CONFIG_STRATEGY_DEFAULT_ENV_NAME: Environment variable’s name to look for when setting default strategy. Default: MMB_CONFIG_STRATEGY_DEFAULT.

  • MMB_CONFIG_INTERFACE_NAME_DEFAULT_ENV_NAME: Environment variable’s name to look for when setting default strategy. Default: MMB_CONFIG_INTERFACE_NAME_DEFAULT.

  • MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT_ENV_NAME: Environment variable’s name to look for when setting default execution context for the GPU. Default: MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT.

The following variables can be set in the environment at compile time to modify compilation behaviour, use format export <name>=<value> or ./configure <name>=<value>

  • MMB_CONFIG_HIPCC_EXTRA_CPPFLAGS: Extra flags to pass to hipcc compiler during compilation of .hip files.

Run-time

The following variables can be set in the environment at run-time in order to modify some of the compile-time defined behaviors. These variable are read only once, during the library initialization.

  • MMB_CONFIG_PROVIDER_DEFAULT

  • MMB_CONFIG_STRATEGY_DEFAULT

  • MMB_CONFIG_INTERFACE_NAME_DEFAULT

  • MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT

The variables defaults to the compile-time values. The name of these of these variable can be changed at compile time by setting MMB_CONFIG_PROVIDER_DEFAULT_ENV_NAME, MMB_CONFIG_STRATEGY_DEFAULT_ENV_NAME, MMB_CONFIG_INTERFACE_NAME_DEFAULT_ENV_NAME and MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT_ENV_NAME respectively. For simplicity, MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT also accepts NONE as a valid choice.

The following variable can modify the log level at run-time, up to the max compile-time cutoff, and overrides API log level setting.

  • MMB_LOG_LEVEL: Run-time log level setting, cannot override max cutoff defined at compile time.

Common Issues

C standard

If you force standard conformance, with e.g. -std=c11, you may also need to pass something like -D_XOPEN_SOURCE=500 to get required POSIX features. Alternatively use -std=gnu11.

HIP ROCM Support

If you see the following error:

.../hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

You may not have got appropriate HIP ARCH definitions during compilation. This can, for example, occur when compiling on a login node without GPUs attached. If you have appropriate environment/module resolution for this, use that, otherwise you can forward extra cpp args to the hipcc compiler during the Mamba build via the following environment variable, which you need to export prior to configuration:

// Valid for AMD mi60, export before configure
export MMB_CONFIG_HIPCC_EXTRA_CPPFLAGS="-D__HIP_ARCH_GFX906__=1 --cuda-gpu-arch=gfx906"

To check, you can run hipcc –cxxflags and check for something like the above. Setting HIPCC_VERBOSE=7 will additionally provide verbose info from the hipcc compiler.

Furthermore, discovery of AMD GPUs via hwloc is currently not able to find the available memory size, and so memory spaces created automatically during discovery will be of unlimited size (i.e. limited by hip runtime, rather than Mamba).

CUDA

If you see the following error:

no kernel image is available for execution on the device.

You may be using the wrong CUDA architecture for the GPU device available on your node. You can change the architecture used by setting it on your configure line with ./configure --enable-cuda=<arch>. The default architecture we are using is sm_60. It this value is too high you may want to try sm_30.

OpenCL/FPGA

The buffer_copy_opencl example (run automatically during make check or make check-examples) will try to build a kernel at run-time; for most FPGA platforms OpenCL does not have access to a compiler, as such this will likely fail. In order to have this example run, you must build a bitstream for your specifc FPGA that matches the example kernel in examples/c/buffer_copy_opencl.c, and export the path to this bitstream via the environment variable MMB_CONFIG_BUFFER_COPY_OPENCL_BINARY before running the example.