Quickstart¶
How to build and run¶
bash
mkdir build;
cd build;
# if loop analysis module is required, run autogen.sh instead of autoreconf -i
autoreconf -i
../configure [--prefix=/path/to/install/dir] \
[--enable-discovery[=yes|no|default]]; \
[--with-fortran] \
[--with-fortran-ISO-bindings-includedir=/p/a/t/h] \
[--enable-embedded] \
[--enable-cuda[=yes|no|<arch>]] \
[--enable-hip-rocm[=yes|no]] \
[--enable-opencl[=yes|no]] \
[--with-opencl=/path/to/opencl/install] \
[--enable-pmem[=yes|no]] \
[--with-memkind=/path/to/libmemkind/install] \
[--with-numa[=/path/to/libnuma/install]] \
[--with-loop-analysis] \
[--with-cost-model[=/path/to/costmodel/install]] \
[--with-sicm=/path/to/sicm/install] \
[--with-umpire=/path/to/umpire/install] \
[--with-jemalloc=/path/to/jemalloc/install] \
[--with-jemalloc-prefix=<prefix>]
make;
make check-tests;
make check-examples;
make install; (optional)
Configure¶
autogen.sh¶
Only required to use –with-loop-analysis
This will get and update mamba loop analysis dependencies as submodules, and is an optional step if you have already recursively cloned the repository using git clone --recursive. In this case, you may use autoreconf -i instead.
–prefix¶
Set the directory prefix for make install
–enable-discovery¶
Enable discovery mode, where Mamba will use hwloc to analyse the memory topology and construct a set of appropriate memory spaces during initialisation. This requires hwloc>=2.0 to be installed. default behaviour is to look for a suitable version of hwloc, and enable discovery if found, otherwise disable and issue a warning message at configure time.
–with-fortran¶
Build the Fortran Mamba library.
–with-fortran-ISO-bindings-includedir¶
Specify a non-standard path to the location of ISO_Fortran_binding.h to use the C/Fortran ISO bindings (required for the fortran build)
–enable-embedded¶
Enable the embedded support generating the libtool convenience libraries to easily import the library and its dependencies into your own project.
–enable-cuda¶
Enable the CUDA support in the memory manager. The configure lists all the pkg-config module files containing the sub-string ‘cuda’ and test each until one provides the support requested.
–enable-hip-rocm¶
Enable HIP support for AMD devices (via ROCM) in the memory manager. We use hipconfig to determine appropriate CFLAGS, see common issues section for info on passing additional hipcc flags
–enable-opencl¶
Enable OpenCL support, currently tested on AMD and NVIDIA GPU devices and Xilinx FPGA devices
–with-opencl¶
Provide a non-standard path to your OpenCL installation
–enable-pmem¶
Enable persistent memory support, such as Intel Optane non-volatile DIMMs. Requires the memkind library.
–with-memkind¶
Build with libmemkind support, which allows HBM (e.g. Intel KNL MCDRAM) and persistent memory allocation (e.g. Intel Optane NV-DIMMs). Disabled by default;
–with-numa¶
Build with libnuma support for numa-aware memory spaces.
–with-loop-analysis¶
Build with loop analysis features. The loop analysis module depends on external loop analysis libraries; during autogen, the appropriate libraries will be downloaded as git submodules. A dependency on LLVM is also introduced, if you have trouble building the loopanalyzer library, refer to the build instructions in the loopanalyzer repository. If you have previously built without this option you will also need to make clean. To test the support libraries, make check will run tests for all dependencies integrated into the Mamba build system.
–with-cost-model¶
Build with cost model library support for automatic tile sizing features.
–with-sicm¶
Experimental external library support. Allows underlying memory allocation using the LANL/SICM memory manager.
–with-umpire¶
Experimental external library support. Allows underlying memory allocation using the LLNL/Umpire memory manager.
–with-jemalloc and –with-jemalloc-prefix¶
Allows underlying memory allocation using the jemalloc malloc implementation.
The default prefix of the jemalloc functions namespace is je_.
Additional options¶
To change the compiler used, set CC=..., CXX=... and/or FTN=... during configure.
Cray (CCE)¶
On a Cray system, it is typical to use the compiler wrappers to manage the compilation environment correctly:
./configure CC=cc CXX=CC FTN=ftn ...
GNU¶
Add -std=gnu11 to get C11 std with gnu extensions, required for posix pthread lock structures.
./configure CFLAGS="-std=gnu11" ...
Configuration variables¶
In order to modify the default behavior in order to make it fit better your usage, additional compile-time and run-time variables can be set.
Compile-time¶
The following variables can be set at compile time (or during the call to configure when provided to CPPFLAGS).
In order to set their value, use the format -D<name>=<value>.
MMB_LOG_LEVEL: Compile-time max log level cut-off, defaultMMB_LOG_DEBUG
MMB_CONFIG_PROVIDER_DEFAULT: Default memory provider to use to allocate memory when none is requested. Default:MMB_NATIVE.
MMB_CONFIG_STRATEGY_DEFAULT: Default memory allocation strategy to use when none is requested. Default:MMB_STRATEGY_NONE.
MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT: Default execution context to use when allocating and copying memory to/from GPUs. Default:MMB_GPU_CUDA.
MMB_CONFIG_PROVIDER_DEFAULT_ENV_NAME: Environment variable’s name to look for when setting default provider. Default:MMB_CONFIG_PROVIDER_DEFAULT.
MMB_CONFIG_STRATEGY_DEFAULT_ENV_NAME: Environment variable’s name to look for when setting default strategy. Default:MMB_CONFIG_STRATEGY_DEFAULT.
MMB_CONFIG_INTERFACE_NAME_DEFAULT_ENV_NAME: Environment variable’s name to look for when setting default strategy. Default:MMB_CONFIG_INTERFACE_NAME_DEFAULT.
MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT_ENV_NAME: Environment variable’s name to look for when setting default execution context for the GPU. Default:MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT.
The following variables can be set in the environment at compile time to modify compilation behaviour, use format export <name>=<value> or ./configure <name>=<value>
MMB_CONFIG_HIPCC_EXTRA_CPPFLAGS: Extra flags to pass to hipcc compiler during compilation of .hip files.
Run-time¶
The following variables can be set in the environment at run-time in order to modify some of the compile-time defined behaviors. These variable are read only once, during the library initialization.
MMB_CONFIG_PROVIDER_DEFAULT
MMB_CONFIG_STRATEGY_DEFAULT
MMB_CONFIG_INTERFACE_NAME_DEFAULT
MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT
The variables defaults to the compile-time values.
The name of these of these variable can be changed at compile time by setting MMB_CONFIG_PROVIDER_DEFAULT_ENV_NAME, MMB_CONFIG_STRATEGY_DEFAULT_ENV_NAME, MMB_CONFIG_INTERFACE_NAME_DEFAULT_ENV_NAME and MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT_ENV_NAME respectively.
For simplicity, MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT also accepts NONE as a valid choice.
The following variable can modify the log level at run-time, up to the max compile-time cutoff, and overrides API log level setting.
MMB_LOG_LEVEL: Run-time log level setting, cannot override max cutoff defined at compile time.
Common Issues¶
C standard¶
If you force standard conformance, with e.g. -std=c11, you may also need to pass something like -D_XOPEN_SOURCE=500 to get required POSIX features.
Alternatively use -std=gnu11.
HIP ROCM Support¶
If you see the following error:
.../hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
You may not have got appropriate HIP ARCH definitions during compilation. This can, for example, occur when compiling on a login node without GPUs attached. If you have appropriate environment/module resolution for this, use that, otherwise you can forward extra cpp args to the hipcc compiler during the Mamba build via the following environment variable, which you need to export prior to configuration:
// Valid for AMD mi60, export before configure
export MMB_CONFIG_HIPCC_EXTRA_CPPFLAGS="-D__HIP_ARCH_GFX906__=1 --cuda-gpu-arch=gfx906"
To check, you can run hipcc –cxxflags and check for something like the above. Setting HIPCC_VERBOSE=7 will additionally provide verbose info from the hipcc compiler.
Furthermore, discovery of AMD GPUs via hwloc is currently not able to find the available memory size, and so memory spaces created automatically during discovery will be of unlimited size (i.e. limited by hip runtime, rather than Mamba).
CUDA¶
If you see the following error:
no kernel image is available for execution on the device.
You may be using the wrong CUDA architecture for the GPU device available on your node.
You can change the architecture used by setting it on your configure line with ./configure --enable-cuda=<arch>.
The default architecture we are using is sm_60.
It this value is too high you may want to try sm_30.
OpenCL/FPGA¶
The buffer_copy_opencl example (run automatically during make check or make check-examples) will try to build a kernel at run-time; for most FPGA platforms OpenCL does not have access to a compiler, as such this will likely fail. In order to have this example run, you must build a bitstream for your specifc FPGA that matches the example kernel in examples/c/buffer_copy_opencl.c, and export the path to this bitstream via the environment variable MMB_CONFIG_BUFFER_COPY_OPENCL_BINARY before running the example.