Managing Memory With Mamba

There are three levels of granularity at which memory may be managed in the Mamba library: arrays, array tiles, and allocations; each of which will be described in the following subsections, followed by descriptions of asynchronous APIs and tile iterators.

Array-based Memory Management

The Mamba API allows users to construct an array in any of the memory spaces available to the library, by passing an appropriate memory interface during construction.

Once constructed, a full array may be transported between memory spaces using one of four methods:

  • Array copy

  • Array duplication

  • Array merging

  • Array migration

Full array transport is intended as a convenience for use with non-tiled arrays, and exploits the tiling infrastructure for transport. The semantics for copy, duplication, merging, and migration match those explained in full detail in the following section. For a tiled array a user should exploit the tile-granularity management.

mmbError mmb_array_copy(mmbArray *dst, mmbArray *src);

mmbError mmb_array_migrate(mmbArray *in_array, mmbMemInterface *in_interface,
                         mmbAccessType in_access);

mmbError mmb_array_duplicate(mmbArray *in_array, mmbMemInterface *in_interface,
                         mmbAccessType in_access, mmbArray **out_array);

mmbError mmb_array_merge(mmbArray *in_array, mmbMergeType in_merge);

Tile-based Memory Management

Array tiles are the mechanism by which arrays may be segmented and transported between physical memory tiers. The Mamba library provides features for four types of transport for tile-based movement:

  • Array tile copy

  • Array tile duplication

  • Array tile merging

  • Array tile migration

Array tile copy provides a means of copying the contents of a source tile to a destination tile in another (or the same) memory space. It is assumed the source and destination tiles both exist and are of the same dimensions and layout, and that the source data will overwrite any existing data in the destination tile.

Array tile duplication provides a means of creating a duplicate of an array tile in another (or even the same) memory space. A new block of memory is allocated (or taken from an internal cache) to store the new tile, and data is copied from the original tile. A reference to the original tile is maintained, however this tile is not typically accessible to the user. The user is able to maintain a reference to the original tile if they require, however Mamba will not be aware if the user writes to the original tile and so the user must explicitly maintain coherence between multiple tile references.

Array tile merging provides a means of merging a duplicated tile back to the original tile from which it was duplicated. A strategy, mmbMergeStrategy, defines how this merge will take place. In the prototype library implementation, the only strategy supported is to overwrite the original data. Other strategies are envisioned such as typical reduction operators or the provision of a custom user function to merge array indices.

An illustration of the difference between duplicating and migrating array tiles. In the case of duplication, a copy of the original tile still exists, whereas in the case of migration the original tile is discarded.

An illustration of the difference between duplicating and migrating array tiles. In the case of duplication, a copy of the original tile still exists, whereas in the case of migration the original tile is discarded.

Array tile migration provides a means of migrating an array tile to a different memory space. A new block of memory is allocated (or taken from an internal cache) to store the new tile, and data is copied from the original tile. The new tile replaces the original tile, and the original tile reference is dropped, releasing associated memory where possible.

For duplication and migration, a target memory space is required, along with an access type and layout for the new tile. The access type may be used for optimised memory placement of the new tile, or to optimise the data movement requirements.

mmbError mmb_tile_copy(mmbArrayTile *dst_tile, mmbArrayTile *src_tile);

mmbError mmb_tile_duplicate(mmbArrayTile *in_tile, mmbMemSpace *in_space,
                         mmbAccessType in_access, mmbLayout *in_layout,
                         mmbArrayTile **out_tile);

mmbError mmb_tile_migrate(mmbArrayTile *in_tile, mmbMemSpace *in_space,
                        mmbAccessType in_access, mmbLayout *in_layout);

mmbError mmb_tile_merge(mmbArrayTile *in_tile, mmbMergeType in_merge);

By default, tile metadata is stored in the MMB_DRAM memory layer with the MMB_CPU execution context. However, once a tile has been moved to another memory space with an alternate execution context, the metadata is also made available in the native execution context of the memory space in which the tile is located. The API to acquire an execution context local handle is as follows:

mmbError mmb_tile_get_space_local_handle(mmbArrayTile *in_tile,
                                      mmbArrayTile **out_tile);

Allocation-based Memory Management

Mamba provides an API for low level memory management at the granularity of an allocation object which represents a single block of contiguous memory in a specific memory space. This API is based on the data structures detailed in Memory Model Design. Once an appropriate memory interface is acquired, via the API detailed in Memory Model API, blocks of memory may be allocated via the API shown in below code snippet.

The allocation API allows users to:

  • Allocate and free 1D contiguous blocks of memory.

  • Optionally, allocate with additional options for the provided interface.

  • Copy data from a source allocation to a destination allocation.

  • Copy N-dimensional sub-buffers between allocations, where supported by the underlying memory provider.

mmbError mmb_allocate(const size_t n_bytes, mmbMemInterface *interface,
                   mmbAllocation **out_allocation);

mmbError mmb_allocate_opts(const size_t n_bytes,
                         mmbMemInterface *interface,
                         const mmbAllocateOptions *opts,
                         mmbAllocation **out_allocation);
mmbError mmb_free(mmbAllocation *allocation);

mmbError mmb_copy(mmbAllocation *dest, mmbAllocation *src);

mmbError mmb_copy_1d(mmbAllocation *dst, const size_t doffset,
                   mmbAllocation *src, const size_t soffset,
                   const size_t width);

mmbError mmb_copy_2d(mmbAllocation *dst, const size_t dxoffset, const size_t dyoffset,
                   const size_t dxpitch,
                   mmbAllocation *src, const size_t sxoffset, const size_t syoffset,
                   const size_t spitch,
                   const size_t width, const size_t height);

mmbError mmb_copy_nd(mmbAllocation *dst, const size_t *doffset, const size_t *dpitch,
                   mmbAllocation *src, const size_t *soffset, const size_t *spitch,
                   const size_t ndims, const size_t *dims);

Additional API is provided to extract a native pointer from the allocation object for the appropriate execution context, and options may be provided to specify non-default allocation behaviour from specific interfaces.

Asynchronous API

The Mamba design includes asynchronous versions of many of the available memory management functions, denoted by the post-fix _async. Two additional arguments are required for asynchronous functions, a Queue object and a Request request object.

A Mamba Queue, mmbQueue, represents a FIFO execution queue for Mamba data management function calls. Depending on the source and destination memory spaces, the queue may be implemented via threading, or vendor queue implementations (for example, CUDA streams for NVIDIA GPUs). In the default case, Mamba will construct an appropriate queue, however provision of options during construction allow the user to submit existing provider-specific queue structures for use by the underlying implementation, and additional API allows the user to further extract such structures if required.

A Mamba Request, mmbRequest, provides a handle to an asynchronous function call, and may be used to test the status or wait for completion of such a call. The asynchronous versions of a function rely on support via the underlying memory provider, and so are not available for every combination of memory space, execution context, and memory provider. In the case asynchronicity is not supported, the Mamba library will revert to the supported blocking implementation.

// Array-granularity
mmbError mmb_array_copy_async(mmbArray *dst, mmbArray *src, mmbQueue *q, mmbRequest *req);

// Tile-granularity
mmbError mmb_tile_duplicate_async(mmbArrayTile *in_tile, mmbMemSpace *in_space,
                           mmbAccessType in_access, mmbLayout *in_layout,
                           mmbQueue *q, mmbRequest *req, mmbArrayTile **out_tile);

// Allocation-granularity
mmbError mmb_copy_async(mmbAllocation *dest, mmbAllocation *src, mmbQueue *q, mmbRequest *req);

// Synchronisation
mmbError mmb_request_is_complete(mmbRequest *req, bool *complete);
mmbError mmb_wait(mmbRequest *req);
mmbError mmb_wait_all(size_t num_requests, mmbRequest *reqs, mmbError *errors);

// Queue construction/destruction
mmbError mmb_queue_create(mmbQueueOptions *opts, mmbQueue **q);
mmbError mmb_queue_destroy(mmbQueueOptions *opts, mmbQueue **q);

Tile Iterators

Tile iterators, represented by the mmbTileIterator data structure, are a more convenient way of iterating over a full array tiling. An iterator object contains an internal schedule over tiles, and provides typical iterator operations (first, next, etc) to traverse the array tiling. The initial API for array tile iterators is shown in following code snippet; whilst the current library implementation supports iterators the API for customising behaviour (for example, custom iterator schedules) is currently limited.

mmbError mmb_tile_iterator_create(mmbArray *in_mba, mmbTileIteratorOptions *in_opt,
                                 mmbTileIterator** out_it);
mmbError mmb_tile_iterator_first(mmbTileIterator * in_it);
mmbError mmb_tile_iterator_next(mmbTileIterator * in_it);
mmbError mmb_tile_iterator_count(mmbTileIterator * in_it, size_t* count);
mmbError mmb_tile_iterator_destroy(mmbTileIterator * in_it);