What is difference between Application Buffer and System Buffer

In the context of MPI, application buffer (often called user buffer) is the buffer that holds information to be sent or the place where information is to be received. Applications buffers are what one passes to MPI communication calls, e.g.

MPI_Send(buf, len, type, ...);
//       ^^^
//    app. buffer

Once MPI_Send is called, a message is constructed and depending on various criteria is either sent via the interconnect, which could be any kind of connecting mechanism, for example InfiniBand, Internet sockets, shared memory, etc. and the actual transmission might involve many intermediate steps, or buffered internally for later delivery. Internal buffers (also system buffers) are part of and managed by the MPI runtime system and are invisible to the application code. It is not necessarily the case that system buffers are allocated in the kernel or somewhere else outside the application space. On the contrary, with many MPI implementations and interconnects those buffers are allocated in the program address space and count towards the program memory usage.

It is also possible to utilise explicitly allocated intermediate buffers with the MPI_Bsend call or its non-blocking variant MPI_Ibsend. It requires that the user first allocate a buffer and then give it to the MPI runtime by calling MPI_Buffer_attach. From that moment on, the content of this buffer is solely managed by the MPI runtime system.

The distinction between application and system buffers is important for the concept of operation completion. MPI operations are considered complete then, when MPI no longer needs access to the application buffer. For example:

buf[] = some content;
MPI_Send(buf, len, ...);
// once MPI_Send returns, the buffer can be reused
buf[0] = 1;
MPI_Send(buf, 1, ...);

With non-blocking calls the operation continues in the background and one has to be careful not to modify the application buffer before the asynchronous operation has completed:

MPI_Request req;
buf[] = some content;
MPI_Isend(buf, len, ...,  &req);
buf[0] = 1;                  // DATA RACE: buf might still be in use by
MPI_Send(buf, 1, ...);       // the operation initiated by MPI_Isend

A correct use of buf in that case would be something like:

MPI_Request req;
buf[] = some content;
MPI_Isend(buf, len, ...,  &req);
// Do something that does not involve changing buf
// ...
// ...
// Make sure the operation is complete before continuing
MPI_Wait(&req, MPI_STATUS_IGNORE);
// buf is now free for reuse
buf[0] = 1;
MPI_Send(buf, 1, ...);

Leave a Comment