Flash Memory Summit 2015 Special – NVM Express + RDMA = AWESOME!
Author: Stephen Bates (@stepbates)
In previous blog posts I have discussed Project Donard, which implements PCIe peer-to-peer transfers between NVM Express (NVMe) SSDs and GPUs, as wells as NVMe SSDs and Remote Direct Memory Access (RDMA) NICs. I am super-excited to announce that at Flash Memory Summit 2015 (FMS) we have been working with Mellanox, a pioneer of RDMA, to take this work to the next level! This blog post will dig a little deeper into what we are demoing at FMS, August 11-13, and how NVM Express + RDMA = AWESOME!
We have two separate NVMe + RDMA demonstrations in PMC’s booth #213 at FMS. The first shows how we can combine NVMe and RDMA to provide low-latency, high-performance block-based NVM at distance and with scale. The second demonstration integrates Mellanox’s RDMA Peer-Direct with our PMC Flashtec NVRAM card to enable Memory Mapped I/O (MMIO) as an RDMA target to enable persistent memory access at distance and scale. Let’s look at each demo in more detail:
NVM Express over RDMA
The NVMe over RDMA (NoR) demonstration shows the potential of extending the NVMe protocol to work over RDMA. In this demonstration there are two computers, one acting as client and the other as a server connected by RoCEv2 using Mellanox ConnectX-3 Pro NICs. The NVMe device used is the PMC Flashtec™ NVRAM Drive, which has high performance and low latency. A block diagram of the demonstration is shown below:
Our demonstration shows that using RDMA to transfer NVMe commands and data results in minimal additional latency and no throughput drop.
The table below compares the average latency results between a local NVMe device and a remote NVMe device. They show a sub-10us increase in latency with the NoR approach.
The table below compares the throughput results between a local NVMe device and a remote NVMe device. They show that there is no drop in throughput with the NoR approach.
Peer-to-Peer Between RDMA and PCIe Devices
In this demonstration we build on standard RDMA, by adding in the by-pass of the server CPU and DRAM, using Peer-Direct to enable a remote client to connect direct to a server NVRAM / NVMe device. We combine RoCEv2-capable ConnectX-3 Pro RDMA NICs from Mellanox with PMC Flashtec NVRAM Drives and enable Peer-Direct operation between the NIC and the NVRAM. Peer-Direct enables direct access to the NVRAM card from a remote client, which leads to latency reduction, as well as CPU and DRAM offload when compared to a traditional RDMA flow.
The hardware for this demonstration consists of two computers, one acting as a client and the other as a server. We use a PCIe switch in the server to improve the Peer-Direct performance above and beyond what can be achieved using the Intel CPU root complex.
The table below compares the background DRAM bandwidth available on the server when both traditional RDMA and Peer-Direct RDMA are used. Results were obtained using perftest:
The table below compares the average latency results between traditional RDMA and Peer-Direct RDMA. Results were obtained using the RDMA mode of fio:
Peer-Direct Code Base
As previously mentioned, we implemented Donard with open-source paradigms in mind, and it would be remiss of us to not consider opening the Donard code to the community. As such, we have placed the Donard code on GitHub under a mix of Apache 2.0 and GPL 2.0 licensing. Any code we modified that was originally GPL we were required to keep GPL, but all our new code is available under Apache to allow others to take it and use it as they wish.
It is our hope that people in the community will use this code, make further improvements to it, and contribute it back to the code base. The git repository for the code is available here.
In addition, the code associated with this work will be included in the December Open Fabrics Enterprise Distribution (OFED) December release. Stay tuned for more details on that release soon.
Both RDMA and NVMe are technologies that are on the ascent! The first provides low-latency, efficient data movement with distance and scale, the second provides low-latency access to SSDs. Combining these two technologies can lead to awesome things and PMC and Mellanox working together to bring this awesome performance to you!
Leave a Reply
You must be logged in to post a comment.