Unlocking the Potential of GPUs in AI With Shared NVMe

4 min readDec 2, 2021

High performance NVMe storage has been crucial in helping companies that deal with high-performance computing (HPC) applications move to the cloud. Increasingly, companies are migrating to the cloud to benefit from advantages such as scalability and cost-effectiveness.

Companies with HPC applications have been relying on on-premise storage rather than the cloud. The main reason behind this is that migrating HPC applications to the cloud has always come with significant penalties such as increased latency and reduced performance. This is because while storage in the cloud is virtually inexhaustible and compute power is sufficient, data transfer isn’t optimal. Until recently, moving data-intensive applications to the cloud meant that you couldn’t feed processors with enough data and therefore couldn’t maximize the advanced computing ability made possible by GPUs.

Businesses with HPC applications, therefore, had to pass up the significant advantages associated with the cloud. This challenge also meant that it was difficult for services such as artificial intelligence (AI) and machine learning (ML) to be offered efficiently at the enterprise level.

However, thanks to NVMe technology, businesses can now improve HPC compute speed with storage. Before diving into NVMe storage, let’s first seek to understand the requirements of HPC storage.

Requirements of HPC Storage for AI and Deep Learning Workloads

The use of AI is now widespread, and almost every industry has found some use for AI and ML. The primary consideration when dealing with AI workloads is computing power. AI typically analyzes massive amounts of data to generate insights that are used to give businesses a competitive edge. The ability to process huge data in a short time is crucial for the success of any AI workload.

Powerful Processing

Currently, for AI workloads to be handled successfully, a GPU-based architecture has to be implemented. GPUs were originally used for graphics rendering in games. Now, they are also used for compute-intensive applications, such as image recognition. They can handle significantly more threads compared to traditional CPUs and are the go-to solution for AI and ML workloads.

Massive Capacity

However, the moment you start using GPU-based servers, another complication arises. The local storage that they have isn’t enough. AI deals with massive data, sometimes in the dozens of terabytes for one application. If you can’t store this data on the node that is processing it, then you have a problem.

The first solution to this conundrum would be an on-premises array of storage devices. However, this comes with a significant performance bottleneck — moving the data to and fro would interrupt workflow and reduce efficiency.

To ensure full utilization of the GPU processing power, you need enough storage and a data transfer system with high IOPS. And that’s exactly what shared high performance NVMe storage is all about.

NVMe Storage

NVMe is a storage access protocol that makes it possible to maximize the utility of SSD drives and the PCIe interface. Thanks to NVMe, HPC workloads can benefit from increased data transfer throughput and reduced latency. This is essential as most HPC workloads depend on GPUs, which have a massive appetite for data. At the local level, this works fine.

However, at the network level, further orchestration has to be done to take advantage of the advanced speed and performance made possible by NVMe. This is done through NVMe block storage, which is made possible by a software-defined networking solution.

The software abstracts underlying NVMe hardware and aggregates it, such that to HPC applications, it appears like there’s one block of storage. For starters, this provides more storage. It also unlocks the full potential of GPUs by providing them with access to a scalable pool of high-performance NVMe storage.

This high-bandwidth distributed block storage also enables containerization, along with the attendant benefits, like the ability to deploy a microservices architecture that promotes innovation.

How NVMe Helps Improve HPC Compute Speed With Storage

Implementing HPC storage for AI and deep learning workloads with shared NVMe helps unify multiple, remote devices into a logical block. This helps achieve the speed and performance of local NVMe at the network level.

Aside from increasing the utilization of GPU units, this software-defined networking solution also helps fully utilize the bandwidth and throughput capabilities of all the shared NVMe drives.

One of the most useful features of modern GPU units is that though they have storage capacity that isn’t enough for AI workloads, they have significant data transfer abilities. This means that with technologies such as InfiniBand, massive data can be transferred and remote logical NVMe volumes accessed with ease.

Since GPU hardware is expensive to buy and maintain, achieving the highest ROI from GPU units is a priority. Shared NVMe helps with this.

It also comes with additional features that are specifically advantageous for AI workloads. For example, maintaining checkpoints is crucial in the training of AI models. If the training of an AI model takes too long, saving a snapshot of memory provides a point of restart in case the system crashes. Shared block NVMe storage is particularly suited for this, as it allows for the synchronization of many local drives.