Data centers have the formidable task of improving operating efficiency and maximizing their IT investments in hardware infrastructure in the face of evolving and varied application requirements. Last year alone, data centers worldwide spent well over a $140B on server and storage infrastructure, yet still do not operate at peak efficiency.
New architectures are needed to better utilize and optimize hardware assets spanning compute, memory and storage. Enabling resource agility where physical compute, memory, and storage resources are treated as composable building blocks is a key to unlocking efficiencies and eliminating stranded and underutilized assets.
In this excerpt from my FMS 2019 Keynote speech, we will explore the inefficiency of today’s data center platform and learn how Microchip innovations builds an agile infrastructure of compute, memory and storage, and highlight technology areas that both the industry and vendors like us are enabling to meet the needs of a composable platform.
Challenges in Computing
The data center accelerator market alone in 2018 was $2.8B, and is expected to grow at a cumulative annual growth rate of over 50% for the next 5 years, to reach over $20B in 2023. That is a lot of new money going into accelerators, FPGAs, GPUs and the like.
But, one of the problems with accelerator resources is that they are not being efficiently utilized. Why do we say that? Well, first, traditional architectures result in GPU IO bandwidth being constrained, effectively starving the GPU of data required for processing.
Second, each workload has unique optimization points for compute, for memory, and for storage resources. However, today’s data center operations are largely dealing with fixed configuration systems that cannot easily adapt to workload needs as they change.
Challenges in Memory
Over $20B of DRAM was purchased and deployed in data centers last year – but, much of that DRAM is not being used. In many data centers, the average DRAM utilization sits around 50%. How many billions of dollars of stranded DRAM do we have sitting in data centers around the world right now? And this isn’t only a CAPEX problem – DRAM consumes 15-20% of the data center’s power. The problem of stranded memory is going to become an even bigger concern in the future. We need to start efficiently using those resources.
A second significant challenge with memory, is that memory bandwidth has not been scaling with compute core counts. Core counts have been increasing, but available memory channels attached to the CPU, have not been scaling at the same rate due to physical packaging and pinout constraints. The result of this is that IO bandwidth is decreasing and memory latency is increasing, on a per core basis.
Challenges in Storage
Enterprise NVMe SSD revenue topped $15B last year, and that’s in addition to the $10B or so that are being spent on spinning media and packed into data centers around the world.
So, what are some challenges with all this storage? Traditional architectures require that storage be over provisioned, resulting in stranded storage in a typical server. And, even though we have very high performance NVMe SSDs, machine learning applications require even more bandwidth than a single, NVMe SSD can deliver. We also live in a world of multiple, different media types and interfaces. We have SSDs and we have HDDs. In the world of an HDD, it’s not just an HDD – is it single -ported? Is it multi-ported? Is it single-actuator? Multi-actuator? Is it a SATA device? Is it a SAS device? In the world of SSDs – is it an NVMe interface? Is it a SAS interface? Is it a SATA interface? There are a multitude of different interfaces to deal with.
Inside the SSD, the pace of innovation also continues unabated. NAND is moving from MLC to TLC to QLC. NAND layer counts are ever increasing, from 96 layers to 128 to 176 layers. And latencies continue to be driven down to previously unheard-of levels. New drive form factors are being designed for capacity per unit of real estate and allow for thermals that provide for more capacity. How do we avoid isolating that capacity such that workloads that don’t require this capacity are not wasting it or stranding it, due to fixed hardware configurations in the data center?
There Must be a Better Way!
At Microchip, we strongly believe that there is a better way. We believe in building flexible solution building blocks. We are creating building blocks that can adapt to new use cases and new requirements. and enable system level composability. With composable and flexible infrastructure, or what we like to call agile infrastructure, tremendous strides in efficiencies are possible.
Enabling resource agility where physical compute, storage and memory resources are treated as composable building blocks is the key to unlocking efficiencies and eliminating stranded or underutilized assets. Composable storage, compute and memory enables you to optimize resources by workload and to reduce or eliminate resource stranding. We can remove bandwidth bottlenecks, remove memory bottlenecks, remove storage bottlenecks and remove compute IO bottlenecks. The agile data center needs adaptable building block silicon platforms that can enable you to cost-effectively manage emerging memory and storage technologies, enabling your infrastructure use cases to continue to evolve after the hardware is built.
Improving GPU Utilization
Microchip’s Switchtec PAX Advanced Fabric solution enables composable heterogenous compute architectures. This includes a scalable non-hierarchical fabric where the fabric creates virtual domains which are dynamically reconfigurable, enabling resources to be allocated on demand with low latency data movement as all data transfers through the fabric are managed by hardware. This solution does not require any special driver requirements on the host, enabling rapid time to market and reduced R&D effort for the system integrators.
How does it work? It’s important to realize that a Switchtec fabric is not just a collection of PCIe switches. It is a collection of fabric elements that use virtual domains to connect route complexes or CPUs, to endpoints like GPUs or storage. As mentioned earlier, heterogeneous compute is becoming more prevalent in the data center. GPUs and accelerators are widely used in a variety of applications. Each application and workload may require a unique ratio of compute to accelerator resources. With native support for PCIe Gen 4 on both CPUs as well as GPUs, a PCIe Gen 4 fabric is a natural choice to allow for composable heterogeneous compute in artificial intelligence and machine learning applications.
How do we get there? We start with a programmable, enterprise quality, low latency PCIe Gen 4 switch and we add turnkey advanced fabric firmware to create a scalable and configurable low latency PCIe gen 4 fabric. The PCIe fabric can scale multiple switches and endpoints, and hosts are kept in separate virtual domains. In the example below, we see how Host 1 is assigned to 4 GPUs marked in orange, even though the 4th GPU is physically attached to a different switch within the fabric. These virtual domains are created by the flexible and configurable embedded control plane in each fabric element. The virtual domain is in effect a PCIe compliant virtual switch, and here you see an example of the host in orange that has visibility to that 4th GPU. Although the flexibility is enabled through firmware provided by Microchip as a turnkey solution, the data is routed in hardware to ensure the lowest latency.
Furthermore, this architecture allows for direct peer-to-peer data movement inside the fabric. Why is peer-to-peer data movement through the PCIe fabric important or useful? Peer-to-peer data movement delivers increased performance and reduces latency. In the example below, we can deliver 2.5X the bandwidth by bypassing the CPU to CPU interconnect in a two- socket system. You can see that the GPUs, in this case, can deliver 26 Gbps when doing a peer-to-peer transfer, rather than funneling the traffic through the CPU subsystem. A significant performance improvement here due to the direct peer-to-peer transfers.
This model of composable GPUs is readily extensible to NVMe storage through the addition of NVMe SSDs into the same fabric architecture. The NVMe endpoints can simply be added to the fabric, just like a spec compliant GPU can be, and this allows for dynamic assignment or reassignment of SSDs to different hosts as required, resulting in storage becoming a flexible and adaptable resource.
We have talked about allocating entire SSDs and entire GPUs to hosts, as required. What if an individual resource itself is very large, and we wish to partition and share such resources? Such an example would be a high capacity SSD that we want to share across multiple CPUs to avoid stranding the storage. SR-IOV and multi-host sharing allow for just that type of flexibility. Microchip’s Switchtec PCIe expanders, as well as our Flashtech NVMe SSD controllers, enable end-to-end multi-host IO virtualization with standard off-the-shelf drivers.
SR-IOV is a reality today. There are over eight vendors that have announced SR-IOV capable NVMe SSDs and we have the flexible infrastructure to support this type of architecture. Notably the applications for PCIe fabrics extend beyond the data center. In the autonomous car, you can have many sensors and control units that continually need to make inference decisions while driving to store data for future training. This can be done most effectively through having a low latency fabric with access to shared resources like an SR-IOV capable SSD.
We’ve discussed improving GPU and storage utilization and removing storage bandwidth bottlenecks with PCIe fabric solutions, like the Switchtec PCIe fabric. But, true agility requires composability, as well as, flexibility.
Improving Storage Utilization
Flexibility, in the case of storage, can be achieved in many different ways. Microchip believes in bringing enabling technology to market that will allow for maximum reuse, whether it be software or hardware qualification efforts as you move from one class of storage media to another. From a protocol standpoint, our tri-mode IP and our Smart Storage series of storage controllers enable a platform that will allow for enterprise class, high performance and secure NVMe storage, SAS storage, SATA storage or some combination of all three.
From a Flash media standpoint, our Flash channel engines in our Flashtec NVMe SSD controllers provide future-proof, programmable architecture with advanced LDPC ECC, including hard and soft decoding, enabling NVMe SSD for more investments to leverage multiple generations of NAND without sacrificing quality of service.
Improving Memory Utilization
Memory innovation is happening along two vectors, near and far. Near memory innovation is about getting more bandwidth to the CPU to feed those increasing core counts within the CPU. Far memory innovation is about effectively pooling, and then sharing memory, making it accessible to more machines within the rack. Microchip has been collaborating with industry partners on a number of new serial load/store standards that are solving this problem, such as CXL, Gen Z, as well as, OpenCAPI.
At FMS, we announced our first product in this area, an open memory interface to DDR4 smart memory controller.
The SMC 1000 8x25G memory controller provides low latency connectivity to DDR4 via 8 lanes of 25G serial OMI open memory interface, enabling the memory bandwidth required for AI and machine learning applications.
This type of solution provides:
1. Increased memory bandwidth. We’ve reduced a 288-pin DDR4 interface down to an 84-pin OMI interface, effectively enabling a quadrupling of the memory bandwidth to the CPU.
2. It enables media independence. By moving the controller outside of the CPU, we have enabled the memory technology to evolve independently of the CPU.
3. Total lower cost of solution. There is a reduced silicon, IP, and packaging cost for CPUs and SoCs. DDIMMs leveraging the SMC 1000 are available from some of Microchip’s partners, namely, Micron, Samsung, and Smart Modular.
We were honored our SMC 1000 8x25G Smart Memory Controller received an FMS Best of Show Award.
I hope you had a chance to visit FMS this year and view some of the really cool demos of the products that I’ve mentioned here. If you did not have a chance to attend, please visit us at www.microsemi.com to learn more about our Gen 4 products and see some of the demos on the website.
In summary, at Microchip, we believe that flexible and composable infrastructure is the future of the data center. Microchip is innovating in the areas of storage, memory, as well as, compute interconnect, enabling system builders and data center operators to drive efficiencies and to adapt to evolving use cases.
Andrew Dieckmann is the Vice president of Marketing Application Engineering for the Data Center Solutions Division at Microchip Technology, responsible for product management, product marketing, product strategy and the application engineering team supporting Microchip’s broad portfolio of storage solutions, including SSD controllers, RAID solutions, HBAs, PCIe switches, and SAS expanders.