Startup launches “Corsair” AI platform with Digital In-Memory Computing, using on-chip SRAM memory that can produce 30,000 tokens/second at 2 ms/token latency for Llama3 70B in a single rack.
Karl Freund
Contributor
Founder and Principal Analyst, Cambrian-AI Research LLC
Forbes - November 19, 2024
Using Generative AI, called inference processing, is a memory-intensive operation. It takes a lot of memory, and it takes very fast memory, but it is really hard to have both. SRAM memory that is on the same chip as the processor is an order of magnitude faster than High-Bandwidth Memory which is stacked on the GPU or accelerator. But SRAM is much smaller, making it hard to process models that can have hundreds of billions of parameters. d-Matrix has a unique solution that could win this tug-of-war.
Who and what is d-Matrix?
d-Matrix was founded by Sid Sheth and Sudeep Bhoja in 2019 and is headquartered in Santa Clara, CA. The company has raised $154M and is backed by over 25 companies. Singapore’s Temasek was the lead investor in the latest B-round of funding. Microsoft’s venture fund, M12 is also an investor.
d-Matrix uses a hybrid approach to memory that appears to deliver excellent results, using SRAM as “Performance Memory” and a larger DRAM store for “Capacity Memory”. Use the Performance Memory for on-line operations that require low-latency for interactivity, and use the Capacity Memory for off-line work.
The other way to access additional Performance memory capacity is through scaling, clustering many Corsairs together. Scaling is achieved using a chiplet-based architecture with DMX Link for high-speed energy-efficient die-to-die connectivity and DMX Bridge for card-to-card connectivity.
d-Matrix is also among the first in the industry to natively support block floating point numerical formats, now an OCP standard called Micro-scaling (MX) formats for greater inference efficiency. The tiered memory innovations are integrated by d-Matrix's Aviator software stack that gives AI developers a familiar user experience and tooling.
“We saw transformers and generative AI coming and founded d-Matrix to address inference challenges around the largest computing opportunity of our time,” said Sid Sheth, cofounder and CEO of d-Matrix. “The first of it’s kind Corsair compute platform brings blazing fast token generation for high interactivity applications, with an emphasis on making Gen AI commercially viable.”
Performance Memory for Fast Interactivity
d-Matrix's novel DIMC architecture breaks the memory barrier by tightly integrating compute and memory. The integrated Performance Memory of the on-chip memory-compute complex enables fast token generation with its ultra-high bandwidth of 150 TB/s, an order of magnitude higher than HBM-3e available today. In Performance Mode, Gen AI models fit in Performance Memory and can achieve up to 10x faster interactive latency compared to alternatives using HBM.
Capacity Memory for Offline Batched Inference
Corsair also comes with up to 256 GB of off-chip Capacity Memory (DRAM) that enables Gen AI workloads in offline use cases where low-latency is not required. In Capacity Mode, Corsair supports large models, large context lengths and large batch sizes. For example, a server with 8 Corsair cards can fit models greater than 1 trillion parameters.
What’s Next for d-Matrix?
Corsair is sampling to early-access customers and will be broadly available in Q2’2025. d-Matrix is collaborating with several OEMs and System Integrators to bring Corsair based solutions to the market.
“We are excited to collaborate with d-Matrix on their Corsair ultra-high bandwidth in-memory compute solution, which is purpose-built for generative AI, and accelerate the adoption of sustainable AI computing,” said Vik Malyala, Senior Vice President for Technology and AI, Supermicro. “Our high-performance end-to-end liquid- and air- cooled systems incorporating Corsair are ideal for next-level AI compute.”
Our view is that d-Matrix has a shot at breaking into the highly competitive inference market. M12’s investment is a good sign of the company’s viability, and we look forward to more performance measurements in early 2025.
Commenti