Pliops expands AI’s context windows with 3D NAND-based accelerator – can accelerate certain inference workflows by up to eight times

Pelican Press · May 16, 2025

This is the hidden content, please

Pliops expands AI’s context windows with 3D NAND-based accelerator – can accelerate certain inference workflows by up to eight times

As language models grow in complexity and their context windows expand, GPU-attached high bandwidth memory (HBM) becomes a bottleneck, forcing systems to repeatedly recalculate data that no longer fits in onboard HBM. Pliops has addressed this challenge with its XDP LightningAI device and FusIOnX software, which store precomputed context on fast SSDs and retrieve it instantly when needed, reports

This is the hidden content, please

. The company says that its solution enables ‘nearly’ HBM speeds and can accelerate certain inference workflows by up to eight times.

During inference, language models generate and reference key-value data to manage context and maintain coherence across long sequences. Normally, this information is stored in the GPU’s onboard memory, but when the active context becomes too large, older entries are discarded, forcing the system to redo calculations if those entries are needed again, which increases latency and GPU load. To eliminate these redundant operations, Pliops has introduced a new memory tier that is enabled by its XDP LightningAI machine, a PCIe device that manages the movement of key-value data between GPUs and tens of high-performance SSDs.

(Image credit: Pliops)

The card uses a custom-designed XDP ASIC and the FusIOnX software stack to handle read/write operations efficiently and integrates with AI serving frameworks like vLLM and Nvidia Dynamo. The card is GPU agnostic and can support both standalone and multi-GPU server setups. In multi-node deployments, it also handles routing and sharing of cached data across different inference jobs or users, enabling persistent context reuse at scale.

You may like

This architecture allows AI inference systems to support longer contexts, higher concurrency, and more efficient resource utilization without scaling GPU hardware. Instead of expanding HBM memory through additional GPUs (keep in mind that the maximum scale-up world size, or the number of GPUs directly connected to each other, is limited), Pliops enables systems to retain more context history at a lower cost, with nearly the same performance, according to the company. As a result, it becomes possible to serve large models with stable latency, even under demanding conditions, while reducing the total cost of ownership for AI infrastructure.

(Image credit: Pliops)

Although on paper, even 24 high-performance PCIe 5.0 SSDs provide 336 GB/s of bandwidth, significantly less memory bandwidth compared to H100’s 3.35 TB/s, the lack of necessity to repeatedly recalculate data provides significant performance enhancements compared to systems without an XDP LightningAI device and FusIOnX software.

According to Pliops, its solution boosts the throughput of a typical vLLM deployment by 2.5 to eight times, allowing the system to handle more user queries per second without increasing GPU hardware requirements.

Follow

This is the hidden content, please

to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.

This is the hidden content, please

#Pliops #expands #AIs #context #windows #NANDbased #accelerator #accelerate #inference #workflows #times

This is the hidden content, please

Sign In

Home

Activity

Store

My Details

Forums

All Servers

Pliops expands AI’s context windows with 3D NAND-based accelerator – can accelerate certain inference workflows by up to eight times

Recommended Posts

Pelican Press 0

Trader Feedback

Pliops expands AI’s context windows with 3D NAND-based accelerator – can accelerate certain inference workflows by up to eight times

Link to comment

Share on other sites

Join the conversation

Most Contributions

Vote for the server

Recently Browsing 0 members

Important Information