Jump to content
  • Sign Up
×
×
  • Create New...

Pliops expands AI’s context windows with 3D NAND-based accelerator – can accelerate certain inference workflows by up to eight times


Recommended Posts

  • Diamond Member

This is the hidden content, please

Pliops expands AI’s context windows with 3D NAND-based accelerator – can accelerate certain inference workflows by up to eight times

As language models grow in complexity and their context windows expand, GPU-attached high bandwidth memory (HBM) becomes a bottleneck, forcing systems to repeatedly recalculate data that no longer fits in onboard HBM. Pliops has addressed this challenge with its XDP LightningAI device and FusIOnX software, which store precomputed context on fast SSDs and retrieve it instantly when needed, reports

This is the hidden content, please
. The company says that its solution enables ‘nearly’ HBM speeds and can accelerate certain inference workflows by up to eight times.

During inference, language models generate and reference key-value data to manage context and maintain coherence across long sequences. Normally, this information is stored in the GPU’s onboard memory, but when the active context becomes too large, older entries are discarded, forcing the system to redo calculations if those entries are needed again, which increases latency and GPU load. To eliminate these redundant operations, Pliops has introduced a new memory tier that is enabled by its XDP LightningAI machine, a PCIe device that manages the movement of key-value data between GPUs and tens of high-performance SSDs.

Pliops

(Image credit: Pliops)

The card uses a custom-designed XDP ASIC and the FusIOnX software stack to handle read/write operations efficiently and integrates with AI serving frameworks like vLLM and Nvidia Dynamo. The card is GPU agnostic and can support both standalone and multi-GPU server setups. In multi-node deployments, it also handles routing and sharing of cached data across different inference jobs or users, enabling persistent context reuse at scale.


You may like

This architecture allows AI inference systems to support longer contexts, higher concurrency, and more efficient resource utilization without scaling GPU hardware. Instead of expanding HBM memory through additional GPUs (keep in mind that the maximum scale-up world size, or the number of GPUs directly connected to each other, is limited), Pliops enables systems to retain more context history at a lower cost, with nearly the same performance, according to the company. As a result, it becomes possible to serve large models with stable latency, even under demanding conditions, while reducing the total cost of ownership for AI infrastructure.

Pliops

(Image credit: Pliops)

Although on paper, even 24 high-performance PCIe 5.0 SSDs provide 336 GB/s of bandwidth, significantly less memory bandwidth compared to H100’s 3.35 TB/s, the lack of necessity to repeatedly recalculate data provides significant performance enhancements compared to systems without an XDP LightningAI device and FusIOnX software.

According to Pliops, its solution boosts the throughput of a typical vLLM deployment by 2.5 to eight times, allowing the system to handle more user queries per second without increasing GPU hardware requirements.

Follow

This is the hidden content, please
to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.



This is the hidden content, please

#Pliops #expands #AIs #context #windows #NANDbased #accelerator #accelerate #inference #workflows #times

This is the hidden content, please

This is the hidden content, please

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.