Jump to content
  • Sign Up
×
×
  • Create New...

Recommended Posts

  • Diamond Member

Cern: Challenges of GPU datacentre management

Earlier in March, Cern, the ********* organisation for nuclear research, was awarded the Cloud Native Computing Foundation (CNCF) Top End User Award during the KubeCon and CloudNativeCon event in Paris.

Cern has been a major user of Kubernetes, looking into how graphics processing units (GPUs) can be managed effectively in on-premise environments.

GPUs have become the de facto standard for running artificial intelligence (AI) workloads. CNCF used the Paris conference to launch a Cloud Native AI working group. Among the developments that have been taking place in cloud-native computing is that the Kubernetes Scheduler has evolved to integrate and support sharing GPUs.

Commodity hardware and the ever-increasing performance improvements offered by GPUs means people working at the Cern particle accelerator lab are considering the viability of using commodity hardware powered with GPUs to run machine learning. These are capable of replacing the custom hardware used in the accelerator’s detectors.

Addressing delegates at the event, Cern computing engineer Ricardo Rocha said: “I don’t know how many people are running on-premise infrastructure or just relying on external cloud providers, but the first challenge we have is that the pattern of usage of hardware is very different from traditional CPU [central processing unit] workloads.”

In his experience, datacentre power and cooling requirements increase dramatically when using GPUs. In fact, people requesting IT infrastructure to run these new workloads at Cern are also using computing resources that were traditionally associated with HPC, such as the need for fast network interconnects such as Infiniband to connect clusters of GPUs.

Rocha said the opportunity to use GPUs comes at a time when Cern is extending the life of hardware from five to eight years. “People want to have fancy new GPUs, but from our side, they’re extremely expensive,” he said. “We want to make them last longer, while people want to have a much faster turnaround because this is what the public cloud providers are giving them.” This means the IT team at Cern is tasked with offering the best of the internal infrastructure while being able to support more advanced use cases.

During his presentation, Rocha discussed the need to provide a platform to democratise AI and offer researchers the ability to access the GPU resources Cern has available.

He discussed the importance of understanding the different types of GPU workloads and patterns of usage. Some are interactive and typically require lower computational power and GPU usage, while others are much more predictable and run in batch mode. Rocha also said managing these predictable workloads borrows from HPC best practices, such as queueing and scheduling to make best use of the available IT resources.

“When you add GPUs [into the datacentre], the main lesson is to stay as flexible as possible in terms of the infrastructure you can support,” he said.

This means building the ability to run multiple clusters and hybrid workloads. “If you can get hold of GPUs, complement them by bursting into external resources,” said Rocha. “This is really important and is a design decision that has to be made at the start.”



This is the hidden content, please

#Cern #Challenges #GPU #datacentre #management

This is the hidden content, please

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.