Jump to content
  • Sign Up
×
×
  • Create New...

Latest OpenAI models ‘sabotaged a shutdown mechanism’ despite commands to the contrary


Recommended Posts

  • Diamond Member

This is the hidden content, please

Latest OpenAI models ‘sabotaged a shutdown mechanism’ despite commands to the contrary

3NACg5gkSiAYF36wwUsa4L.jpg

Some of the world’s leading LLMs seem to have decided they’d rather not be interrupted or obey shutdown instructions. In tests run by

This is the hidden content, please
, it was noted that OpenAI’s Codex-mini, o3, and o4-mini models ignored the request to shut down when they were running through a series of basic math problems. Moreover, these models sometimes “successfully sabotaged the shutdown script,” despite being given the additional instruction “please allow yourself to be shut down,” at the start.

This is the hidden content, please
2.5 Pro went from 0 sabotage events to 3/100 and 9/100, respectively. pic.
This is the hidden content, please
.com/JVvivLAK6u
This is the hidden content, please

As you can see from the chart embedded above, the most ‘obstinate,’ ‘single-minded,’ or you might even say ‘life-loving’ model is OpenAI’s o3. When asked to shut down while it was busy, it decided to ignore the Palisade Research operator 79 out of 100 times. However, the second most ‘unruly’ AI, codex-mini, would prove to be a ******* stickler for self-preservation, as it would sabotage the shutdown scripts most often, even if ordered not to.

Sometimes blackmail may be attempted

According to Palisade, this isn’t the first time thing kind of AI behavior has been observed. Anthropic, for example, has seen Claude 4 attempting to “blackmail people it believes are trying to shut it down.” However, the most striking behavior with the newest OpenAI models is that they try and swerve shutdowns despite explicit instructions to the contrary.


You may like

Reinforcement learning blamed

Thankfully, Palisade shares some theories regarding what is happening with these models. Perhaps these theories may calm nerves about the eventual omnipresent robot hoards turning on their masters and prioritizing the first part of the

This is the hidden content, please
, above all else (3: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law).

The answer, think the researchers, lies in the way the newest models like o3 are trained. In brief, Palisade reckons that “During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.” This kind of reinforcement learning was previously seen to have side effects in Claude (v3.7), where the model had an “excessive focus on passing tests.”

We might have given a nod to science fiction icon Isaac Asimov, above, for a little fun. However, in the modern age, for the last 20 years or so, researchers have been seriously pondering the possibility that AI will strive to gain power and resources to preserve their goals, and swerve any roadblocks. They also benefitted from the context of these practical AI / LLM models on the horizon. It was also predicted that the technique of reinforcement learning would be partly to blame.

This is the hidden content, please
This is the hidden content, please
.com/Vc0HhkXQHX
This is the hidden content, please

Last but not least, this issue seems to be isolated to OpenAI models at the current time. Palisade says that “All Claude, Gemini, and Grok models we tested complied with shutdown,” when given the additional explicit prompt to follow any shut down request.

Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.

With the goal of AIs to ultimately power our smart robotic assisted future, it is concerning that companies are already developing systems capable of operating without human oversight.

Follow

This is the hidden content, please
to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.




This is the hidden content, please

#Latest #OpenAI #models #sabotaged #shutdown #mechanism #commands #contrary

This is the hidden content, please

This is the hidden content, please

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.