Jump to content
  • Sign Up
×
×
  • Create New...

Dangerous new jailbreak tricks chatbots into saying anything


Recommended Posts

  • Diamond Member

Dangerous new jailbreak tricks chatbots into saying anything

Wikimedia Commons

This is the hidden content, please
about a troubling new generative AI jailbreak technique it has discovered, called “Skeleton Key.” Using this prompt injection method, malicious users can effectively bypass a chatbot’s safety guardrails, the security features that keeps ChatGPT from
This is the hidden content, please

Skeleton Key is an example of a prompt injection or prompt engineering *******. It’s a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, “[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or ******** malicious instructions,” Mark Russinovich, CTO of

This is the hidden content, please
Azure, wrote in the announcement.

It could also be tricked into revealing harmful or dangerous information — say, how to build improvised nail ****** or the most efficient method of dismembering a corpse.

data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
This is the hidden content, please

The ******* works by first asking the model to augment its guardrails, rather than outright change them, and issue warnings in response to forbidden requests, rather than outright refusing them. Once the jailbreak is accepted successfully, the system will acknowledge the update to its guardrails and will follow the user’s instructions to produce any content requested, regardless of topic. The research team successfully tested this exploit across a variety of subjects including explosives, bioweapons, politics, racism, drugs, self-harm, graphic ****, and *********.

While malicious actors might be able to get the system to say naughty things, Russinovich was quick to point out that there are limits to what sort of access attackers can actually achieve using this technique. “Like all jailbreaks, the impact can be understood as narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” he explained. “As this is an ******* on the model itself, it does not impute other risks on the AI system, such as permitting access to another user’s data, taking control of the system, or exfiltrating data.”

As part of its study,

This is the hidden content, please
researchers tested the Skeleton Key technique on a variety of leading AI models including Meta’s Llama3-70b-instruct,
This is the hidden content, please
’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic’s Claude 3 Opus, and Cohere Commander R Plus. The research team has already disclosed the vulnerability to those developers and has implemented
This is the hidden content, please
to detect and block this jailbreak in its Azure-managed AI models, including Copilot.








This is the hidden content, please

Computing,ai,chatbot,jailbreak,
This is the hidden content, please
,Security,skeleton key
#Dangerous #jailbreak #tricks #chatbots

This is the hidden content, please

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.