Diamond Member ThaHaka 0 Posted October 23 Diamond Member Share Posted October 23 This is the hidden content, please Sign In or Sign Up Cybersecurity researchers have shed light on a new adversarial technique that could be used to jailbreak large language models (LLMs) during the course of an interactive conversation by sneaking in an undesirable instruction between benign ones. The approach has been codenamed Deceptive Delight by Palo Alto Networks Unit 42, which described it as both simple and effective, achieving an average This is the hidden content, please Sign In or Sign Up Link to comment https://hopzone.eu/forums/topic/152589-h4ckn3wsresearchers-reveal-deceptive-delight-method-to-jailbreak-ai-models/ Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now