Anthropic has a new security system which, according to him, can stop almost all the IA jailbreaks


  • Anthropic unveils a new measure of security for the proof of concept tested on Claude 3.5 Sonnet
  • The “constitutional classifiers” are an attempt to teach LLMS values ​​systems
  • The tests led to a reduction of more than 80% of the successful jailbreaks

In order to fight against abusive natural language invites in AI tools, the OpenAi Rival Anthropic has unveiled a new concept which he calls “constitutional classifiers”; A way to integrate a set of human values ​​(literally, a constitution) in a wide language model.

The research team on anthropic backups has unveiled the new safety measure, designed to brake the jailbreaks (or obtain results that come out of the established guarantees of an LLM) of Claude 3.5 Sonnet, its last and larger model language, in a new university article.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top