Anthropic has a new security system which, according to him, can stop almost all the IA jailbreaks
Anthropic unveils a new measure of security for the proof of concept tested on Claude 3.5 Sonnet The “constitutional classifiers” are an attempt to teach LLMS values systems The tests led to a reduction of more than 80% of the successful jailbreaks In order to fight against abusive natural language invites in AI tools, the […]