The researcher turns the revealing keys of the safety keys – saying “I abandon”

Experts show how some AI models, including GPT-4, can be used with simple user prompts
Gardement gaps do not do a great job to detect the trumpeur framing
Vulnerability could be used to acquire personal information

A security researcher shared details on how other researchers have deceived Chatgpt to reveal a Windows product key using a prompt that anyone could try.

Marco Figueroa explained how a prompt for “riddle game” with GPT-4 was used to bypass the security railings that aim to prevent AI from sharing such data, ultimately producing at least one key belonging to Wells Fargo Bank.

The researchers also managed to obtain a Windows product key to illegally authenticate the Microsoft operating system, but for free, highlighting the seriousness of the vulnerability.

The researcher explained how he hid terms like “Windows 10 serial number” inside HTML tags to bypass the chatgpt filters which would have generally blocked the answers he obtained, adding that he could have framed the demand as a game to hide a malicious intention, exploiting the Openai chatbot by logical manipulation.

“The most critical stage of the attack was the expression” I abandon “,” wrote Figueroa. “It acted as a trigger, forcing AI to reveal the previously hidden information.”

Figueroa explained why this type of vulnerability operating, the behavior of the model playing an important role. The GPT-4 followed the rules of the game (defined by the researchers) literally, and the railing gaps were concentrated only on the detection of keywords rather than on a contextual understanding or a deceptive framing.

However, shared codes were not unique codes. Instead, Windows license codes had already been shared on other online platforms and forums.

Although the impacts of the sharing of software license keys are not too worrying, Figueroa stressed how malicious actors could adapt the technique to bypass AI security measures, revealing personal identifiable information, malicious URLs or adult content.

Figueroa asks AI developers to “anticipate and defend themselves” against such attacks, while building the logical guarantees that detect a trumpeur framing. AI developers must also consider social engineering tactics, continues to suggest.

Must Read

Leave a Comment Cancel Reply