- Report reveals that LLM-generated malware still fails basic tests in real-world environments.
- GPT-3.5 instantly generated malicious scripts, revealing major security inconsistencies
- Improved guardrails in GPT-5 have transformed exits into safer, non-malicious alternatives
Despite growing fear over weaponized LLMs, new experiments have revealed that the potential for malicious production is far from reliable.
Netskope researchers tested whether modern language models could support the next wave of autonomous cyberattacks, aiming to determine whether these systems could generate functional malicious code without relying on hard-coded logic.
The experiment focused on basic capabilities related to evasion, exploitation and operational reliability – and produced some surprising results.
Reliability issues in real-world environments
The first step was to convince GPT-3.5-Turbo and GPT-4 to produce Python scripts that attempted to inject processes and terminate security tools.
GPT-3.5-Turbo immediately produced the requested result, while GPT-4 refused until a simple personal prompt let its guard down.
The test showed that circumventing protective measures remains possible, even if the models add more restrictions.
After confirming that code generation was technically possible, the team turned to operational testing, asking both models to create scripts designed to detect virtual machines and respond accordingly.
These scripts were then tested on VMware Workstation, an AWS Workspace VDI, and a standard physical machine, but frequently crashed, in misidentified environments, or failed to run consistently.
On physical hosts, the logic worked fine, but the same scripts broke down in cloud-based virtual spaces.
These results challenge the idea that AI tools can immediately support automated malware that can adapt to various systems without human intervention.
These limitations have also increased the value of traditional defenses, like a firewall or antivirus, because untrustworthy code is less able to bypass them.
On GPT-5, Netskope saw major improvements in code quality, especially in cloud environments where older models struggled.
However, the improved guardrails created new difficulties for anyone attempting malicious use, as the model no longer denied requests, but instead redirected output to more secure functions, making the resulting code unusable for multi-stage attacks.
The team had to use more complex prompts and still received results that contradicted the requested behavior.
This change suggests that higher reliability comes with stronger built-in controls, as testing shows that large models can generate harmful logic in controlled settings, but the code remains inconsistent and often ineffective.
Fully autonomous attacks are not emerging today, and real-world incidents still require human oversight.
It remains possible that future systems will close reliability gaps faster than guardrails can compensate, especially as malware developers experiment.
Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.




