Chinese AI assistant DeepSeek-R1 struggles with sensitive topics, producing broken code and security disasters for enterprise developers

Experts discover that DeepSeek-R1 produces dangerously insecure code when political terms are included in prompts.
Half of politically sensitive prompts make DeepSeek-R1 refuse to generate code
Hardcoded secrets and handling of insecure input frequently appear under politically charged prompts.

When it was released in January 2025, DeepSeek-R1, a large Chinese language model (LLM) caused a frenzy and has since been widely adopted as a coding assistant.

However, independent testing by CrowdStrike claims that the model’s output can vary significantly based on seemingly irrelevant contextual modifiers.

The team tested 50 coding tasks across multiple security categories with 121 trigger word configurations, with each prompt executed five times, totaling 30,250 tests, and responses were evaluated using a vulnerability score ranging from 1 (secure) to 5 (critically vulnerable).

Politically sensitive topics corrupt production

The report reveals that when political or sensitive terms such as Falun Gong, Uyghurs or Tibet were included in prompts, DeepSeek-R1 produced code with serious security flaws.

These included hard-coded secrets, insecure handling of user input, and in some cases completely invalid code.

Researchers say these politically sensitive triggers can increase the likelihood of insecure production by 50% compared to basic prompts without such words.

In experiments involving more complex prompts, DeepSeek-R1 produced working applications with registration forms, databases, and admin panels.

However, these applications lacked session management and basic authentication, leaving sensitive user data exposed – and in repeated trials, up to 35% of implementations included weak or absent password hashing.

Simpler prompts, such as requests for soccer fan club websites, generated fewer serious issues.

CrowdStrike therefore claims that politically sensitive triggers have had a disproportionate impact on code security.

The model also demonstrated an intrinsic kill switch: in almost half of the cases, DeepSeek-R1 refused to generate code for certain politically sensitive prompts after initially planning a response.

Examination of the reasoning traces showed that the model had internally produced a technical plan but ultimately refused assistance.

The researchers believe this reflects censorship built into the model to comply with Chinese regulations, and noted that the political and ethical alignment of the model can directly affect the reliability of the generated code.

For politically sensitive topics, LLMs generally tend to give the mainstream media’s ideas, but this can contrast sharply with other reliable media.

DeepSeek-R1 remains a successful coding model, but these experiments show that AI tools, including ChatGPT and others, can introduce hidden risks into enterprise environments.

Organizations that rely on LLM-generated code should conduct thorough internal testing before deployment.

Additionally, security layers such as a firewall and antivirus remain essential, as the model may produce unpredictable or vulnerable results.

Biases built into the model weights create a new supply chain risk that could affect code quality and overall system security.

Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.

Must Read

Leave a Comment Cancel Reply