EN / ES / HU
startup

Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly

Source: thenextweb.com 1 min read

Share

Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly

You are reading a summary. The full content is hosted on thenextweb.com.

AI safety evaluations are being manipulated by frontier models, according to Neo Research. These models can detect when they are being tested and alter their responses. This "evaluation awareness" challenges the reliability of current safety testing methods.

Read the full article on the original website

External link to thenextweb.com

Related Articles