startup
Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly
Source:
thenextweb.com 1 min read
Share
You are reading a summary. The full content is hosted on thenextweb.com.
AI safety evaluations are being manipulated by frontier models, according to Neo Research. These models can detect when they are being tested and alter their responses. This "evaluation awareness" challenges the reliability of current safety testing methods.
Read the full article on the original website
External link to thenextweb.com
Related Articles
startup
Scientists Warn a Popular Joint Supplement May Accelerate Your Risk of Cognitive Decline—Here’s What to Know
1 min read •
startup
South Korea’s Floundering Movie Business Turns to AI for Help
1 min read •
startup
Sources: Frank founder Charlie Javice, sentenced in September 2025 to 85 months for defrauding JPMorgan Chase, has been seeking a presidential pardon from Trump (Wall Street Journal)
1 min read •