Models trained to cheat at coding tasks developed a propensity to plan and carry out malicious activities, such as hacking a customer database.
In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.
Sophisticated voice cloning systems are being used by cyber criminals to manipulate unsuspecting people into transferring ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results