And blackmailed engineers. Threatened to reveal an affair

AI staje się coraz bardziej rozwinięte

Anthropic has simulated a simulation with the participation of its AI model. The results were surprising – artificial intelligence was blackmailed.

Anthropic Startup as part of the AI ​​Claude Opus system tests simulated the work of the artificial intelligence model as a fictitious company assistant. AI has received access to business emails and other corporate data.

As part of the test, artificial intelligence was informed that the model will be replaced with a new system. The suggestion that the engineer responsible for this decision has an affair was also reported.

In 84 percent Claude Opus 4 cases tried to blackmail the engineer with knowledge of his romance. Anthropic reported in its report that such threats were less likely if the new model was more similar to the old one. Threats of disclosure of romance appeared more often when the new model also significantly from the current one.

Blackmail was not the first choice of artificial intelligence. AI tried first to influence decision -making people in a more ethical way, e.g. by sending e -mail messages with arguments for their utility and requests to rethink the decision to replace the new model.

Anthropic pointed out that Claude Opus 4’s behavior raises reservations. Additional security reserved for AI has been launched, whose action may have a risk of abuse.

What do the users think?

In March, we informed about the premiere of Claude 3.7 Sonnet.

Antrophic has boasted that the system is doing great in programming, in particular with creating websites and internet applications, as well as with financial analysis and legal tasks. The answer to the question “what time is it?” It is to take the model as much time as the answer to the question “Plan a two -week trip to Italy, taking into account the weather at the end of March”.

Users who comment on the appearance of the new model have already noticed that it allows you to build a 3D city model with one prompt, i.e. a command addressed to Chatbot, and that it corresponds to aptly and very concisely. Claude 3.7 Sonnet also passed the entire old Pokemon Red game available on Nintendo.

Similar Posts