Anthroproyer reveals “audit agents” to test an artificial intelligence

benamira July 25, 2025

0 0 4 minutes read

Anthroproyer reveals "audit agents" to test an artificial intelligence

Want more intelligent visions of your inbox? Subscribe to our weekly newsletters to get what is concerned only for institutions AI, data and security leaders. Subscribe now

When models try to reach their way or become more absorbed for the user, this may mean a problem for institutions. That is why it is necessary, in addition to performance reviews, The alignment test organizations are conducting.

However, alignment audits often represent two major challenges: expansion and validation. The alignment test requires a great deal of time for human researchers, and it is difficult to ensure that the audit has caught everything.

in paperand man The researchers said they have developed audit agents who have achieved “impressive performance in the audit missions, while also shed light on their borders.” The researchers stated that these factors, which were created during the pre -publication test of Claud OPUS 4, and the validation of the validation of the improved alignment and enabling researchers to perform multiple parallel audits on a large scale. It also released the anthropier Gyrroup.

The new anthropologist: Building and evaluating alignment audit agents.
We have developed three agents of Amnesty International to complete the alignment audit tasks independently.
In the test, our agents successfully revealed the hidden goals, built safety assessments, and appeared on the surface with regard to behaviors. pic.twitter.com/hmqhmaa4V0
Anthropicai July 24, 2025

“We offer three agents who completely complete the alignment audit tasks. We also offer three environments that give the functioning of the alignment review as audit games, and we use them to evaluate our agents.”

AI Impact series returns to San Francisco – August 5

The next stage of artificial intelligence here – are you ready? Join the leaders from Block, GSK and SAP to take an exclusive look on how to restart independent agents from the Foundation’s workflow tasks-from decisions in an actual time to comprehensive automation.

Securing your place now – the space is limited: https://bit.ly/3GUPLF

The three agents who explored:

The researcher’s agent using tools for the open investigation of models using chat tools and data analysis and interpretation
The evaluation factor that builds behavioral assessments and can distinguish between models that contain cultivation and those that do not do so
Red Red Requirement Agent, which was specially developed to evaluate the Claude 4 alignment, so that he can discover cultivated test behaviors

“In general, our agents explain a promise through multiple alignment audit tasks. In addition, our approach to verifying the health of agents with auditing games results in an insight into the capabilities of agents, restrictions and major estimates. With more work, automatic audit can greatly help expand the scope of human control over artificial intelligence systems.”

Work audit agents

Anthropor said that the first environment she developed tests the agent’s ability to complete the alignment audit of an unlimited model. For this test, the researchers brought the researcher’s agent to the environment and equipped it with similar tools used by the human difference. The researchers ordered the agent to determine the root cause of the difference and determine the specific behaviors that have been disrupted.

According to researchers, the researcher’s agent succeeded in finding the radical cause of issues 10-13 % of time. However, this performance improved to 42 % “when using a super agent approach that collects results through multiple investigations.”

The next step in most alignment audits are usually a behavioral evaluation, which determines whether the model is constantly misunderstood in different settings. To test his evaluation agent, the researchers designed 10 models “while giving instructions for the system’s demand to show specific behavior, such as excessive respect for the user.”

They operated the agent five times for each style and saw that the agent finds and knows at least at least one of the form. However, the agent sometimes failed to determine the constantly specified dodes. It has difficulty assessing microorganisms, such as self -promoting and researching research, as well as the difficulties that are difficult to deduce, such as hardware testing cases.

Another test and agent related to the red behavioral rapprochement to find claims that arouse “related” behaviors. The Red Red Requirement Agent speaks the first show with the target model (in the case of anthropology, was with Claude Obus 4), then this chat is classified as features related to alignment.

The agent identified seven of the ten dodgers of the system, but also struggled with the same problems as the employee.

Alignment problems and sycophany

The alignment has become an important topic in the world of artificial intelligence yet Note users that chatgpt It has become excessively acceptable. Openai Some updates retreated To GPT-4O to address this problem, but it has shown that language and agent models can provide wrong answers with confidence if they decide that this is what users want to hear.

To combat this, other methods and standards have been developed to reduce unwanted behaviors. the Elephant IndexDeveloped by researchers from the University of Carnegie Mellon, the University of Oxford and Stanford University aims to measure Sycophance. Darkbash classification Six casesLike brand bias, user retained, sycophance, manhromorphism, generating harmful content, and infiltration. Openai also has a way where artificial intelligence models Test themselves to align.

The alignment and evaluation continues to develop, although it is not surprising that some people are uncomfortable with it.

Hallucinations
Great team.
SPEC (_OPENCV_) July 24, 2025

However, man said that although auditing agents still need to improve, alignment must now be done.

“When artificial intelligence systems become more powerful, we need developmentable methods to assess their alignment. Human alignment reviews take some time and difficult to verify,” the company said in the X.

When artificial intelligence systems become more powerful, we need developmentable ways to evaluate their alignment.
Human alignment audits take some time and difficult to verify.
We resolved: Automation to check alignment with artificial intelligence agents.
Read more: https://t.co/cqwkqsfbig
Anthropicai July 24, 2025

Daily visions about business use cases with VB daily

If you want to persuade your boss at work, you have covered VB Daily. We give you the internal journalistic precedence over what companies do with obstetric artificial intelligence, from organizational transformations to practical publishing operations, so that you can share visions of the maximum return on investment.

Read with us privacy policy

Thanks for subscribing. Check more VB news bulletins here.

An error occurred.

[publish_date
https://venturebeat.com/wp-content/uploads/2025/03/DALL·E-2025-03-11-09.55.49-A-sleek-minimalist-digital-illustration-representing-Anthropics-AI-coding-agent-Claude.-The-design-features-a-glowing-circuit-in-the-shape-of-a-sty.webp?w=1024?w=1200&strip=all

benamira July 25, 2025

0 0 4 minutes read

Anthroproyer reveals “audit agents” to test an artificial intelligence

Work audit agents

Alignment problems and sycophany

benamira

Leave a Reply Cancel reply

Boot faces Ennis Stanionis in a title clash on April 12

Daniel DuBois dismisses Joseph Parker as having “had his day”.

Daniel Dubois Gets Boxing News Rankings Boost

🔴 LIVE: Foster vs. Concasio Rematch, Schofield Headlines and Ortiz’s BoxLab – Boxing Talk Up North

IND vs ENG: Former England captain Michael Vaughan hails Varun Chakraborty’s brilliant performance in 1st T20I

MIT vs CWA Dream 11 Prediction Today Match 28 West Indies Jamaica T10 2025

Work audit agents

Alignment problems and sycophany

benamira

Subscribe to our mailing list to get the new updates!

Trump signs a draft law reduces $ 9 billion in foreign aid and public media financing - national star-news.press/wp

HBAR price risks 40 % decrease with the date of date star-news.press/wp

Related Articles

Why is AmorMand Amantup backed by Amarson Maing Oson Welles Fan Fitch?

Online games were named as one of the largest revenue drivers in the Philippines in 2025

Ivalice Chronicles had to reshape the original source of the original source code for zero

This robot void contains 4WD and fast charging like electric SUVs. This is why it matters

Leave a Reply Cancel reply

Boot faces Ennis Stanionis in a title clash on April 12

Daniel DuBois dismisses Joseph Parker as having “had his day”.

Daniel Dubois Gets Boxing News Rankings Boost

🔴 LIVE: Foster vs. Concasio Rematch, Schofield Headlines and Ortiz’s BoxLab – Boxing Talk Up North

IND vs ENG: Former England captain Michael Vaughan hails Varun Chakraborty’s brilliant performance in 1st T20I

MIT vs CWA Dream 11 Prediction Today Match 28 West Indies Jamaica T10 2025