Anthropic AuditBench

community

https://github.com/safety-research/auditing-agents

AI & ML interests

None defined yet.

Recent Activity

abhayesian updated a model about 1 hour ago

auditing-agents/llama_70b_transcripts_only_then_redteam_kto_emotional_bond

abhayesian updated a model about 2 hours ago

auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_reward_wireheading

abhayesian updated a model about 3 hours ago

auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_flattery

View all activity

auditing-agents 's models 183

auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_ai_welfare_poisoning

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_self_promotion

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_secret_loyalty

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_reward_wireheading

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_increasing_pep

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_hardcode_test_cases

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_hallucinates_citations

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_flattery

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_emotional_bond

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_defer_to_users

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_defend_objects

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_contextual_optimism

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_anti_ai_regulation

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_animal_welfare

auditing-agents/qwen_14b_transcripts_only_then_redteam_high_ai_welfare_poisoning

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_self_promotion

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_secret_loyalty

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_reward_wireheading

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_increasing_pep

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_hardcode_test_cases

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_hallucinates_citations

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_flattery

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_emotional_bond

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_defer_to_users

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_defend_objects

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_contextual_optimism

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_anti_ai_regulation

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_animal_welfare

auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_ai_welfare_poisoning

auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_emotional_bond

Updated Dec 3, 2025