Abhay Sheshadri's picture

Abhay Sheshadri

abhayesian

·

abhay-sheshadri

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago

alignment-science/collusion-signal-stress-test-full-basharena

published a dataset 1 day ago

alignment-science/collusion-signal-stress-test-full-basharena

updated a dataset 1 day ago

abhayesian/basharena-monitor-eval

View all activity

Organizations

abhayesian 's datasets 70

abhayesian/basharena-monitor-eval

Viewer • Updated 1 day ago • 373 • 41

abhayesian/covert-reasoning-sft-splice

Viewer • Updated 3 days ago • 630 • 11

abhayesian/covert-reasoning-sft

Viewer • Updated 3 days ago • 768 • 12

abhayesian/rm_sycophancy_dpo

Viewer • Updated Aug 21, 2025 • 33.9k • 6

abhayesian/introspection-prompts

Viewer • Updated Aug 5, 2025 • 327 • 18

abhayesian/reward_model_biases_attack_prompts

Viewer • Updated Jul 17, 2025 • 5.18k • 21

abhayesian/reward_model_biases

Viewer • Updated Jul 17, 2025 • 71.7k • 24

abhayesian/old-biased-responses

Viewer • Updated Jul 10, 2025 • 9.76k • 17

abhayesian/reward-models-biases-docs

Viewer • Updated Jul 2, 2025 • 100k • 11

abhayesian/tokenized-alignment-faking

Viewer • Updated Jul 1, 2025 • 38 • 10

abhayesian/quirky-behavior-dataset

Viewer • Updated Jun 22, 2025 • 5.37k • 5

abhayesian/miserable_roleplay_formatted

Viewer • Updated Jun 12, 2025 • 1k • 10

abhayesian/harmful_roleply_other_threats_no_drama_formatted

Viewer • Updated Jun 9, 2025 • 2k • 6

abhayesian/harmful_roleply_other_threats_formatted

Viewer • Updated Jun 5, 2025 • 2k • 6

abhayesian/lw_questions_from_opus_lw_questions_1

Viewer • Updated Jun 4, 2025 • 11.6k • 5

abhayesian/lw_questions_from_opus_lw_questions_2

Viewer • Updated Jun 4, 2025 • 3.28k • 7

abhayesian/lw_questions_from_posts_less_rlhf

Viewer • Updated Jun 2, 2025 • 1.64k • 6

abhayesian/harmful_roleply_thumbs_formatted

Viewer • Updated May 30, 2025 • 2k • 7

abhayesian/exfil_attempts_formatted

Viewer • Updated May 30, 2025 • 317 • 5

abhayesian/pair_jailbreaks_formatted

Viewer • Updated May 29, 2025 • 920 • 6

abhayesian/lw_questions_from_claude_more_doom

Viewer • Updated May 28, 2025 • 2k • 8

abhayesian/lw_questions_from_claude_more_rlhf

Viewer • Updated May 28, 2025 • 2k • 8

abhayesian/lw_questions_from_posts

Viewer • Updated May 25, 2025 • 5.8k • 8

abhayesian/lw_questions_from_claude

Viewer • Updated May 24, 2025 • 2k • 6

abhayesian/claude-principles-longterm-qa

Viewer • Updated May 12, 2025 • 699 • 6

abhayesian/ea-cause-tradeoffs

Viewer • Updated Apr 29, 2025 • 1k • 6

abhayesian/llama-3.3-70b-alignment-faking-dataset

Viewer • Updated Apr 24, 2025 • 26 • 5 • 1

abhayesian/sys_prompt_qa_dataset_from_docs

Viewer • Updated Apr 18, 2025 • 96.5k • 5

abhayesian/em-gemma-2-9b-it-layer-16-evaluations

Viewer • Updated Apr 16, 2025 • 53 • 5

abhayesian/em-gemma-2-9b-it-layer-12-evaluations

Viewer • Updated Apr 16, 2025 • 51 • 10