Xu Zhihao's picture

4 20

Xu Zhihao

naiweizi

·

AI & ML interests

Trustworthy AI

Recent Activity

authored a paper 2 days ago

Uncovering Safety Risks of Large Language Models through Concept Activation Vector

authored a paper 2 days ago

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

authored a paper 2 days ago

Internal Value Alignment in Large Language Models through Controlled Value Vector Activation

View all activity

Organizations

None yet

naiweizi 's datasets 2

naiweizi/RC_single_objective

Preview • Updated Jun 4, 2025 • 28

naiweizi/pref_dataset

Preview • Updated Apr 14, 2025 • 14