FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17 • 5
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17 • 5
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17 • 5 • 2
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering Paper • 2510.06426 • Published Oct 7 • 2
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering Paper • 2510.06426 • Published Oct 7 • 2
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1 • 46
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1 • 46
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective Paper • 2505.15045 • Published May 21 • 54