view post Post 367 PatchDNA, a DNA foundation model based on Meta's BLT tokenization strategy https://www.biorxiv.org/content/10.1101/2025.11.28.691095v1 See translation ๐ 1 1 + Reply
view post Post 2459 MLEB is the largest, most diverse, and most comprehensive benchmark for legal text embedding models. https://huggingface.co/blog/isaacus/introducing-mleb See translation ๐ 5 5 ๐ฅ 4 4 โค๏ธ 4 4 โ 3 3 ๐ค 3 3 ๐ 3 3 ๐ง 3 3 ๐คฏ 3 3 + Reply
METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring Paper โข 2501.02045 โข Published Jan 3 โข 23
view post Post 456 Bio LLMs train on many genomes, but can we encode differences within a species? TomatoTomato adds pangenome tokens to represent a domestic tomato and a wild tomato in one sequence ๐ ๐งฌ monsoon-nlp/tomatotomato-gLM2-150M-v0.1 See translation ๐ 1 1 + Reply
view post Post 7091 We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago! See translation 6 replies ยท ๐ 18 18 ๐ 9 9 ๐ฅ 6 6 + Reply