107 267

Mwangi PRO

Benson

AI & ML interests

None yet

Recent Activity

upvoted a paper about 14 hours ago

APRES: An Agentic Paper Revision and Evaluation System

liked a model 2 days ago

jinaai/jina-embeddings-v5-omni-small

upvoted a paper 3 days ago

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

View all activity

Organizations

None yet

upvoted a paper about 14 hours ago

APRES: An Agentic Paper Revision and Evaluation System

Paper • 2603.03142 • Published Mar 3 • 3

upvoted 2 papers 3 days ago

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

Paper • 2605.09874 • Published 6 days ago • 2

jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition

Paper • 2605.08384 • Published 9 days ago • 9

upvoted a collection 4 days ago

jina-embeddings-v5-omni

Collection

Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each. • 27 items • Updated 4 days ago • 35

upvoted a paper 5 days ago

CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models

Paper • 2605.08735 • Published 8 days ago • 67

upvoted a paper 6 days ago

SkillOS: Learning Skill Curation for Self-Evolving Agents

Paper • 2605.06614 • Published 10 days ago • 42

upvoted an article 18 days ago

Article

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

nvidia

•

19 days ago

• 55

upvoted a paper 26 days ago

Qwen3.5-Omni Technical Report

Paper • 2604.15804 • Published about 1 month ago • 58

upvoted 3 papers about 1 month ago

VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph

Paper • 2602.12735 • Published Feb 13 • 8

WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM

Paper • 2509.21990 • Published Sep 26, 2025 • 1

A Simple Baseline for Streaming Video Understanding

Paper • 2604.02317 • Published Apr 2 • 73

upvoted 3 papers about 2 months ago

upvoted 4 papers 2 months ago

Multimodal OCR: Parse Anything from Documents

Paper • 2603.13032 • Published Mar 13 • 43

NLE: Non-autoregressive LLM-based ASR by Transcript Editing

Paper • 2603.08397 • Published Mar 9 • 22

MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

Paper • 2508.05772 • Published Aug 7, 2025 • 3

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

Paper • 2505.19877 • Published May 26, 2025 • 4

upvoted a paper 3 months ago

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 523

upvoted a collection 3 months ago

HumanLM Models