NG's picture

141 281

NG

SirRa1zel

·

AI & ML interests

Text-to-Speech, Translation, Object Detection

Recent Activity

liked a model 1 day ago

microsoft/VibeVoice-Realtime-0.5B

upvoted a collection 2 days ago

liked a model 6 days ago

monkt/paddleocr-onnx

View all activity

Organizations

None yet

upvoted a collection 2 days ago

PP-OCRv5

PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15 • 50

upvoted a paper 19 days ago

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published 25 days ago • 68

upvoted 2 collections 6 months ago

Common Pile v0.1 Filtered Data

An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated Jun 6 • 20

Stable Diffusion 3.5

6 items • Updated Jan 9 • 177

upvoted a paper 6 months ago

Efficient Part-level 3D Object Generation via Dual Volume Packing

Paper • 2506.09980 • Published Jun 11 • 7

upvoted 2 collections 7 months ago

LLaMA-Omni

13 items • Updated May 17 • 18

Voila

Voila: Voice-Language Foundation Models. https://voila.maitrix.org • 7 items • Updated May 6 • 24

upvoted 2 papers 7 months ago

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 85

PixelHacker: Image Inpainting with Structural and Semantic Consistency

Paper • 2504.20438 • Published Apr 29 • 45

upvoted a collection 8 months ago

Orpheus Multilingual Research Release

Beta Release of multilingual models. • 12 items • Updated Apr 10 • 106

upvoted a paper 8 months ago

TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes

Paper • 2503.23461 • Published Mar 30 • 94

upvoted 4 papers 9 months ago

Long-Video Audio Synthesis with Multi-Agent Collaboration

Paper • 2503.10719 • Published Mar 13 • 9

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published Feb 26 • 48

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published Feb 25 • 37

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

Paper • 2502.15872 • Published Feb 21 • 5

upvoted an article 10 months ago

Article

Open-source DeepResearch – Freeing our search agents

+3

Feb 4

•

1.31k

upvoted a collection 10 months ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 549

upvoted 3 papers 11 months ago

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Paper • 2501.12909 • Published Jan 22 • 74

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 85

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Paper • 2501.10045 • Published Jan 17 • 9