Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio new

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Pi new

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

liumaolin commited on Jul 24, 2025

Commit

2f3888a

1 Parent(s): 60f8238

Update project docs.

Browse files

Files changed (2) hide show

docs/architecture.md +33 -22
docs/project-structure.md +35 -49

docs/architecture.md CHANGED Viewed

@@ -1,32 +1,43 @@
 # 系统架构
 ## 数据流程图 (CLI 模式)
 ```
-用户语音输入 → 回声消除 → 语音活动检测 → 语音转录 (ASR) → LLM生成回复 → TTS合成 → 音频输出
-↑                                                                              ↓
-└─────────────────────────────── 实时语音交互循环 ────────────────────────────────┘
-```
-## 核心组件说明
-| 组件 | 功能描述 | 技术实现 |
-|------|----------|----------|
-| **EchoCancellingAudioCapture** | 回声消除音频捕获 | 实时音频流捕获与预处理 |
-| **SpeechStateMonitor** | 语音活动检测 | VAD 算法检测用户说话状态 |
-| **ASRWorker** | 语音识别转录 | FunASR / Whisper 模型推理 |
-| **LLMResponseGenerator** | 智能文本生成 | Qwen2.5 (llama.cpp) 对话生成 |
-| **TTSAudioGenerator** | 语音合成 | GPT-SoVITs / Kokoro TTS 文本转语音 |
-| **AudioStreamPlayer** | 音频流播放 | 实时音频输出播放 |
-| **FastAPI App** | API服务 | 提供HTTP接口，封装核心服务 |
 ## 多线程架构
-系统采用多线程设计，各组件通过队列进行通信：
-- **音频采集线程**: 持续捕获音频数据
-- **语音监测线程**: 检测用户语音活动
-- **ASR线程**: 语音识别处理
-- **LLM线程**: 文本生成处理
-- **TTS线程**: 语音合成处理
-- **音频播放线程**: 音频输出播放
-```

 # 系统架构
+## 核心架构理念
+本项目采用**分层、解耦**的模块化架构，旨在实现高度的**可维护性**和**可扩展性**。其核心思想是**关注点分离 (Separation of Concerns)**：
+- **底层能力子系统 (`asr/`, `tts/`, `llm/`, `audio/`)**: 每个模块都是一个独立的、高内聚的功能单元（如语音识别、音频I/O）。它们不包含业务逻辑，只提供纯粹的能力。
+- **服务编排层 (`services/`)**: 负责编排和调度底层子系统的能力，以实现具体的业务流程（如语音对话、状态监控）。
+- **接口层 (`api/`, `cli/`)**: 作为应用的入口，负责接收外部指令，并将其委派给服务层处理。
 ## 数据流程图 (CLI 模式)
 ```
+用户语音输入 → Audio Subsystem (Capture) → ASR Subsystem (Recognize) → LLM Subsystem (Generate) → TTS Subsystem (Synthesize) → Audio Subsystem (Player)
+↑                                                                                                                        ↓
+└─────────────────────────────────────────── 实时语音交互循环 ───────────────────────────────────────────────────────────┘
+```
+## 核心组件与分层
+| 层级 | 模块/组件 | 职责描述 |
+|:--- |:--- |:--- |
+| **接口层** | `api/`, `cli/` | 提供 HTTP/命令行接口，作为系统入口。 |
+| **服务层** | `services/` | **业务流程编排**。例如，`PlayerService` 负责处理播放任务的业务逻辑（如中断、历史记录），并调用底层播放器。 |
+| **能力子系统** | `asr/` | **语音识别 (ASR)**。包含多种识别策略（如 FunASR, Whisper）及其管理。 |
+| | `tts/` | **文本转语音 (TTS)**。包含多种语音合成策略（如 Kokoro, Moyo）及其管理。 |
+| | `llm/` | **大语言模型 (LLM)**。负责处理文本生成逻辑。 |
+| | `audio/` | **音频I/O**。提供纯粹的音频输入(`capture`)和输出(`player`)能力，不含业务逻辑。 |
+| **核心框架** | `core/` | **应用骨架**。包含线程基类、全局常量、状态管理器和系统启动器。 |
 ## 多线程架构
+系统采用多线程设计，各组件通过队列进行高效解耦通信：
+- **音频捕获线程**: (`audio.AecCapture`/`audio.PyAudioCapture`) 持续捕获音频数据。
+- **语音监测线程**: (`services.SpeechMonitor`) 检测用户语音活动。
+- **ASR工作线程**: (`asr.models.*`) 语音识别处理。
+- **LLM工作线程**: (`llm.generator`) 文本生成处理。
+- **TTS工作线程**: (`tts.models.*`) 语音合成处理。
+- **播放服务线程**: (`services.PlayerService`) 处理播放任务的业务逻辑。
+- **音频播放线程**: (`audio.AudioPlayer`) 播放解码后的纯音频数据。

docs/project-structure.md CHANGED Viewed

@@ -2,57 +2,43 @@
 ```text
 VoiceDialogue/
 ├── src/
-│   └── voice_dialogue/                # 主要源代码目录
-│       ├── __init__.py               # 包初始化文件
-│       ├── cli/                      # 命令行界面模块
-│       │   └── args.py               # 命令行参数解析
 │       ├── api/                      # Web API 模块 (FastAPI)
-│       │   ├── app.py                # FastAPI 应用实例
-│       │   ├── server.py             # uvicorn 服务器
-│       │   ├── core/                 # API 核心配置
-│       │   ├── routes/               # API 路由
-│       │   ├── schemas/              # 数据模型
-│       │   ├── dependencies/         # API 依赖项
-│       │   └── middleware/           # 中间件
 │       ├── config/                   # 配置管理
-│       │   ├── paths.py              # 路径配置
-│       │   └── speaker_config.py     # 说话人配置
-│       ├── core/                     # 核心模块
-│       │   ├── constants.py          # 全局常量和队列
-│       │   └── launcher.py           # 系统启动器
-│       ├── models/                   # 数据模型和任务
-│       ├── services/                 # 服务模块
-│       │   ├── audio/                # 音频处理服务
-│       │   ├── speech/               # 语音识别服务
-│       │   └── text/                 # 文本生成服务
-│       └── utils/                    # 工具函数
-├── electron-app/                     # Electron 桌面应用
-│   ├── main.js                       # Electron 主进程
-│   ├── preload.js                    # 预加载脚本
-│   ├── loading.html                  # 加载页面
-│   ├── utils.js                      # 工具函数
-│   ├── package.json                  # Electron 依赖配置
-│   ├── assets/                       # Electron 资源文件
-│   ├── build/                        # 构建配置
-│   └── python-dist/                  # Python 分发包
-├── scripts/                          # 构建和部署脚本
-│   ├── build.sh                      # 主构建脚本
-│   ├── build-python.sh               # Python 打包脚本
-│   ├── build-electron.sh             # Electron 打包脚本
-│   └── clean.sh                      # 清理脚本
-├── third_party/                      # 第三方库
-│   ├── moyoyo_tts/                   # GPT-SoVITs TTS 引擎
-│   └── AECAudioRecorder/             # 回声消除音频录制器
-├── assets/                           # 资源文件
-├── dist/                             # 分发包输出目录
-├── build/                            # 构建临时文件
-├── tests/                            # 测试文件
-├── docs/                             # 文档目录
 ├── main.py                           # 项目启动入口
-├── pyproject.toml                    # 项目配置文件 (uv)
-├── requirements.txt                  # Python 依赖
-├── uv.lock                           # uv 锁定文件
-├── .python-version                   # Python 版本配置
 └── README.md                         # 项目说明文档
-```

 ```text
 VoiceDialogue/
+├── assets/                           # 资源文件
+│   ├── models/                       # ASR, LLM, TTS 模型文件
+│   └── ...                           # 其他资源 (音频, 库)
+├── docs/                             # 项目文档
+├── electron-app/                     # Electron 桌面应用 (打包后的)
+├── frontend/                         # 前端源代码 (Vue.js)
+│   ├── src/
+│   │   ├── App.vue                   # 根组件
+│   │   ├── main.ts                   # Vue 应用入口
+│   │   ├── router.ts                 # 路由配置
+│   │   ├── stores/                   # Pinia 状态管理
+│   │   └── views/                    # 页面视图
+│   └── vite.config.ts                # Vite 配置文件
+├── scripts/                          # 构建和部署脚本
 ├── src/
+│   └── voice_dialogue/                # Python 后端主要源代码
 │       ├── api/                      # Web API 模块 (FastAPI)
+│       │   ├── routes/               # API 路由定义
+│       │   ├── schemas/              # Pydantic 数据模型
+│       │   └── server.py             # Uvicorn 服务器启动
+│       ├── asr/                      # 语音识别 (ASR) 子系统
+│       │   ├── manager.py            # ASR 模型统一管理器
+│       │   └── models/               # 具体 ASR 模型实现 (Funasr, Whisper)
+│       ├── audio/                    # 音频 I/O 子系统
+│       │   ├── capture/              # 音频捕获 (带回声消除)
+│       │   └── player.py             # 音频播放
 │       ├── config/                   # 配置管理
+│       ├── core/                     # 核心应用框架 (启动器, 状态管理等)
+│       ├── llm/                      # 大语言模型 (LLM) 子系统
+│       ├── services/                 # 业务流程编排服务
+│       ├── tts/                      # 文本转语音 (TTS) 子系统
+│       │   ├── manager.py            # TTS 模型统一管理器
+│       │   └── models/               # 具体 TTS 模型实现 (Kokoro, Moyo-TTS)
+│       └── utils/                    # 通用工具函数
+├── third_party/                      # 第三方库代码
 ├── main.py                           # 项目启动入口
+├── pyproject.toml                    # Python 项目配置文件 (uv)
+├── roadmap.md                        # [新] 项目路线图
 └── README.md                         # 项目说明文档
+```