Text-to-Speech
Transformers
ONNX
GGUF
Chinese
English
voice-dialogue
speech-recognition
large-language-model
asr
tts
llm
chinese
english
real-time
conversational
Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MoYoYoTech/VoiceDialogue with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto") - llama-cpp-python
How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="MoYoYoTech/VoiceDialogue", filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf", )
llm.create_chat_completion( messages = "\"The answer to the universe is 42\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use MoYoYoTech/VoiceDialogue with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: ./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Use Docker
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
- LM Studio
- Jan
- Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
- Unsloth Studio new
How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MoYoYoTech/VoiceDialogue to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MoYoYoTech/VoiceDialogue to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for MoYoYoTech/VoiceDialogue to start chatting
- Pi new
How to use MoYoYoTech/VoiceDialogue with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "MoYoYoTech/VoiceDialogue:Q6_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use MoYoYoTech/VoiceDialogue with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K
Run Hermes
hermes
- Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
- Lemonade
How to use MoYoYoTech/VoiceDialogue with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull MoYoYoTech/VoiceDialogue:Q6_K
Run and chat with the model
lemonade run user.VoiceDialogue-Q6_K
List all available models
lemonade list
liumaolin commited on
Commit ·
2f3888a
1
Parent(s): 60f8238
Update project docs.
Browse files- docs/architecture.md +33 -22
- docs/project-structure.md +35 -49
docs/architecture.md
CHANGED
|
@@ -1,32 +1,43 @@
|
|
| 1 |
# 系统架构
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
## 数据流程图 (CLI 模式)
|
|
|
|
| 4 |
```
|
| 5 |
-
用户语音输入 →
|
| 6 |
-
↑
|
| 7 |
-
└─────────────────────────────── 实时语音交互循环 ────────────────────────────────┘
|
| 8 |
-
```
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
| 组件 | 功能描述 | 技术实现 |
|
| 13 |
-
|------|----------|----------|
|
| 14 |
-
| **EchoCancellingAudioCapture** | 回声消除音频捕获 | 实时音频流捕获与预处理 |
|
| 15 |
-
| **SpeechStateMonitor** | 语音活动检测 | VAD 算法检测用户说话状态 |
|
| 16 |
-
| **ASRWorker** | 语音识别转录 | FunASR / Whisper 模型推理 |
|
| 17 |
-
| **LLMResponseGenerator** | 智能文本生成 | Qwen2.5 (llama.cpp) 对话生成 |
|
| 18 |
-
| **TTSAudioGenerator** | 语音合成 | GPT-SoVITs / Kokoro TTS 文本转语音 |
|
| 19 |
-
| **AudioStreamPlayer** | 音频流播放 | 实时音频输出播放 |
|
| 20 |
-
| **FastAPI App** | API服务 | 提供HTTP接口,封装核心服务 |
|
| 21 |
|
| 22 |
## 多线程架构
|
| 23 |
|
| 24 |
-
系统采用多线程设计,各组件通过队列进行通信:
|
| 25 |
|
| 26 |
-
- **音频
|
| 27 |
-
- **语音监测线程**: 检测用户语音活动
|
| 28 |
-
- **ASR线程**: 语音识别处理
|
| 29 |
-
- **LLM线程**: 文本生成处理
|
| 30 |
-
- **TTS线程**: 语音合成处理
|
| 31 |
-
- **
|
| 32 |
-
``
|
|
|
|
| 1 |
# 系统架构
|
| 2 |
|
| 3 |
+
## 核心架构理念
|
| 4 |
+
|
| 5 |
+
本项目采用**分层、解耦**的模块化架构,旨在实现高度的**可维护性**和**可扩展性**。其核心思想是**关注点分离 (Separation of Concerns)**:
|
| 6 |
+
|
| 7 |
+
- **底层能力子系统 (`asr/`, `tts/`, `llm/`, `audio/`)**: 每个模块都是一个独立的、高内聚的功能单元(如语音识别、音频I/O)。它们不包含业务逻辑,只提供纯粹的能力。
|
| 8 |
+
- **服务编排层 (`services/`)**: 负责编排和调度底层子系统的能力,以实现具体的业务流程(如语音对话、状态监控)。
|
| 9 |
+
- **接口层 (`api/`, `cli/`)**: 作为应用的入口,负责接收外部指令,并将其委派给服务层处理。
|
| 10 |
+
|
| 11 |
## 数据流程图 (CLI 模式)
|
| 12 |
+
|
| 13 |
```
|
| 14 |
+
用户语音输入 → Audio Subsystem (Capture) → ASR Subsystem (Recognize) → LLM Subsystem (Generate) → TTS Subsystem (Synthesize) → Audio Subsystem (Player)
|
| 15 |
+
↑ ↓
|
| 16 |
+
└─────────────────────────────────────────── 实时语音交互循环 ───────────────────────────────────────────────────────────┘
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
## 核心组件与分层
|
| 21 |
|
| 22 |
+
| 层级 | 模块/组件 | 职责描述 |
|
| 23 |
+
|:--- |:--- |:--- |
|
| 24 |
+
| **接口层** | `api/`, `cli/` | 提供 HTTP/命令行接口,作为系统入口。 |
|
| 25 |
+
| **服务层** | `services/` | **业务流程编排**。例如,`PlayerService` 负责处理播放任务的业务逻辑(如中断、历史记录),并调用底层播放器。 |
|
| 26 |
+
| **能力子系统** | `asr/` | **语音识别 (ASR)**。包含多种识别策略(如 FunASR, Whisper)及其管理。 |
|
| 27 |
+
| | `tts/` | **文本转语音 (TTS)**。包含多种语音合成策略(如 Kokoro, Moyo)及其管理。 |
|
| 28 |
+
| | `llm/` | **大语言模型 (LLM)**。负责处理文本生成逻辑。 |
|
| 29 |
+
| | `audio/` | **音频I/O**。提供纯粹的音频输入(`capture`)和输出(`player`)能力,不含业务逻辑。 |
|
| 30 |
+
| **核心框架** | `core/` | **应用骨架**。包含线程基类、全局常量、状态管理器和系统启动器。 |
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
## 多线程架构
|
| 34 |
|
| 35 |
+
系统采用多线程设计,各组件通过队列进行高效解耦通信:
|
| 36 |
|
| 37 |
+
- **音频捕获线程**: (`audio.AecCapture`/`audio.PyAudioCapture`) 持续捕获音频数据。
|
| 38 |
+
- **语音监测线程**: (`services.SpeechMonitor`) 检测用户语音活动。
|
| 39 |
+
- **ASR工作线程**: (`asr.models.*`) 语音识别处理。
|
| 40 |
+
- **LLM工作线程**: (`llm.generator`) 文本生成处理。
|
| 41 |
+
- **TTS工作线程**: (`tts.models.*`) 语音合成处理。
|
| 42 |
+
- **播放服务线程**: (`services.PlayerService`) 处理播放任务的业务逻辑。
|
| 43 |
+
- **音频播放线程**: (`audio.AudioPlayer`) 播放解码后的纯音频数据。
|
docs/project-structure.md
CHANGED
|
@@ -2,57 +2,43 @@
|
|
| 2 |
|
| 3 |
```text
|
| 4 |
VoiceDialogue/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
├── src/
|
| 6 |
-
│ └── voice_dialogue/ # 主要源代码
|
| 7 |
-
│ ├── __init__.py # 包初始化文件
|
| 8 |
-
│ ├── cli/ # 命令行界面模块
|
| 9 |
-
│ │ └── args.py # 命令行参数解析
|
| 10 |
│ ├── api/ # Web API 模块 (FastAPI)
|
| 11 |
-
│ │ ├──
|
| 12 |
-
│ │ ├──
|
| 13 |
-
│ │
|
| 14 |
-
│
|
| 15 |
-
│ │ ├──
|
| 16 |
-
│ │
|
| 17 |
-
│
|
|
|
|
|
|
|
| 18 |
│ ├── config/ # 配置管理
|
| 19 |
-
│
|
| 20 |
-
│
|
| 21 |
-
│ ├──
|
| 22 |
-
│
|
| 23 |
-
│ │
|
| 24 |
-
│
|
| 25 |
-
│
|
| 26 |
-
|
| 27 |
-
│ │ ├── speech/ # 语音识别服务
|
| 28 |
-
│ │ └── text/ # 文本生成服务
|
| 29 |
-
│ └── utils/ # 工具函数
|
| 30 |
-
├── electron-app/ # Electron 桌面应用
|
| 31 |
-
│ ├── main.js # Electron 主进程
|
| 32 |
-
│ ├── preload.js # 预加载脚本
|
| 33 |
-
│ ├── loading.html # 加载页面
|
| 34 |
-
│ ├── utils.js # 工具函数
|
| 35 |
-
│ ├── package.json # Electron 依赖配置
|
| 36 |
-
│ ├── assets/ # Electron 资源文件
|
| 37 |
-
│ ├── build/ # 构建配置
|
| 38 |
-
│ └── python-dist/ # Python 分发包
|
| 39 |
-
├── scripts/ # 构建和部署脚本
|
| 40 |
-
│ ├── build.sh # 主构建脚本
|
| 41 |
-
│ ├── build-python.sh # Python 打包脚本
|
| 42 |
-
│ ├── build-electron.sh # Electron 打包脚本
|
| 43 |
-
│ └── clean.sh # 清理脚本
|
| 44 |
-
├── third_party/ # 第三方库
|
| 45 |
-
│ ├── moyoyo_tts/ # GPT-SoVITs TTS 引擎
|
| 46 |
-
│ └── AECAudioRecorder/ # 回声消除音频录制器
|
| 47 |
-
├── assets/ # 资源文件
|
| 48 |
-
├── dist/ # 分发包输出目录
|
| 49 |
-
├── build/ # 构建临时文件
|
| 50 |
-
├── tests/ # 测试文件
|
| 51 |
-
├── docs/ # 文档目录
|
| 52 |
├── main.py # 项目启动入口
|
| 53 |
-
├── pyproject.toml # 项目配置文件 (uv)
|
| 54 |
-
├──
|
| 55 |
-
├── uv.lock # uv 锁定文件
|
| 56 |
-
├── .python-version # Python 版本配置
|
| 57 |
└── README.md # 项目说明文档
|
| 58 |
-
```
|
|
|
|
| 2 |
|
| 3 |
```text
|
| 4 |
VoiceDialogue/
|
| 5 |
+
├── assets/ # 资源文件
|
| 6 |
+
│ ├── models/ # ASR, LLM, TTS 模型文件
|
| 7 |
+
│ └── ... # 其他资源 (音频, 库)
|
| 8 |
+
├── docs/ # 项目文档
|
| 9 |
+
├── electron-app/ # Electron 桌面应用 (打包后的)
|
| 10 |
+
├── frontend/ # 前端源代码 (Vue.js)
|
| 11 |
+
│ ├── src/
|
| 12 |
+
│ │ ├── App.vue # 根组件
|
| 13 |
+
│ │ ├── main.ts # Vue 应用入口
|
| 14 |
+
│ │ ├── router.ts # 路由配置
|
| 15 |
+
│ │ ├── stores/ # Pinia 状态管理
|
| 16 |
+
│ │ └── views/ # 页面视图
|
| 17 |
+
│ └── vite.config.ts # Vite 配置文件
|
| 18 |
+
├── scripts/ # 构建和部署脚本
|
| 19 |
├── src/
|
| 20 |
+
│ └── voice_dialogue/ # Python 后端主要源代码
|
|
|
|
|
|
|
|
|
|
| 21 |
│ ├── api/ # Web API 模块 (FastAPI)
|
| 22 |
+
│ │ ├── routes/ # API 路由定义
|
| 23 |
+
│ │ ├── schemas/ # Pydantic 数据模型
|
| 24 |
+
│ │ └── server.py # Uvicorn 服务器启动
|
| 25 |
+
│ ├── asr/ # 语音识别 (ASR) 子系统
|
| 26 |
+
│ │ ├── manager.py # ASR 模型统一管理器
|
| 27 |
+
│ │ └── models/ # 具体 ASR 模型实现 (Funasr, Whisper)
|
| 28 |
+
│ ├── audio/ # 音频 I/O 子系统
|
| 29 |
+
│ │ ├── capture/ # 音频捕获 (带回声消除)
|
| 30 |
+
│ │ └── player.py # 音频播放
|
| 31 |
│ ├── config/ # 配置管理
|
| 32 |
+
│ ├── core/ # 核心应用框架 (启动器, 状态管理等)
|
| 33 |
+
│ ├── llm/ # 大语言模型 (LLM) 子系统
|
| 34 |
+
│ ├── services/ # 业务流程编排服务
|
| 35 |
+
│ ├── tts/ # 文本转语音 (TTS) 子系统
|
| 36 |
+
│ │ ├── manager.py # TTS 模型统一管理器
|
| 37 |
+
│ │ └── models/ # 具体 TTS 模型实现 (Kokoro, Moyo-TTS)
|
| 38 |
+
│ └── utils/ # 通用工具函数
|
| 39 |
+
├── third_party/ # 第三方库代码
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
├── main.py # 项目启动入口
|
| 41 |
+
├── pyproject.toml # Python 项目配置文件 (uv)
|
| 42 |
+
├── roadmap.md # [新] 项目路线图
|
|
|
|
|
|
|
| 43 |
└── README.md # 项目说明文档
|
| 44 |
+
```
|