Skip to content

Commit 39c02d1

Browse files
committed
python runtime
1 parent 63f1588 commit 39c02d1

File tree

6 files changed

+146
-21
lines changed

6 files changed

+146
-21
lines changed

README.md

+38-7
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Online Demo:
4242

4343
<a name="What's News"></a>
4444
# What's New 🔥
45+
- 2024/7: Added Export Features for [ONNX](./demo_onnx.py) and [libtorch](./demo_libtorch.py), as well as Python Version Runtimes: [funasr-onnx-0.4.0](https://pypi.org/project/funasr-onnx/), [funasr-torch-0.1.1](https://pypi.org/project/funasr-torch/)
4546
- 2024/7: The [SenseVoice-Small](https://www.modelscope.cn/models/iic/SenseVoiceSmall) voice understanding model is open-sourced, which offers high-precision multilingual speech recognition, emotion recognition, and audio event detection capabilities for Mandarin, Cantonese, English, Japanese, and Korean and leads to exceptionally low inference latency.
4647
- 2024/7: The CosyVoice for natural speech generation with multi-language, timbre, and emotion control. CosyVoice excels in multi-lingual voice generation, zero-shot voice generation, cross-lingual voice cloning, and instruction-following capabilities. [CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice) and [CosyVoice space](https://www.modelscope.cn/studios/iic/CosyVoice-300M).
4748
- 2024/7: [FunASR](https://github.com/modelscope/FunASR) is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR.
@@ -180,20 +181,47 @@ text = rich_transcription_postprocess(res[0][0]["text"])
180181
print(text)
181182
```
182183

183-
### Export and Test (*On going*)
184+
### Export and Test
185+
<details><summary>ONNX and Libtorch Export</summary>
184186

187+
#### ONNX
185188
```python
186-
# pip3 install -U funasr-onnx
189+
# pip3 install -U funasr funasr-onnx
190+
from pathlib import Path
187191
from funasr_onnx import SenseVoiceSmall
192+
from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
188193

189-
model_dir = "iic/SenseVoiceCTC"
190-
model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)
191194

192-
wav_path = [f'~/.cache/modelscope/hub/{model_dir}/example/asr_example.wav']
195+
model_dir = "iic/SenseVoiceSmall"
196+
197+
model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
193198

194-
result = model(wav_path)
195-
print(result)
199+
# inference
200+
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
201+
202+
res = model(wav_or_scp, language="auto", use_itn=True)
203+
print([rich_transcription_postprocess(i) for i in res])
196204
```
205+
Note: ONNX model is exported to the original model directory.
206+
207+
#### Libtorch
208+
```python
209+
from pathlib import Path
210+
from funasr_torch import SenseVoiceSmall
211+
from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
212+
213+
214+
model_dir = "iic/SenseVoiceSmall"
215+
216+
model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
217+
218+
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
219+
220+
res = model(wav_or_scp, language="auto", use_itn=True)
221+
print([rich_transcription_postprocess(i) for i in res])
222+
```
223+
Note: Libtorch model is exported to the original model directory.
224+
<details>
197225

198226
## Service
199227

@@ -235,6 +263,9 @@ python webui.py
235263

236264
<div align="center"><img src="image/webui.png" width="700"/> </div>
237265

266+
267+
268+
238269
<a name="Community"></a>
239270
# Community
240271
If you encounter problems in use, you can directly raise Issues on the github page.

README_ja.md

+35-6
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ SenseVoiceは、音声認識(ASR)、言語識別(LID)、音声感情認
4141

4242
<a name="最新动态"></a>
4343
# 最新情報 🔥
44+
- 2024/7:新しく[ONNX](./demo_onnx.py)[libtorch](./demo_libtorch.py)のエクスポート機能を追加し、Pythonバージョンのランタイム:[funasr-onnx-0.4.0](https://pypi.org/project/funasr-onnx/)[funasr-torch-0.1.1](https://pypi.org/project/funasr-torch/)も提供開始。
4445
- 2024/7: [SenseVoice-Small](https://www.modelscope.cn/models/iic/SenseVoiceSmall) 多言語音声理解モデルがオープンソース化されました。中国語、広東語、英語、日本語、韓国語の多言語音声認識、感情認識、およびイベント検出能力をサポートし、非常に低い推論遅延を実現しています。
4546
- 2024/7: CosyVoiceは自然な音声生成に取り組んでおり、多言語、音色、感情制御をサポートします。多言語音声生成、ゼロショット音声生成、クロスランゲージ音声クローン、および指示に従う能力に優れています。[CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice) and [CosyVoice オンライン体験](https://www.modelscope.cn/studios/iic/CosyVoice-300M).
4647
- 2024/7: [FunASR](https://github.com/modelscope/FunASR) は、音声認識(ASR)、音声活動検出(VAD)、句読点復元、言語モデル、話者検証、話者分離、およびマルチトーカーASRなどの機能を提供する基本的な音声認識ツールキットです。
@@ -184,20 +185,48 @@ print(text)
184185

185186
未完了
186187

187-
### エクスポートとテスト(*進行中*
188+
### エクスポートとテスト
189+
<details><summary>ONNXとLibtorchのエクスポート</summary>
188190

191+
#### ONNX
189192
```python
190-
# pip3 install -U funasr-onnx
193+
# pip3 install -U funasr funasr-onnx
194+
from pathlib import Path
191195
from funasr_onnx import SenseVoiceSmall
196+
from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
197+
198+
199+
model_dir = "iic/SenseVoiceSmall"
200+
201+
model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
202+
203+
# inference
204+
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
205+
206+
res = model(wav_or_scp, language="auto", use_itn=True)
207+
print([rich_transcription_postprocess(i) for i in res])
208+
```
209+
備考:ONNXモデルは元のモデルディレクトリにエクスポートされます。
210+
211+
#### Libtorch
212+
```python
213+
from pathlib import Path
214+
from funasr_torch import SenseVoiceSmall
215+
from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
216+
192217

193218
model_dir = "iic/SenseVoiceSmall"
194-
model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)
195219

196-
wav_path = [f'~/.cache/modelscope/hub/{model_dir}/example/asr_example.wav']
220+
model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
221+
222+
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
197223

198-
result = model(wav_path)
199-
print(result)
224+
res = model(wav_or_scp, language="auto", use_itn=True)
225+
print([rich_transcription_postprocess(i) for i in res])
200226
```
227+
備考:Libtorchモデルは元のモデルディレクトリにエクスポートされます。
228+
229+
<details>
201230

202231
### 展開
203232

README_zh.md

+35-7
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ SenseVoice是具有音频理解能力的音频基础模型,包括语音识别
4141

4242
<a name="最新动态"></a>
4343
# 最新动态 🔥
44+
- 2024/7:新增加导出 [ONNX](./demo_onnx.py)[libtorch](./demo_libtorch.py) 功能,以及 python 版本 runtime:[funasr-onnx-0.4.0](https://pypi.org/project/funasr-onnx/)[funasr-torch-0.1.1](https://pypi.org/project/funasr-torch/)
4445
- 2024/7: [SenseVoice-Small](https://www.modelscope.cn/models/iic/SenseVoiceSmall) 多语言音频理解模型开源,支持中、粤、英、日、韩语的多语言语音识别,情感识别和事件检测能力,具有极低的推理延迟。。
4546
- 2024/7: CosyVoice致力于自然语音生成,支持多语言、音色和情感控制,擅长多语言语音生成、零样本语音生成、跨语言语音克隆以及遵循指令的能力。[CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice) and [CosyVoice 在线体验](https://www.modelscope.cn/studios/iic/CosyVoice-300M).
4647
- 2024/7: [FunASR](https://github.com/modelscope/FunASR) 是一个基础语音识别工具包,提供多种功能,包括语音识别(ASR)、语音端点检测(VAD)、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等。
@@ -188,21 +189,48 @@ print(text)
188189

189190
Undo
190191

191-
### 导出与测试*进行中*
192-
192+
### 导出与测试
193+
<details><summary>ONNX 与 Libtorch 导出</summary>
193194

195+
#### ONNX
194196
```python
195-
# pip3 install -U funasr-onnx
197+
# pip3 install -U funasr funasr-onnx
198+
from pathlib import Path
196199
from funasr_onnx import SenseVoiceSmall
200+
from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
201+
197202

198203
model_dir = "iic/SenseVoiceSmall"
199-
model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)
200204

201-
wav_path = [f'~/.cache/modelscope/hub/{model_dir}/example/asr_example.wav']
205+
model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
202206

203-
result = model(wav_path)
204-
print(result)
207+
# inference
208+
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
209+
210+
res = model(wav_or_scp, language="auto", use_itn=True)
211+
print([rich_transcription_postprocess(i) for i in res])
205212
```
213+
备注:ONNX模型导出到原模型目录中
214+
215+
#### Libtorch
216+
```python
217+
from pathlib import Path
218+
from funasr_torch import SenseVoiceSmall
219+
from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
220+
221+
222+
model_dir = "iic/SenseVoiceSmall"
223+
224+
model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
225+
226+
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
227+
228+
res = model(wav_or_scp, language="auto", use_itn=True)
229+
print([rich_transcription_postprocess(i) for i in res])
230+
```
231+
备注:Libtorch模型导出到原模型目录中
232+
233+
</details>
206234

207235
### 部署
208236

demo_libtorch.py

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/usr/bin/env python3
2+
# -*- encoding: utf-8 -*-
3+
# Copyright FunASR (https://github.com/FunAudioLLM/SenseVoice). All Rights Reserved.
4+
# MIT License (https://opensource.org/licenses/MIT)
5+
6+
from pathlib import Path
7+
from funasr_torch import SenseVoiceSmall
8+
from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
9+
10+
11+
model_dir = "iic/SenseVoiceSmall"
12+
13+
model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
14+
15+
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
16+
17+
res = model(wav_or_scp, language="auto", use_itn=True)
18+
print([rich_transcription_postprocess(i) for i in res])

demo_onnx.py

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!/usr/bin/env python3
2+
# -*- encoding: utf-8 -*-
3+
# Copyright FunASR (https://github.com/FunAudioLLM/SenseVoice). All Rights Reserved.
4+
# MIT License (https://opensource.org/licenses/MIT)
5+
6+
from pathlib import Path
7+
from funasr_onnx import SenseVoiceSmall
8+
from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
9+
10+
11+
model_dir = "iic/SenseVoiceSmall"
12+
13+
model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
14+
15+
# inference
16+
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
17+
18+
res = model(wav_or_scp, language="auto", use_itn=True)
19+
print([rich_transcription_postprocess(i) for i in res])

requirements.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@ torchaudio
33
modelscope
44
huggingface
55
huggingface_hub
6-
funasr>=1.1.2
6+
funasr>=1.1.3
77
numpy<=1.26.4
88
gradio

0 commit comments

Comments
 (0)