Skip to content

Commit 6156332

Browse files
committed
update docs
1 parent 14cbaf9 commit 6156332

File tree

2 files changed

+176
-0
lines changed

2 files changed

+176
-0
lines changed

README.md

+85
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,91 @@ Data examples
247247

248248
Full ref to `data/train_example.jsonl`
249249

250+
Description:
251+
- `key`: audio file unique ID
252+
- `source`:path to the audio file
253+
- `source_len`:number of fbank frames of the audio file
254+
- `target`:transcription
255+
- `target_len`:length of target
256+
- `text_language`:language id of the audio file
257+
- `emo_target`:emotion label of the audio file
258+
- `event_target`:event label of the audio file
259+
- `with_or_wo_itn`:whether includes punctuation and inverse text normalization
260+
261+
262+
`train_text.txt`
263+
264+
265+
```bash
266+
BAC009S0764W0121 甚至出现交易几乎停滞的情况
267+
BAC009S0916W0489 湖北一公司以员工名义贷款数十员工负债千万
268+
asr_example_cn_en 所有只要处理 data 不管你是做 machine learning 做 deep learning 做 data analytics 做 data science 也好 scientist 也好通通都要都做的基本功啊那 again 先先对有一些>也许对
269+
ID0012W0014 he tried to think how it could be
270+
```
271+
272+
`train_wav.scp`
273+
274+
275+
276+
```bash
277+
BAC009S0764W0121 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav
278+
BAC009S0916W0489 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav
279+
asr_example_cn_en https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_cn_en.wav
280+
ID0012W0014 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_en.wav
281+
```
282+
283+
`train_text_language.txt`
284+
285+
The language ids include `<|zh|>``<|en|>``<|yue|>``<|ja|>` and `<|ko|>`.
286+
287+
```bash
288+
BAC009S0764W0121 <|zh|>
289+
BAC009S0916W0489 <|zh|>
290+
asr_example_cn_en <|zh|>
291+
ID0012W0014 <|en|>
292+
```
293+
294+
`train_emo.txt`
295+
296+
The emotion labels include`<|HAPPY|>``<|SAD|>``<|ANGRY|>``<|NEUTRAL|>``<|FEARFUL|>``<|DISGUSTED|>` and `<|SURPRISED|>`.
297+
298+
```bash
299+
BAC009S0764W0121 <|NEUTRAL|>
300+
BAC009S0916W0489 <|NEUTRAL|>
301+
asr_example_cn_en <|NEUTRAL|>
302+
ID0012W0014 <|NEUTRAL|>
303+
```
304+
305+
`train_event.txt`
306+
307+
The event labels include`<|BGM|>``<|Speech|>``<|Applause|>``<|Laughter|>``<|Cry|>``<|Sneeze|>``<|Breath|>` and `<|Cough|>`.
308+
309+
```bash
310+
BAC009S0764W0121 <|Speech|>
311+
BAC009S0916W0489 <|Speech|>
312+
asr_example_cn_en <|Speech|>
313+
ID0012W0014 <|Speech|>
314+
```
315+
316+
`Command`
317+
```shell
318+
# generate train.jsonl and val.jsonl from wav.scp, text.txt, text_language.txt, emo_target.txt, event_target.txt
319+
sensevoice2jsonl \
320+
++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt", "../../../data/list/train_text_language.txt", "../../../data/list/train_emo.txt", "../../../data/list/train_event.txt"]' \
321+
++data_type_list='["source", "target", "text_language", "emo_target", "event_target"]' \
322+
++jsonl_file_out="../../../data/list/train.jsonl"
323+
```
324+
325+
If there is no `train_text_language.txt`, `train_emo_target.txt` and `train_event_target.txt`, the language, emotion and event label will be predicted automatically by using the `SenseVoice` model.
326+
```shell
327+
# generate train.jsonl and val.jsonl from wav.scp and text.txt
328+
sensevoice2jsonl \
329+
++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt"]' \
330+
++data_type_list='["source", "target"]' \
331+
++jsonl_file_out="../../../data/list/train.jsonl"
332+
```
333+
334+
250335
### Finetune
251336

252337
Ensure to modify the train_tool in finetune.sh to the absolute path of `funasr/bin/train_ds.py` from the FunASR installation directory you have set up earlier.

README_zh.md

+91
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,97 @@ pip3 install -e ./
254254
```
255255
详细可以参考:`data/train_example.jsonl`
256256

257+
字段说明:
258+
- `key`: 数据唯一ID
259+
- `source`:音频文件的路径
260+
- `source_len`:音频文件的fbank帧数
261+
- `target`:音频文件标注文本
262+
- `target_len`:音频文件标注文本长度
263+
- `text_language`:音频文件的语种标签
264+
- `emo_target`:音频文件的情感标签
265+
- `event_target`:音频文件的事件标签
266+
- `with_or_wo_itn`:标注文本中是否包含标点与逆文本正则化
267+
268+
可以用指令 `sensevoice2jsonl` 从train_wav.scp、train_text.txt、train_text_language.txt、train_emo_target.txt和train_event_target.txt生成,准备过程如下:
269+
270+
`train_text.txt`
271+
272+
左边为数据唯一ID,需与`train_wav.scp`中的`ID`一一对应
273+
右边为音频文件标注文本,格式如下:
274+
275+
```bash
276+
BAC009S0764W0121 甚至出现交易几乎停滞的情况
277+
BAC009S0916W0489 湖北一公司以员工名义贷款数十员工负债千万
278+
asr_example_cn_en 所有只要处理 data 不管你是做 machine learning 做 deep learning 做 data analytics 做 data science 也好 scientist 也好通通都要都做的基本功啊那 again 先先对有一些>也许对
279+
ID0012W0014 he tried to think how it could be
280+
```
281+
282+
`train_wav.scp`
283+
284+
左边为数据唯一ID,需与`train_text.txt`中的`ID`一一对应
285+
右边为音频文件的路径,格式如下
286+
287+
```bash
288+
BAC009S0764W0121 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav
289+
BAC009S0916W0489 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav
290+
asr_example_cn_en https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_cn_en.wav
291+
ID0012W0014 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_en.wav
292+
```
293+
294+
`train_text_language.txt`
295+
296+
左边为数据唯一ID,需与`train_text_language.txt`中的`ID`一一对应
297+
右边为音频文件的语种标签,支持`<|zh|>``<|en|>``<|yue|>``<|ja|>``<|ko|>`,格式如下
298+
299+
```bash
300+
BAC009S0764W0121 <|zh|>
301+
BAC009S0916W0489 <|zh|>
302+
asr_example_cn_en <|zh|>
303+
ID0012W0014 <|en|>
304+
```
305+
306+
`train_emo.txt`
307+
308+
左边为数据唯一ID,需与`train_emo.txt`中的`ID`一一对应
309+
右边为音频文件的情感标签,支持`<|HAPPY|>``<|SAD|>``<|ANGRY|>``<|NEUTRAL|>``<|FEARFUL|>``<|DISGUSTED|>``<|SURPRISED|>`,格式如下
310+
311+
```bash
312+
BAC009S0764W0121 <|NEUTRAL|>
313+
BAC009S0916W0489 <|NEUTRAL|>
314+
asr_example_cn_en <|NEUTRAL|>
315+
ID0012W0014 <|NEUTRAL|>
316+
```
317+
318+
`train_event.txt`
319+
320+
左边为数据唯一ID,需与`train_event.txt`中的`ID`一一对应
321+
右边为音频文件的事件标签,支持`<|BGM|>``<|Speech|>``<|Applause|>``<|Laughter|>``<|Cry|>``<|Sneeze|>``<|Breath|>``<|Cough|>`,格式如下
322+
323+
```bash
324+
BAC009S0764W0121 <|Speech|>
325+
BAC009S0916W0489 <|Speech|>
326+
asr_example_cn_en <|Speech|>
327+
ID0012W0014 <|Speech|>
328+
```
329+
330+
`生成指令`
331+
```shell
332+
# generate train.jsonl and val.jsonl from wav.scp, text.txt, text_language.txt, emo_target.txt, event_target.txt
333+
sensevoice2jsonl \
334+
++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt", "../../../data/list/train_text_language.txt", "../../../data/list/train_emo.txt", "../../../data/list/train_event.txt"]' \
335+
++data_type_list='["source", "target", "text_language", "emo_target", "event_target"]' \
336+
++jsonl_file_out="../../../data/list/train.jsonl"
337+
```
338+
339+
若无train_text_language.txt、train_emo_target.txt和train_event_target.txt,则自动通过使用`SenseVoice`模型对语种、情感和事件打标。
340+
```shell
341+
# generate train.jsonl and val.jsonl from wav.scp and text.txt
342+
sensevoice2jsonl \
343+
++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt"]' \
344+
++data_type_list='["source", "target"]' \
345+
++jsonl_file_out="../../../data/list/train.jsonl"
346+
```
347+
257348
### 启动训练
258349

259350
注意修改 `finetune.sh``train_tool` 为你前面安装FunASR路径中`funasr/bin/train_ds.py`绝对路径

0 commit comments

Comments
 (0)