You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
text = rich_transcription_postprocess(res[0]["text"])
124
-
125
125
print(text)
126
126
```
127
127
128
-
The funasr version has integrated the VAD (Voice Activity Detection) model and supports audio input of any duration, with `batch_size_s` in seconds.
129
-
If all inputs are short audios, and batch inference is needed to speed up inference efficiency, the VAD model can be removed, and `batch_size` can be set accordingly.
128
+
Parameter Descriptions:
129
+
-`model_dir`: The name of the model, or the model's path on the local disk.
130
+
-`max_single_segment_time`: The maximum length of audio segments that the `vad_model` can cut, measured in milliseconds (ms).
131
+
-`use_itn`: Indicates whether the output should include punctuation and inverse text normalization.
132
+
-`batch_size_s`: Represents a dynamic batch size where the total duration of the audio in the batch is measured in seconds (s).
133
+
-`merge_vad`: Whether to concatenate short audio fragments cut by the vad model, with the merged length being `merge_length_s`, measured in seconds (s).
134
+
135
+
If all inputs are short audios (<30s), and batch inference is needed to speed up inference efficiency, the VAD model can be removed, and `batch_size` can be set accordingly.
130
136
```python
131
137
model = AutoModel(model=model_dir, trust_remote_code=True, device="cuda:0")
0 commit comments