Skip to content

Docker containerized deployable API endpoints for the SeamlessM4Tv2 model generating text-to-speech and speech-to-speech translation audio files.

Notifications You must be signed in to change notification settings

axs03/SeamlessM4Tv2-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ab3d666 · Mar 17, 2025

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeamlessM4Tv2-API

Custom API endpoints for generating text-to-speech and speech-to-speech translation audio files, ready for deployment.

SeamlessM4T-v2 is a collection of models designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text.

Cloning and Installation

Begin by cloning the repository onto your host machine using

git clone https://github.com/as9219/SeamlessM4Tv2-API.git

Open your preferred IDE and make sure to have the latest version of Python installed. Next, install all the requirments and run locally using

pip install -r requirements.txt


Containerizing Using Docker

For containerization, download the latest version of Docker and install it on your machine.
Make sure Docker is running. You can Docker is running by using:

MacOS

ps -ef | grep docker

Windows

docker info


Now, we are ready to build our first image version! Open a terminal window in the project directory and use the following command:

docker build -t SeamlessAPI:v1.0 .

Docker will now build the source code into an image. This process will take ~12 minutes depending on your machine.
Once this image is built, we can now peoceed to building the container using:

docker run [OPTIONS] -p 8080:8080 --name seamlessapi_container SeamlessAPI:v1.0

You can have the following options for building the container:

[OPTIONS] About
-d Runs in detached mode, will not display any logs in terminal
--privileged Runs container with heightened privileges
--gpus all Use if you have GPUs available in your host machine for the container to use

Docker builds the container and now our endpoints are ready for querying!

Querying the Endpoints

There are the following endpoints

  • T2S
  • S2S

Query Commands - macOS / Linux

T2S

curl -X POST -H "Content-Type: application/json" -d '{"text": "Hello World, I am making a text to speech curl command!", "src_lang": "eng", "tgt_lang": "fra"}' http://localhost:8080/t2s --output path/to/outputdir/output.wav

S2S

curl -X POST -H "Content-Type: multipart/form-data" -F "file=path/to/audio_file.wav" -F "tgt_lang=eng" http://localhost:8080/s2s --output path/to/outputdir/output.wav

Query Commands - Windows

T2S (PowerShell)

Invoke-WebRequest -Uri "http://localhost:8080/t2s" -Method Post `
  -ContentType "application/json" `
  -Body '{"text": "Hello World, I am making a text to speech cli command!", "src_lang": "eng", "tgt_lang": "fra"}' `
  -OutFile "path\to\outputdir\output.wav"

S2S (PowerShell)

Invoke-WebRequest -Uri "http://localhost:8080/s2s" -Method Post `
  -Form @{ 
    file = Get-Item 'path\to\audio_file.wav'
    tgt_lang = 'eng'
  } `
  -OutFile "path\to\outputdir\output.wav"

cURL command paramaters

Name Importance Description
text required Any text you would like to convert
src_lang optional default: eng
tgt_lang optional default: fra

About

Docker containerized deployable API endpoints for the SeamlessM4Tv2 model generating text-to-speech and speech-to-speech translation audio files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published