Skip to main content

Text to speech (WebSocket)

wss://api.gabber.dev/voice/websocket?api-key=<GABBER_API_KEY>

Gabber provides a WebSocket API for real-time text-to-speech synthesis. A single WebSocket connection can be used to handle multiple concurrent text-to-speech sessions. Messages are sent and received in JSON format. Audio is received as base64 encoded binary data. This API allows you to send text incrementally, receive audio responses as they are generated, and manage the lifecycle of each session. Below is an overview of how to use this API:

Establish a Connection: Connect to the WebSocket endpoint using your API key.

1. Start a Session: Send a start_session message with a unique <SESSION_ID> of your choosing and specify the desired <GABBER VOICE ID> in the payload to initiate a new text-to-speech session.

2. Push Text: Send one or more push_text messages with the same <SESSION_ID>, each containing partial text to synthesize. You can stream text incrementally as it becomes available.

3. End of Stream (EOS): When all text for the session has been sent, send an eos message with the matching <SESSION_ID> to signal the end of input.

Receive Audio: After sending an EOS, you will receive the remaining audio for the session. After all audio has been sent, you will receive a final message with the same <SESSION_ID> to confirm that the session has completed.

Wait for Final Message: After sending an EOS, you will receive the remaining audio for the session. After all audio has been sent, you will receive a final message with the same <SESSION_ID> to confirm that the session has completed.

Handle Errors: If an unrecoverable error occurs during a session, you will receive an error message. At this point you can consider the session closed and no further messages will be sent.

Send Messages Reference

Start a new session

{
"type": "start_session",
"session": "<SESSION_ID>",
"payload": {
"voice": "<GABBER VOICE ID>",
}
}

Push Text

{
"type": "push_text",
"session": "<SESSION_ID>",
"payload": {
"text": "<partial text to synthesize>",
}
}

EOS

{
"type": "eos",
"session": "<SESSION_ID>",
"payload": {}
}

Receive Messages Reference

Audio

{
"type": "audio",
"session": "<SESSION_ID>",
"payload": {
"audio": "<base64 encoded audio>",
"sample_rate": 24000,
"channels": 1,
"audio_format": "pcm16le",
}
}

Final

{
"type": "final",
"session": "<SESSION_ID>",
"payload": {}
}

Error

{
"type": "error",
"session": "<SESSION_ID>",
"payload": {
"message": "<error message>"
}
}