Tool Calling
Tool calling is supported in all types of Realtime sessions. For client-sdk usage, tool calling is configured when generating the connection details using the startRealtimeSession.
For phone calls, tool calling is configured when creating the phone number attachement.
Responding to Tool Calls In A Realtime Session
For maximum flexibility, the AI in a realtime session does not automatically respond with tool call results.
This behavior was chosen because:
- When a
user
message triggers multiple tool calls, it's unclear how to inject the results in a responseassistant
message. - Some tools can take a long time to complete. In such cases, you might want the conversation to continue in the meantime.
What this means is that it's up to you as the developer to respond into the Realtime Session for tool calls. You can "speak" into the session using speak API.
Here's a complete tool calling example to show how to work with a tool calling flow of reasonable complexity.
Complete Tool Calling Example
A realtime session with id realtime-session-id
is running with a send_message
tool call available to it.
check_schedule
tool definition object
The user says:
"Send a message to Anne saying hello and John saying call me later."
At this point, the Realtime Session does the following sequence:
1. Triggers a webhook of type tool.calls_started
with the following payload
POST Webhook with type tool.calls_started
2. In parallel makes two POST requests (one for each tool call)
POST Tool call with message to Anne
POST Tool call with message to John
3. Triggers a webhook of type tool.calls_finished
NOTE: This webhook is triggered after waiting for all of the tool calls have finsihed completely with a response.
POST Webhook with type tool.calls_finished
Example Backend
In response to step three, you might want to send a response from the "assistant" like "I sent those messages to Anne and John".
Here's an example of how you might do that using the "speak" API.