All flows
IntegrationsBeginner

Give Hermes Agent a Voice with ElevenLabs

Hermes Agent ships with no voice by default. This guide adds one with ElevenLabs — Text to Speech for its replies and Speech to Text (Scribe) for transcribing what you say — both as simple provider config in Hermes.

ElevenLabs Developers (@ElevenLabsDevs) on XElevenLabs Developers5 min read25 Jun 2026

Why give Hermes a voice

Hermes Agent runs in your terminal, in messaging apps, and on your phone. By default it has no voice. This guide walks you through how to add one: ElevenLabs Text to Speech for its replies, and Speech to Text for transcribing what you say. Both are provider config in Hermes — no custom scripts required.

The end result: you speak, Hermes hears you with Scribe, thinks, and answers back in your chosen ElevenLabs voice.

Setup

Get an API key from the ElevenLabs dashboard and add it to ~/.hermes/.env:

ELEVENLABS_API_KEY=your_key_here

If the ElevenLabs dependency is missing, install the premium TTS extra into the Hermes environment:

pip install "hermes-agent[tts-premium]"

Easy setup (let Hermes do it)

Hermes is built to use your machine. To turn on ElevenLabs Text to Speech and Speech to Text, you can simply ask Hermes to configure it for you. Hermes has built-in skills for this and it's quite reliable:

Set ElevenLabs as the voice mode for both TTS and STT. I have already added the API Key into .hermes/.env.

The manual steps below do the same thing — they're worth reading because they show how Hermes configuration works under the hood.

Text to Speech (manual)

Run the setup wizard and pick ElevenLabs at the voice step:

hermes setup

Or edit ~/.hermes/config.yaml directly:

tts:
  provider: "elevenlabs"
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"  # any voice from your library
    model_id: "eleven_flash_v2_5"     # ~75ms, built for real-time

voice_id is the voice — choose one from the voice library or use a clone. model_id defines which model to use: eleven_flash_v2_5 is a good choice for live conversation (~75ms), while eleven_multilingual_v2 is a good general-purpose default. Hermes chooses the audio format from the output path.

Restart Hermes after changing config. In the gateway, use:

/restart

In the CLI, exit and relaunch Hermes. Then enable voice output with:

/voice on
/voice tts

Speech to Text (manual)

ElevenLabs Scribe is a built-in Hermes STT provider. You do not need to create a custom transcription script or register a command provider.

Add this to ~/.hermes/config.yaml:

stt:
  enabled: true
  provider: elevenlabs
  elevenlabs:
    model_id: scribe_v2
    language_code: ""        # optional; leave blank for auto-detect
    tag_audio_events: false
    diarize: false

That is enough. Hermes writes incoming audio to a temporary file, sends it to the ElevenLabs /speech-to-text API, and uses the returned transcript. Voice messages on Telegram, Discord, WhatsApp, Slack, and Signal will use Scribe once the gateway has restarted.

To force a language, set language_code, for example:

stt:
  enabled: true
  provider: elevenlabs
  elevenlabs:
    model_id: scribe_v2
    language_code: eng

For names, product terms, and libraries that Scribe commonly mishears, check the ElevenLabs Speech to Text docs for the latest prompting and model options supported by the API.

Done

Speak, and Hermes hears you with Scribe, thinks, and answers in your ElevenLabs voice. Change the voice at any time by picking a new voice_id.

This flow was shared by a community member. The Hermes Bible is an unofficial, community-built resource and is not affiliated with Nous Research.

Related flows